Browse skills & tools

skill ★ 21,403

Authoring and Running Promptfoo Evaluation Suites

This skill guides developers through authoring comprehensive promptfoo evaluation suites for robust regression testing and quality assurance. It covers defining prompts, structuring test cases, implementing various assertions, and validatin…

promptfoo/promptfoo promptfoo evaluation qa regression-testing

skill ★ 24,025

E2E Behavior Validation for Agentic Systems

This skill guides developers in creating robust end-to-end tests using Playwright, focusing on validating core product behaviour and data flow rather than superficial UI states. It provides patterns for testing complex agentic interactions,…

mastra-ai/mastra e2e-testing playwright behavior-validation agent-testing

skill ★ 4

MCP Server Evaluation Creator

Provides a structured methodology for generating complex, multi-hop Q pairs to benchmark the effectiveness of MCP servers through verifiable tool-use evaluations.

jmrplens/gitlab-mcp-server mcp evaluation benchmarking llm-testing