Browse skills & tools

skill ★ 8

Structured Multi-Alternative Comparison

A systematic framework for evaluating multiple alternatives using consistent criteria, a comparison matrix, and evidence-based decision recommendations.

n24q02m/wet-mcp decision-making comparison-matrix evaluation-framework structured-analysis

skill ★ 50

Eval-Driven Development Framework for AI Agents

This skill provides a formal framework for implementing Eval-Driven Development (EDD) within AI coding sessions. It enables developers to define capability and regression tests, track agent reliability using metrics like pass@k, and generat…

tan-yong-sheng/ai-vision-mcp evaluation-framework edd ai-testing regression-testing

skill ★ 73,580

Bootstrap Realtime Evaluation Environments

Automates the scaffolding of new realtime evaluation environments within the OpenAI cookbook by configuring harnesses, prompts, tools, and datasets. It includes automated validation via smoke and full evaluation runs to ensure the new setup…

openai/openai-cookbook realtime-evals scaffolding openai-cookbook automated-testing