Tag: evaluation-framework
skill
★ 8
Structured Multi-Alternative Comparison
A systematic framework for evaluating multiple alternatives using consistent criteria, a comparison matrix, and evidence-based decision recommendations.
skill
★ 50
Eval-Driven Development Framework for AI Agents
This skill provides a formal framework for implementing Eval-Driven Development (EDD) within AI coding sessions. It enables developers to define capability and regression tests, track agent reliability using metrics like pass@k, and generat…
skill
★ 73,580
Bootstrap Realtime Evaluation Environments
Automates the scaffolding of new realtime evaluation environments within the OpenAI cookbook by configuring harnesses, prompts, tools, and datasets. It includes automated validation via smoke and full evaluation runs to ensure the new setup…