This utility measures core retrieval metrics (Hit@5, MRR, nDCG@10) against a corpus of golden queries. It compares current performance against a saved baseline and provides actionable interpretation and tuning recommendations for diagnosing…