Tag: llm
multi-model consensus for agent predictions
This tool generates a mathematically optimal consensus prediction by aggregating outputs from multiple LLMs or models. It automatically weights contributions based on historical accuracy and calculates entropy to measure model disagreement.
Zero-API-Key Web Search and Verification Tool
A robust, command-line utility designed for source-backed web search, evidence-aware claim verification, and deep page browsing. It supports multi-engine SERP retrieval and includes a Web Unlocker to access geo-blocked or rate-limited conte…
Manual Context Compaction for Long Agentic Workflows
This skill suggests manual context compaction at logical breakpoints during extended, multi-phase tasks. Rather than relying on arbitrary auto-compaction, it prompts the user to explicitly compact the working memory when transitioning betwe…
Intelligent Mistral Model and Tool Router
This skill acts as a central routing layer, intelligently selecting the optimal Mistral model and associated tool for any given task. It determines the best combination of model (e.g., codestral, magistral) and capability (e.g., OCR, classi…
Focused code review of diffs with Codestral
This skill performs a focused code review of a provided diff, automatically detecting the most relevant focus area (e.g., security, performance, API design). It utilizes the Codestral model to provide concrete, high-signal findings and conc…
Multi-speaker audio transcription and action item extraction
This skill transcribes multi-speaker audio recordings using diarization, then classifies each speaker's turns to extract structured action items, decisions, and open questions. It outputs a comprehensive, per-speaker dispatch summary suitab…
Persistent Memory and Knowledge Graph for Agents
This system provides persistent, structured memory for agents, allowing them to store, recall, and connect context across sessions. It utilizes a hybrid search mechanism combining vector, BM25, and graph traversal for robust knowledge compo…
Persistent memory for AI coding agents
This tool provides persistent, queryable memory for AI agents, allowing them to store critical decisions, user preferences, and lessons learned. It uses a hybrid retrieval architecture (vector similarity and full-text search) to ensure agen…
Optimize RAG chunk size and retrieval settings
This skill systematically sweeps various chunk sizes and retrieval modes (keyword, hybrid, rerank) to benchmark and determine the optimal configuration for your specific corpus. It reports standard metrics like nDCG and MRR against a golden…
evaluate retrieval quality metrics and performance
This utility measures core retrieval metrics (Hit@5, MRR, nDCG@10) against a corpus of golden queries. It compares current performance against a saved baseline and provides actionable interpretation and tuning recommendations for diagnosing…
LLM cost tracking and routing policy management
This skill manages and optimizes LLM usage costs by dynamically routing tasks to the most cost-effective models. It provides comprehensive reporting on savings, usage policies, and routing quality metrics.
Systematic Knowledge Base Audit and Cleanup
This skill systematically reviews stored memories to maintain knowledge base integrity by identifying and resolving duplicates, contradictions, and stale entries. It supports advanced knowledge graph hygiene, including detecting orphaned ed…
Persistent, Versioned Memory for AI Agents
This tool provides agents with a structured, versioned knowledge base, allowing for the storage of facts and complex summaries via keypaths. It supports semantic search, full project history tracking, and conflict detection, acting as a rob…
Automated Codebase Documentation Generator Pipeline
This skill orchestrates a comprehensive, multi-phase pipeline using specialized agents to analyze a codebase, conduct deep research, and generate structured documentation. It ensures conflict-free file ownership and includes a final QA vali…
Extract structured data from meeting transcripts
This skill ingests various meeting transcript formats (e.g., VTT, SRT, TXT) and uses advanced NLP to extract structured entities such as decisions, action items, attendees, and commitments. It stores these findings into persistent memory, e…
Import Emails and Extract Structured Entities
This skill connects to an email MCP to ingest emails into persistent memory. It systematically extracts structured entities—including contacts, tasks, events, and transactions—while maintaining full provenance via source quoting and unique …
Import and structure chat history from various sources
This skill ingests conversation transcripts from diverse sources—including ChatGPT, Claude, and Slack exports—and structures them into persistent memory. It extracts key entities such as decisions, tasks, and contacts, reconstructing a trac…
Systematic Comparative Project Analysis Across Repositories
This skill systematically assesses a target project or concept by comparing it against the foundational context of all repositories loaded from the truth layer. It generates structured reports covering competitive positioning, potential par…
Store Chat Conversations to Neotoma Knowledge Graph
This skill reviews the current chat transcript, generating a structured preview of all data to be persisted in Neotoma. It ensures canonical storage by creating dual agent_message records per turn (user and assistant) and requires explicit …
Open WebSearch: Advanced Live Web Retrieval Skill
This skill provides comprehensive, multi-source web retrieval, managing complex setup via local CLI/daemon or workspace MCP tools. It intelligently prioritises direct URL fetching, focused searches, and GitHub READMEs while adhering to stri…
Comprehensive Testing Guidelines for Inbox-Zero
Provides comprehensive guidance for testing the application, covering unit, integration, and end-to-end workflows. It details running specialized tests for LLM features and cross-model evaluation.
structured workflow for bug fixing and testing
This skill automates the process of fixing bugs within a feature or module by following a structured workflow. It supports optional error classification (implementation, spec, or architectural) and ensures all fixes are validated with regre…
Guidelines for Robust LLM Integration
This skill outlines best practices for implementing Language Model functionality, focusing on structured code patterns, mandatory Zod schema validation, and robust error handling. It guides developers on separating system and user prompts w…
Guidelines for LLM Testing and Development
Provides comprehensive guidelines for developing robust tests for LLM-related functionality. It advises developers on fixture usage, particularly regarding model and provider selection, ensuring tests remain flexible.