Tag: observability
Real-time Agent Conversation and Cost Monitor
This tool monitors Higress access logs in real-time, aggregating multi-turn conversations by session ID. It provides detailed visibility into token usage, cost statistics, and full conversation history via a web interface and CLI.
Build production AI agent backends in Python
This SDK enables the development of robust, production-ready AI agent backends using Python. It supports multiple agent frameworks (e.g., LangGraph, CrewAI) and provides features like streaming, persistent memory, and full observability via…
CloudBase Agent SDK for Production AI Backends
This SDK enables the construction of robust, production-grade AI agent backends using Python. It supports multiple agent frameworks, offering features like streaming protocols, persistent memory integration, and full observability tooling.
Agent lifecycle hooks for observation and interception
Allows developers to observe and intercept agent execution at critical lifecycle points, including tool calls, context compression, and sub-agent interactions. Hooks can be registered to implement custom logic for tracing, mocking, or loggi…
Deploy OpenTelemetry Observability Stack to Kind
This skill automates the deployment of a comprehensive OpenTelemetry stack, including Prometheus, Grafana, Tempo, and the OTEL Collector, onto a local Kind cluster. It establishes a complete observability environment suitable for testing te…
Query Netdata Cloud via REST API
This skill enables querying Netdata Cloud via its REST API to retrieve time-series metrics, logs, network flows, and topology data. It provides the necessary procedures for interacting with spaces, rooms, and nodes using an API token.
Detect Production Regressions in Langfuse
Proactively identifies production regressions by comparing recent Datadog error logs, spans, and API latency against baseline benchmarks across multiple environments. It generates a structured findings table for human review before optional…
Langfuse Datadog Production Query Recipes
Provides predefined Datadog query shapes for investigating Langfuse production telemetry across multiple environments. It facilitates research into tenant activity, API usage, queue behaviour, and system metrics.
Opik TypeScript SDK for Tracing
Provides architectural patterns and implementation guidelines for the Opik TypeScript SDK, covering asynchronous batching, flushing strategies, and error handling.
Sentry Logging and Error Handling Audit
An automated code review skill designed to enforce correct logging and error handling patterns within the Sentry MCP codebase. It ensures 4xx errors are handled without creating Sentry issues, while 5xx errors are correctly captured via log…
CloudBase Agent Python SDK
A Python SDK for building production-ready AI agent backends with support for LangGraph, CrewAI, and LlamaIndex. It enables streaming via the AG-UI protocol and integrates persistent memory, observability, and various toolsets including MCP…
Systematic Dynatrace Production Investigation
A structured workflow for investigating production incidents using Dynatrace data, covering problem triage, root cause analysis via DQL, and security vulnerability reviews.
Deploy OpenTelemetry Stack to Kind
Automates the deployment of an OpenTelemetry observability stack, including Prometheus, Grafana, and Tempo, to a Kind cluster via Helm. It provides a ready-to-use environment for testing metrics collection and distributed tracing capabiliti…
Istio Service Mesh Management
Manage Istio service mesh configurations for traffic management, security, and observability within Kubernetes clusters. It enables canary deployments, mTLS enforcement, and sidecar troubleshooting.
Cilium and Hubble Network Observability
Manage eBPF-based networking and Hubble traffic observability within Kubernetes clusters. This skill enables the implementation of network policies, monitoring of traffic flows, and troubleshooting of connectivity issues.
Implementing Canonical Log Lines
Provides architectural guidelines for implementing wide events, or canonical log lines, to facilitate high-cardinality, structured logging for advanced debugging and analytics.
AgentCore Runtime Session Investigation
Enables the investigation of Amazon Bedrock AgentCore runtime sessions by querying CloudWatch Logs Insights. It facilitates session-to-trace resolution, OpenTelemetry noise filtering, and the analysis of tool invocations, errors, and token …