Tag: observability

Type: All Skills Tools
tool ★ 8,394

Real-time Agent Conversation and Cost Monitor

This tool monitors Higress access logs in real-time, aggregating multi-turn conversations by session ID. It provides detailed visibility into token usage, cost statistics, and full conversation history via a web interface and CLI.

higress-group/higress log-monitoring session-tracking token-usage llm-cost
skill ★ 1,010

Build production AI agent backends in Python

This SDK enables the development of robust, production-ready AI agent backends using Python. It supports multiple agent frameworks (e.g., LangGraph, CrewAI) and provides features like streaming, persistent memory, and full observability via…

TencentCloudBase/CloudBase-MCP python ai-agent agent-backend langgraph
skill ★ 1,010

CloudBase Agent SDK for Production AI Backends

This SDK enables the construction of robust, production-grade AI agent backends using Python. It supports multiple agent frameworks, offering features like streaming protocols, persistent memory integration, and full observability tooling.

TencentCloudBase/CloudBase-MCP python agent-sdk fastapi langgraph
skill

Agent lifecycle hooks for observation and interception

Allows developers to observe and intercept agent execution at critical lifecycle points, including tool calls, context compression, and sub-agent interactions. Hooks can be registered to implement custom logic for tracing, mocking, or loggi…

lobehub/lobe-chat agent-hooks lifecycle observability tracing
skill

Deploy OpenTelemetry Observability Stack to Kind

This skill automates the deployment of a comprehensive OpenTelemetry stack, including Prometheus, Grafana, Tempo, and the OTEL Collector, onto a local Kind cluster. It establishes a complete observability environment suitable for testing te…

StacklokLabs/toolhive open-telemetry observability prometheus grafana
skill ★ 78,845

Query Netdata Cloud via REST API

This skill enables querying Netdata Cloud via its REST API to retrieve time-series metrics, logs, network flows, and topology data. It provides the necessary procedures for interacting with spaces, rooms, and nodes using an API token.

netdata/netdata netdata rest-api observability monitoring
skill ★ 27,411

Detect Production Regressions in Langfuse

Proactively identifies production regressions by comparing recent Datadog error logs, spans, and API latency against baseline benchmarks across multiple environments. It generates a structured findings table for human review before optional…

langfuse/langfuse datadog regression-detection production-monitoring observability
skill ★ 27,411

Langfuse Datadog Production Query Recipes

Provides predefined Datadog query shapes for investigating Langfuse production telemetry across multiple environments. It facilitates research into tenant activity, API usage, queue behaviour, and system metrics.

langfuse/langfuse datadog telemetry langfuse observability
skill ★ 19,338

Opik TypeScript SDK for Tracing

Provides architectural patterns and implementation guidelines for the Opik TypeScript SDK, covering asynchronous batching, flushing strategies, and error handling.

comet-ml/opik typescript opik observability tracing
skill ★ 693

Sentry Logging and Error Handling Audit

An automated code review skill designed to enforce correct logging and error handling patterns within the Sentry MCP codebase. It ensures 4xx errors are handled without creating Sentry issues, while 5xx errors are correctly captured via log…

getsentry/sentry-mcp sentry logging error-handling observability
skill

CloudBase Agent Python SDK

A Python SDK for building production-ready AI agent backends with support for LangGraph, CrewAI, and LlamaIndex. It enables streaming via the AG-UI protocol and integrates persistent memory, observability, and various toolsets including MCP…

TencentCloudBase/CloudBase-AI-ToolKit python ai-agents langgraph fastapi
skill ★ 115

Systematic Dynatrace Production Investigation

A structured workflow for investigating production incidents using Dynatrace data, covering problem triage, root cause analysis via DQL, and security vulnerability reviews.

dynatrace-oss/dynatrace-mcp dynatrace incident-response observability dql
skill ★ 1,806

Deploy OpenTelemetry Stack to Kind

Automates the deployment of an OpenTelemetry observability stack, including Prometheus, Grafana, and Tempo, to a Kind cluster via Helm. It provides a ready-to-use environment for testing metrics collection and distributed tracing capabiliti…

Stacklok/toolhive opentelemetry kind kubernetes helm
skill ★ 888

Istio Service Mesh Management

Manage Istio service mesh configurations for traffic management, security, and observability within Kubernetes clusters. It enables canary deployments, mTLS enforcement, and sidecar troubleshooting.

rohitg00/kubectl-mcp-server istio kubernetes service-mesh traffic-management
skill ★ 888

Cilium and Hubble Network Observability

Manage eBPF-based networking and Hubble traffic observability within Kubernetes clusters. This skill enables the implementation of network policies, monitoring of traffic flows, and troubleshooting of connectivity issues.

rohitg00/kubectl-mcp-server kubernetes cilium hubble ebpf
skill ★ 598

Implementing Canonical Log Lines

Provides architectural guidelines for implementing wide events, or canonical log lines, to facilitate high-cardinality, structured logging for advanced debugging and analytics.

neondatabase/mcp-server-neon logging observability structured-logging wide-events
skill ★ 9,069

AgentCore Runtime Session Investigation

Enables the investigation of Amazon Bedrock AgentCore runtime sessions by querying CloudWatch Logs Insights. It facilitates session-to-trace resolution, OpenTelemetry noise filtering, and the analysis of tool invocations, errors, and token …

awslabs/mcp aws bedrock cloudwatch agentcore