Browse skills & tools

skill ★ 4,829

Local Speech-to-Text Transcription with Whisper

This skill utilizes the Whisper CLI to perform accurate, local speech-to-text transcription on various audio formats. It allows developers to process audio files without requiring external API keys or network connectivity.

the-open-agent/openagent speech-to-text audio-transcription whisper local-processing

tool ★ 4,829

Transcribe audio files using OpenAI Whisper API

This tool facilitates audio transcription by interacting with the OpenAI Whisper API endpoint. Developers can submit audio files and optionally specify language or prompts, receiving the output as plain text or structured JSON.

the-open-agent/openagent audio-transcription openai-whisper api-call audio-processing

tool ★ 18,765

Local API for Screen Activity and Memory Retrieval

This tool provides programmatic access to a local REST API, allowing agents to query comprehensive user data including screen recordings, audio transcripts, UI elements, and persistent memories. It supports advanced search, activity summari…

screenpipe/screenpipe local-api screen-recording activity-logging memory-retrieval

tool ★ 18,765

Local API for Screen Activity Analysis

This tool provides a comprehensive local REST API for querying and analyzing user activity data, including screen recordings, audio transcripts, and UI element context. Developers can programmatically retrieve usage summaries, perform targe…

screenpipe/screenpipe local-api usage-analytics screen-recording ui-context

tool

Local Audio Transcription via Whisper CLI

Perform local speech-to-text transcription using the Whisper command-line interface. It supports various audio formats and model sizes to balance speed and accuracy without requiring an API key.

casibase/casibase speech-to-text audio-transcription whisper cli

skill

OpenAI Whisper Audio Transcription API

Transcribe audio files via the OpenAI Whisper API endpoint. The implementation supports custom base URLs for compatible proxies and allows for language hints or prompts.

casibase/casibase openai-whisper audio-transcription speech-to-text api-integration

tool

AI Audio Transcription using Whisper AI

This tool transcribes various audio formats (mp3, wav, etc.) into text using Whisper AI. It supports auto-language detection and returns detailed segment breakdowns and timestamps via a paid API endpoint.

ntriq-gh/ntriq-agentshop audio-transcription speech-to-text whisper llm

tool

Batch audio transcription using Whisper inference

Transcribes up to 500 audio files in a single API call using local Whisper inference. This service accepts an array of audio URLs and optionally specifies the target language.

ntriq-gh/ntriq-agentshop audio-transcription batch-processing whisper llm

skill ★ 6

Multi-speaker audio transcription and action item extraction

This skill transcribes multi-speaker audio recordings using diarization, then classifies each speaker's turns to extract structured action items, decisions, and open questions. It outputs a comprehensive, per-speaker dispatch summary suitab…

Swih/mistral-mcp audio-transcription diarization meeting-minutes action-items

tool

Local API for User Activity and Memory

This tool provides programmatic access to a local REST API for querying comprehensive user activity data, including screen recordings, audio transcripts, UI elements, and persistent memories. It allows agents to analyze usage patterns, summ…

mediar-ai/screenpipe local-api user-analytics screen-recording memory-storage

tool

Screenpipe Local API Interface

Query local screen recordings, audio transcriptions, and UI elements via a REST API. It provides programmatic access to user activity, application usage, and visual context through searchable metadata and frame retrieval.

mediar-ai/screenpipe screen-recording api activity-tracking ocr

skill ★ 372,633

OpenAI Audio Transcription API Wrapper

This skill provides a wrapper for transcribing audio files using various OpenAI models, including gpt-4o and whisper-1. It supports advanced features such as speaker diarization, language specification, and custom prompts.

openclaw/openclaw audio-transcription openai whisper speech-to-text

skill ★ 12

Citedy URL Content Ingestion

Converts URLs into structured data including transcripts, summaries, and metadata for YouTube videos, web articles, PDFs, and audio files. It enables seamless ingestion of diverse media types into LLM pipelines.

Citedy/citedy-seo-agent content-ingestion url-extraction youtube-transcription pdf-parsing