Tag: audio-transcription
Local Speech-to-Text Transcription with Whisper
This skill utilizes the Whisper CLI to perform accurate, local speech-to-text transcription on various audio formats. It allows developers to process audio files without requiring external API keys or network connectivity.
Transcribe audio files using OpenAI Whisper API
This tool facilitates audio transcription by interacting with the OpenAI Whisper API endpoint. Developers can submit audio files and optionally specify language or prompts, receiving the output as plain text or structured JSON.
Local API for Screen Activity and Memory Retrieval
This tool provides programmatic access to a local REST API, allowing agents to query comprehensive user data including screen recordings, audio transcripts, UI elements, and persistent memories. It supports advanced search, activity summari…
Local API for Screen Activity Analysis
This tool provides a comprehensive local REST API for querying and analyzing user activity data, including screen recordings, audio transcripts, and UI element context. Developers can programmatically retrieve usage summaries, perform targe…
Local Audio Transcription via Whisper CLI
Perform local speech-to-text transcription using the Whisper command-line interface. It supports various audio formats and model sizes to balance speed and accuracy without requiring an API key.
OpenAI Whisper Audio Transcription API
Transcribe audio files via the OpenAI Whisper API endpoint. The implementation supports custom base URLs for compatible proxies and allows for language hints or prompts.
AI Audio Transcription using Whisper AI
This tool transcribes various audio formats (mp3, wav, etc.) into text using Whisper AI. It supports auto-language detection and returns detailed segment breakdowns and timestamps via a paid API endpoint.
Batch audio transcription using Whisper inference
Transcribes up to 500 audio files in a single API call using local Whisper inference. This service accepts an array of audio URLs and optionally specifies the target language.
Multi-speaker audio transcription and action item extraction
This skill transcribes multi-speaker audio recordings using diarization, then classifies each speaker's turns to extract structured action items, decisions, and open questions. It outputs a comprehensive, per-speaker dispatch summary suitab…
Local API for User Activity and Memory
This tool provides programmatic access to a local REST API for querying comprehensive user activity data, including screen recordings, audio transcripts, UI elements, and persistent memories. It allows agents to analyze usage patterns, summ…
Screenpipe Local API Interface
Query local screen recordings, audio transcriptions, and UI elements via a REST API. It provides programmatic access to user activity, application usage, and visual context through searchable metadata and frame retrieval.
OpenAI Audio Transcription API Wrapper
This skill provides a wrapper for transcribing audio files using various OpenAI models, including gpt-4o and whisper-1. It supports advanced features such as speaker diarization, language specification, and custom prompts.
Citedy URL Content Ingestion
Converts URLs into structured data including transcripts, summaries, and metadata for YouTube videos, web articles, PDFs, and audio files. It enables seamless ingestion of diverse media types into LLM pipelines.