Tag: ocr

Type: All Skills Tools
tool

Batch Screenshot Text and UI Extraction

Extract text, UI elements, and data tables from up to 500 screenshots in a single API call. The service uses a flat-fee USDC payment model via the x402 protocol.

ntriq-gh/ntriq-agentshop batch-extraction screenshot-analysis ocr ui-elements
tool

Batch Document Intelligence Processing API

This tool processes up to 500 document images in a single batch call, providing OCR, classification, table extraction, and summarisation. It accepts an array of image URLs and supports various analysis types via a structured JSON payload.

ntriq-gh/ntriq-agentshop document-intelligence batch-processing ocr extraction
skill ★ 6

Mistral MCP Integration for OpenClaw

Enables OpenClaw to access Mistral-specific capabilities via the Mistral MCP server, including OCR, Codestral FIM, and Voxtral audio processing. It also provides access to durable workflows, moderation, and batch API endpoints.

Swih/mistral-mcp mistral-ai mcp openclaw ocr
skill ★ 6

PDF Invoice Data Extractor

Extracts structured line-item data from digital and scanned PDF invoices using Mistral OCR and chat models. It facilitates accounting reconciliation by parsing vendor details, dates, and VAT amounts into structured formats.

Swih/mistral-mcp pdf-extraction ocr mistral-ai invoice-processing
skill ★ 6

Structured Contract Analysis and Risk Scoring

This skill processes PDF or scanned contracts by first performing OCR using Mistral's document AI, and subsequently extracting structured data, including key clauses and associated risk levels, via JSON schema enforcement. It provides a com…

Swih/mistral-mcp contract-analysis ocr risk-scoring json-schema
tool

Screenpipe Local API Interface

Query local screen recordings, audio transcriptions, and UI elements via a REST API. It provides programmatic access to user activity, application usage, and visual context through searchable metadata and frame retrieval.

mediar-ai/screenpipe screen-recording api activity-tracking ocr
skill ★ 2

Batch Book Translation and OCR Pipeline

An automated pipeline for processing historical book scans through image-cropping, Gemini-powered OCR, and context-aware translation. It supports both real-time Lambda workers and cost-effective Gemini Batch API integration.

Embassy-of-the-Free-Mind/sourcelibrary-v2 batch-processing ocr translation gemini-api
skill ★ 85

MinerU Document Extractor

MinerU is a CLI tool and agent skill for reliable document parsing, converting PDFs, scanned documents, images, and Word/PowerPoint/Excel files into Markdown, HTML, LaTeX, or DOCX. It offers both a fast, zero-setup mode and a precision mode…

opendatalab/MinerU-Ecosystem document-parsing pdf ocr markdown
skill ★ 136,096

Comprehensive PDF Processing and Manipulation Skill

This skill provides comprehensive capabilities for handling PDF documents, enabling advanced operations such as extracting text, tables, and images, or performing structural modifications like merging, splitting, rotating, and encrypting fi…

anthropics/skills pdf-processing pdf-manipulation text-extraction table-extraction