ElevenLabs Speech to Text excels at transforming spoken words into written text with a high degree of precision across a variety of situations and languages.
It offers two primary functionalities: Scribe v2 and Scribe v2 Realtime. The former is designed for converting audio and video into text, making it suitable for generating captions, subtitles, and editable transcripts for various types of recorded media.
It is notable for its capability to accurately transcribe specific words based on context, highlight sound occurrences in transcripts, and identify and label each participant in a conversation.
The latter, Scribe v2 Realtime, is tailored for real-time uses such as live calls, meetings, or AI systems needing immediate transcription.
It employs a streaming-focused design to deliver real-time results while maintaining accuracy. It also incorporates features like accurate speech segmentation for smoother live processing and voice activity detection.
Both Scribe versions are compatible with more than 90 languages and can be integrated into your products using its API.
Multilingual transcription
Real-time transcription
Supports 90+ languages
No offline support
Doesn't support all languages
No free tier

Released 2 years ago
Free

Released 4 months ago
Free + from $3/month

Released 1 year ago
Free + from $3/month

Released 2 months ago
Free + from $8.99/month

Released 7 months ago
Free + from $9.99/month

Released 2 years ago
Free + from $19/month

Speech-to-Text API that supports multiple languages and offers exceptional accuracy.
Released 8 years ago
Free + from free tier available

Released 2 months ago
Free + from $10/month

AI-driven service for converting audio/video to text in over 98 languages.
Released 2 years ago
Free + from $4.90/month

Released 3 years ago
Free + from $9.72/month

Released 7 months ago
Free + from $9.99/month

Released 3 years ago
Free + from $0.00/unit