Conformer-2 is a sophisticated AI model for automatic speech recognition, succeeding Conformer-1. It incorporates significant enhancements for decoding proper nouns and alphanumeric sequences, and excels in noisy settings.
This is attributed to extensive training using a large collection of English audio. A key benefit of Conformer-2 is that it maintains the same word error rate as Conformer-1, while delivering improved metrics focused on user experience.
Further enhancements to Conformer-2, relative to its predecessor, were achieved by expanding the volume of training data and incorporating more pseudo-label models.
Additionally, modifications to the inference process have reduced Conformer-2's latency, thereby speeding up overall performance. A key advancement in Conformer-2 is its innovative training method, which utilizes model ensembling.
Instead of relying on a single 'teacher' for labels, this model generates labels from multiple 'teachers', resulting in a more versatile and robust model.
This reduces the impact of individual model errors. The development of Conformer-2 also involved examining data and model parameter scaling, increasing the model size, and increasing the amount of audio training data.
These approaches were intended to realize the untapped potential identified by the 'Chinchilla' paper for large language models. With these improvements, Conformer-2 offers quicker response times than Conformer-1, defying the trend of slower, more costly larger models.
Trained on 1.1 million training hours
Better recognition of proper nouns
Enhanced alphanumeric recognition
Trained only on English language data
Potential bias from its teachers
Lacks support for multiple languages

Highly accurate Speech-to-Text API supporting multiple languages
Released 5 months ago
Free + from $0.10/unit

Released 4 months ago
Free + from $5/month

Released 4 years ago
Free + from $0.30/unit

Released 2 years ago
Free

A speech language model that accepts prompts for voice AI applications.
Released 4 months ago
Free + from $0.15/unit

Speech-to-Text API that supports multiple languages and offers exceptional accuracy.
Released 8 years ago
Free + from free tier available

Achieve seamless multilingual communication using AI translation capabilities.
Released 2 years ago
Free

Released 3 years ago
Free

Released 1 year ago
Free + from $0.99/unit

Released 2 years ago
Free + from $5

Released 3 years ago
Free + from $0.00/unit