The AI capability that converts spoken language into written text — enabling voice assistants, transcription services, and hands-free interfaces by understanding human speech across accents, languages, and noisy environments.
In Depth
Speech recognition — formally called Automatic Speech Recognition (ASR) — is the technology that converts audio of human speech into written text. Early systems relied on hand-crafted acoustic models and language models combined through complex pipelines. Modern speech recognition is dominated by end-to-end deep learning models that directly map audio waveforms to text. OpenAI's Whisper model, trained on 680,000 hours of multilingual audio, demonstrated that a single Transformer-based model could achieve near-human transcription accuracy across dozens of languages.
The challenge of speech recognition extends far beyond simply matching sounds to words. Real-world speech is messy: people speak with different accents, speeds, and volumes; background noise interferes; multiple speakers overlap; and homophones (words that sound the same but have different meanings) require contextual understanding to transcribe correctly. Modern systems handle these challenges through large-scale training on diverse audio data and the integration of language models that use context to resolve ambiguities.
Speech recognition is a foundational technology for many AI applications. Voice assistants (Siri, Alexa, Google Assistant) depend on it as their primary input modality. Call centers use it for real-time transcription and sentiment analysis. Accessibility tools provide subtitles and transcriptions for deaf and hard-of-hearing users. Medical documentation systems allow doctors to dictate notes. The accuracy of modern ASR — often exceeding 95% on clean speech — has made voice interaction a natural, mainstream interface for technology.
Speech recognition converts human speech to text using deep learning — it is the enabling technology for voice assistants, transcription services, and the entire voice-first interface paradigm.