Definition: Recognizes individual words spoken by a specific person.
Accuracy: Typically works with 90 to 98% reliability for vocabularies ranging from 100 to 10,000 words or larger.
Training: Speaker-dependent training involves users repeating the full vocabulary, improving accuracy.
Environment: Recognition rates improve in quiet environments, with head-mounted microphones, and careful vocabulary selection.
Continuous-speech recognition:
Definition: Aims to understand continuous spoken words, similar to the fantasy of HAL in sci-fi, but reality presents challenges.
History: Many research projects pursued this during the dot-com boom, leading to high expectations but disappointing outcomes.
Issues: Speech dictation products work but face serious problems with error rates and error repair, impacting document quality.
Cognitive Load: Dictation can impose cognitive burdens, interfering with planning and sentence formation.
Voice information systems:
Appeal: Human voice is appealing for communication and information.
Use Cases: Stored speech commonly used in telephone-based information systems, Interactive Voice Response (IVR) for government services, tourist information, and after-hours messages.
Cost-Effectiveness: IVR systems can provide good customer service at a low cost with proper development methods and metrics.
Speech generation:
Definition: Successful technology generating speech used in consumer products and telephones.
Implementation: Inexpensive, compact, reliable systems use digitized speech segments (canned speech) in applications like automobile navigation, internet services, and utility-control rooms.
Non-speech auditory interfaces:
Definition: Beyond speech, includes individual audio tones and more complex information presentation through sound and music.
Examples: Computer systems use tones for warnings or acknowledgments; keyboards and mobile devices provide electronically generated sound feedback.