in Telephony

Sphinx: An Open Source Speech-to-Text Engine

I attended the Toronto Asterisk Users’ Group meeting tonight and one of the hot topics discussed over dinner was speech-to-text (i.e. speech recognition). Text-to-speech (TTS) in Asterisk is already well-handled by Festival and the corresponding Asterisk application, but I think you’ll agree that speech recognition is a far more interesting topic. (Except if you hate Emily, Bell Canada’s vocal equivalent of the stupid Microsoft paperclip)

Carnegie Mellon University has long had a group working on a recognition engine called Sphinx, funded by a DARPA grant. I’m told that Sphinx-II, the original C version, is available as an application for Asterisk, but later versions of Sphinx have much higher accuracy. Sphinx-3 is written in C++ and Sphinx-4 is written entirely in Java. Sphinx is different from many other speech recognition systems in that it does not require training, which makes it ideal for use in telephony applications. Instead, you supply it with a dictionary of known waveforms (the bigger the dictionary, the more RAM is used). Mike Ashton of QualityTrack claims over 96% accuracy using Sphinx, using it to strip sensitive information out of recorded phone calls from a call centre monitoring application.

This is really fascinating technology, and the best part about it is that despite having been developed under a DARPA grant, it’s open source! Apparently this was one of the stipulations of the CMU researchers when they first agreed to accept the grant, and the community is the better for it. According to the site, it’s rather difficult to install and set up, particularly for those of us with no knowledge in speech patterns and the like, but perhaps one day I’ll be able to have a system that I can dial and say “Please reboot programGuide” and Asterisk will be able to do the right thing.

Write a Comment


  1. Sphinx is not that difficult to set up if you use one of the examples as a base and learn a how to write a JSFG. If you are not going to set up a complex dialog system the grammar should be fairly straightforward.
    I did a project at the university during summer and we used sphinx to implement voice control for a robot. The hard part for us was to training our english/american pronounciation.

    Greetings from Germany.

  2. I agree with Engelke, sphinx is more difficult to use (and configure) when creating an acoustic model (to recognize a language) than create a simple grammar.
    I've also done, during my training, an integration between sphinx-4 and asterisk, and got some good results. The hard job here is to get an appropriated acoustic model that fit with telephony requirements (8Khz)

  3. Could you tell us if you know about a speech to text soft to be implemented as PBI under PCbsd to help people with handicap to use it in OOo


  4. I am doing my project on "speech to text" . Can u guys tell me how should i proceed with it using SPHINX 4.
    I am also doing VOICE AND SPEECH RECOGNITION. It's comparatively easy in MATLAB.
    Can this entire thing be done using SPHINX. i need a direction.

  5. @ Deepak

    I'm doing a project under the same lines as u r doing.
    I have done Speech recognition but i too wanna do Speech to Text now.
    Did u get any achievements further.
    thsi is my mail [email protected]

  6. Howdy this is kind of of off topic but I was wondering if blogs use WYSIWYG editors or
    if you have to manually code with HTML.
    I’m starting a blog soon but have no coding know-how so I wanted to get advice from someone with experience. Any help would be enormously appreciated!