GNU/Linux: Current State of Voice Dictation and Recognition

I have a friend who is suffering from a degenerative chronic disease. It is slowly destroying his ability to use his hands to type and interact with his personal computer. This makes it difficult for him to correspond in e-mail, type documents in a word processor or do any other task that requires much interaction using the keyboard and/or mouse interface. He has recently purchased Dragon NaturallySpeaking to use with his Microsoft Vista based PC in an attempt to overcome his problem. As of now he is in limbo over starting up with training NaturallySpeaking for use due to another reason.

That other reason is my friend wants me to assist him to move to GNU/Linux on his new Dell PC (8 GB memory, 500 GB hard disk, a 16x DVD?RW drive, plus a second read only DVD-ROM drive, AMD Phenom 2.4GHz 9750 quad processor, Vista Home Premium OS, MS Works, etc.) He will run the Vista that came with it in a virtual machine in case he needs something that is not available on Linux. One of the primary requirements is assistive software that will allow him to interact with the PC for command and control. I am happy to report there is at least this level of voice support available. But this assistive software must also take dictation fairly accurately so he can more easily use e-mail and a word processor.

For the past three weeks I and my research assistant have been searching the WWW for dictation software that works under GNU/Linux. We have discovered this to be an exercise in frustration with several dead ends. My assistant found projects that some forum post or blog post would purport to be for the purpose of dictation and sent me the URLs for follow up. Yet, when I follow up on the projects they are either “dead”, cannot be made to actually work for dictation or are active but in a state of perpetual research. The developers of the latter seem to be more interested in getting a Masters degree or a PhD than actually driving the projects forward to be usable for people with real needs. Up until recently no project seems to have been actively solving the problem of desktop dictation software for modern GNU/Linux systems. If any were, I nor my assistant can find evidence of it in the form of even alpha level software to try. This is quite discouraging for those of us currently seeking a solution for a disabled friend or loved one who needs voice dictation to be able to effectively use a GNU/Linux based PC.

One particularly discouraging find was that IBM has stopped development of ViaVoice for GNU/Linux and has pulled what was available from the market. As a former user of ViaVoice on IBM OS/2 Warp 4 I know it to be an excellent product. I did not need ViaVoice when I was using it, I just used it because it was available and had a bit of a “This is neat!” factor about it for me. I know from experience with it back then that it can be used as a voice dictation system once properly trained. It also handles voice command and control quite well. I had no reason to think the GNU/Linux version would no longer be available and had hopes I could get that for my friend. Unfortunately, someone or some committee at IBM has decided our disabled friends do not need ViaVoice any longer. This is shameful. Some things should be done regardless of the “bottom line”. Assistive technology for our disabled fellow man is one such thing.

With the discouraging part behind us I want to look at what is being done and recent developments as of February 2010. Just recently the simon project announced an upcoming 1½ year benefit project on its web log. The announcement includes the following:

Abstract:
With the help of verbal control provided by simon using terms of everyday language, useful scenarios and areas of application shall be created to enable an easy use of new communication technologies such as the internet, telephone and multimedia applications for elderly people. Moreover, additional security can be provided, for example, a reminder for the user to take a medication.

While this announcement does not specifically state work on solving the dictation problem there is at least proof that the assistive software simon is moving forward with its user voice interface. We can only hope that the research done will turn simon into a useful, dictation capable, voice interface for GNU/Linux. Unfortunately, simon uses the HTK-Toolkit which is not GPL and has its own rather restrictive license that includes this clause:

2.2 The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.

This restrictive license means the HTK-toolkit cannot be distributed with a GNU/Linux distribution. Which also means it is unlikely that simon will be included in many distributions as it relies on this toolkit for the heavy lifting of back-end speech processing.

I am very glad to report there is hope for open source, GNU/Linux distribution friendly, back-end processing with the CMU Sphinx project started in the School of Computer Science at Carnegie Mellon University. CMU Sphinx uses a BSD style license which does not restrict redistribution. With my apologies to the project members with whom I have interacted, it does seem the CMU Sphinx project is one of those that is more interested in only the speech engine for research purposes to get those Masters degrees and PhDs I mentioned above. If there is any work being done by this project for a useful front-end I could not find it. While a back-end processor is necessary for speech recognition and dictation it is only half of the problem. I think a PhD or three could be had by working on that useful front-end voice dictation system for GNU/Linux.

There are other projects I could mention but I will leave those for anyone who wants to comment about them.

Here is what I see is needed today. Some project needs to work on both speech back-end processing and voice dictation using a license or licenses that allow free distribution of the source code and binaries for the entire project. Then this project needs to put the pieces together for the FOSS community to start using right away, reporting on bugs and making feature requests. We need a 1960’s era “moon shot” starting in 2010 for GNU/Linux voice dictation for our disabled fellow man. However, instead of taking several years of careful research and planning to “get to the moon” this project would use the model of release early, release often and let us all work on this together starting with an early alpha as soon as possible. After all, no one is likely to die from trying alpha level software and we will all benefit with a usable voice processing system for dictation sooner.

click for a free hit counter
design schools directory

Notice: All comments here are approved by a moderator before they will show up. Depending on the time of day this can take several hours. Please be patient and only post comments once. Thank you.

Edit Mon Mar 1 13:20:03 CST 2010: Correct the phrase “second read only Blu-ray drive” as that was incorrect. The correct drive type is now mentioned.