GNU/Linux: Current State of Voice Dictation and Recognition

I have a friend who is suffering from a degenerative chronic disease. It is slowly destroying his ability to use his hands to type and interact with his personal computer. This makes it difficult for him to correspond in e-mail, type documents in a word processor or do any other task that requires much interaction using the keyboard and/or mouse interface. He has recently purchased Dragon NaturallySpeaking to use with his Microsoft Vista based PC in an attempt to overcome his problem. As of now he is in limbo over starting up with training NaturallySpeaking for use due to another reason.

That other reason is my friend wants me to assist him to move to GNU/Linux on his new Dell PC (8 GB memory, 500 GB hard disk, a 16x DVD?RW drive, plus a second read only DVD-ROM drive, AMD Phenom 2.4GHz 9750 quad processor, Vista Home Premium OS, MS Works, etc.) He will run the Vista that came with it in a virtual machine in case he needs something that is not available on Linux. One of the primary requirements is assistive software that will allow him to interact with the PC for command and control. I am happy to report there is at least this level of voice support available. But this assistive software must also take dictation fairly accurately so he can more easily use e-mail and a word processor.

For the past three weeks I and my research assistant have been searching the WWW for dictation software that works under GNU/Linux. We have discovered this to be an exercise in frustration with several dead ends. My assistant found projects that some forum post or blog post would purport to be for the purpose of dictation and sent me the URLs for follow up. Yet, when I follow up on the projects they are either “dead”, cannot be made to actually work for dictation or are active but in a state of perpetual research. The developers of the latter seem to be more interested in getting a Masters degree or a PhD than actually driving the projects forward to be usable for people with real needs. Up until recently no project seems to have been actively solving the problem of desktop dictation software for modern GNU/Linux systems. If any were, I nor my assistant can find evidence of it in the form of even alpha level software to try. This is quite discouraging for those of us currently seeking a solution for a disabled friend or loved one who needs voice dictation to be able to effectively use a GNU/Linux based PC.

One particularly discouraging find was that IBM has stopped development of ViaVoice for GNU/Linux and has pulled what was available from the market. As a former user of ViaVoice on IBM OS/2 Warp 4 I know it to be an excellent product. I did not need ViaVoice when I was using it, I just used it because it was available and had a bit of a “This is neat!” factor about it for me. I know from experience with it back then that it can be used as a voice dictation system once properly trained. It also handles voice command and control quite well. I had no reason to think the GNU/Linux version would no longer be available and had hopes I could get that for my friend. Unfortunately, someone or some committee at IBM has decided our disabled friends do not need ViaVoice any longer. This is shameful. Some things should be done regardless of the “bottom line”. Assistive technology for our disabled fellow man is one such thing.

With the discouraging part behind us I want to look at what is being done and recent developments as of February 2010. Just recently the simon project announced an upcoming 1½ year benefit project on its web log. The announcement includes the following:

With the help of verbal control provided by simon using terms of everyday language, useful scenarios and areas of application shall be created to enable an easy use of new communication technologies such as the internet, telephone and multimedia applications for elderly people. Moreover, additional security can be provided, for example, a reminder for the user to take a medication.

While this announcement does not specifically state work on solving the dictation problem there is at least proof that the assistive software simon is moving forward with its user voice interface. We can only hope that the research done will turn simon into a useful, dictation capable, voice interface for GNU/Linux. Unfortunately, simon uses the HTK-Toolkit which is not GPL and has its own rather restrictive license that includes this clause:

2.2 The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.

This restrictive license means the HTK-toolkit cannot be distributed with a GNU/Linux distribution. Which also means it is unlikely that simon will be included in many distributions as it relies on this toolkit for the heavy lifting of back-end speech processing.

I am very glad to report there is hope for open source, GNU/Linux distribution friendly, back-end processing with the CMU Sphinx project started in the School of Computer Science at Carnegie Mellon University. CMU Sphinx uses a BSD style license which does not restrict redistribution. With my apologies to the project members with whom I have interacted, it does seem the CMU Sphinx project is one of those that is more interested in only the speech engine for research purposes to get those Masters degrees and PhDs I mentioned above. If there is any work being done by this project for a useful front-end I could not find it. While a back-end processor is necessary for speech recognition and dictation it is only half of the problem. I think a PhD or three could be had by working on that useful front-end voice dictation system for GNU/Linux.

There are other projects I could mention but I will leave those for anyone who wants to comment about them.

Here is what I see is needed today. Some project needs to work on both speech back-end processing and voice dictation using a license or licenses that allow free distribution of the source code and binaries for the entire project. Then this project needs to put the pieces together for the FOSS community to start using right away, reporting on bugs and making feature requests. We need a 1960’s era “moon shot” starting in 2010 for GNU/Linux voice dictation for our disabled fellow man. However, instead of taking several years of careful research and planning to “get to the moon” this project would use the model of release early, release often and let us all work on this together starting with an early alpha as soon as possible. After all, no one is likely to die from trying alpha level software and we will all benefit with a usable voice processing system for dictation sooner.

click for a free hit counter
design schools directory

Notice: All comments here are approved by a moderator before they will show up. Depending on the time of day this can take several hours. Please be patient and only post comments once. Thank you.

Edit Mon Mar 1 13:20:03 CST 2010: Correct the phrase “second read only Blu-ray drive” as that was incorrect. The correct drive type is now mentioned.


Published by

Gene A.

Gene is a "Unix Guy", network technologist, system trouble-shooter and IT generalist with over 20 years experience in the SOHO and SMB markets. He is familiar with and conversant in eComStation (a.k.a. OS/2), DOS (PC, MS and Free), Unix, Linux and those GUI based systems from Microsoft. Gene is also a follower of Jesus (forgiven, not perfect), and this does inform his world view.

28 thoughts on “GNU/Linux: Current State of Voice Dictation and Recognition”

  1. All,

    Well, this sucks. Somehow, WordPress set itself to “Users must be registered and logged in to comment” at some point in the past month. So, anyone wanting to comment has been unable to do so. I apologize for that and will keep a closer eye on the site settings from now on.


  2. All,

    For the record, this article is not intended to criticise any project, not even CMU Sphinx. It is intended to put voice recognition and dictation in front of the Linux community to give it some attention. If you have a problem with what I wrote, fine. But realize that the idea is to get more people involved in solving the problem. Even though it is not a glowing, fawning review of current GNU/Linux speech technology the more that see this article, the better.

    Of course if you don’t agree with me you are probably a Nazi or something. 😉


  3. Gene–

    Don’t blame IBM. They sold ViaVoice to ScanSoft (Now Nuance) in 2003. Nuance, true to its roots in Xerox, has some peculiar ideas about marketing. They also own Dragon, by the way–so they are the big dog on the speech recognition market.

    Does Dragon work under Wine or one of its derivatives? If not, perhaps it would be worthwhile to encourage getting that compatibility working? I have used both ViaVoice and Dragon on occasion, and Dragon is presently the state of the art in speech recognition and dictation.

  4. I still have my old Via Voice disc somewhere in my storage room from when I use to use it. It is sad that IBM dropped this when they did, they should at least release what code they can into the Open Source Community it would be great to have it up and running on my Linux machines, and I know of several people that such software would almost be life changing, keyboards and arthritis don’t mix well.

  5. I would have thought that the ideal device for this would be a large Android tablet with a mouth operated pointer. Google also has an excellent voice-to-text application (although this is server based).

    Since Android is open sourced, I guess it might make a good platform for development of a touch screen PC with voice recognition and text to speech for those with partially disabilities.

  6. Just a curiosity…

    Have you tried using any of the ‘available’ windows software, and processing through ‘Wine’ to work on your Linux project?

  7. IBM’s Viavoice should be set free, imo. Dragon’s Naturally Speaking does work under Linux+Wine. I set it up for an office about three years ago, and it worked properly.

  8. To be effective and useful for people who can’t use their hands, more than just dictation function is needed. An efficient correction function for recognition errors and efficient mouse control function is also needed. Dragon NaturallySpeaking has these functions, MacSpeech and MS Windows speech recognition do not. Via Voice had some of this functionality but was awkward to use. Not enough people liked using it to make it financially successful. And why don’t MacSpeech and MS Windows SR have the additional functionality? It would take more time/expense to put in than the developers are willing/able to.

    Speech recognition is based on artificial intelligence. It’s taken thousands of work hours from many people to develop speech recognition to where it is now. Dragon NaturallySpeaking works well on Windows today. For a few hundred dollars for Windows and Dragon NaturallySpeaking, most people with proper training can use it today. Or wait years to see if anyone puts in the thousands of hours needed to recreate the wheel for people who think that saving that money is worth waiting or because they don’t want to use Windows or both.

    There doesn’t seem to be enough people yet who are both not using windows and want SR to pay anyone to make it worthwhile to put the time in. And as it seems that they want it for free, the funding will have to be secured from elsewhere. And then what – give the software away for free? I don’t see it happening, it makes no sense. Accomplishing SR is not trivial, it is highly complex and takes a huge amount of time and very specific expertise to create a result that works well.

  9. There are 3 things you need for speech recognition: a speech engine, an acoustic model and a language model. There is a speech engine called Julius that can be used for the interpretation. The problem is that Julius only understands Chinese at the moment. I’ve been donating to the GPL’d Voxforge speech repository project for a while which will be used for the acoustic model and language model. Voxforge is a project to collect enough English dictation to use the Julius engine for English. It has a goal of 140 hours of dictation for its first release. Ya’ll might think about contributing,

  10. David (comment #4) Thank you for the comment.

    I appreciate the information on IBM and ViaVoice. I did not turn up that little tidbit in my research. As for using Dragon NaturallySpeaking under WINE, I have read and been told in IRC on Freenode that is possible. I have been told it does not interface with GNU/Linux applications from WINE so I am not sure how one would use it to create an e-mail in Kmail or Evolution.

  11. the old rang (comment #7) Thanks for reading.

    I will be trying to get Dragon NaturallySpeaking working in WINE for my friend. However, as I stated in response to David’s comment above, it will not interface with GNU/Linux applications. That makes it less useful for my friend, but we will see how it goes. I intend to write a follow-up article after doing that.

  12. Dan Multer (comment #9) Thank you for reading and for your comment.

    I appreciate your taking the time for those explanations. In answer to your question, “And then what – give the software away for free?”, let me clarify. IBM ViaVoice was not “free” when I used it. If the commercial version of ViaVoice for GNU/Linux were still available I would purchase it today for my friend.

    If we in the GNU/Linux community end up with a “free” speech recognition and processing system for command, control and dictation that will be great. But I nor my friend require it to be “free”. We just want a native application today that will work on GNU/Linux for him. Sadly, that is not yet available it seems.

  13. Jared (comment #10) Thank you for the comment.

    Your explanation of the need for contributions to Voxforge is appreciated. Folks, follow that URL ( and help with some voice contributions. I understand that North American adult male voices are fairly well represented. The need is for adult female voices, children’s voices and elderly male and female voices.

  14. McWolf (comment #15), Thank you for reading.

    Here is part of the information about ORCA from the web site:

    Orca is a free, open source scriptable screen reader. Using various combinations of speech, braille, and magnification, Orca helps provide access to applications and toolkits that support the AT-SPI (e.g., the GNOME desktop).

    ORCA is not what is needed for my friend. He can see just fine. He needs input capability that does not involve the use of one’s hands.

  15. Is there any mileage in looking to encourage IDE developers to “scratch an itch” so that they can avoid RSI while coding (Hands free usage is not just for the clinically disabled – enlightened self-interest could help here)
    – I know nothing about this field but I guess all IDEs have some sort of rule engine for grammars (in code language however rudimentary), the ability to apply short cuts to switch edit modes (think refactoring tools) and extensible document models, and the ability to accept text input. Might it be that some of these could be extensible into language model – it seems the acoustic model and speech engine are both firmly in the aural domain, but the control aspects of the problem may in part be resolves – My tuppennies worth – Hope someone reads this and helps

  16. Reply to number 12:
    on Mar 1st, 2010 at 12:35 pm

    Just a thought… Have you used the ‘gaming version’ enhancements of WINE?

    If you are using some kind of input that you have working, with voice, you might also speak to Nintendo for some of their control input. (since the game pad and other Wii controlers are essentially only devices that make the program work, that would be a good idea to address to them, since they are looking for more avenues to enter all areas of computer usage with their systems… The might even have an application already, since I believe they have voice input in some games, if only sound… Just a thought) (Hmmm? Thinking… kinda made muh head hurt)

  17. I actually just tried out several versions of voice recognition of software, and had varying degrees of failure.

    I agree that the software is far too hard to use at this point, and could use some spiffying up on the usability end. I’ve got quite a bit of experience with GTK and Python, so I’m tempted to start making an interface for Sphinx (which appears to have Python bindings).

  18. Hello Gene,

    I just wanted to mention that there is a related project that you are invited to participate in all you want. That’s how it is when no one gets paid.

    The project is VINUX – the project is for Visually Impaired users and we’ve used both pure Debian and now Ubuntu. We are very much interested in helping those with other impairments and we’ve discussed what you have mentioned on our development list (see google groups).


    Best wishes,


  19. Hello everybody

    I’m also looking already quite long for a workable solution for Speech to Text but over all I could not find something really good working under Linux.

    It would be great to have a good solution which enables me to record notes with my mobile or any other device and create a text out of it later.

    A project which does look serious to me was

    Greetings Dr.Allcome

  20. The main problem with Linux speech-to-text (all platform independent accuracy debates aside) is with building a UI. Nothing will *ever* reach the level of integration of that found in Dragon Naturally Speaking, due to the fact that there is such a wide variety of graphical libraries at play in Linux. Not to mention the fact that every damn Linux distro wants to be different… I mean, don’t get me wrong, I love Linux. I use it as a primary platform both at work & home, and I love it. It’s the very essence of personal freedom spawned from the creative minds of those both altruistic and intelligent. But until some level of uniting happens, there will always be problems with implementing a UI that works across all these variations.

  21. Linux and Unix have an elegant, mature, and versatile serial user interface that can readily be extended to flexibly support voice-to-text and text-to-voice: the Unix command line interface (using your choice of shell, probably bash?).

    An extended emacs mode could also provide a suitable UI framework…


    Lar Kaufman

  22. I for one would be glad to pay for a speech-control and speech-to-text program that works under Linux. I am an avid Linux end-user and have sever arthritis. I am not a coder and can’t contribute very much except a little money and some dictation for the Voxforge project. If a few hundred people like myself were willing to commit to between $5 and $10/mo and a few hours work I think that someone with the skills to get the program running would then be willing to commit to the work. Let’s see if we can’t get it going.

  23. What’s so wrong with an 80% solution? A generic and portable utility that simply takes dictation, coverts to text, and stores it into its own text box and then assist the user with copying it from there and pasting it into the app of your choice. Further integration would certainly be great and necessary for some individuals, but this much of the entire functional feature set would still be great by itself and could form the working technical basis for further front-end integration for all kinds of desktops.

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow the directions below to post a comment if you are human. After 3 failed tries reload the page to start with new images.