Tux + Linux Items

Help promote Linux and FOSS at the
Sample T-Shirt from the ERACC Cafe Press Store
ERACC Cafe Press Store

GNU/Linux: Current State of Voice Dictation and Recognition

I have a friend who is suffering from a degenerative chronic disease. It is slowly destroying his ability to use his hands to type and interact with his personal computer. This makes it difficult for him to correspond in e-mail, type documents in a word processor or do any other task that requires much interaction using the keyboard and/or mouse interface. He has recently purchased Dragon NaturallySpeaking to use with his Microsoft Vista based PC in an attempt to overcome his problem. As of now he is in limbo over starting up with training NaturallySpeaking for use due to another reason.

That other reason is my friend wants me to assist him to move to GNU/Linux on his new Dell PC (8 GB memory, 500 GB hard disk, a 16x DVD?RW drive, plus a second read only DVD-ROM drive, AMD Phenom 2.4GHz 9750 quad processor, Vista Home Premium OS, MS Works, etc.) He will run the Vista that came with it in a virtual machine in case he needs something that is not available on Linux. One of the primary requirements is assistive software that will allow him to interact with the PC for command and control. I am happy to report there is at least this level of voice support available. But this assistive software must also take dictation fairly accurately so he can more easily use e-mail and a word processor.

For the past three weeks I and my research assistant have been searching the WWW for dictation software that works under GNU/Linux. We have discovered this to be an exercise in frustration with several dead ends. My assistant found projects that some forum post or blog post would purport to be for the purpose of dictation and sent me the URLs for follow up. Yet, when I follow up on the projects they are either “dead”, cannot be made to actually work for dictation or are active but in a state of perpetual research. The developers of the latter seem to be more interested in getting a Masters degree or a PhD than actually driving the projects forward to be usable for people with real needs. Up until recently no project seems to have been actively solving the problem of desktop dictation software for modern GNU/Linux systems. If any were, I nor my assistant can find evidence of it in the form of even alpha level software to try. This is quite discouraging for those of us currently seeking a solution for a disabled friend or loved one who needs voice dictation to be able to effectively use a GNU/Linux based PC.

One particularly discouraging find was that IBM has stopped development of ViaVoice for GNU/Linux and has pulled what was available from the market. As a former user of ViaVoice on IBM OS/2 Warp 4 I know it to be an excellent product. I did not need ViaVoice when I was using it, I just used it because it was available and had a bit of a “This is neat!” factor about it for me. I know from experience with it back then that it can be used as a voice dictation system once properly trained. It also handles voice command and control quite well. I had no reason to think the GNU/Linux version would no longer be available and had hopes I could get that for my friend. Unfortunately, someone or some committee at IBM has decided our disabled friends do not need ViaVoice any longer. This is shameful. Some things should be done regardless of the “bottom line”. Assistive technology for our disabled fellow man is one such thing.

With the discouraging part behind us I want to look at what is being done and recent developments as of February 2010. Just recently the simon project announced an upcoming 1½ year benefit project on its web log. The announcement includes the following:

Abstract:
With the help of verbal control provided by simon using terms of everyday language, useful scenarios and areas of application shall be created to enable an easy use of new communication technologies such as the internet, telephone and multimedia applications for elderly people. Moreover, additional security can be provided, for example, a reminder for the user to take a medication.

While this announcement does not specifically state work on solving the dictation problem there is at least proof that the assistive software simon is moving forward with its user voice interface. We can only hope that the research done will turn simon into a useful, dictation capable, voice interface for GNU/Linux. Unfortunately, simon uses the HTK-Toolkit which is not GPL and has its own rather restrictive license that includes this clause:

2.2 The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.

This restrictive license means the HTK-toolkit cannot be distributed with a GNU/Linux distribution. Which also means it is unlikely that simon will be included in many distributions as it relies on this toolkit for the heavy lifting of back-end speech processing.

I am very glad to report there is hope for open source, GNU/Linux distribution friendly, back-end processing with the CMU Sphinx project started in the School of Computer Science at Carnegie Mellon University. CMU Sphinx uses a BSD style license which does not restrict redistribution. With my apologies to the project members with whom I have interacted, it does seem the CMU Sphinx project is one of those that is more interested in only the speech engine for research purposes to get those Masters degrees and PhDs I mentioned above. If there is any work being done by this project for a useful front-end I could not find it. While a back-end processor is necessary for speech recognition and dictation it is only half of the problem. I think a PhD or three could be had by working on that useful front-end voice dictation system for GNU/Linux.

There are other projects I could mention but I will leave those for anyone who wants to comment about them.

Here is what I see is needed today. Some project needs to work on both speech back-end processing and voice dictation using a license or licenses that allow free distribution of the source code and binaries for the entire project. Then this project needs to put the pieces together for the FOSS community to start using right away, reporting on bugs and making feature requests. We need a 1960′s era “moon shot” starting in 2010 for GNU/Linux voice dictation for our disabled fellow man. However, instead of taking several years of careful research and planning to “get to the moon” this project would use the model of release early, release often and let us all work on this together starting with an early alpha as soon as possible. After all, no one is likely to die from trying alpha level software and we will all benefit with a usable voice processing system for dictation sooner.

click for a free hit counter
design schools directory

Notice: All comments here are approved by a moderator before they will show up. Depending on the time of day this can take several hours. Please be patient and only post comments once. Thank you.

Edit Mon Mar 1 13:20:03 CST 2010: Correct the phrase “second read only Blu-ray drive” as that was incorrect. The correct drive type is now mentioned.

Share

28 comments to GNU/Linux: Current State of Voice Dictation and Recognition

  • I actually just tried out several versions of voice recognition of software, and had varying degrees of failure.

    I agree that the software is far too hard to use at this point, and could use some spiffying up on the usability end. I’ve got quite a bit of experience with GTK and Python, so I’m tempted to start making an interface for Sphinx (which appears to have Python bindings).

  • David J. Ring, Jr.

    Hello Gene,

    I just wanted to mention that there is a related project that you are invited to participate in all you want. That’s how it is when no one gets paid.

    The project is VINUX – the project is for Visually Impaired users and we’ve used both pure Debian and now Ubuntu. We are very much interested in helping those with other impairments and we’ve discussed what you have mentioned on our development list (see google groups).

    See http://www.vinux.org.uk/

    Best wishes,

    David

  • Dr.Allcome

    Hello everybody

    I’m also looking already quite long for a workable solution for Speech to Text but over all I could not find something really good working under Linux.

    It would be great to have a good solution which enables me to record notes with my mobile or any other device and create a text out of it later.

    A project which does look serious to me was http://www.voxforge.org/.

    Greetings Dr.Allcome

  • youshoulduseunix

    The main problem with Linux speech-to-text (all platform independent accuracy debates aside) is with building a UI. Nothing will *ever* reach the level of integration of that found in Dragon Naturally Speaking, due to the fact that there is such a wide variety of graphical libraries at play in Linux. Not to mention the fact that every damn Linux distro wants to be different… I mean, don’t get me wrong, I love Linux. I use it as a primary platform both at work & home, and I love it. It’s the very essence of personal freedom spawned from the creative minds of those both altruistic and intelligent. But until some level of uniting happens, there will always be problems with implementing a UI that works across all these variations.

  • Lar Kaufman

    Linux and Unix have an elegant, mature, and versatile serial user interface that can readily be extended to flexibly support voice-to-text and text-to-voice: the Unix command line interface (using your choice of shell, probably bash?).

    An extended emacs mode could also provide a suitable UI framework…

    -lar

    Lar Kaufman

  • [...] Theres sphinx in the Ubuntu repository but do read this http://blog.eracc.com/2010/02/28/gnu…d-recognition/ [...]

  • I for one would be glad to pay for a speech-control and speech-to-text program that works under Linux. I am an avid Linux end-user and have sever arthritis. I am not a coder and can’t contribute very much except a little money and some dictation for the Voxforge project. If a few hundred people like myself were willing to commit to between $5 and $10/mo and a few hours work I think that someone with the skills to get the program running would then be willing to commit to the work. Let’s see if we can’t get it going.

  • David Mc

    What’s so wrong with an 80% solution? A generic and portable utility that simply takes dictation, coverts to text, and stores it into its own text box and then assist the user with copying it from there and pasting it into the app of your choice. Further integration would certainly be great and necessary for some individuals, but this much of the entire functional feature set would still be great by itself and could form the working technical basis for further front-end integration for all kinds of desktops.

Leave a Reply

  

  

  

By posting you agree you have read and will abide by our comment policy.
If you have not read the comment policy, please do so, then post a comment.

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>