View on

Robot - Text-to-Speech Overlay

Robot is a Windows application that allows you to easily input text on the fly and have it spoken back on any audio device(s). This can be used to quickly and easily send a text-to-speech voice to any audio output device.

The application shows a small overlay on the screen when it is active that includes a text input, a message indicating the basic hotkeys, and a status bar. This overlay can only be activated by the focus hotkey so that it cannot interrupt anything.


Download complete package as a .zip: 32-bit | 64-bit

Or download individual files and put them together into a folder:

The executables provided above are built using Visual Studio 14.0. Additional builds using gcc can be found in the packages directory and the builds directory.


The overlay is a small rectangle that stays on top of windows and can only be focused using the hotkey. It is slightly transparent when not active.

F1 for focus; F4 to close

This is what the overlay looks like by default.

When text is being synthesized into a .wav file, a small bar below the message will appear in a red color. When text is being spoken, the bar will appear blue.

Here are some examples of spoken text using the default voice installed on Windows 7, Microsoft Anna:

Description of required files

This application uses the following files to properly function:

  • robot.exe
    The application file itself which uses all the following files together.
  • robot.dll
    An additional library file that supplies the hooking functions to temporarily block user input while the application has focus. These functions are required to be in a .dll file rather than the .exe itself since they are hooked globally.
  • voice.vbs
    A VBScript file that converts the given text to a .wav file that is played back afterwards. The text-to-speech generation is performed using this file for two reasons.
    1. The code is very simple as a VBScript file, likely simpler than it would be in C++.
    2. Depending on what type of process is using the Speech API (32-bit vs 64-bit,) different voices may be installed. Usually using an 32-bit process will result in larger choices of voices. So changing cscript.exe to an 32-bit version or 64-bit version may result in a different selection of voices.
  • cscript.exe
    The executable that runs the voice.vbs file. This is included on Windows by default. It can be set to a 32-bit process by running "C:\Windows\SysWOW64\cscript" instead.
  • sox.exe
    An audio processing application that plays the audio to a device and can optionally apply filters. A list of available filters can be found on the SoX documentation page, under "Effects".

Robot.exe will take the text the user inputs, pass it to voice.vbs to generate a .wav file, and then pass the .wav file to sox.exe for playback and filtering.

Usage and settings

Some usage information and options can be displayed by running robot.exe --help. For the sake of brevity, this does not show full descriptions of all available options. Instead, they are described in more detail on this page.

Furthermore, there is no user interface for editing the settings, nor a file to save the settings to. Instead, users can create a batch file that executes robot.exe with any command line options. For example:

robot.exe ^ --voice "Microsoft Anna" ^ --rate 0 ^ --device "<your-audio-device-here>" ^ --exe-cscript "C:\Windows\SysWOW64\cscript" ^ --exe-voice "voice.vbs" ^ --exe-sox "sox" ^ -x center -y 80 ^ --filters pitch -400 bass +10

Example fitlers

  • Filters 1
    pitch -400 bass +10
  • Filters 2
    tremolo 1000 80 pitch 400 gain 4
  • Filters 3
    chorus 0.4 0.8 20 0.5 0.10 2 -t pitch +200 bass +10 gain 4
  • Filters 4
    pitch +150 phaser 0.6 0.66 3 0.6 2 -t phaser 0.5 0.8 3 0.6 2 -s highpass 1000 gain 15
  • Filters 5
    chorus 0.4 0.8 20 0.5 0.10 2 -t echo 0.9 0.8 33 0.9 echo 0.7 0.7 10 0.2 echo 0.9 0.2 55 0.5 gain 10 bass +40 gain 13
  • Filters 6
    overdrive 10 echo 0.8 0.8 5 0.7 echo 0.8 0.7 6 0.7 echo 0.8 0.7 10 0.7 echo 0.8 0.7 12 0.7 echo 0.8 0.88 12 0.7 echo 0.8 0.88 30 0.7 echo 0.6 0.6 60 0.7 gain 13