Voice Dictation via Voxtype
I use Voxtype on Debian for local dictation in terminal and GUI applications. This post documents the setup I currently run.
§Core config
My main configuration lives at ~/.config/voxtype/config.toml.
After trying several engines, SenseVoice (sensevoice in the config file) is my default for day-to-day dictation. It is accurate enough for general writing, and latency is much lower than Whisper on my machine.
Engines and models I tested:
- Whisper:
large-v3-turbo - Moonshine:
base - Parakeet:
parakeet-tdt-0.6b-v3 - SenseVoice:
small-fp32(current choice)
Core settings:
1 | engine = "sensevoice" |
I leave the other engine sections in the configuration file as-is. The top-level engine = "..." setting determines which engine Voxtype actually uses.
§Debian package install
I install the Debian package from the release page.
Release page: https://github.com/peteonrails/voxtype/releases/
At the time of writing, the package file is voxtype_0.6.3-1_amd64.deb.
1 | curl -LO https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype_0.6.3-1_amd64.deb |
§Switching backends (Whisper vs ONNX)
Voxtype backend selection has two layers:
- Backend binary (system-level):
sudo voxtype setup onnx --enable-> switch to ONNX binarysudo voxtype setup onnx --disable-> switch back to Whisper binary
- Engine selection (config/CLI):
engine = "sensevoice"/engine = "moonshine"/engine = "whisper"- or
voxtype --engine whisper ...
For optional GPU acceleration:
1 | sudo voxtype setup gpu --enable |
Check the active backend/GPU with:
1 | voxtype setup onnx --status |
§System integration
I install ydotool on Debian/KDE with Wayland so text pastes correctly after recognition.
1 | sudo apt install ydotool |
The Debian package already ships a user service (/usr/lib/systemd/user/voxtype.service), so I enable it with:
1 | systemctl --user daemon-reload |
If I change the configuration later, I restart it with:
1 | systemctl --user restart voxtype.service |
Check the service status with:
1 | systemctl --user status voxtype.service |
I remap F6 to F13 with keyd and then bind Voxtype to F13 to avoid conflicts with common application shortcuts.
/etc/keyd/default.conf:
1 | [main] |
Apply the mapping:
1 | sudo systemctl restart keyd |
I also raise inotify limits to avoid watcher errors on Debian:
/etc/sysctl.conf:
1 | fs.inotify.max_user_watches=1048576 |
Load the new values:
1 | sudo sysctl --system |
This setup gives me fast day-to-day dictation, and SenseVoice offers the best balance of latency and accuracy I have found so far.