Voice Dictation via Voxtype

I use Voxtype on Debian for local dictation in terminal and GUI applications. This post documents the setup I currently run.

§Core config

My main configuration lives at ~/.config/voxtype/config.toml.

After trying several engines, SenseVoice (sensevoice in the config file) is my default for day-to-day dictation. It is accurate enough for general writing, and latency is much lower than Whisper on my machine.

Engines and models I tested:

Whisper: large-v3-turbo
Moonshine: base
Parakeet: parakeet-tdt-0.6b-v3
SenseVoice: small-fp32 (current choice)

Core settings:

engine = "sensevoice"

[hotkey]
key = "F13"
modifiers = []
mode = "toggle"
enabled = true

[sensevoice]
model = "small-fp32"

[moonshine]
model = "base"

[parakeet]
model = "parakeet-tdt-0.6b-v3"

[whisper]
model = "large-v3-turbo"
language = ["en", "zh"]
translate = false

[output]
mode = "paste"
# due to keyd remapping; should have been ctrl+shift+v or ctrl+v
paste_keys = "super+rightshift+c"
fallback_to_clipboard = true

I leave the other engine sections in the configuration file as-is. The top-level engine = "..." setting determines which engine Voxtype actually uses.

§Debian package install

I install the Debian package from the release page.

Release page: https://github.com/peteonrails/voxtype/releases/

At the time of writing, the package file is voxtype_0.6.3-1_amd64.deb.

curl -LO https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype_0.6.3-1_amd64.deb
sudo apt install ./voxtype_0.6.3-1_amd64.deb
voxtype --version
voxtype setup model

§Switching backends (Whisper vs ONNX)

Voxtype backend selection has two layers:

Backend binary (system-level):
- sudo voxtype setup onnx --enable -> switch to ONNX binary
- sudo voxtype setup onnx --disable -> switch back to Whisper binary
Engine selection (config/CLI):
- engine = "sensevoice" / engine = "moonshine" / engine = "whisper"
- or voxtype --engine whisper ...

For optional GPU acceleration:

1	sudo voxtype setup gpu --enable

Check the active backend/GPU with:

1
2
3

voxtype setup onnx --status
voxtype setup gpu --status
ls -l /usr/bin/voxtype

§System integration

I install ydotool on Debian/KDE with Wayland so text pastes correctly after recognition.

1	sudo apt install ydotool

The Debian package already ships a user service (/usr/lib/systemd/user/voxtype.service), so I enable it with:

1 2	systemctl --user daemon-reload systemctl --user enable --now voxtype.service

If I change the configuration later, I restart it with:

1	systemctl --user restart voxtype.service

Check the service status with:

1	systemctl --user status voxtype.service

I remap F6 to F13 with keyd and then bind Voxtype to F13 to avoid conflicts with common application shortcuts.

/etc/keyd/default.conf:

1 2	[main] f6 = f13

Apply the mapping:

1	sudo systemctl restart keyd

I also raise inotify limits to avoid watcher errors on Debian:

/etc/sysctl.conf:

1 2	fs.inotify.max_user_watches=1048576 fs.inotify.max_user_instances=1024

Load the new values:

1	sudo sysctl --system

This setup gives me fast day-to-day dictation, and SenseVoice offers the best balance of latency and accuracy I have found so far.