I use Voxtype on Debian for local dictation in terminal and GUI applications. This post documents the setup I currently run.

§Core config

My main configuration lives at ~/.config/voxtype/config.toml.

After trying several engines, SenseVoice (sensevoice in the config file) is my default for day-to-day dictation. It is accurate enough for general writing, and latency is much lower than Whisper on my machine.

Engines and models I tested:

  • Whisper: large-v3-turbo
  • Moonshine: base
  • Parakeet: parakeet-tdt-0.6b-v3
  • SenseVoice: small-fp32 (current choice)

Core settings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
engine = "sensevoice"

[hotkey]
key = "F13"
modifiers = []
mode = "toggle"
enabled = true

[sensevoice]
model = "small-fp32"

[moonshine]
model = "base"

[parakeet]
model = "parakeet-tdt-0.6b-v3"

[whisper]
model = "large-v3-turbo"
language = ["en", "zh"]
translate = false

[output]
mode = "paste"
# due to keyd remapping; should have been ctrl+shift+v or ctrl+v
paste_keys = "super+rightshift+c"
fallback_to_clipboard = true

I leave the other engine sections in the configuration file as-is. The top-level engine = "..." setting determines which engine Voxtype actually uses.

§Debian package install

I install the Debian package from the release page.

Release page: https://github.com/peteonrails/voxtype/releases/

At the time of writing, the package file is voxtype_0.6.3-1_amd64.deb.

1
2
3
4
curl -LO https://github.com/peteonrails/voxtype/releases/download/v0.6.3/voxtype_0.6.3-1_amd64.deb
sudo apt install ./voxtype_0.6.3-1_amd64.deb
voxtype --version
voxtype setup model

§Switching backends (Whisper vs ONNX)

Voxtype backend selection has two layers:

  1. Backend binary (system-level):
    • sudo voxtype setup onnx --enable -> switch to ONNX binary
    • sudo voxtype setup onnx --disable -> switch back to Whisper binary
  2. Engine selection (config/CLI):
    • engine = "sensevoice" / engine = "moonshine" / engine = "whisper"
    • or voxtype --engine whisper ...

For optional GPU acceleration:

1
sudo voxtype setup gpu --enable

Check the active backend/GPU with:

1
2
3
voxtype setup onnx --status
voxtype setup gpu --status
ls -l /usr/bin/voxtype

§System integration

I install ydotool on Debian/KDE with Wayland so text pastes correctly after recognition.

1
sudo apt install ydotool

The Debian package already ships a user service (/usr/lib/systemd/user/voxtype.service), so I enable it with:

1
2
systemctl --user daemon-reload
systemctl --user enable --now voxtype.service

If I change the configuration later, I restart it with:

1
systemctl --user restart voxtype.service

Check the service status with:

1
systemctl --user status voxtype.service

I remap F6 to F13 with keyd and then bind Voxtype to F13 to avoid conflicts with common application shortcuts.

/etc/keyd/default.conf:

1
2
[main]
f6 = f13

Apply the mapping:

1
sudo systemctl restart keyd

I also raise inotify limits to avoid watcher errors on Debian:

/etc/sysctl.conf:

1
2
fs.inotify.max_user_watches=1048576
fs.inotify.max_user_instances=1024

Load the new values:

1
sudo sysctl --system

This setup gives me fast day-to-day dictation, and SenseVoice offers the best balance of latency and accuracy I have found so far.