Hub and spoke model for STT

mike99mac · December 1, 2025, 9:59pm

Working with Ryan, a comp sci student, we got whisper running on a RasPi 5 and an Nvidia Jetson Orin Nano.

The RasPi (a “spoke”) translates the JFK words in 2.5 seconds, while after much digging to get the Nvidia GPU to be used (the “hub”), got it to be twice as fast.

lsenv
...
STT speeds:
Hub:
executing cmd: curl http://kinglion:5002/stt -s -H Content-Type: audio/wav --data-binary @/home/pi/minimy/jfk.wav
{"text":"and so my fellow americans ask not what your country can do for you ask what you can do for your country."}

real    0m1.245s
user    0m0.128s
sys     0m0.016s
Spoke:
executing cmd: curl http://localhost:5002/stt -s -H Content-Type: audio/wav --data-binary @/home/pi/minimy/jfk.wav
{"text":"and so my fellow americans ask not what your country can do for you ask what you can do for your country"}

real    0m2.539s
user    0m0.039s
sys     0m0.000s

The model we are working toward is to have a fast “hub” machine do the STT, but if it is down, the “spoke” can still do so locally.