Working with Ryan, a comp sci student, we got whisper running on a RasPi 5 and an Nvidia Jetson Orin Nano.
The RasPi (a “spoke”) translates the JFK words in 2.5 seconds, while after much digging to get the Nvidia GPU to be used (the “hub”), got it to be twice as fast.
lsenv
...
STT speeds:
Hub:
executing cmd: curl http://kinglion:5002/stt -s -H Content-Type: audio/wav --data-binary @/home/pi/minimy/jfk.wav
{"text":"and so my fellow americans ask not what your country can do for you ask what you can do for your country."}
real 0m1.245s
user 0m0.128s
sys 0m0.016s
Spoke:
executing cmd: curl http://localhost:5002/stt -s -H Content-Type: audio/wav --data-binary @/home/pi/minimy/jfk.wav
{"text":"and so my fellow americans ask not what your country can do for you ask what you can do for your country"}
real 0m2.539s
user 0m0.039s
sys 0m0.000s
The model we are working toward is to have a fast “hub” machine do the STT, but if it is down, the “spoke” can still do so locally.