trillion operations per second.
Used desktop, $50 (lenovo m73). Nvidia 1030, $75 (x2). This runs mycroft, local copy of wikipedia, and mimic2 easily. I actually have 2 1030’s, so it can do kaldi or deepspeech as well. Kaldi works better than DS right now, but it’s a bit slower. Performance for everything but STT is quite reasonable. Mimic and Deepspeech are both working towards lowering the requirements necessary for usefulness as well. *edited to add, also runs the personal backend and front end bits easily.
I have a Neural compute stick; their limited model compatibility makes them moot for use on mycroft currently.
Im loving the idea of my croft local server… I have 2 recently decommissioned ProLiant and a few quadro cards lying around. I would be happy to lend them to this project if it proceeds… Keep me posted.
New to Mycroft - but VERY excited about the project and personal server. Like others, I run a DL380 G7 @ home with vmware. I also have a fairly powerful linux desktop with 32g ram, 9 core AMD processor, and lower powered GPU. Looking forward to learning and growing with you guys!
A quick and dirty TTS backend for local mimic2 instances: https://github.com/el-tocino/localcroft
This one basically just pulls the WAV from mimic and returns it, skipping visimes and some other (useful) safety checks, but it does work locally. It keeps local copies of responses, which speeds up repeat responses quite a bit. This will work…poorly…with very long responses, odd numbers, etc. Also be aware that the first response from a newly-started mimic2 demo_server instance will probably trip the request time out. Subsequent responses should work well. After restarting, load the web page and verify so you can avoid this.
If you’re running a local copy of Deepspeech, this can be pointed to your own instance already using local config*. I am not yet enamored of this option, though I’m going to try making a custom language model to try and make it work better for me. This repo is the script I use to start that up, adjust to your local settings as needed: https://github.com/el-tocino/DSSS
Between these, you’re 95% of the way to running local. The remainder is currently left as an exercise for the reader, and will get addressed at some point in the future by the core folks. Also skills will still do their thing, of course.
Can you give a hint on machine requirements (cpu, ram, etc.) for the deepspeech-server?
Is it feasible for desktop-cpus or should i stay away when i don’t have a gpu?
What is necessary to have for example a 1:1 performance (1 second of speech requires 1 second of deepspeech processing)?
You’d want a gpu for close to real time processing. Doesn’t need to be a massive one, but speed costs, how fast can you afford to go? I use an nvidia 1030, doesn’t seem to be a big delay with it.
anything works for me, i used flaskex as template, but nothing against replacing it, just wanted something quick to enable pairing so i could develop the backend
try running/checking the examples in the “examples” folder
it is packaged in a way you can just pip install it, those commands are to launch the server inside a python script, eventually ill make it so you can just run the module, for now just write that in any python file and run it
no need for separate vms either, you can run them together in a server, or like me, in picroft itself. but separate machines are an option if you want that
ok seems i have it running now. i can start the mycroft cli and when i ask to tell me a joke i do get a (bad) joke as a reply. i do see some errors in the logs… Will have a look at them.
Im installing a GUI on my server now ( ubuntu 18 server with no desktop) so i can have a look at the GUI. Maybe then i can have a go at helping you out with the frontend.
I’m completely new here but looking to help. I am excited to see the great work already underway on a personal home server. I have a local copy of deepspeech running with the model available from the deepspeech website. I’ve run with a couple of hardware configs including one with a Titan V but my STT accuracy is not at all usable - and considerably frustrating. Switching to the mycroft deepspeech backend per PR-1503 provides much improved accuracy. I would like to help with accuracy of the local config. does anyone else have accuracy problems with the deepspeech STT? does anyone know for sure that it is the accuracy of the server model or if the mycroft backend uses add’l analytics to improve upon the STT conversion from DeepSpeech? I suspect both a better model trained on the tagging done via the mycroft website and perhaps additional processing. My investigations continue. Will post back if I discover anything of value.
I made a minor update for the local Deepspeech STT. https://github.com/el-tocino/localcroft
Not perfect and can probably use an update to directly pass the file handle instead of writing out to ramdisk. It works slightly better on my picroft to local DS instance than the default method. Check the saved wav to see if it sounds like what you expect. The trim bits are there to remove the start_listening noise that was in front of my audio clips; your instance may vary. Check the /ramdisk/ds_audio.wav file to see if it has residual noise in front.
Bumping this as the personal backend repo is up for testing. I was able to get it running and connect a mycroft instance to it. In conjunction with local deepspeech and mimic2 instances, it works completely locally (somewhat…deepspeech isn’t great for me still).
If you do run it anywhere it can touch the internet, please be sure to use ssl. If you need a cert, look into let’s encrypt, they will issue no-cost certs. You have to renew every three months, but that can also be scripted if you search around.