Google Coral Edge TPU

StuartIanNaylor · February 19, 2020, 11:08am

Anyone tested how much a boost a Google Coral Edge TPU gives a Pi4 with Picroft?
Deepspeech would benefit but FANN used in Padatious doesn’t?!?
Surprised a tensorflow-lite neural net hasn’t preference over FANN.
Precise is the same I guess.

I name drop the Google product as likely it will have much support, just wondered if someone had tried.
I was just wondering with the diversification of load that generally runs on Mycroft could each mic input trigger an instance.
That would really throw a cat amongst the Alexa/Google pigeons…

Dominik · February 19, 2020, 11:38am

No need for a TPU/GPU: “DeepSpeech v0.6 with TensorFlow Lite runs faster than real time on a single core of a Raspberry Pi 4”.

Inference performance is not the problem here, but word error rate (WER).
Even though current DeepSpeech model has WER of 7.5% in tests, this is still too much for real world application - unless you want to repeat your phrases multiple times until you trigger the correct intent…

StuartIanNaylor · February 19, 2020, 12:08pm

Is that with training or just the pre-trained models?

I should just use my USB headset and give it a go, but waiting for mic and speaker.
I have had a look at a few vids and often there is quite a delay maybe it was a Pi3?
I was presuming it was wake → STT → Intent Parser → TTS that apart from network latency you could accelerate the whole process.

Are there any vids showing it in action on a Pi 4?

Dominik · February 19, 2020, 1:25pm

Another quote from the article I have linked in my previous post: “It achieves a 7.5% word error rate on the LibriSpeech test clean benchmark” - this relates to the pre-trained model for DeepSpeech 0.6. LibriSpeech is not “real world talk” but based on public domain audio books.

The overall delay you observe on the Pi3 has several factors, one being the network latency for STT (with cloud service). Another factor is that currently this is not a streaming STT, so Mycroft waits for silence to detect the end of the spoken phrase. In noisy environments this may be not possible so that it cuts off after the maximum recording time (10 seconds if i remember correctly). So in worst case it takes up to 10 seconds after wake-word before the actual STT is started.

For DeepSpeech there is a streaming API but this only shifts the problem of “phrase ending detection” to the DeepSpeech side…

And yes, a beefier CPU like the Pi4 will speed up the intent parsing.

StuartIanNaylor · February 19, 2020, 2:48pm

I was looking at the roadmap and just getting up to speed with Mycroft.

Being a noob my assumptions are relatively blind but it would seem generally there is a shift to neural network technologies for key Mycroft components.
I was just interested if anyone had tried tensorflow with the Pi now its has an alternative to GPU based offering.

" Identify hardware targets and create recommendations list

As rapidly evolving technologies, the requirements to run full STT and TTS services is a moving target. But recognizing the demands helps shape the system, too, so is valuable to look at early. Examine options such as:

Hosted STT/TTS hardware recommendations and limitations, e.g. “1 GTX 1080 can handle X Mycroft units” or “1 FPGA (brand and type need to be specced) can handle Y Mycroft units”
Look at lower-power options such as STT/TTS passthru with account randomization"

It was just curiosity but I did wonder “1 Coral Edge TPU can handle X Mycroft units” with the Pi4 being able to steam to multiple dumb WiFi speaker/mics.
They are still pretty pricey in comparison to the cost of a Pi4 but $75 gives 4 TOPs and being USB the Pi4 might be able to handle x2.
But my thoughts where along the lines of could a Pi4 & TPU handle 4 or more Mycroft units as its likely they will drop in cost and not far out of line of multiple Google/Amazon offerings in overall price.

That so much is in the neural network domain that even parsing text semantics from webpages could also be accelerated greatly.
Like this article as a semantic aware webcrawler would be pretty damn awesome https://medium.com/teleporthq-io/understanding-the-web-parsing-web-pages-semantically-805ef857854d

With the roadmap and current neural net libs chosen as I noob I am head scratching to why not tensorcore-light for all?

Also has anyone played with GPU/TPU acceleration? As just extremely curious how much load can the TPU handle and what resultant load is on the Pi4.
You can still run CPU based tensorcore-light but at an instant add a TPU and gain much accelaration…

There are quite a few now on the market but the Coral edge just seems to be getting the most Pi focus.

Dominik · February 19, 2020, 7:49pm

From what I read here in the Mozilla-forums currently DeepSpeech will not run on Coral EdgeTPU as it does not support full TensorflowLite function set and DeepSpeechs TfLite model requires some functions that are not supported.

Regarding TTS inference performance: Mozilla TTS Tacotron + GriffinLim vocoder, which has rather low quality, is 2-3x realtime on a GTX1080Ti (6 seconds audio take 2-3 seconds for inference). Tacotron2 and/or higher quality vocoders like WaveRNN or WaveGlow are slower…

Another issue with the smaller Edge computing devices like Coral TPU or Jetson Nano is the rather small amount of available RAM that limits size of model that can be loaded. Therefore usually these devices can only run one model at a time, so there is no parallel STT and TTS. (Loading a DeepSpeech model on my Jetson Nano can take up to a minute).

StuartIanNaylor · February 19, 2020, 8:16pm

Dunno not sure about TfLite as think they are awaiting replies.

github.com/mozilla/DeepSpeech

Feature request: Full integer quantization for tflite: Coral edge TPU compatibility

opened 06:27AM - 01 Sep 19 UTC

jacobjennings

For support and discussions, please use our [Discourse forums](https://discourse….mozilla.org/c/deep-speech). If you've found a bug, or have a feature request, then please create an issue with the following information: - **Have I written custom code (as opposed to running examples on an unmodified clone of the repository)**: No - **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: Kubuntu 18.04 - **TensorFlow installed from (our builds, or upstream TensorFlow)**: N/A - **TensorFlow version (use command below)**: - **Python version**: - **Bazel version (if compiling from source)**: - **GCC/Compiler version (if compiling from source)**: - **CUDA/cuDNN version**: - **GPU model and memory**: - **Exact command to reproduce**: Instructions from https://coral.withgoogle.com/docs/edgetpu/compiler/ Download pretrained model for DeepSpeech 0.5.1 edgetpu_compiler output_graph.tflite Feature request I picked up the Coral USB ML accelerator which can run inference on tflite models with additional restrictions: https://coral.withgoogle.com/products/accelerator https://coral.withgoogle.com/docs/edgetpu/models-intro/ "Note: Starting with our July 2019 release (v12 of the Edge TPU runtime), the Edge TPU supports models built with TensorFlow's post-training quantization, but only when using full integer quantization (you must use the TensorFlow 1.15 "nightly" build and set both the input and output type to uint8). Previously, we supported only quantization-aware training, which uses "fake" quantization nodes to simulate the effect of 8-bit values during training. So although you now have the option to use post-training quantization, keep in mind that quantization-aware training generally results in a higher accuracy model because it makes the model more tolerant of lower precision values." Include any logs or source code that would be helpful to diagnose the problem. For larger logs, link to a Gist, not a screenshot. If including tracebacks, please include the full traceback. Try to provide a reproducible test case. deepspeech/deepspeech-0.5.1-models$ edgetpu_compiler output_graph.tflite Edge TPU Compiler version 2.0.258810407 INFO: Initialized TensorFlow Lite runtime. Invalid model: output_graph.tflite Model not quantized

Dunno haven’t a clue about ram as with 2 KB RAM presumed the models where not TPU based.
We talking language model? As not sure how the 46MB or whatever it is resides.

[EDIT]
They seem to use Discourse more frequently, so looking like your right.
https://discourse.mozilla.org/t/edgetpu-board-support/53994/2

baconator · February 19, 2020, 8:52pm

Coral support is limited. I would not pick one up at this point. Wait for a more generally usable version or updates to current software to take advantage of its capabilities.

As for the N GPUs are able to X mycroft units, that’s not a very clear metric, either. There’s significantly more elements to make that a coherent equation.

StuartIanNaylor · February 20, 2020, 12:02am

Yeah it seems it works ok with the object identification demo.
Its my noobness as I thought tensorcore-lite support would mean it would support tensor-lite?!?

e.g. “1 GTX 1080 can handle X Mycroft units” or “1 FPGA (brand and type need to be specced) can handle Y Mycroft units”

Was from the Mycroft roadmap and only quoted in a similar vain of curiosity that X or Y of a brand and type could be tested.

I always struggle getting through any ML documentation as always seems extremely longwinded and painful.
I was hoping it was going to be purchase and install with tensorflow-lite which I keep calling tensorcore, but doesn’t matter its currently a cul-de-sac.
Apols for all the questions but just getting my bearings and direction.

baconator · February 20, 2020, 2:30am

Lots of marketing, lots of tech, shove into a casing and you have the sausage that is modern “AI”.

StuartIanNaylor · February 20, 2020, 4:05am

Not so sure about the sausages in the casing, from personal experience seems to be us outside the case trying to develop.
There is a disconnect in the knowledge hireachy of ML, but up at the top close to the source get results.
I played around with Unity and their OpenML stuff and the samples worked perfect, but my usual approach to hacking (what I call programming) was a pointless disaster.
The model collection and hyperparameters seems more like an arcane art rather than the simple brute force of it works, it doesn’t check error msg.

TFL and GPU seem to be currently x86_64 only where the Arm/Raspberry version is native client_client only

I read enough of lissyx posts that if they are failing to compile then I am not even going to bother,

hobbyist · May 8, 2020, 1:44pm

Hello can I get faster CNN training time by using Google Coral dev rather than PYNQ-Z1? Can I get faster CNN training time by Google Coral dev comparing to Jetson nano? Has anyone who use it, give an advice?

StuartIanNaylor · May 8, 2020, 4:15pm

Yes but many only offer a subset of tensorflow compatibillity.
The Coral works with the Google image project but haven’t seen another project for it.
They are coming down in price but how restrictive there subset is still means unless you know of a working project prob don’t bother.

hobbyist · May 8, 2020, 4:24pm

“The Coral works with the Google image project but haven’t seen another project for it.”

What do you mean? If I buy it and run custom build tensorflow CNN code, won’t it work??

StuartIanNaylor · May 8, 2020, 4:25pm

Prob not as they offer a subset of tensorflow lite and have you seen another project for it?
Or any mention anywhere of another project that supports it?
If you can rewrite using its specifics then yes, but no one seems to.

Deepspeech might be now they are using 1.15 but prob not as likely.

hobbyist · May 11, 2020, 9:12am

Is this better than the Google Coral Dev?

What I mean, is: Is it more flexible as far as the ML programs I can use, is concerned?

StuartIanNaylor · May 11, 2020, 1:02pm

No as they are all very similar with different compatibility issues that you will have to research yourself.
The google coral is as good as any and you could take the image kit and feed with Mel-Frequency Cepstral Coefficients, or MFCCs
Basically voice images and the standard image classification with that input should work.
I think google where/are in the process of improving whats available not sure what the state of play is.

Asus say they are going to ‘support’ but if its any better or worse than the Pi offering via Google image dunno.

Just don’t expect to grab Deepspeech compile and fly as Deepspeech even runs a fork of tensorflow 1.5 that I have no idea what stage with accelerators such as that are.

You can try but think its best to say that actual compatibility and whats available might be big constraints.

I want one but think it prob might be a dissapointment in what I can run.

The best overall compat are the new Nvidia RTX cards and after that its all down hill with earlier cards often needing earlier versions of tensorflow as performance is badly effected.
My graphics card is a mweh GTX780 pretty old now and don’t even bother trying to use it.

Deepspeech prob would benefit from an accelerator if it would work as on a Pi at least its single thread.

baconator · May 11, 2020, 2:54pm

That’s literally a google coral tpu plus an SBC, so a direct competitor from a different vendor.

Other than a Jetson TX2 or Xavier board from nvidia, there’s not much in the sbc space that’s viable for anything but customer or very specific ML work yet. If you’re looking to train, get an add in board for a desktop or go the cloud route. SBC’s are inference boards.

hobbyist · May 24, 2020, 6:17pm

The NPU that this https://tinker-board.asus.com/prod_tinker-edge-r.html contains, has anything to do with TPU? Is it faster than Google Coral Dev/Asus Tinker edge T ?

Dominik · May 24, 2020, 7:25pm

TPU = Tensor Processing Unit - this can be seen as a GPU that is specialized/optimized for tensor operations (vector and matrix multiplication)

NPU = Neural Processing Unit (based on FPGA) where you can load the model directly to the processing unit. This may give excellent performance but as a drawback programming is quite complicated. Most available model/algorithms are for visual processing (object detection) so this would not be my choice when it comes to speech recognition.

In absolute numbers: Rockchip RK3399pro+NPU is up to 2.4 TOPS, Google Coral TPU is rated up to 4TOPS.