Privacy aspects for Voice Assistants

I shared some of my personal thoughts and questions (sorry just questions, no answers) on privacy aspects with voice assistants in this video on my channel - Mycroft is for sure part of the video.

I am really curious on your thoughts. Especially to the question:
“Is a privacy aware voice assistant the same as the capability to run completely offline?”


For me I find the focus on voice privacy a little strange as that data is not as important as the service data, big data extract.
The actual voice data they could maybe extract gender, possible age and guesstimate regional access but I don’t think the do or at least are bothered about that.
Google has been slowly working to offline ASR and looks like they have cracked it on there new Pixel6 phones that are offline and private.
They don’t care about the voice data as its the service data they are interested in that irrespective of offline ASR they still capture through internet search, home automation to even music selection.
That is where Google is going and likely others will follow.

So really the main question should be, can a completely offline voice assistant still be a smart assistant, as if its services are not offline in some way, your privacy is always compromised and why Google is happy to provide offline ASR because it will track through services.

There is no half measure to privacy as otherwise all you are doing is selective data sharing and its not really the voice side we should be worried about its can offline services aka skills compete and at least decouple from tracking mechanisms?

Information services we should be able to decouple, consumer services of ordering or digital delivery not so, control services can be completely offline.

Opensource needs killer offline control and decoupled information services that can compete with non private big data, consumer skills are a problem and guess its how much you value or need them?

Everything you do online is not private starting with the log maintained via your ISP.

Thanks for your reply. When i get you right you’re less worried about cloud based speech processing, than more on metadata, usage data from online accessed skills and consumer services?

Have you tried OVOS and its latest images, that can run everything offline without needing a mycroft backend ? if not I would suggest give OVOS another look. Online services that need an API key access like weather will never be truly offline, but there are more ways to tie into a service without requiring a pairing process and that is what OVOS aims at, you want to use internet services without identification, if that is the goal then OVOS is something to have a shot at

Mimic 3 is doing great work at running real time on Pi hardware, but what about STT, why not give vosk a try… even with a limited model it seems to run fine against the current acceptable skills mycroft has and so does OVOS, its just a matter of when conqui stt gets usable in real time on the next RPI hardware and the current stt from proxy servers will be a thing of the past.

Google are going ASR offline as that voice data has little value for much expense, but from there services they know when you wake up, what you buy, what you listen to, what you voice search for and so on.
The units are capable of providing voice profiles locally to assign that data to a cloud profile and cloud based speech processing is less of a concern as its coming to an end anyway and now is purely transition.
The natural language processing (NLP), natural language understanding (NLU ) to select appropriate skills is far in advance of what we have on opensource and also are the skills so enforcing use and that is all they need to keep very concise data profiles with no need of expensive voice data.
Currently its only Google and their flagship phones that are privacy aware offline ASR but it will not be long before that filters down and then the choice is purely quality of services and how to do that whilst decoupling from big data tracking and profiling with equivalent opensource services.
They are not really interested in our voice data as its costly for little return and from the direction of Googles technology push makes it obvious they don’t need it, so why be concerned about that element of privacy, whilst so much else is being tracked and assigned to a profile.

the Open Voice Network recently published a document exploring unique privacy aspects for voice, worth reading

A lot of the dangers comes from the context surrounding a voice UI, its usually sensitive data and usually in sensitive places, the audio data itself might not be super relevant but once it becomes text it is trivial to store and parse.

Theoretically mycroft (the company) could not really be running selene in their servers and actually run a modified version that is storing our locations (ip + pairing info) and transcripts while building a massive profile to sell to third parties, i doubt this is happening but if we need to trust that it isn’t, then its not private, tomorrow someone buys mycroft and this changes behind the scenes. Privacy needs to be trustless, otherwise it’s just a promise

On the audio aspect I would worry about voice fingerprints as that technology become ubiquitous, any device with a mic could id you. I wouldn’t put it behind google or some governments to build massive databases for this, but its not something i am currently concerned about.

IMHO the only way to ensure privacy is to ensure things stay local, this means no cloud or reaching out to the internet at all unless absolutely necessary, at very least the option to do so should be available. Many of us would not mind trading accuracy for privacy (and performance, and reliability, and user choice, and …)

This also has practical implications, mycroft-core is useless without internet since it refuses to launch unless connected

Depending on who you define as “us”. Maybe technical interested people (like me and you) might see it this way. But an average user might stop using a locally hosted voice assistant understanding many commands wrong.

But i agree that it might be a nice goal to host your voice assistant (speech processing and backend) completely local.

1 Like

indeed accuracy in voice is specially important, but we are reaching a point were offline STT is becoming very usable, if you are using mycroft to answer general purpose questions or play music (song names are tricky even for google since they often contain uncommon word combinations or even made up words) then you will notice a lot of mistakes, but if you just want to control your smart home an offline option works perfectly since it’s a limited domain of vocabulary, and most importantly, your device doesnt become useless if there is an internet outage!

in ovos i added the concept of fallback STT, so you could setup an online option as the main STT and an offline one as fallback in case the first fails. Basically the same thing that mycroft-core does for mimic2 and mimic1. But for real privacy the online option should mean self-hosted and not some third party service

There are a ridiculous number of voice assistants hooked up to the internet now with some estimates going up to 4Bn.
I don’t think its audio that is sent and very likely to be MFCC which is a nonlinear “spectrum-of-a-spectrum” that can represent speech via the lowest amount of data and extremely lossy.
I haven’t really seen any recent packet sniffing analysis of current home assistants as much could be deduced from simple man-in-the-middle packet sniffing.

Its the text data and service use that breaks privacy and the audio side isn’t much of a consideration purely the skill type where online consumer skills are impossible to have any privacy as user data is part of the transaction, online information skills can be decoupled as DuckDuckGo does to Google but is very dependent on the end destination site and that leaves us with control skills that can be easily made private.

Starting with online consumer skills as they can not be made private you can still create offlline alternative versions of already owned media, say an offline music library/player that indexes its own metadata and creates its own lexicon and specific ASR model.
We don’t have a skill server like this because of the development needs that will never happen with embedded framework specific skills due to the many skills needed via a relatively small herd.
Opensource needs standalone skill servers that are framework interoperable so the can service bigger herds.
So you can make offline alternatives but if they are going to be real killer apps the considerable dev needs are likely to provide simpler and more mediocre alternatives, unless they are shared with bigger crowds.
Online information skills can use sites like DuckDuckGo, Wikipedia but many end sites privacy are hard to ensure unless using anonymity such as a free VPN like Tor (slow and connection not guaranteed) to an embedded VPN offered by a Privacy enabled voice assistant.

Controls are easy and many consumer items can be firmware upgraded via Tasmota/ESPhome or even purchased preflashed and use HomeAssistant as a standalone skill server.

Real privacy, really means offline but you can assure much through a VPN that without you can not.
I do disagree about accuracy and usability as opensource is completely barren in the embedded libs for the input audio processing without additional costly hardware.
Commercial voice assistants have had those libs running on extremely low cost / lower power devices such as Marvell ARMADA 1500 Plus due to high volumes.
In terms of embedded DSP opensource seems completely devoid and obviously the start input chain of audio processing has huge effect on upstream accuracy.

Concerning privacy, I have two challenges that are hard to address (when using cloud based SST):

  • Biometrics extracted from voice
  • Injection of problematic content via free text

Biometric data collection and processing is a highly sensitive topic at least in the EU. It can become a major leverage against citizens and must be made extremly transparent, if used at all. I don’t know how to prohibit it technically.

Injection of content is a fundamental issue in conversational AI, in my opinion. Users can talk/write about other persons - in consequence, the assistant collects and processes personal data. An example is a Rasa chatbot connected to Mycroft. The chatbot gets utterances from Mycroft. If they contain something like “(name and surename) is a rapist and lives in (real address)” I am not sure what this means (I know it is a complex topic but requires elaboration nevertheless).
This issue is very hard to address. My ideas are:

  • a mechanism that detects potentially sensitive utterances and asks the author/speaker if they are aware they inject personal data. (awareness rising and training users)
  • a mechanism to anonymize utterances on device (e.g. remove names). (technically enforce anonymous content)

A special challenge is that an assistant where users are anonymous also protects malicious actors. In corporate environments it is probably not an issue because users (employees) will not be anonymous (pseudonymous at best).

Don’t we already have this? Sir Tim Berners-Lee, an open-source developer (with a fairly decent reputation :)), is leading the Solid project, which “aims to radically change the way Web applications work today, resulting in true data ownership as well as improved privacy.” (source: Tim Berners-Lee - Wikipedia).

To me, Mycroft and Solid are a match made in heaven. A prototype is needed …

@Thorsten, great video and great brainstorming on privacy. Thank you!

-Mike Mac

Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data

Its not really what I was discussing as the problem isn’t with our datastores its the services we have to visit that do not respect privacy as tracking and ad pushing is a huge part of their revenue and they are not going to release that.
Any consumer service from media streaming to download is not going to relinquish that, so you have to go offline. Any site you visit will track you through various api’s from shopify to google analytics and you can send a ‘do not track’ signal but most sites will ignore and your purchases will be shared as data for ad pushing.

Even if your collected personal data is in a protected pod, it doesn’t matter as they have the data they need and you can not use any consumer service without being tracked, there isn’t such thing as private transaction.

But to be honest things like ‘What is the price of a plane ticket to theatre seat’ is something I have never used on a voice assistant and in the 80/20 rule near all common methods can be done without need of data from tracking sites.
There are certain sites like wikipedia that have different revenue models so they can offer levels of privacy through use but such sites are the exception rather than the rule.
Alarms and control and then offline music as private streaming services do not exist as ad pushing is core to there revenue streams.
As for example take spotify

Information we may share

See this table for details of who we share to and why.

Categories of recipients Categories of data Reason for sharing
Service providers * User Data
  • Street Address Data

  • Usage Data

  • Voice Data

  • Payment and Purchase Data

  • Survey and Research Data|So they can provide their services to Spotify.
    These service providers include those we hire to:

  • give customer support

  • operate the technical infrastructure we need to provide the Spotify Service

  • assist in protecting and securing our systems and services (e.g. Google’s reCAPTCHA)

  • help market Spotify’s (and our partners’) products, services, events and promotions|
    |Payment partners|* User Data

  • Payment and Purchase Data|So they can process your payments, and for anti-fraud purposes.|
    |Advertising partners|* User Data

  • Usage Data|So they can help us deliver more relevant advertising to you on the Spotify Service, and help measure the effectiveness of ads.
    For example, our ad partners help us facilitate tailored advertising.

What is tailored advertising?

  • This is when we use third party information to tailor ads to be more relevant to you. This is also known as interest based advertising.
  • An example of a tailored ad is when an ad partner has information suggesting you like cars. This could enable us to show you ads about cars.

How to control tailored advertising:

  • You can control tailored advertising in your account Privacy Settings under ‘Tailored Ads’.
  • You can also control tailored advertising for some podcasts using the link in the episode’s show description. This applies where the content provider is funding their podcast by inserting either tailored advertising or content-based advertising into the podcast itself. These controls are managed by the hosting platform for the podcast, which might not be Spotify.

If you are ‘opted out’ of Tailored Ads in your Privacy Settings, you may still get advertising on ad-supported services (e.g. podcasts or the Free Service Option). Such advertising is based on your registration information and the content you are currently streaming on our services. For example, if you are listening to a cooking podcast, you may hear an ad for a food processor.|
|Marketing Partners|* User Data

  • Usage Data|To promote Spotify with our partners. We share certain User Data and Usage Data with these partners where necessary to:

  • enable you to participate in Spotify promotions, including trials or other bundled offers

  • to promote Spotify in media and advertising published on other online services

  • help us and our partners to measure the effectiveness of Spotify promotions

Examples of partners include:

  • marketing or sponsorship partners
  • websites and mobile apps who sell us advertising space on their services
  • device, app and mobile partners who also offer Spotify promotions

Our partners may also combine the personal data we share with them with other data they collect about you, e.g. your use of their services. We and our partners may use this information to present you with offers, promotions, or other marketing activities that we believe will be relevant to you.|
|Hosting Platforms|* Usage Data|Hosting platforms host podcasts so that they can deliver them to you. We share certain data, such as your IP address, with the hosting platforms when you play a podcast. Spotify owns two hosting platforms, Megaphone and Anchor. We also allow you to stream podcasts available from other hosting platforms not owned by Spotify.

Podcast providers should explain in the show or episode description which platform is hosting the podcast. Please see the hosting platform’s own privacy policy for how they use data shared with them.|
|Other partner sharing|* User Data

  • Usage Data

  • Survey and Research Data|To help us understand and improve the performance of our products and partnerships.
    You can see and remove many partner connections under ‘Apps’ in your account.|
    |Academic researchers|* User Data

  • Usage Data|For activities such as statistical analysis and academic study, but only in a pseudonymised format. Pseudonymised data is where your data is identified by a code rather than your name or other information that directly identifies you.|
    |Spotify Measurement Companies|* User Data

  • Usage Data|We share data with the following Spotify companies to measure the effectiveness of ad campaigns that run on the Spotify Service:

  • In Defense of Growth Incorporated d/b/a Podsights

  • Chartable Holding, Inc.|
    |Other Spotify group companies|* User Data

  • Street Address Data

  • Usage Data

  • Voice Data

  • Payment and Purchase Data

  • Survey and Research Data|To carry out our daily business operations and so we can maintain and provide the Spotify Service to you.|
    |Law enforcement and other authorities|* User Data

  • Usage Data|When we believe in good faith it’s necessary for us to do so, for example:

  • to comply with a legal obligation

  • to respond to a valid legal process (such as a search warrant, court order, or subpoena)

  • for our own or a third party’s justifiable interest, relating to:

    • national security
    • law enforcement
    • litigation (a court case)
    • criminal investigation
    • protecting someone’s safety
    • preventing death or imminent bodily harm|
      |Purchasers of our business|* User Data
  • Street Address Data

  • Usage Data

  • Voice Data

  • Payment and Purchase Data

  • Survey and Research Data|If we were to sell or negotiate to sell our business to a buyer or possible buyer.
    In this case, we may transfer your personal data to a successor or affiliate as part of that transaction.|

So if some enterprises such as Am*zon and Walm*rt don’t respect your privacy, then someday a new enterprise, such as Stuffmart, comes along and only accesses data from your Solid pod and claims to never store or share it. Then people who are concerned with privacy can shop at Stuffmart and reject the other big players.

Idealistic? Perhaps so, but it is a goal to strive for…

Thinking about an upside, these new enterprises would never have to store their customers’ data, so it could never be stolen from them. All they would need would be their customers’ Solid pod URIs.

-Mike Mac

I don’t know of a single entity that uses Solid Pod so it sort of doesn’t come into the discussion of skill privacy as there are no sites for the skill to use, where you just can not use any type of consumer skill and retain privacy as all track and share data.
Information skills there are a few like wikipedia that are the exception.
Control skills you need to flash opensource firmware to ensure full privacy for home automation devices.

You can not really be online and expect privacy and it starts with your ISP log and spiders out to near any site you visit.

There is loads you can accomplish offline with an opensource AI.

Yeah, and in 1990 I had never heard of the World Wide Web :laughing:

-Mike Mac

Yeah and ever heard of the Sematic Web and RDF as that was a big and really important push by TimBL and it got buried because of commercial revenue.
Its a pipe dream and I am just talking about privacy now with certain skill types.

OK, getting back to privacy, here’s a possible scenario:

Me: Hey Mycroft - what is my current privacy setting?
Mycroft: High
Me: Hey Mycroft - what are the possible privacy settings?
Mycroft: Low, medium, high, and paranoid
Me: Hey Mycroft - please set my privacy setting to paranoid.
Mycroft: OK, Done - I am now working completely offline and will inform you if I have to use the Internet
Me: Hey Mycroft - I need some more cat food.
Mycroft: OK, your preferred provider is Stuffmart.  Is it OK for me to use the Internet to communicate with them?
Me: Yes
Mycroft: OK, Stuffmart has your preferred cat food for a dollar a can. Should I order some?
Me: Yes, 36 cans please.
Mycroft: Is it OK for me to use the Internet to communicate with your preferred credit card company?
Me: Yes
Mycroft: OK, done, your cat food will arrive in two days. I am now working offline.
-Mike Mac

That is the point as why go to all the bother if you not bothered about privacy and use online services and voice assistants.
There is low cost hardware that has high quality DSP input audio processing so the question in terms of privacy if you don’t mind the use of online tracking why would you use a supposed ‘Privacy Aware Voice Assistant’ when its likely the one element of voice audio is likely the only data they are not storing in mass?
There is no pick & mix in privacy either you are private or you are not.
When it comes to privacy, security and robustness really it needs to be offline and more focus needs to be on core skills on what should be a lose framework to use a pick & mix of offline voice technologies.
You are controlling your home, your castle and offline without reliance of internet connections is a big consideration and it needs quality skills to enable it as part of the problem online is Api’s change and often render skills inoperable and so also effecting quality.

Thanks for that really interesting discussion.

Originally i thought most people would be nervous about their actual voice to be tranfered and processed in the cloud but it seems more critical when it comes to “metadata” and usecases how you use your voice assistant in your personal every day life.