Licensing a Language: How the copyright system abuses fundamental human rights

Originally published at: Licensing a Language: How the copyright system abuses fundamental human rights - Mycroft

Who owns the language that you speak? The people who created it? The people who use it? The people who document it? No one at all?

It’s a question that came up recently in our Lingua Franca project – our multilingual parsing and formatting library. A contributor had used a website to check their translation and wanted to do the right thing by acknowledging them. I’m glad they did, because we want to acknowledge the work we build upon, and because it was a clear flag that we couldn’t accept their contribution.

Unfortunately that use of a third party source means that at least some of their contribution was potentially not theirs to submit, it belongs to someone else, even though it was their own native language.

You might be asking yourself… WTF?

How can you copyright a language?

The answer, like so much of our legal system is a big pile of grey goop, and each country has their own version. In general, you can’t copyright a language or general use phrases, however in effect that’s exactly what happens.

Let’s take a quick look at the Macquarie Dictionary license. Of particular note:

Except as expressly permitted in Clause 2.1, the Licensee warrants that it will not, nor will it license or permit others to, directly or indirectly, without Macquarie’s prior written consent:

(e) use the Licensed Material to create any derivative work, product or service, or merge the Licensed Material with any other product, database, or service;


Seems fair enough. Clearly I can’t copy their database and create a competing dictionary website. But what constitutes a derivative work in the context of language could be pretty broad, and you can quickly take it to comical extremes. If I use the Macquarie Dictionary to learn English and then create anything using my newly acquired language skills, is that a derivative work? Seems stupid until someone gets sued for it… (actually it would still be stupid).

Even more interesting is a right they do grant and the restriction placed on that:

…Macquarie grants to the Licensee the following non-exclusive rights (the “Rights”) for the Term:

(d) create a hypertext link to any part of the Licensed Material provided that no person other than the Licensee may use such hypertext link.


So technically linking to any part of their site that contains “Licensed Material” would violate the terms of this license agreement.

How can this be legal?

I’m sure their argument would be that they aren’t licensing the language itself, they are licensing their collation and interpretation of it. This is a valid argument. As a company they have put a lot of time, money and effort into producing this content. I think most people would agree that setting up a competing dictionary website that just rips off their content would be unfair and unjust.

What though, does it mean for the broader use of this knowledge?

I haven’t looked at their definition for a triangle, but I’m fairly sure it will be something like “a two-dimensional shape with three sides”, or perhaps “a three-sided two-dimensional shape”. There just aren’t too many different ways you can say what a triangle is. Does this mean they have copyrighted their particular order of half a dozen words? Does this mean I just violated their copyright?

Given I haven’t checked, perhaps this is a case of “Schrodinger’s Copyright”. I may be violating and not violating their copyright at the same time until the state of the system is observed. To play it safe I’m just going to leave that box closed.

The legal, social and cultural ramifications of this are enormous, particularly for Indigenous and First Nations people around the world. What does it mean for a company to own even a subset of your language? How might that affect your fundamental human right to use your own language (particularly in regard to Articles 13, 14 and 16 of the United Nations Declaration on the Rights of Indigenous Peoples)?

There is no way to do justice to this immensely important topic in this post. If you’re interested in learning more, I’d strongly recommend exploring the topic of Indigenous Data Sovereignty.

What does this mean for Mycroft?

At the end of the day, we can’t accept potentially copyrighted material into Mycroft projects no matter how fundamental it is. If you’re ever unsure about what you are submitting, please ask in Mycroft Chat, the Forums, or contact us directly.

It’s the same for any other project crowd-sourcing content whether that is open source or not. If you contribute to OpenStreetMap you can’t just open a Google Maps window and start copying across street names. The street names themselves aren’t owned by Google, but the digital collection of them accessible through Google Maps is.

So if you are submitting translations or other content to a Mycroft project, please do not take these from third party sources. It must be entirely your own generated content, that you are granting a license for Mycroft to use through our Contributor License Agreement.

In this case, you retain ownership over your contribution and hey presto, for better or worse – you own a small slice of human language.

1 Like

I would suggest contacting the Electronic Frontier Foundation (EFF) to see what they have to say. They’ve got a lot of experience with copyright related things.

The problem is a lot of these “copyrights” probably aren’t legally valid.

Copyright generally requires a unique original arrangement.

so you cant copyright anything that has ever been copyrighted before (would be a derivative) or has been put into the public domain.

while you might be able to produce original arrangements that can be valid copyrights, its a lot harder to have a valid copyright on specific small parts.

where the boundary occurs is the grey area

though this varies somewhat by country, technically speaking if anyone anywhere has ever written( or other method of creation) something before they would have the copyright and
between 50 -100 years after their death, varying between countries and with some exceptions, it becomes public domain. So a lot of language can’t be copyrighted as is already public domain.

however proving this can be problematic and the legal arguments this could be expensive.

translation copyrights

in regards to translations themselves they are generally derivative works with the translator potentially being a co-author depending on the “creativity” required for the translation

the co-author copyright would then generally again be lifetime+(50-100)

i doubt any machine translator can directly count as creative, an therefore be a co author and therefore have copyright claim( there “owner/operator” may have a license dependent on what you agreed to when using them)

Third party website translations

in regard to using third parties to check short translations my non legal opinion is

if the website does the translation (you put in your untranslated text) it is probably your copyright but it depends on exactly how the translator works and what you agreed to, so make sure the license of the website is compatible with creative commons ( CC BY or CC BY-SA) or that it states that you retain copyright ownership.

e.g google translate appears to list its licenses here

if you use example sentences from third party websites it is either their copyright or licensed so again make sure the license of the website is compatible with creative commons

if you write the translation entirely yourself Then check using third party website (dictionary etc.) but Make no changes its your copyright (if applicable)

if you write the translation yourself but you check individual words meaning(s) or select individual words by meaning using a third party website its your copyright again as the meaning of words can’t be copyrighted (only the specific wording or examples)

1 Like

Hi @yaomtc - we’re big fans of the EFF :smile:

@fireblade I very much agree. Unfortunately what is legally valid and what someone can convince a court to proceed with are very different, so for the moment we’re taking the very safe route.