Google Translate is even better now, part 2

« previous post | next post »

"Google Translate learns 24 new languages"
Isaac Caswell, Google blog (5/11/22)

==========

Illustrated green globe with the word "hello" translated into different languages.

For years, Google Translate has helped break down language barriers and connect communities all over the world. And we want to make this possible for even more people — especially those whose languages aren’t represented in most technology. So today we’ve added 24 languages to Translate, now supporting a total of 133 used around the globe.

Over 300 million people speak these newly added languages — like Mizo, used by around 800,000 people in the far northeast of India, and Lingala, used by over 45 million people across Central Africa. As part of this update, Indigenous languages of the Americas (Quechua, Guarani and Aymara) and an English dialect (Sierra Leonean Krio) have also been added to Translate for the first time.

The Google Translate bar translates the phrase "Our mission: to enable everyone, everywhere to understand the world and express themselves across languages" into different languages.

Translate's mission translated into some of our newly added languages

Here’s a complete list of the new languages now available in Google Translate:

  • Assamese, used by about 25 million people in Northeast India
  • Aymara, used by about two million people in Bolivia, Chile and Peru
  • Bambara, used by about 14 million people in Mali
  • Bhojpuri, used by about 50 million people in northern India, Nepal and Fiji
  • Dhivehi, used by about 300,000 people in the Maldives
  • Dogri, used by about three million people in northern India
  • Ewe, used by about seven million people in Ghana and Togo
  • Guarani, used by about seven million people in Paraguay and Bolivia, Argentina and Brazil
  • Ilocano, used by about 10 million people in northern Philippines
  • Konkani, used by about two million people in Central India
  • Krio, used by about four million people in Sierra Leone
  • Kurdish (Sorani), used by about 15 million people in Iraq and Iran
  • Lingala, used by about 45 million people in the Democratic Republic of the Congo, Republic of the Congo, Central African Republic, Angola and the Republic of South Sudan
  • Luganda, used by about 20 million people in Uganda and Rwanda
  • Maithili, used by about 34 million people in northern India
  • Meiteilon (Manipuri), used by about two million people in Northeast India
  • Mizo, used by about 830,000 people in Northeast India
  • Oromo, used by about 37 million people in Ethiopia and Kenya
  • Quechua, used by about 10 million people in Peru, Bolivia, Ecuador and surrounding countries
  • Sanskrit, used by about 20,000 people in India
  • Sepedi, used by about 14 million people in South Africa
  • Tigrinya, used by about eight million people in Eritrea and Ethiopia
  • Tsonga, used by about seven million people in Eswatini, Mozambique, South Africa and Zimbabwe
  • Twi, used by about 11 million people in Ghana

This is also a technical milestone for Google Translate. These are the first languages we’ve added using Zero-Shot Machine Translation, where a machine learning model only sees monolingual text — meaning, it learns to translate into another language without ever seeing an example. While this technology is impressive, it isn't perfect. And we’ll keep improving these models to deliver the same experience you’re used to with a Spanish or German translation, for example. If you want to dig into the technical details, check out our Google AI blog post and research paper.

We’re grateful to the many native speakers, professors and linguists who worked with us on this latest update and kept us inspired with their passion and enthusiasm. If you want to help us support your language in a future update, contribute evaluations or translations through Translate Contribute.

==========

Selected readings

[Thanks to Don Keyser]



24 Comments

  1. Jonathan Smith said,

    May 12, 2022 @ 7:55 pm

    "We’re grateful to the many native speakers, professors and linguists who worked with us on this latest update and kept us inspired with their passion and enthusiasm."
    Wow, what corny pricks. Google Translate not only doesn't support languages like Cantonese and Taiwanese with tens of millions of passionate speakers and tons of easily accessible text but actively removes support upon realizing its work in these areas could prove problematic for Dragonfly the Remix. So I must amend the comment on the linked Reddit thread to "Diu lei lo mo gong chan tong AND Google!!!" True scum.
    https://www.reddit.com/r/Cantonese/comments/omxajf/bring_cantonese_support_back_to_google_translate/

  2. KIRINPUTRA said,

    May 12, 2022 @ 10:41 pm

    Tip of an iceberg. Taioanese doesn't even have an ISO 639 code. I guess Cantonese does, practically if not technically. In a sociopolitical sense, Cantonese & Taioanese are almost analogous to the spoken Arabic languages — except Mandarin is sociopolitically a one-stop shop, unlike Standard Arabic, and a huge clock is ticking.

    Given the no-minorities-please "physics" of East Asian society, it seems impossible for Cantonese or Taioanese speakers to break away as a non-Mandarin-aligned sociolinguistic minority, which would better allow Google to help. I.e. the idea of de-aligning from Mandarin as-a-minority-if-necessary doesn't exist; it's "all or nothing".

    I can see how Google could be a positive force for Cantonese in spite of it all, if it cared. But any well-meant attempt for Taioanese at this time would probably just deepen the hole, like the Kongsī Taioanese channel, with its 24/7 subtitles in a "seductively pseudo-official" Mando-Taioanese hybrid script.

    Glad to see Ilokano up there at last, though.

  3. KIRINPUTRA said,

    May 12, 2022 @ 10:42 pm

    > it seems impossible for Cantonese or Taioanese speakers to break away as a non-Mandarin-aligned sociolinguistic minority,

    I meant "it seems impossible for SOME Cantonese or Taioanese speakers to break away as a non-Mandarin-aligned sociolinguistic minority"….

  4. David Morris said,

    May 12, 2022 @ 11:32 pm

    20,000 use Sanscrit. If that hasn't changed since its heyday, how and why hasn't it? If it has, what makes it still Sanscrit?

  5. Parvez Qadir said,

    May 13, 2022 @ 8:38 am

    Saraiki is 26 M. It is important language in Pakistan and India. Saraiki be added please.

  6. Robert T McQuaid said,

    May 13, 2022 @ 12:17 pm

    I was skeptical that Google could translate a language with fewer than a million speakers, so I did a round-trip translate of an article on Virgin Orbit to and from Dhivehi. The result was quite readable.

  7. Coby L said,

    May 13, 2022 @ 1:31 pm

    Sanskrit, and Latin, but not Ancient Greek?!!

  8. Philip Anderson said,

    May 13, 2022 @ 3:37 pm

    @Coby L
    I believe Sanskrit and Latin are spoken within some religious communities; Ancient Greek probably isn’t.

  9. Tom Davidson said,

    May 13, 2022 @ 5:13 pm

    When I type in a word, phrase or sentence, the Google Search Results do not show the "Translate this page" link.

  10. J.W. Brewer said,

    May 13, 2022 @ 6:04 pm

    @Philip Anderson – but there are a reasonable number of currently-living speakers of any number of modern languages who from time to time would like to get a quick-and-dirty translation of a text written in Ancient Greek (or post-ancient-but-medieval Greek: there are different ways to split up the history). Probably significantly more than non-Tsongaphones who would want a quick-and-dirty translation into their L1 of a text written in Tsonga, although to be fair perhaps the Tsonga functionality is more for the benefit of those who are literate in Tsonga who would like a translation from some other language they don't know (maybe including some earlier form of Greek?) into Tsonga. Which is perfectly legitimate, it's just that for any given language added to google translate's repertoire, the user base for "translate into that language" and "translate from that language" is by definition going to be different folks.

    I have myself from time to time stuck bits of ancient-ish Greek (including New Testament/patristic/Byzantine) into google translate although I knew it would treat them if they were modern Greek, and … the results varied pretty wildly in quality, because the distance between earlier forms of Greek and modern Greek (both in lexical meaning and grammatical structure) is not consistent but can vary dramatically between short text A and short text B. I have in my younger years formally studied ancient-ish Greek and ought in principle still be able to do my own translations into English, but at this point in my life that can be a slow and taxing process, so having an automated first-approximation would be helpful.

  11. Bloix said,

    May 13, 2022 @ 9:06 pm

    These comments remind of this, start at min 2:01 (those who don't watch Louis C.K. on principle should stay away):
    https://www.youtube.com/watch?v=nUBtKNzoKZ4

  12. Philip Anderson said,

    May 14, 2022 @ 5:21 am

    @J.W. Brewer
    I am interested in reading Ancient Greek too, My point was that Google Translate may well be concentrating on communication for language communities (as a public service) rather than as a service to curious individuals. You and I may or may not agree with that, but it’s my hypothesis for the different availability.
    Also, Google Translate already has (Modern) Greek, and it doesn’t AFAIK support more than one variant of ANY language – one English, one Spanish, one Arabic. LL has already discussed (without a consensus) why Latin and Sanskrit are regarded as different languages, but Ancient Greek etc are diachronic variants of the modern language – but that’s the way many people view languages.

  13. Philip Taylor said,

    May 14, 2022 @ 5:43 am

    I do have a philosophical problem with the view that "Ancient Greek etc are diachronic variants of the modern language". With "Modern Greek etc are diachronic variants of the ancient language" I would have no problem, but how an ancient language can be viewed as a variant of a modern language I simply cannot see.

  14. CD said,

    May 14, 2022 @ 11:20 am

    The linked material is fascinating — this moves past training pairwise AIs on existing translated text, and finds that what a model learns in one kind of translation can be applied to unrelated languages. So a question for linguists: what does a large scale model, like the 1000-language model described, know? Is there really a universal grammar?

  15. Terpomo said,

    May 14, 2022 @ 8:35 pm

    J. W. Brewer, Baidu's translator supports Ancient Greek but it's pretty bad.
    Philip Taylor, if we're to have a linguistically coherent definition of 'variant' wouldn't it have to be reflexive? That is, if A is a variant of B then B is a variant of A, since there's no criteria on which to judge which is the "real" or "proper" language.

  16. Philip Taylor said,

    May 15, 2022 @ 2:14 am

    I would argue "no, it does not have to be reflexive". At the time that Ancient Greek was spoken, it could not have been thought of as a variant of Modern Greek, since the latter did not yet exist. Now that Modern Greek does exists, it is reasonable to postulate that it is a variant of Ancient Greek, even if not all would agree. My EUR 0,02.

  17. Philip Anderson said,

    May 15, 2022 @ 4:08 am

    @Philip Taylor, Terpomo
    Point taken. I meant, and should have said, that Ancient Greek and Modern Greek are regarded as variants of Greek. What ‘Greek’ means then becomes the question: the superset of variants that each Greek speaker would recognise as Greek, their language and not another one?

  18. TR said,

    May 15, 2022 @ 3:24 pm

    Given how many thousands of ungrammatical Latin tattoos GT is responsible for, I'm a bit ambivalent about its expansion to other ancient languages.

  19. Terpomo said,

    May 15, 2022 @ 4:40 pm

    Jonathan Smith, I have to wonder if that really is why Google doesn't support translation and speech synthesis in Cantonese given that Baidu's translator does. It's difficult to believe Baidu is somehow LESS beholden to the CCP than Google is.

  20. Terpomo said,

    May 15, 2022 @ 4:42 pm

    David Morris, from what I understand there are a few thousand people in India who list their mother tongue as "Sanskrit" on census forms for political reasons but there's no evidence of actual native Sanskrit speakers. Though it's possible the 20,000 is actually an estimate of the number of scholars who have some decent knowledge of the language.

  21. Andreas Johansson said,

    May 16, 2022 @ 2:57 am

    It struck me as slightly unlikely there'd be a language in South Africa with 14 million speakers that I hadn't even heard of, and, happily for my vanity, it turns out that Sepedi is also, perhaps better*, known as Northern Sotho, which I had heard of.

    * The WP article is called Northern Sotho language, which may be indicative.

  22. Phillip Helbig said,

    May 16, 2022 @ 3:57 am

    Although it doesn’t offer as many languages, and many are done via English, in some respects DEEPL is better than Google Translate.

    https://www.deepl.com/translator

  23. Anand Manikutty said,

    May 18, 2022 @ 11:47 am

    > David Morris, from what I understand there are a few thousand people in India
    > who list their mother tongue as "Sanskrit" on census forms for political reasons
    > but there's no evidence of actual native Sanskrit speakers.

    Ne, kompreneble ne. Vi eraras. Estas multaj homoj en Barato, kiuj markas sanskriton kiel sia gepatra lingvo ĉar ĝi *estas* ilia gepatra lingvo.

    Trans:
    No, you are mistaken. There are plenty of people in India who mark Sanskrit as their native language because it *is* their native language.

    ~

    1. La nombro da individuoj, kiuj parolas sanskriton, certe ne estas nula. Ĉar sanskrito estas liturgia lingvo, ekzistas diversaj komunumoj kiuj uzas ĝin preskaŭ ĉiutage. Ekzemple, pastroj.

    2. Sanskrito staras sur Nivelo 4 de la Vastigita Gradigita Intergeneracia Disrupcio-Skalo. Ĝi ne estas en danĝero esti formortinta.

    Trans.

    1. The number of individuals who speak Sanskrit is certainly not zero. Since Sanskrit is a liturgical language, there are various communities who use it almost daily. For instance, priests.

    2. Sanskrit stands on Level 4 of the Expanded Graded Intergenerational Disruption Scale. It is not in danger of being extinct. Plenty of people use it.

    [=+=]
    Nur kelkaj informoj. Mi ne havos tempon respondi al ajnaj demandoj krom se vi respondos en Esperanto, sed, kiel ajn, ĉi tiu estas la situacio kun sanskrito nun.

    Trans.

    Just some information, fyi. I won't have time to respond to any questions unless you reply in Esperanto, but, any way, this is the situation with Sanskrit right now.

  24. Anand Manikutty said,

    May 18, 2022 @ 11:52 am

    > 20,000 use Sanscrit. If that hasn't changed since its heyday, how and why hasn't it?

    La nombroj certe ŝanĝiĝis ekde la glortempo de sanskrito.

    Trans.

    The numbers have certainly changed since the heyday of Sanskrit. Certainly, this aspect of the history of Sanskrit is well attested to.

    > If it has, what makes it still Sanscrit?
    Ĝi estas ankoraŭ la sama lingvo kun la sama gramatiko.

    Trans.

    It is still the same language with the same grammar. Sanskrit is pretty standardized also. So, it is still the same language.

RSS feed for comments on this post