Amazon releases 51-language dataset for language understanding

tgv · on April 20, 2022

Translation quality is not great. A sentence like "maak de verlichting een beetje meer warm hier" is absolutely not idiomatic Dutch, and "wat is de tijdt in de andere tijdzones" contains an easily caught spelling error. "laat me mijn wekkers zien" is quite ridiculous; it means: "show me my (alarm) clocks". "ontvang me updates van bram´s facebook van het weekend" is ungrammatical. Well, not great.

jgmf · on April 20, 2022

Jack here from Alexa. Thanks for the feedback. Quality control was nontrivial, to put it succinctly, but we certainly always want to be better. I'll have to check if the issues you've noticed were detected in the judgement scores. For the first utterance you mentioned, all three raters put a score of 1 for spelling_score, which means "There are 1-2 spelling errors." So that's good.

Though we re-collected some utterances with low scores, we didn't have the budget to get perfect scores for all utterances. As such, we decided to include all utterances along with the scores from the 3 raters, such that users can perform filtering as they'd like. Some may want to keep the noise intact to help with training.

usr1106 · on April 20, 2022

This is good info. I just had a quick glance and did not pay attention to the scores.

1f60c · on April 20, 2022

"Laat me mijn wekkers zien" is indeed a bit odd, but it makes sense if you think of it in the context of a voice assistant that displays information on a screen.

usr1106 · on April 20, 2022

In German I see:

"putzen ist gut staub ist so schlecht mach jetzt deinen zauber sauber mein teppich"

This is complete nonsense. Mark Twain could have written it to make fun of the German language.

jgmf · on April 20, 2022

Ah, that's unfortunate to hear. In this case I see grammar_score ranging from 2 (Some errors (the meaning can be understood but it doesn't sound natural in your language)) to 4 (Perfect)--looks like one of the raters was too generous with a rating of 4.

I think this feedback is helping me to realize that we should be more explicit about our philosophy of keeping some of the noise in the dataset and allowing people to filter based on judgments.

That said, people do say some strange and ungrammatical things to virtual assistants (not saying this example is representative per se), so it's nice to include some of the odd ones.

Thanks!

usr1106 · on April 20, 2022

Thanks for following up!

I guess what happened is a user of your system tried to be funny. In this utterance there are 3 short fully grammatical sentences. And a 4th one which is not very grammatical but fully understandable, commanding the device to clean the carpet :)

- Cleaning is good.

- Dust is so bad.

- Make your miracle.

- Cleanly my carpet. [1]

[1] original is an ungrammatical imperative

Some (more) human review would have been needed.

The Finnish samples are full of very weird utterances, too. Some like people might have written over 50 years ago, but nobody would speak like this.

Maybe they were reviewed on Mechanical turk with moderate requirements / payment? Well, you get what you pay for...

trhway · on April 21, 2022

>people do say some strange and ungrammatical things to virtual assistants

with virtual assistants and the likes appearing more and more around one can wonder whether a new language/dialect can emerge for that setting.

usr1106 · on April 21, 2022

Absolutely. I thought the same thing. If virtual assistants [1].were widely used and they don't fully understand all natural language the user might want to use, users will adapt and start to use some restricted language.

[1] Personally I think it's a useless concept and the examples in the dataset mostly confirmed my view. Come on, a time zone conversion. Do we want to get so stupid that we cannot calculate it ourselves anymore?.Adjusting a light? Most of us sit too much and standing up is good for your body. Such technology is not only useless, but just bad. Not to mention the surveillance capitalism aspect which is proably also there or even prevailing .

fhars · on April 20, 2022

Might be some service incantation for flying carpets, "cleaning is good, dust is so bad, now clean your spell, my carpet!"

With the right wand movements, it could sort of make sense.

Insanity · on April 20, 2022

I would probably ask the assistant “Toon mijn wekkers”. Never actually used one in Dutch though and any “command” sounds odd to me in Dutch even though it’s my native language

fxtentacle · on April 20, 2022

Luckily, this matters a lot less than one might think. A good AI architecture can tolerate a few % of noise in the data. For example, A LOT of people in the Oscar dataset spelled the German "Metzger" as "Metzker" but AIs trained on it still perform great and strongly prefer the correct spelling.

jgmf · on April 20, 2022

Hi everyone, Jack FitzGerald here from the Alexa team who created this new dataset, the corresponding code, the leaderboard, etc. I'd be happy to answer any questions you might have, and I'll jump in with the other comments already here. Thank you.

lvl102 · on April 20, 2022

Jack, this project and competition remind me of the Netflix Prize so much! Can’t wait till we see what people can do with this dataset. Congrats on the launch!

jgmf · on April 20, 2022

Thank you!

rsstack · on April 20, 2022

The sentences in Hebrew are hilarious. Either fixated on Omer Adam[1] or very formal language that I don't think people would actually use. Or formal sentences asking about Omer Adam's greatest hits.

https://en.wikipedia.org/wiki/Omer_Adam

mmastrac · on April 20, 2022

The paper [1] has an interesting discussion of european famliar/formal pronous:

Many languages encode different levels of polite- ness through their use of pronouns. Many European languages distinguish between “familiar” and “for- mal” pronouns, with the “formal” pronouns often morphologically identical to a plural. In French, the second-person singular “tu” is used between friends, while the second-person plural “vous” is used when speaking to a group, or to an individual of higher social rank (such as an employee to a manager). These politeness systems are very heav- ily influenced by social context, and the MASSIVE dataset gives us a chance to see how people adapt their language when speaking to a virtual assistant instead of another human.

https://arxiv.org/pdf/2204.08582.pdf

nicoburns · on April 20, 2022

Historically those languages have included English too. "you" and "ye" being the formal/plural form (and indeed "you" can still be used to refer to either an individual or a group) with "thou" and "thee" being the now mostly (although not entirely) disused informal version.

Zababa · on April 20, 2022

> while the second-person plural “vous” is used when speaking to a group, or to an individual of higher social rank (such as an employee to a manager).

From my experience "vous" is not always about social rank, but often about "distance". If you go to a supermarket, you'll use "vous" with the employees and they'll use it with you. Most of the time when I use "vous" the other person will also use it. "vous"/"tu" cases are more rare.

capableweb · on April 20, 2022

Are the sentences in the dataset supposed to be correct? (specifically, the `.utt` key)

Seeing some strange results in multiple languages, that don't look correct. Even some English ones look incorrect (but not a native English speaker, so maybe I'm wrong). Here are some examples:

- "i want to play that music one again"

- "what's that the album is current music from"

- "which alarms do i have"

Maybe I'm misunderstanding the purpose of the dataset...

wging · on April 20, 2022

Experience (both from work on either Alexa or a similar system in previous years, and as an actual user) tells me that yes, people do often say things that don't "look correct". From personal experience I know that something about the pressure of knowing that the device will detect the end of your utterance before you're ready, if you're silent for too long, can end up rushing you and scrambling your brain a little. Even if you ordinarily have near-perfect instincts for English grammar and sentence construction, you can end up saying some pretty weird things.

But thinking of them as user mistakes, and framing them as "correct" or "incorrect", is counterproductive here. It may be a useful distinction as a learner, when you're trying to improve your own command of the language. But when they're inputs to your system, you have to do your best to do what the user meant, if that's possible to infer. All three of your examples are actually clear, if nonstandard (play music again; tell me what album the currently playing music is from; list my alarms). So a system that takes these as input should handle them as valid requests, not classify them as "incorrect". It might be different for a system designed to teach 'proper' English, rather than a system designed to enact the user's will.

bagels · on April 20, 2022

These seem like it probably things people said to Alexa. There is probably a combination of speech recognition errors and actual ungrammatical speech happening. Spoken word is often ungrammatical.

6gvONxR4sf7o · on April 20, 2022

It’s kinda wild to think about how incomplete written language is compared to spoken language. Punctuation does a lot of heavy lifting. If I write “I think well the I mean what album is this the song thats playing from” it takes some work to parse, but that’s what an ASR system has to do. A more human transcription might be “I think-- well the-- I mean, what album is this (the song that’s playing) from?”

numpad0 · on April 20, 2022

On top of those, these are translated ~nonsense~ text that aren’t suited for translation. Most likely a list of English texts are handed to translators without intended use cases or extra contexts well communicated to them. Some of texts are only correct in literal sense only or acceptably only in translated literature only, and still passing reviews.

Edit: “nonsense” was way too harsh of a word

usr1106 · on April 20, 2022

> These seem like it probably things people said to Alexa.

That could be an explanation. But some people made fun of Alexa and there was no human to filter out the nonsense:

In German I see:

"putzen ist gut staub ist so schlecht mach jetzt deinen zauber sauber mein teppich"

(cleaning is good dust is so bad make your miracle cleanly my carpet)

Tozen · on April 22, 2022

Looks like a sentence merged from separate ones. In that context, you could derive multiple intentions. So while not grammatically correct, it could still be useful.

- "i want to play that music one again"

I want to play that music again. I want to play that one again. I want that music again. Play that music again. Play that one again.

henchc · on April 20, 2022

They are supposed to mimic what an intelligent voice assistant might encounter, so not always grammatical, but both the original dataset and the localizations were crowdsourced. Despite efforts at quality control some errors might persist, but as mentioned by another commenter shortened or cutoff phrases or re-phrasings are common.

numpad0 · on April 20, 2022

Funny ones I could find, in Japanese:

  - unanimous grammar score: 4, spelling score: 2:
  3046:  "今どうしても中華料理が食べたいのでテイクアウトで注文させてください": "I really want to eat Chinese right now, is it okay if I order some delivered(asking the boss nicely)" , (4,2),(4,2),(4,2)  
  13367: "単語を定義するとき文で明確にします": "[this will be]clarified in sentence when defining the word", (4,2),(4,2),(4,2)  
  13258: "ビーエムダブリューのマイレージ": "Mileage [reward program] for BMW", (4,2),(4,2),(4,2)  
  13986: "トップモデルの車": "Top-style car" or "car of top [beauty] model", (4,2),(4,2),(4,2)  
  14556: "この会社の株価を更新してください": "Please update stock price for this company", (4,2),(4,2),(4,2)  
  14592: "スマホのサーキットは何なのか教えて": "Tell me what is the smartphone raceway", (4,2),(4,2),(4,2)  
  15414: "マネージャーが必要だ": "There will have to be a manager", (4,2),(4,2),(4,2)  
  
  - 2/3 grammar score: 4, spelling score: 2:
  6235: "素晴らしい映画を見たのでコピーがリリースされたら予約してください": "Because I watched a magnificent movie, please reserve in case a knockoff is let go", (4,2),(4,2),(3,2)  
  14401: "あなたが私に探した最後のことを教えてください": "Tell me the last thing you have searched of me", (2,2),(4,2),(4,2)  
  14407: "感じることができるの": "Are you able to sense that", (4,2),(2,2),(4,2)   
  
  - less than above
  14744: "ブロイラーは何ですか、どのように使えばいいですか": "Broiler [meat] is what, how may I use it(for you/for community)", (4,2),(4,1),(3,2)  
  15456: "ユナイテッド航空にあなたが私の荷物を無くしたのに怒るとツイートをしたいけど": "Though I want to tweet to [Mr.]United Airlines that you become enraged despite you losing my luggage", (4,2),(4,2),(2,2)

numpad0 · on April 22, 2022

I've since given a bit more thoughts on this - Part B question 3 and 4, from Fig.3 on the paper, asks only whether the sentence sound natural and (technically) correct, without specifying context, that the sentence is supposed to be an utterance spoken to a voice assistant. From my own comment above:

   3046: "I really want to eat Chinese right now, is it okay if I order some delivered"  
  14556: "Please update stock price for this company"  
  15414: "There will have to be a manager"

These are all context errors. What is not widely acknowledged(as I'd like...) is that translations for short sentences like "I need to see the manager" are situational, beyond depending on tones and politeness. Consider following:

  - I have complaints as a customer, and *I need to see the manager*.
  - I need to escalate this issue, and *I need to see the manager*.
  - You asked what I am about to do, and *I need to see the manager*.
  - "The manager" is a dashboard app, and *I need to see the manager*.

I won't list my translations because I've already dumped way too much CJK characters here, but each of above four should turn up differently against each others from organic translators(especially the last one).

timr · on April 20, 2022

This is a dataset of ~19,500 unique phrases in 51 languages.

The language coverage is certainly broad, but a 20k phrase set is not huge, in terms of NLU models.

disgruntledphd2 · on April 20, 2022

Yeah, definitely, but the goal (transfer learning across languages) is super good, and this seems like a good dataset for that purpose (though I suspect it's entirely insufficient to learn an NLP model from scratch).

cperiz · on April 20, 2022

The paper provides baseline results on how some pre-trained open source models will perform, when fine-tuned on this dataset. https://arxiv.org/abs/2204.08582

marcodiego · on April 20, 2022

Anyone knows a good open source translator that can be ran locally?

6gvONxR4sf7o · on April 20, 2022

If you know your way around ML, huggingface has some solid models. I’ve used MarianMT before and it was decent enough.

_pvxk · on April 21, 2022

https://translatelocally.com/ is a nice gui around marian/bergamot. So far not very many bundled pairs, though I would guess any of the models from https://github.com/Helsinki-NLP/Opus-MT-train/tree/master/mo... and https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/maste... should be usable.

There is also Apertium, a rule-based system which is very good for some closely-related pairs that have had a lot of work put into them (especially translation between Romance languages, e.g. Spanish→Catalan, and Norwegian Bokmål→Nynorsk), and the only OK translator for some lesser-resourced languages (e.g. Northern Saami→Norwegian Bokmål), but very underdeveloped for anything to/from English (it feels a bit pointless writing rules for English where there is so much available data; RBMT shines where there's not enough available data, ie. most of the languages of the world). It's packaged for most distros: https://packages.debian.org/search?suite=sid&arch=any&search...

capableweb · on April 20, 2022

Argos Translate fits with those requirements, but its translation quality is not great. I'd love to see something like DeepL (that has really good translation quality) running locally, but I think that's a pipe-dream for now.

euroderf · on April 20, 2022

I poked around and didn't see a list of the 51 languages. Maybe I'm stupid.

jgmf · on April 20, 2022

Sorry, we didn't have enough space to fit it on the blog post. You can find the list of languages on page 4 of our paper: https://arxiv.org/pdf/2204.08582.pdf

I'll add this to our github repo, too.

Thanks!

euroderf · on April 21, 2022

Cheers! -Curious

amilios · on April 20, 2022

Funny ones I found in Greek (my native lang):

"και έπεσε σκοτάδι" => "and darkness fell" (???)

"είναι βρώμικα εδώ κάνε λίγο φασαρία" => "it's dirty here, make a fuss"

But they're actually mostly pretty decent.

jgmf · on April 20, 2022

"and darkness fell" Yes, this one caused us some consternation :).

lifeisstillgood · on April 20, 2022

Just today I was looking at how to use Alexa to help my son with his school spanish. I am not convinced it is perfect but it looks like being part of the puzzle.

Gelob · on April 20, 2022

where are all the "shut up, alexa" and "fuck you alexa" phrases?