
Show HN: Unbabel API – Human Corrected Machine Translation - vasco_
http://blog.unbabel.com/post/75063388957/translation-api
======
etrain
This type of crowdsourcing meets ML is a really nice example of where we can
leverage humans and machines to the best of their current abilities.

It would be great if the feedback from the human workers could get
reintegrated into the translation models in an online fashion so that they get
better over time. I realize they're probably outsourcing their machine
translation, but that would be a terrific fully integrated pipeline.

~~~
gracaninja
I am Unbabel's CTO. Thanks for your comment. Our goal is exactly to use that
feedback to improve our MT systems. We are currently outsourcing our MT, while
training our own models (using Moses). Besides generating parallel text, the
types of data we will be collecting (e.g. chain of editions performed by each
editor on a task), will allow new and interesting algorithms to update the
translation models.

~~~
sushirain
I remember reading your NLP papers back in my academic days. Great work. One
typo: in the
[https://www.unbabel.com/pricing/](https://www.unbabel.com/pricing/) page
"from Portuguese to Portuguese can take some time".

~~~
Noxchi
And Italian to Portuguese should be not available.

~~~
vasco_
Why should it not be available?

~~~
byoung2
There is currently a gray dot, which for the other languages is used to
indicate that the source and destination language is the same. For Italian to
Portuguese this is not the case. It should have one of the other colored
icons.

~~~
vasco_
Ah, thanks, sorry, bug on our part, being corrected right now, thank you for
pointing it out.

------
vmarsy
I didn't know the product by itself, I'm not commenting about the API.

Seems a really good idea, however I looked at:

[http://news.unbabel.co/fr/fobo-est-presente-a-san-
francisco-...](http://news.unbabel.co/fr/fobo-est-presente-a-san-francisco-
pour-devenir-la-maniere-la-plus-rapide-et-la-plus-facile-pour-vendre-votre-
consumer-electronics/)

Vs.

[http://techcrunch.com/2014/01/10/fobo/](http://techcrunch.com/2014/01/10/fobo/)

and the translation, by _5_ people, is really poor: Just look at the first
line :

>Maintenant, vous sauriez déjà que Craigslist n'est pas bon comme un lieu pour
vendre votre produits.

This translation is unpleasant to read and has some mistakes. Google translate
gave me a better translation.

I guess the problem comes from the fact that translators are not native in the
target language. When you use the product, can you request native language
translators ?

~~~
vasco_
Also, you are correct, one of the things we are noticing is that requiring
that the translator be native in the target language provides significant
quality. As we increase the community, we will optimize for that. Thanks for
commenting.

~~~
vmarsy
As mmastrac said, if there's a way to add hints to the translator on how to
translate small sentences (for a mobile app for instance) I'll definitely use
your product!

For me, having a native translator seems a must have, I hope your community
will grow enough for it to be possible!

~~~
vasco_
There is already a way to send comments to the translator, Thank you for your
comment. When you send the message you can define the tone and give specific
instructions. We also have an beta version of the android app and are almost
ready to launch iphone. Also, we already have a lot of natives in the
platform. For paid tasks, the quality is vastly superior to the news. I would
invite you to try. Top up and experiment with giving instructions to the
translators. And please get in touch with us, I would love to chat.

------
zmmmmm
There is a kernel of genius in this: I can't translate a word of Chinese, but
I can usually do a _good_ job of _correcting_ a good machine translation. If
the machine does the first part, I can finish the last part, and I never knew
Chinese at all! So this would appear to dramatically reduce the requirement
for the extremely rare skill - being highly expert in two languages - down to
the very common skill of being expert in one language, and just passable in
the other. I love it.

~~~
vasco_
Thank you for your comment, that is exactly where are trying to get to.

------
kvnlw
I use unbabel a lot, and the really fast translation has a lot of implications
that aren't immediately obvious. You can communicate with international
customers efficiently in their own language. You don't have to build
translation time into the end of your release cycle. It's really sad when you
have to punt a last minute feature because you don't have time for translation
before the deadline.

~~~
vasco_
Thanks for the comment, we love our customers and we hope we continue to bring
value for you and your customers.

------
lignuist
> Human Corrected Translations for 1 cent per word

"Per word" of the source language, or the target language? Sum of both? What
about languages which have a different concept of "words" in written text
(e.g. Chinese, Turkish, ...).

And by the way... "cent" of which currency? :)

Edit: I just saw that the list of supported languages does not contain
languages with "exotic" types of word boundaries (yet).

~~~
vasco_
We are still trying to figure that out, there is no clear answer regarding how
to base the pricing. Perhaps it should be based on words that were actually
corrected? So for the time being we basing it only on source language words.
For chinese, for example, we probably will do it based on characters, but
ideas are welcome.

~~~
logn
I think only pricing the source language makes sense. That way the consumer
knows the cost going in and there's no incentive for you to provide more
verbose translations.

------
mmastrac
How do you guys deal with "context" around translations? Lots of shorter
strings in web application require some hand-holding for translators to let
them know where the phrase will be used, as the original English text might
translate to vastly different things depending on surrounding
functionality/text.

~~~
vasco_
That is a great point. We have some ideas on how to tackle that problem (like
integrating with pontoon, from Mozilla, for example), but right now there
would be some strings where that would be a problem. Surprisingly that has
proven less of an issue than we had feared originally. In any case, it is a
real problem for localization.

------
camillomiller
I have read some of your news translated in Italian by your service. That's a
very good translation, though in some parts the automatic nature of the
original translation is still detectable. E.g. The piece I read is the one
about Adolf Hitler Platz, translated from TechCrunch. Commas are the dead
giveaway: they're still used like in English. There's also a coordinate clause
introduced by an em dash – we don't usually do that in Italian.

That's to say that the service is super interesting, but I guess the final
user still has some manual editing to do if he/she wants to use this kind of
translations in a professional environment. Given the price, it's still a
great deal.

One last thing: the table in your home page shows that the translation from
English to Italian is not available (Italy's listed only under the "from"
column and not in the "to" row, if I'm reading that correctly).

Good luck with the product :)

~~~
vasco_
Thank you, we are working hard to keep improving. Keep in mind that the news
is done by the whole community, while paid tasks are only done by the best
editors.

------
TapocoL
Just curious, how is dynamic text handled in the Unbabel API? Can you pass
something like "Your score is %1" where %1 would be replaced by the first
parameter? Or would you have to request "Your score is 1", "Your score is 2",
"Your score is 3", etc separately?

~~~
gracaninja
Right now you can pass variables. We tell editors to ignore those variables.
For instance if you text is "Your score is %1" and you wanted to translate to
Portuguese, the translation would be "A sua pontuação é %1".

------
nowarninglabel
Pretty neat, will have to try it out for Kiva. We translate a giant volume of
words every year (we've translated nearly 100 million words total).

As it stands, we have an awesome community of volunteer translators who take
care of most our needs, but sometimes in times of high demand we could use
help getting through a large batch of loans to translate. That said, when we
tested out some external services for leveraging machine translation and
translation memories, what we found were a few problems that keep us from
being able to leverage external solutions

1) Our volunteers don't like "post-editing", meaning what you are doing here
of a human manually fixing up a machine translation. Since you are paying your
translators, I imagine they don't mind though.

2) It seems the majority of companies are focused on English -> Foreign
Language, whereas the vast majority of our translation needs are Foreign
Language -> English, and this proved decisive with most of the software being
geared in such a way.

3) Our partners are often in remote areas with not always the highest level of
education, and often are writing in a language that is their second language
(say perhaps French in a Senegal where the person's native language is Wolof),
so the grammar of the French is not going to be great to begin with. This
throws off the machine translation and makes it nearly impossible to develop a
translation memory that is segmented in the right way to actually produce
usable translation suggestions.

4) We need to review the text for policy guidelines (say for instance a
partner puts in directions to a business by accident in a region where our
borrowers are anonymized for safety reasons). But if we send a translation out
to a service like yours and then just have the English back, and then need to
report an issue in it back to the partner, the reviewer who would just know
English would not be able to communicate back to the partner the issue and
identify it in the original language version.

Anyways, just some food for thought in what we've had trouble with in the
space of trying to help us get our lenders connected to our borrowers by
providing them accurate translations of the borrowers' stories.

~~~
vasco_
Thank you for your comment, very insightful concerns and advice. We would love
to chat with you about your experience with crowd translation, it would be
really helpful. Would it be possible to get in touch?

One thing that we could be useful for is to actually help your reviewers to
communicate with the partner, that is to help in translating the communication
itself.

In any case, Kiva is an amazing organization, so obviously we would love to
find out how we can help in any way.

------
codegeek
Interesting idea. I am wondering though about "send message in your native
language". How will that work exactly ? I tried the demo but do you actually
support the native language script ? I got "* Unfortunately we currently do
not support this language. Please try a diferent message." How can I type in
Chinese for example ?

On a side note, just something that I find all the time. The unbabel blog does
not link anywhere to the main site and I had to type unbabel.com manually to
go there. Isn't this something that you guys care about in terms of traffic
source ?

~~~
sofia_
The demo is currently restricted to 4 languages: english, portuguese, spanish
and french. We do have chinese editors and can support chinese characters in
the live product, but are not offering it right now, as we don't have critical
mass.

Thanks for the heads up about the blog. Corrected at the end of the post.

------
hardwaresofton
Was going to make a product similar to this exclusively for Japanese language
-- interested to see how you guys do - I think the premise might be largely
flawed though, you do not want more than one person translating a large
document. Context is important in a lot of languages, and incoherent writing
style (as translation is almost never an idempotent function, there is always
more than one way to say something) often seems unprofessional.

(nvm, I think my idea is differentiated enough that I still might make it some
day)

~~~
vasco_
Hi, that is good point. It is something we are working hard on. I think that
starting with machine translation helps with maintaining coherence in style.
In way it is as if the first translator (the machine) translated the whole
document. The rest can be tackled with a lot of preprocessing and post-
processing. That being said, our method is not suited for really long forms,
such as 25 page documents or novels. Anything that requires creative
translation would probably need a professional translation dedicated on the
project. I would be interested in talking to you about your idea for Japanese.
Anything I can do to help let me know.

~~~
hardwaresofton
So, I'm not likely to pursue my own idea (though I registered a domain name a
long time ago and have a MVP that's rough around the edges) -- but I don't
want to be an idea-hoarder, I'd like to see where you guys go with this, so I
want to make suggestions (if you don't mind, I'm really not trying to sound
uppity at all)

\- I don't think that starting with machine translation to maintain coherence
in style is a good idea, while AI is still in it's infancy. Things like
sentiment detection, NLP, etc are still too infant in my opinion -- this is
baked into the premise of the idea as a whole... We still NEED humans to write
good translations - it seems unreasonable to start at the assumption that you
will get high-quality output from the imperfect machine process that you are
trying to improve (if that makes sense).

I think at best, you will START with bad style, at worst, people will
essentially re-translate the chunks to make more sense anyway, and you're left
with the hodge-podge.

\- If your method WAS suited for medium/long-form, I would suggest adding
another tier of worker-bee: the proof-reader. Allow worker bees to
apply/become proof readers, and create multiple proofs for large documents.
These workers would have qualified for longer-form proofing and possibly
editing. A possible increase to the relative pay of the proof-readers (as they
are even more closely linked to your revenue and customer satisfaction, and
are doing more work to boot), and providing multiple or a combined proof to
the customer (up-charge for this) would be a great addition to what you
already offer. This will probably do wonders for quality control, and will
remove the problem above (I think, to the extent humanly possible). This also
gives the people who work with you chance for improvement, chance to build a
personal brand, and a chance to take pride in their work (and maybe even build
personal/business relationships/trust that benefit the company).

\- Why not play in all the vertical space that you guys are in? Part of my
version of this service dictates a flat rate for a certain length, and a CLEAR
indication that that kind of service is for people with small blurbs to
translate. Some companies only need to translate small blurbs (disconnected
paragraphs, tag lines, etc), and could benefit immensely and constantly (if
you make a brochure for your company, or even an earnings report, etc, you
would need this service EVERY month/year, for example). I don't think you
would have to make too many structural changes to accommodate such a group of
potential customers.

\- I have not operated a system like this at scale, so all my suggestions are
largely baseless (keep that in mind please)

Oh and my idea was to rid the world of "Engrish", especially at the corporate
level.

~~~
vasco_
Thank for your comments, really insightful ideas. A lot of what you say we are
already seeing. For example in Turkish, where the quality of the MT output is
not as good as let's say Spanish, we are already seeing our translators
replacing entire chunks of text. Interestingly, in other language pairs, the
output is close enough that there is usually minimal changes in the output.

We have been thinking about having the editor position, we are experimenting
with the concept and how it fits with out current workflow.

Anyway, thank you for your ideas, really cool.

~~~
hardwaresofton
glad that some of it made sense, I really liked the site and obviously believe
in the idea, interested to see where you guys will go with it!

------
TezzellEnt
This is an awesome approach to human crowdsourced translation. The project I
was working on back in early 2012
([http://crowdlation.com](http://crowdlation.com)) was going to use machine
translation coupled with human editing and reviews in a very similar fashion.
I'm glad that this idea has such positive feedback, and I wish you guys the
best of luck! Feel free to reach out to me via the email in my profile if you
guys would like to chat.

~~~
vasco_
Thanks, I reached out through linked. I am looking forward to talking to you
about your experience in crowdlation.

------
exolab
I just saw a machine-translated text that was given to a translator I know for
"post-editing". Basically you have to delete everything that the machine
translated and do it all over again.

If you want to offer that for 1 cent per word you are going to get exactly
what you are paying for. 90% of qualified translators are bad enough that I
would never let them translate anything for me. I cannot imagine the remaining
10% will work for 1 ct/word.

~~~
vasco_
Thank you for your comment. There is a lot of variation depending on the
language. In Turkish, that tends to happen, a lot of times the translation
needs to be redone, but in EN-SP it is surprisingly good. The crowd aspect of
it tends to help with the quality problem. Still a lot of work to do though,
but we are off to a promising start.

------
edwintorok
Since the title is 'Human Corrected .. Translation' I assumed the API includes
a way for the end-user to provide feedback on the translation. According to
the docs there is a way to provide some instructions, define topics, but all
prior to the translation.

Also what if you want to translate multiple snippets of text, and keep them
somehow consistent? (for example some .po files from a project, translated one
entry at a time).

~~~
gracaninja
There is an endpoint to report a translation which we will release soon (it's
at the end of the documentation), meanwhile you can report a translation
directly to us.

We have an endpoint for bulk translations. We are working on a way to submit
XLIFF and PO files directly. Part of the consistency is achieved by the first
step of MT. We are working on keeping consistency between editors by
propagating their changes.

------
colinbartlett
Just a heads up for OP, looks like there's a mistake in your matrix of support
languages:

[https://s3.amazonaws.com/unbabel-assets-
production/img/chart...](https://s3.amazonaws.com/unbabel-assets-
production/img/chart_1.png)

Italian - Portuguese has a "n/a" style dot, but Portuguese - Portuguese
translation says "Can take some time".

~~~
vasco_
Thanks! :) I guess that one we could do pretty much instantaneously. Thank you
for pointing out.

------
lolexplode
Very interesting concept! I registered, and will keep an eye out for when/if
you guys add languages that I speak. I wonder if it's an error on my end, but
my profile says I'm in Arrifana, Portugal. I can only seem to change my
country of birth, which for the record isn't Portugal, so I wonder where that
is pulled from.

------
trey_swann
Very cool!

Let's say I wanted to Unbabel something from German to English. How long is
'Can take some time?'

Also, how long does it take to Unbabel something given 'Regular service'
conditions?

Last question, how long before 'Unbabel' catches on as a verb?

~~~
vasco_
Thanks Trey. Our goal is to get to 15 minutes of translation time. That would
make it usable for email and customer service messages. Right now it depends
on when our users are awake and which language pair. Faster time has been a
few minutes, average is around an hour.

~~~
trey_swann
That's fantastic! Great turnaround.

------
sunkarapk
Have you guys looked at
[https://github.com/pksunkara/alpaca](https://github.com/pksunkara/alpaca) to
generate API client libraries (SDK) instead of spending time on developing
them?

~~~
vasco_
I will look at it. Thanks for the info.

------
aluhut
Great platform. I hope you add more languages soon. My english is not good
enough but I could provide you with 2 east european languages and translate
them into german.

Also payment in BC or similar would be great.

~~~
vasco_
Thank you. We are planning to add Bitcoin at some, paying our translators
across the world is sometimes hard.

------
sinzone
this is an interesting API to have onto Mashape:
[http://mashape.com](http://mashape.com)

~~~
vasco_
Certainly, we started posting it there, haven't had a chance to finish it, but
will certainly do so. Mashape is a great website.

------
jbverschoor
Congrats vasco! Your pitches were very good

------
lsimoes
Good luck with your project.

------
davidslv
I wish you guys the best :)

