Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Lang (YC S19) – Internationalization Built for Devs
128 points by cyrieu on Aug 6, 2019 | hide | past | favorite | 46 comments
Hey HN! We’re Eric, Peter, and Abhi, founders of Lang (https://www.langapi.co). We help developers quickly translate their apps into foreign languages by combining internationalization SDKs with a command-line interface that integrates directly with human translators.

Previously, we all worked on building internationalization and localization tooling for companies. In our experience, companies don’t think about translation until too late, and the tech debt builds up very fast. It’s a nightmare to receive a task that says “translate app into Spanish.” Choosing the right open-source framework, refactoring the entire codebase, and integrating with human translators is a massive effort. As engineers, we wanted to work on features - not putting every string in our codebase into a translations.json file. In our months of internationalization work, we couldn’t find a good all-in-one toolkit. So we built Lang.

Like other internationalization libraries, Lang gives you a tr() function. Wrap your strings with tr(), and we’ll show your users translations that correspond to their language settings at run-time. But how do you actually get the translations? Open-source frameworks like Polyglot.js stop here, but Lang doesn’t. Run “push,” and our command-line tool will parse your code files, find tr() calls, collect newly added strings, and send them to human translators for you. For JavaScript, we use Babel to construct an Abstract Syntax Tree (AST) of your code, and traverse the tree to find tr()’d strings. For a developer, this makes it simple to add/remove/update strings: just run “push” in your terminal. You can track the status of your translations on our dashboard, and when they’re done just run “pull.” We’ll generate a translation file for you, and connect it with our tr() function. You own the file - Lang doesn’t make any network requests for translations at run-time, and your translations always load, even if our service is down.

This works for static strings in the code, but what about dynamic content in the backend or database? We expose a function called liveTr(), which takes a string argument. The first time liveTr() sees an untranslated string, it will make a request to Lang to translate it and return the string in its original language. But the next time, it will fetch the translation on-demand. We’ve shipped liveTr() with built-in caching functionality to reduce the number of network requests. We also have self-hosted solutions for users with high uptime requirements. This is a common in-house feature companies build for internationalization, and we want to make it available to all devs.

Lang currently supports JavaScript and Typescript apps (React, React Native, Vue etc.) with closed betas for Django, Android, and iOS. Give us a try at https://www.langapi.co/signup - machine translations are free, so you can see your app in another language in minutes. If you use human translations, we charge $99 / month for our tooling, and 6-8 cents per word translated. A lot of our work is inspired by open-source, and we want to give back - if you’re building an open-source project or non-profit, ping us at eric@langapi.co. We’ll drop the monthly fee :)

The HN community builds amazing products, and we’re sure there are plenty of people here who have translated their apps - we’d love to hear your experiences in this area and your feedback on how we can improve!

I like the idea, but you are putting more weight behind the tooling than I would. I don't find translation tooling to be cumbersome, so especially if machine translations are free, I don't find the price point for human translations to be compelling.

What would be compelling is if you could pro-actively call out the bigger gotchas in translation - grammatical differences that make you change word orders, different mechanism for handling plurals, etc. If you could preemptively warn us, even before a "push" that we may hit a problem, I'd take a closer look. For example, flagging a line saying, "Hey, it looks like you are using phrasing that will be problematic in <Italian/Hindi/Russian/etc.> Here's why..."

Great point, we would like to add support for common gotchas. Right now we abstract all plural logic away from the user by automatically requesting all plural forms of a phrase for each language. We also have noticed that translations often break layouts so we are pushing a tool that notifies you when translations are x% longer or shorter than the original string.

We would love to explore more ways we could help pro-actively solve these problems. Ping me at abhi@langapi.co if you want to talk more!

Mozilla’s Fluent project [1] seems like a really thoughtful and comprehensive approach to many of these problems.

[1] https://projectfluent.org/

We analyzed fluent when we built the product and found that it didn't offer much more than the international standard ICU. Our platforms have full support for ICU syntax including plurals.

We needed recently translate our chrome extension (languagelearningwithnetflix.com) into 18 languages. We are poor and had less than $1000 for the job.

Here's our approach.

1. Move all strings into a Google doc. Takes about 8 hours.

2. Organise strings into groups with screenshots, think carefully, split strings, look for reused strings, reword things to make them simpler and easier to translate, add notes to some strings to make the meaning more explicit. Very tedious part of job, 2-3 days work.

3. Put the doc, editable with link, onto upwork with a fixed payment, somewhat generous for translation wordcount. Check translator is a native in the target language, had some good feedback and ideally some IT/programming experience. Order translations for the languages we can check ourselves (5-6 languages).

4. Check the translations received for issues. Translators typically misinterpret the same things, as the source was not clear enough. Fix these issues. Maybe 4 hours work.

5. Now send to 10+ other translators for the languages we don't understand. Cross fingers that these will be ok.

6. Check translations of labels for homogenous usage of semicolons, capitals, fullstops etc. Struggle with zh/ja/ko.

7. Use a small JS script to transform CSV output from sheets to JSON for chrome.i18n.

8. Cycle through all locales and for overflowing text or other issues.

9. For any extra strings that we might need later, can try Microsoft UI translations database, or else, Google translate (which is mostly ok, can check the reverse translation).

Honestly this all was quite a lot of boring work, but we probably ended up with reasonable translations at a good price, and managed to pay translators reasonable money.

> Struggle with zh/ja/ko.

FYI, these are frequently called CJK (chinese, japanese, korean). Although, that more has to do with the fact that each character takes multiple keystrokes to type and thus is harder to support, than to do with the locale.

Wow, that's very interesting--what a time consuming experience. You can imagine that this method would not be sustainable for larger codebases. Let me know at abhi@langapi.co next time you go through this process and we'll help you out :).

This is very cool! It took us months to build the equivalent tooling for reddit (except we were using crowd sourced translations instead of machine based ones).

Do your clients do local caching, or is my uptime dependent on your uptime (unless I code in my own caching I suppose)?

Great question, and thanks so much for sharing your experience at reddit! For static strings, we create resource files and download them directly into your codebase, so your uptime is totally independent of ours once you deploy your app.

For dynamic strings in the database/generated by users, our servers need to be up to receive + handle the translation request, but that will be cached the next time you try to look the translation up.

I'm glad to hear you guys put thought into reliability from the start!

The Hindi example on your front page is translated in a grammatically incorrect manner.

Instead of saying "Lang me aapka swaagat hai", you've got "me aapka swaagat hai Lang".

It's like saying "Lang Welcome to".

Nice idea though, if you can iron stuff like the above out.

Great catch, thank you! We'll update the Hindi example :)

Does using the tooling depend on being translated through Lang? We are going to be setting up new internationalization tooling soon but have a robust human translation community already that we wouldn't want to bypass.

Our tools send the strings to a dashboard. At this point, you can have your own translators add translations and pull them back in. We have developed flows which make it easy for your own translators to go through your phrases, see the context (description + screenshots) and add appropriate translations.

Sending translations to the agencies we've partnered with is completely optional.

That's a great product! But... I can't change the language of your website?!

Our site is internationalized with our product and we have Spanish translations. If you change your Chrome language setting to Spanish you can see it. We'll plan to add a manual language flag soon.

I worked on internationalization extensively before (Pootle, specifically).

The good:

The library's API seems pretty well-thought-out. A good i18n API in JS/TS is highly needed, even more so one that works well with React. I use i18next in my projects but it's mediocre, although I don't know that the difficulties I end up facing with it wouldn't show up here.

The bad:

Pricing. Sorry but translation services are extremely competitive, and players that have been around a long time such as Crowdin, Transifex and Weblate have the benefit of being already trusted by name by a huge community of devs and translators.

You also talk about open source a lot but I'm disappointed your web tooling doesn't seem to be open source & self-hostable. This is one point where you really could differenciate yourself.

The ugly:

It looks like you've pretty concretely tied your i18n API and your translation UI together. I can't see your UI or whether it's any good, but I'm likely to want to use your API with a different translation service, or your translation service with a different API.

Also, please, Google oauth is basically a requirement for any b2b service.

(I'm happy to give more thoughts on a video/screenshare chat if you like, feel free to reach out, email in my profile; always want to help new players in the i18n space)

This looks super cool. I got a bunch of questions!

1. Am I correct in understanding this is meant as a client-side only solution for now? Right now we have a pretty complicated translations process that needs to support translations that are spread across the client and server. Would this support a hybrid approach like that?

2. Another question I have is where does the `translations.json` file come from and where is that stored? Is that just generated by the CLI and then we have to deal with serving that however we want?

3. Is there one `translations.json` file per language? One with all of them? Are there performance concerns with sending large files like that over to the client? This is a general question for me to other developers of large sites: how do you deal with tons of translations?

4. Any plans to support existing translations? E.g. if I have an existing set of translations keys and values can I plug those in somewhere? I know y'all are bootstrapping so it wouldn't surprise me if that's a Future feature.

Again, love the idea of this, and it would be super cool if this solves the problems due to complexity we currently have with supporting a ton of languages.


For some background our current solution involves a generating a rather large (> 1MB) `translations.json` file for each language that we serve to the client via a CDN. Typical map of keys to values.

We create the keys ourselves as we go along something like `dashboard.salesCard-helpText`. Then we have to kludgy Drupal instance to populate the key and value, add some tagging to show it needs to be translated. Translations get entered into that Drupal instance. All of this is entered manually. Then that gets used to generate the `translations.json` file I mentioned earlier.

We have plans in the future to overhaul the process.

1. We support both client and server side frameworks (Django, python, NodeJS). We're adding support for more as fast as we can including Java and Rails.

2. The 'translations.json' file is autogenerated by the CLI and updated each time you pull translations. It's automatically bundled in the deploy process so you don't need to do any extra work.

3. Currently we have a single translations.json file. Space hasn't been an issue yet but we plan to add splitting to reduce it. For dynamic content which can be large, we have solutions where we can serve the content as a CDN. We could also give clients a microservice if they would like to self-host or directly update a cache on disk on the deployed machines. Still experimenting with the best/easiest way to do this.

4. Sure thing, we have a file upload in our dashboard right now, but we want to add this to our CLI to make it more accessible.

I understand your frustration and one of our core philosophies is to do away with keys completely. Our keys are auto-generated and not touched by the user. The code can have the actual text which is a lot more readable. Ping me at abhi@langapi.co and I'd love to help solve your problems.

Congrats on the launch! I can totally see the need for this. What (if anything) would you say has changed in the recent years to make this technologically feasible for the first time? Have you looked into how the latest advances in NLP eg. GPT-2 has the potential to substitute translators entirely?

Thank you for the kind words! Two main reasons for our approach:

1) JavaScript-based apps these days have complex rendering logic that makes the HTML-parser method to find + translate strings unfeasible. Every company we've worked at has needed to extract each string and wrap them with a special `translate` function in their codebase.

2) We make heavy use of the Babel and TypeScript compilers to work with JavaScript ASTs, and there's been huge progress on those recently.

We've thought about NLP, but quality is a huge concern of ours, and we're not quite ready yet to roll that out to companies. If that's something you're interested in, would love to chat, send me an email at HN username + gmail!

I have a small framework I built for developing small web apps faster and it also has a translation engine very much like this, basically what it does is:

    <span class="text-muted">$T(Detajet e blerësit)</span>
and it translates it to:

    buf.WriteString(T(`Detajet e blerësit`))
T is a function that takes a string and returns a string from static map that gets built when app starts or when somebody translates a label.

Cool stuff! I would love to hear more about it. Please ping me at abhi@langapi.co if you want to talk.

Correct me if I'm wrong but wouldn't I still have to take the time to wrap everything in the codebase? I feel like that's the majority of the painful work.

Developers with large code bases have complained that it takes weeks to manually wrap all their strings. We have an experimental tool for ReactJS which can wrap all the front facing strings with our function instantly. It's currently in beta but we recently used it to onboard a codebase with over 1000 strings in half an hour. We're also looking to build this auto-wrapping service for other frameworks.

Ping me at abhi@langapi.co if you want a demo of this beta tool or if you want it for another framework.

Hi, we developed a similar solution minus the handling of translations itself - fortunately we have inhouse people that can supply that.

A few questions:

- Some translations are a bit context dependant, what happens if I dont agree with the translation? - Sometimes we do some kind of media localization, eg: users in france will see a different image than users in portugal. Are you a translation only shop, or you plan to do some kind of l10n?

Best of luck!

Good question, we let you provide as much context to the translators as you want (description, tone, and screenshots). If you still don't like the translation, you can comment and it will be redone taking your comment into account. Additionally, if you have employees who want to review the translations, they can do that on the platform.

We currently handle plan to handle localization of text (regular, dates, currency, time, gender, plurals, etc.). Handling images is interesting though, I would love to explore that more if it was a pain point for you. Ping me at abhi@langapi.co--I'd love to talk!

Couldn't a bad actor abuse liveTr() and call it with a ton of random strings to make me pay to translate a ton of garbage data?

A bad actor could only abuse liveTr() to spend your money if they had access to your API keys, which hopefully should be a secret. In these cases, you'll also be able to report bad strings coming in through your dashboard, and we'll take care of the bad actor and re-fund you immediately!

Presumably you would have to build protections into your own app to prevent that, much like any other user input abuse.

Looks great! It'd be cool if the homepage was a live demo that allowed you to toggle the language.

Our framework automatically checks the browser language preferences. Change your browser language preference to Spanish and refresh to see the translations. We'll also add the flag toggler soon to make it more obvious.

I use gettext for whatever back-end language I'm using. I much prefer handling this sort of thing from the back-end. That way it doesn't matter what the front-end is. It just works.

For frameworks with gettext we integrate with it! Gettext will extract the phrases into a .po file but these need to be translated. Running 'langapi push' will push out all the .po files in the codebase and running 'langapi pull' will pull the translated versions back in and automatically save them in the correct locations.

I'd love to see support for other server-side languages, like C# (preferably something that can blend in with existing resource files) and Java.

We're actually building out support for Java right now and would be more than happy to explore other server-side languages you're interested in. If you have more specific requests ping me at abhi@langapi.co and I'd love to talk!

As I mentioned - C#. Translation mechanism uses so called "resource" files with a master file (english), and secondary files with other languages.



for more information.

Very interesting, question: from the pricing page: it is unclear to me what is meant by phrases in the context "100 Phrases" for $99 plan?

That's great feedback -- we're still working on the best way to do pricing. A phrase is any text wrapped with our 'tr' function. We use phrases for pricing currently because it's an industry norm but we're open to changing it if it causes confusion.

I think the confusion for me was that there is also a pay-per-word price; then for me the limit on the number of phrases does not feel logical. Phrases as tr() calls for me as a developer also depend quite a bit on how I have setup the translation and differ per project.

I would expect to pay a monthly fee for the online service + a fee for each word. Also, 100 unique tr()'s seem a bit limiting? (even smaller projects for me quickly get > 1000 separate tr calls)

This looks great & hi Abhi! Will this support left script languages?

Hey! Requesting and receiving translations work with any language including left script (right-to-left) languages. In the future, we want to solve the page layout problem for RTL languages as we've seen it's a big pain point.

Cool I’d like to use this for my startup. Could I give you a call

Absolutely, email me at HN username + gmail and I can onboard you :)

Wow, this is amazing!

Thanks :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact