Hacker News new | past | comments | ask | show | jobs | submit login
How Telegram Messenger circumvents Google Translate's API (danpetrov.xyz)
392 points by decrypt 24 days ago | hide | past | favorite | 278 comments

Someone deleted an interesting comment about adversarial interoperability [0]

I’d love to see and give money to a project to create and maintain easy to use and stable “adversarial interoperability” APIs for as many services and products as possible.

Perhaps companies and projects would not often use these directly because of the risks (hopefully some would, though!) but individuals could drop the library or the URL to a server hosting it into their apps to gain extra features.

If standardised, whole open source apps could be built around them that allow querying and analysis of data from services and aggregating and automating using the services including optimising prices, taking advantage of offers, and using undocumented APIs to the users advantage.

Maybe something architected and incentivised like https://thegraph.com/ for adversarial intercom and undocumented APIs. Building as a network of nodes and funding with crypto would make it harder to attack and take down.

[0] https://www.eff.org/deeplinks/2019/10/adversarial-interopera...

Do you know Woob ? https://woob.tech/

So they finally gave in to the complaints about the name (https://lists.symlink.me/pipermail/weboob/2021-February/0016...). woob is excellent but unfortunately its very existence means there is a problem. Hopefully they will gain visibility and problematic services will adopt standard protocols

This is very wishful. What is the incentive to enabling content access which bypasses their business (i.e. ad impressions)?

This is nice, and feature rich. Its available in Homebrew, where I was able to try it out (if I ran Nix I'd been able to use nix-shell for such). However, its France and French oriented. Not the world-wide interface I expected it to be. Bands for example, only contains a metal database. Recipes, 5 out of 6 entries is French. Travel, I don't see German or Dutch or UK or Belgian public transport.

I can't speak for Germany/Belgium/NL but we have actual APIs for public transportation in the UK (trains/tube at least, not so sure about buses).

Ahah looking at the logos triggered my frenchness, this things is from the motherland.

Holy s*t this is amazing! A set of web-scraper automations!

Looks excellent.

"pip3 install woob" succeeds but e.g. "woob config-qt" returns "not a woob command". Are the -qt versions not available via pip? I have python3-pyqt5 installed.

Qt applications are maintained and packaged separately:

pip install woob-qt


Thanks, I'll suggest they add that to https://woob.tech/install

Wow that is really cool!

Thanks! This looks really cool

> Perhaps companies and projects would not often use these directly because of the risks

If you provide a solution to someone’s problem, it will be used all over the place.

The biggest companies won’t use these things, but plenty of smaller companies and individual programmers without oversight would use them without a second thought, at least until they’re caught.

Telegram is a relatively large business and here they are abusing an API exactly like you suggest.

This is a really good idea!

Along those lines: maybe we could use a middleware pattern for APIs, frameworks, etc where the interface/package would be built as a layer above two or more services.

That way the developer could switch between them at any time, or even failover automatically.

So for example, rather than going onto GitHub to download an SDK for something like Mailgun, you'd download a middleware framework built on Mailgun and Sendgrid.

This pattern could be used to identify vulnerabilities in software at the conceptual level, by helping developers to avoid marrying their code to individual providers like AWS. Some mission critical software could even be certified as using all adversarial interoperability frameworks.

It could even help third party services pull themselves up by their bootstraps, if they get added to one of these middlewares. An instant user base without having to rely on marketing or word of mouth.

And could be used to identify monopolies when there's no middleware for a service.

I was advocating for this for years but gave up when I talked to someone at Pager Duty that was just straight out against it. For almost irrational reasons. I kept trying to make the case that for a company like their that literally can't be taken offline without risking huge chunks of the internet why wouldn't they want to run their whole business on an abstracted away platform to, say, switch to Windows if linux had a 0day, etc.

> Too many moving pieces. Too much work.

It still boggles my mind that this is the way we do things. "Patch fast!" Ok cool. That's going to keep working out forever.

I totally agree.

I would love to have applications that are more tools than products and weave together these APIs and middlewares, both in querying and visualising, combining, enriching, analysing, filtering etc. data and also taking the output and actioning it.

With the amount of data and services that are now available online we should have superhuman capabilities but the productisation of the internet has left us stuck in company run silos fighting “user journeys”, undocumented APIs and EULAs.

> Someone deleted an interesting comment about adversarial interoperability

That was me. The comment was rapidly amassing upvotes, which made me feel that not only was the comment preaching to the choir but it had the potential to derail the discussion around the actual issue of Telegram's use of Google's Translate APIs.

I think https://plaid.com 's entire business is pretty much an adversarial interoperability webscraping play for banking. The business exists!

youtube-dl does this for video sites, it works on more than just YouTube despite its name.

teller.io does similar for banks.

I would love to see a coordinated effort along the same lines for things with non public APIs. It is however a huge ask as internal APIs are unstable and constantly changing/actively working against things like this which is a huge amount of work to keep up with.

such APIs could be running on https://fluence.network with an interoperability and composition across different services

I don’t understand the way this was implemented.

They are bound to get in trouble with Google for this, but they can’t easily pull the feature. They can’t just be like „oh you’ve had translate for two weeks now, but now we can’t pay for it, so it’s gone.“

What is the long term thinking behind this? Or is this just developers and management not communicating?

Yeah I'm a bit shocked honestly that it made it into an application as widely used as Telegram. It's bound to be detected eventually and the feature will suddenly break. Such a strange software engineering decision.

They can't even plausibly pretend that they didn't know and it's all a big misunderstanding given the lengths they went to obfuscate it in the code.

This seems like one of those features that leadership demanded with a “just make it happen” decree and no budget for API calls.

Then some developers facing a deadline cobbled together something that “just made it happen” so they could kick the can down the road with something that worked, ideally long enough to collect their bonuses and find a new job so it becomes someone else’s problem.

Or maybe Telegram the company just likes to abuse other people’s things and see how long they can get away with bad behavior. Who knows.

Telegram problem is that unlike WhatsApp they don't have a billion dollar corporation backing them. I wouldn't be surprised if they were bleeding money.

Don't really think telegram works this way, you are imagining it like a Big N

“Google cut us off—they’re the bad guys, not us. Blame them.”

…but with more elegant phrasing.

s/elegant phrasing/corporate doublespeak

blah blah blah on 31st of February 1970 Google unanimously decided to terminate our access to their Translate API blah blah blah blah

Telegram knows how to play the PR game. It’s going to be a bit more elegant than that, with a “we will rise from the ashes” tone. They know what they’re doing, and I doubt they had any intention of getting away with this.

This is Telegram. They are actually known for elegant phrasing, not corporate doublespeak.

> They are bound to get in trouble with Google for this

From first look, I don't think they are. Telegram gets a new feature, Google gets more data to mine. It's a win-win. I just hope they'll be clear with their users about sending data to Google.

As a small company who spends $70-80k per year on Google's official Translate API, it's disappointing if Google allows this type of abuse to continue.

If they don't want to pay, they should be using a free open source alternative like https://github.com/LibreTranslate/LibreTranslate

Definitely would love to learn more about your use case. Recently I started to use DeepL and it’s great if the language selection is enough for you.

Personally I have found DeepL to be more accurate in the languages they support than Google Translate. They used to support much less, but for the languages it did, it was pretty great. It supports way more now though!

Still no Turkish on DeepL, even though it is one of the most widely spoken languages across much of Europe and uses the Latin alphabet.

Yet tiny European languages like Latvian are supported, as are very difficult translation targets such as Estonian and Hungarian.

My hopes are dashed every time they add another tiny European language and Turkish remains off the table. :-(

Contrary to Turkish, Estonian, Latvian and Hungarian are all EU official languages. That means that all EU legislation is legally required to be translated to these languages and it is readily available in all these languages for free to train the AI model for translating to those languages.

Indeed, also see linguee.com which uses (mainly) official EU documents to feed an enormous amount of word and phrase translations. Beautiful site that I've been using a long time - and only learn this minute that they are in fact also owned by deepL.

I am unsure if the support for those languages are better vs. Google Translate, but the small set of languages it used to support a year ago or something is definitely better. I remember French and Spanish being way better. DeepL's Polish is not that great, that I can say for certain! Not sure if better than Google Translate's though.

Qualitatively better, or quantitatively? If the latter, do you have some metrics you can share?


> do you have some metrics you can share?

Not really. Try to have discussions using DeepL and Google Translate. Ask native speakers which one was more accurate and whatnot.

I do not know French, but DeepL allowed me to speak to someone using the language, and apparently at some point some people thought I was a native speaker!

Crabs in a bucket.

Not everyone can afford to use these APIs or are able to get permission to use the external APIs. The internal APIs are very good for people who don't mind taking the risk of using a potentially unstable API.

Then don’t run a business with services you can’t afford?

But using internal APIs are free. I can afford to use them since I don't mind having to update my usage of them / work around rate limiting.

Do you also only eat free samples from the grocery store?

Which part of https://translate.google.com/ web page says the service provided by the page is a free sample of commercially available API?

The ToS, I expect

I’ve looked for 5 minutes there, and found nothing relevant. They don’t even have additional specific TOS for their translate service, the only link from https://policies.google.com/terms/service-specific?hl=en-US under “Translate” section points back to their common TOS at https://policies.google.com/terms?hl=en-US

Even if they would have TOS for translate, pretty sure that’s unenforceable. Not unless hiding that page behind a paywall, or requiring a google account. Merely visiting a publicly available web page doesn’t create contractual relationship between end user and web server owner.

I've spent some time looking into this and found very little. The closest I could get is that if you are a Google Cloud customer, it forbids you from reverse engineering other Google APIs that are not documented as supported, which would probably include this use case. No idea if they're a Google Cloud customer though.

> No idea if they're a Google Cloud customer though.

Even if they are, I have doubts that’s enforceable either. One doesn’t need to reverse engineer an API to consume that API, and it’s hard to find out who did the reverse engineering. The reverse engineering might be accomplished by someone else who’s not a Google’s Cloud customer, like an unrelated person answering a question on stackoverflow.com.

Those are usually rate limited.

I'd recommend pizza and beer at tech events instead (albeit the nutritional content of your diet could be more important than free food).

This can't work for long. Translate is a profit center for google, and this also shows others that they can disregard google's monetization model for translate.

Commercial use of those APIs is common, despite translate being pretty expensive. Also, GCP current leadership is so hell bent on nickel-and-diming their customers, and their compensation packages are so dependent on value share growth, that they simply can't afford anyone openly violating their pricing models. Especially a popular app. My guess is this will be down within the first week of January.

I'm curious what techniques they will use to differentiate between Telegram and non-Telegram users. If I were them, I'd simply use my power leverage and threaten them to remove the app from the Play store unless they remove/fix the offending code - it's much simpler than an eternal mouse-and-cat game, with possible collateral damage.

Ah, you're suggesting Google uses its position on the Play Store to fend of abusers of an unrelated service, that happens to be owned by Google as well? I don't think that will work in Google's favor when they try to defend any anti-trust cases slung their way, or being active right now. Google has many other subtle ways that garner less attention to ward of any misuses. Subtly downrank in the search index, throttle any traffic heading their way, suggest other applications in the play store on top, accidentelly flag Telegram as a "risky" platform, etc.

I see what you mean but surely there must be precedents of apps being pulled from the Play Store because they abused 3rd party APIs, be it Google's or somebody else's?

I mean why even bother obfuscating the URL otherwise, surely the expected that it could be caught in the review process.

Well, Apple has no qualms about it[0] so why should Google?

[0] https://dcurt.is/apple-card-can-disable-your-icloud-account

stop giving them ideas.

I don't think you're serious as most of these ideas aren't that novel / intricate, but in general I'd rather publicise these ideas so they're general knowledge and everything would be aware of the absurd amount of power Google levies.

;-) Happy New year!

You too =D

Given that Google runs ReCapcha - which is almost certainly the world's most widely used browser fingerprinting system - they have ample experience with is-it-a-real-browser cat and mouse games.

In fact if you use this API constantly, you are presented with a recaptcha.

In this case, Telegram may just display the captcha and let the user solve it.

That's likely also why Telegram doesn't proxy every translation request over their server: so that it is users individually requesting small number of translations, from their phone, getting around quota of free APIs "naturally".

Honestly those are so easy to circumvent. I built translation functionality into an IRC bot in the past and it was so easy to avoid recaptcha if you did stuff in a clean way.

I think they will look for a legal solution. I doubt they will change the API; my guess it it exists to support some other services google sells -- and adding heuristics to detect freeloaders may impact legitimate customers. Nope, they will get a cease and desist letter, and if they happen to use any other GCP services, or rely on Google to do business, this will likely be pretty persuasive.

> I'm curious what techniques they will use to differentiate between Telegram and non-Telegram users.

A cease-and-desist letter from Google legal tends to work pretty well as a technique in these cases.

I'm pretty sure Telegram is outside of US jurisdiction and can trivially respond with a middle finger.

Google's only real option here is to either engage in cat & mouse trying to block this usage or threaten a Play Store removal which comes with its own drawbacks (Telegram has significant marketshare).

Ummm, the UK has legal system too. I sure would love it if we could "trivially respond with a middle finger" to legal issues. This isn't Russia we are talking about here.

As someone who has often defended Telegram I am somewhat puzzled by this one.

While the legal aspects of this might have to be decided by someone more skilled than me I feel they are morally on the same ground as early Google and if Google makes a big case of it it might backfire spectacularly.

More interesting is it that Telegram sends user texts directly to Google without any proxying (did I get that right and has the author studied it carefully enough?).

This might (again, if this blog post is correct and I read kt correctly) be an actual dangerous move from Telegram. Unlike the problems that many here worry about regarding E2E-encryption, this can potentially drag Telegram down to WhatsApp levels, sending huge amounts of user data straight into Google.

Then of course, we'll need to see. Very much of what Telegram has done security wise is very well thought out and has improved over time.

Recently for example when I started my backup of one of the groups I participate in I had to confirm from a mobile client or wait 24 hours to start backup. Account recovery is almost automagically simple but has some nifty touches to prevent account hijacking. Settings to delete the account if I fail to log in has existed for years, I wonder if they even did this before Google launched it.

So now I am anxious to know if Telegram has done something brilliant again or if this is a turning point.

> Very much of what Telegram has done security wise is very well thought out and has improved over time

This is not my understanding of the situation at all. There's no end-to-end encryption by default [0], and the end-to-end encryption they do have received significant controversy at launch [1] for being essentially a "roll your own" crypto solution which indeed ended up being found to have some issues [2].

They disable the OS backup and instead they effectively store all their user's contacts, messages, media, etc. directly on their servers except for the conversations that the users directly opt out of by turning on e2e. They've promised since 2014 to open source everything but the backend, which stores all this data, is still closed source.

For small group or individual messaging, whatsapp, signal, or matrix are far better choices. I think it's worth acknowledging that telegram has a much bigger focus on large groups and therefore has to make different security tradeoffs, so I think if we consider telegram a social media service it's pretty good -- but is not the best messenger.

[0]: https://www.howtogeek.com/710344/psa-telegram-chats-arent-en... [1]: https://www.vice.com/en/article/wnx8nq/why-you-dont-roll-you... [2]: https://eprint.iacr.org/2015/1177.pdf

You are very precise in your criticism, have my upvote!

A couple of things:

1. the crypto has been improved significantly after the launch as far as I know. That release was back in the dark ages about half a year after WhatsApp got caught sending data unencrypted (and I'm using that word in its original meaning).

2. Can we agree to stop recommending WhatsApp soon?

Thanks! That's kind to say.

I shared some notes on crypto issues more recently in another post above, but I would concede its generally more battle tested than the first version released at this point. The choice to start with home rolled crypto at all continues to concern me, but more importantly the fact that it's not default is now my biggest sticking point if I'm honest.

I think WhatsApp comes with an asterisk in that I'd certainly recommend signal over it, but most non-technical people have never heard of signal so given the choice between WhatsApp and Telegram I'd personally opt for WhatsApp based on their e2e encryption by default, but I could understand if someone personally gave more weight to a distrust of Meta (even if encrypted) than they do to Telegram and made use of Telegram's secret chats.

Last I heard there is nothing wrong with the E2EE in Telegram at this point in time. OTTOMH, the stuff that people were complaining about wasn't anything that anyone would realistically care about, particularly for something not really that secureable like instant messaging on a smartphone.

The current version has no publicly known issues, but suspicion is the only valid response when an entity rolls their own cryptography for production when such well-studied and secure options already exist.

> suspicion is the only valid response when an entity rolls their own cryptography for production

Strong disagree. I would rather have multiple solutions in production, and Telegram's is also well-studied.

Most established approaches were well studied before they were put in production, while Telegram has had the luxury of beta testing their protocol in real time on their 500 million users, and the work isn't yet done with issues discovered as recently as this year:

> But ultimately they prove that the four key issues “could be done better, more securely and in a more trustworthy manner with a standard approach to cryptography,” said ETH Zurich Professor Kenny Paterson, who was part of the team that uncovered the flaw.


I'd suggest you actually read the paper[1] you're referring to, because it's actually a lot less critical of Telegram's approach than you make it out to be.

We are better off that Telegram exists as an alternative. We don't need Signal's protocol (also used in Whatsapp etc) to be a potential SPOF.

To quote: The central result of this work is a proof that the use of symmetric encryption in Telegram’s MTProto 2.0 can achieve the security of a robust bidirectional channel if small modifications are made.

1. https://mtpsym.github.io/paper.pdf

I have read this paper. Not sure where you feel I mischaracterized it. The section you just quoted goes on to discuss caveats, including a recommendation that they swap out their low level cryptographic implementation with one that uses a standard and well-vetted library, and then criticizes them for not having e2e on by default.

> for being essentially a "roll your own" crypto solution

This is technically incorrect. They use well known cryptografic algorithms for encryption and authentication. However, their protocol is unique combination of these cryptografic algorithms to provide specific purpose. It is different thing and different perspective than ”normal” use case for saying ”don’t roll your own crypto.”

"roll your own" has no official definition I'm aware of, so I'm not sure what you mean by "technically" incorrect. This could quickly become a discussion of semantics unfortunately.

To be specific with what I mean, it is "roll your own" in the sense that 1) their protocol combines cryptographic primitives in unrecommended ways with no particular justification and in the sense that 2) they literally have invented and written many of these primitives themselves.

As a concrete example, the IGE mode used by Telegram for AES is not "well known", in fact to the best of my knowledge Telegram is the only popular software in existence that uses this mode. IGE is avoided by all modern cryptographers because the authentication is broken, however Telegram uses this mode but not the authentication. There's no particular justification for this decision over just using an AES mode that everyone uses, but there's a lot of potential pitfalls if not done carefully.

When you ask what type of authenticated encryption scheme Telegram uses, their answer is "none of them". Normally this would be a huge problem. However, they have solved this with their own custom "security checks". These are also not "well known" because Telegram invented them.

When you ask what type of PKCS they use for padding, the answer is "we came up with our own padding algorithm". This was not "well known" because Telegram invented it. The padding issue they fixed in v2, since it turned out to introduce a vulnerability, but I assume you get what I mean by this point.

Contrast this with Signal, which is also ultimately a unique protocol but is one that was implemented using robust, tested primitives in recommended combinations with boring standard choices for all of the details.

An advantage of Telegram is that they attract less attention from the authorities. Also Telegram seems to have a more, shall we say, lax attitude about criminal activity.

> For small group or individual messaging, whatsapp ... [is] far better choice.

The whatsapp that leaks your data to the FBI in real time [1], or do you mean some other whatsapp?

[1]: https://news.ycombinator.com/item?id=29381917

as the first response to that post says the only thing that the FBI got was metadata if icloud backups were enabled, it's impossible for whatsapp to leak your actual messages because they're on device encrypted.

These comments on whatsapp, which appear with regularity by the way, are misleading and just inflammatory.

> it's impossible for whatsapp to leak your actual messages because they're on device encrypted

Have you seen the source code to claim there are no backdoors? Wait, you haven't because it's a proprietary piece of crap produced by the worst corp in existence - Facebook aka Meta. The same entity that's shamelessly involved into Putin's Russia level political censorship and yet we are to trust it according to the parent comment. Right...

This is entirely false, stop spreading disinformation.

Facebook shills are strong here, I see. Here you are: https://propertyofthepeople.org/document-detail/?doc-id=2111...

Excerpt from the FBI report:

> Pen Register: Sent every 15 minutes, provides source and destination of every message. [without a warrant of any kind]

(And who knows what other three letter agencies can get.)

BuT ThAt'S jUsT mEtAdAta, wE hAvE nOtHinG tO hIdE. Then keep using it, but don't spread your bullshit to others.

Telegram is great if you like shiny native features like stickers and having lightweight native clients, but at everything else Telegram is at risk of losing in the long-term.

The big reason for this is that Telegram decided to roll everything mostly on their own (including e.g. MTProto), Telegram is not compatible with Matrix unless you use a bridge, it is not e2e encrypted (unless you use mobile 1-to-1 secret chats. The server side code is proprietary, and the builds of the clients that are published to the app stores could be anything.

While I love using Telegram right now for talking to some groups of friends, I would look at supporting https://matrix.org , since it will likely become the de-facto standard of building messaging platforms.

>Telegram is great if you like shiny native features like stickers and having lightweight native clients, but at everything else Telegram is at risk of losing in the long-term.

Whatever ends up winning is going to need:

  - Native clients on all major platforms
  - Full support for all the fun little extra's like emoji's, reactions, gifs, file transfers etc.
  - True multi-device support that doesn't require any sort of forwarding from another device
  - Group chats
  - Searchable history
  - Your full history to automatically load when you log in on a new device (manually transferring isn't going to be an acceptable solution)
  - No concept of selecting a server or anything. Users need to be able to just log in with a username/password and carry on. 
  - E2E encryption that doesn't sacrifice the user experience
Anything missing from this list? Also, does Matrix support all of that? Last time I checked Matrix out it seemed clunky and confusing (especially for non-technical users) and it was missing a ton of the 'basics' that people expect out of a chat app.

I think I can speak to how Matrix deviates from your list:

- There are technically native clients on every platform, so best kind of correct? However, the "official/main/most popular" client is Electron on Desktop. Partial credit?

- Yup

- Yup, even when using E2E, which is a hell of an accomplishment. You transfer keys from other devices, but not entire messages.

- Yup. E2E or not, your choice.

- Searchable history plus E2E is... hard, to say the least. Some clients will index your conversations while they happen, but that's obviously not the perfect solution. That said, the APIs are so open that I've written python scripts before that download and search entire rooms. It would be possible for a client to do the same, though I don't think any do. Non-encrypted rooms are trivial to search, or course.

- This as well. As before, keys transfer from other devices, messages load from the server.

- This seems like it was engineered to exclude Matrix. The default in every client is matrix.org, and there's no reason you ever need to change it if you're not concerned with it. In fact, most clients make it a couple clicks to change it (https://app.element.io/#/login).

- Not totally sure this is possible, but Matrix comes very close. On par with Signal, though with different tradeoffs (stored history, for example).

- The native clients for Matrix suck. Even the mobile clients for Matrix are full of bugs.

- No custom emojis; every chat application known to man has regular emojis supported in UTF-8, so the author must be talking about custom ones. Which Matrix still does not have: https://github.com/matrix-org/matrix-doc/pull/1951

- I don't think doing what PGP does is really impressive, but okay, fine, one point.

- Matrix group chats are broken and this is why Synapse eats resources like a bear.

- No searchable history on all but one Electron client on one platform when using E2E is terrible, and further supports the argument that all clients suck.

- Point; this is pretty convenient.

- XMPP sucks. Matrix is modern XMPP. People don't like getting confused with servers and similar nonsense, and when your homeserver goes down, you're out of luck. Federation sucks. The question wasn't made to exclude Matrix, it was made to point out that federation sucks. Matrix didn't invent federation; it chose it long after it failed.

- E2E degrades experience greatly. To list my two biggest complaints: It ruins search for all but one client, and the UX around keys is terrible. I frequently have conversations with incredibly technical people and they'll still get absolutely stumped by the UX around keys, because it's awful.

Two out of eight isn't bad.

I use Matrix every day. I have for years; long before the recent rebrand, and multiple presidents have vacated office since I started using Matrix. I love Matrix. But there's no reason to act like it's some golden goose when there are problems from 2015 that are no closer to being fixed than they were at the time. It's a comfortable protocol for usage by people who have powerful computers. For everyone else, it still isn't great.

> - I don't think doing what PGP does is really impressive, but okay, fine, one point.

it's more than PGP, it includes variable PFS, automatic key exchanges

> - Matrix group chats are broken and this is why Synapse eats resources like a bear.

I have heard it's because Synapse is a proof of concept that went into production

> Federation sucks. The question wasn't made to exclude Matrix, it was made to point out that federation sucks. Matrix didn't invent federation; it chose it long after it failed.

I disagree. Federation is a burden, but it enables interoperability between independent parties.

> But there's no reason to act like it's some golden goose when there are problems from 2015 that are no closer to being fixed than they were at the time.

There's also no reason to do the same thing into the other direction.

I also heard that Matrix will run off with your wife and kick your dog. It's a re-implementation of BoziBuddy, and will fail for the same reasons.

Signal seems to have everything you listed except for the full history transferring to a new device (works on Android but not as well with iOS). Though like you implied, it is done manually. I'm still not convinced that this can be done without making major sacrifices to security because I really do want a trustless system where my messages aren't stored on a company's servers (I am okay with optional backups though, so there's a middleground).

> No concept of selecting a server or anything.

This has been one of the big concepts that has bugged me with Matrix. It also is why I'm confused with why people pit Matrix vs Signal. Honestly I see Signal/WhatsApp/iMessage/WeChat as competitors whereas Matrix/Slack/Teams/IRC is a different ecosystem. But I can't get my parents or grandma to use something like Matrix (or even Slack) but they are able to use things in the former category. In fact, this has been one of the great successes of WhatsApp (looking at India with all the aunties and uncles using WA or China with WeChat).

> Anything missing from this list?

- Pinned messaging

- Other class of extras/plugins: on-device translation, calendar reminders, etc

Pinning messages is important for search, but seems to be overlooked frequently (I use this a lot in slack). I often know something is important and need to find it again in a day or two (e.g. traveling) but will also be talking with the other person and that message gets pushed. Pinning lightens the load of searching. It also lightens the load of backups as most people truly want a very small subset of their messages saved but are only aware of an all or nothing approach.

Plugins will be important as well. To complement pinning calendar reminders are great. Google does stuff like this frequently like when you get an email about a flight and then your phone's home screen will have all the information on it. It's also naive to think that you can think of all the things people would want. That's why smartphones have been so successful, because they provide the ecosystem. This isn't too dissimilar from creating a super app. But there's none where the ecosystem is fully secure.

> Signal/WhatsApp/iMessage/WeChat as competitors whereas Matrix/Slack/Teams/IRC is a different ecosystem. But I can't get my parents or grandma to use something like Matrix

I can't get why people need to put Matrix in either Box. It's a communication protocol. Client UX is completely independent, like you can have K9Mail and Thunderbird

Signal desktop clients are not native. They use Electron and are much slower than Telegram.

That's true. Something I wish they would fix but I think now it is tech debt.

- An option to not have a password at all and log in just by having a phone.

It's a 100% must have feature for a phone IM, most people will forget a password the very moment they are forced to create it.

I agree that it's a great option, but unfortunately it's also not secure at all. You're what, one SIM swap away from having your whole chat history owned?

> Your full history to automatically load when you log in on a new device

Most people don't actually need full history in most situations, just recent history.

If you can do recent history, you can probably do the full history.

I personally search my deep history regularly. I might be looking for a recipe, link, someones contact information, an address... There are many reasons having the full history available is important, and "losing" it by getting a new device is a terrible user experience.

Telegram the app, is the best among all messenger apps.

Telegram the company, maybe not.

It is best because they don't bother encrypting user data, loading it directly from its own servers.

I meant feature-wise, the app is spectacular, but security-wise, it's not that trustworthy.

Exactly. Features are way easier with everything plain text.

Matrix still has ways to go but Elemental is now actually usable.

On the Telegram security side of things my group of friends uses it as a more modern IRC. So no NSA proof security is truly even expected. We even bridge some IRC channels to Telegram with bots.

> and the builds of the clients that are published to the app stores could be anything.

Isn't Telegram one of very few that provides verifiable builds, (including on iOS if you root it)?

I might be wrong but I think not.

Edit, see : https://core.telegram.org/reproducible-builds

So it seems even on this point Telegram shines.

Telegram doesn’t even post their source code to match their releases on macOS and iOS. Sometimes they’ll do a code dump somewhere around that time, but it’s not a guarantee.

In the same boat as Signal there.

WhatsApp doesn't post source code at all.

> Very much of what Telegram has done security wise is very well thought out and has improved over time

Though I'm certainly not a cryptography expert, I used to work on Tails OS and some Tor-related projects, and I feel I know where/how to listen to the experts.

Having said that, I am a hard disagree on the quoted statement.

My understanding is that there has been very few improvements that they weren't dragged into. imho telegram is a reckless tool from a cryptographic point of view, and still highly suspect

> My understanding is that there has been very few improvements that they weren't dragged into. imho telegram is a reckless tool from a cryptographic point of view, and still highly suspect

Again, I cannot speak about the crypto but I can speak about

- bugs: last time Telegram had a know bug where data could realistically leak except inside Telegrams infrastructure (which of course is a big deal) was around the time it launched as far as I know. Looking at Signal (which I recommend for anything super secret) they've had a couple of really nasty bugs in the much shorter life time: RCE in desktop client and spuriously sending images to persons except the intended recipient is just two.

- things they haven't been dragged into: anti-hijacking, deletion on account inactivity, working backup, not syncing secret chats between clients and more.

> without any proxying (did I get that right and has the author studied it carefully enough?).

Most likely, since the user-agent rotation code is in the app itself. If it were a Telegram proxy, the proxy would do its own UA and IP hopping and the clients would use their default UAs.

At a certain point, I wonder why Google's abuse team don't simply look for 3+ occurrences of User Agent strings because UA rotation is rarely used for legitimate purposes.

> UA rotation is rarely used for legitimate purposes

It’s not uncommon for hundreds of users to share a single public IPv4 IP address through an ISP-provided NAT. The same applies to corporate LANs with a single uplink channel.

These users gonna have random UA corresponding to market share of web browsers and operating systems, all coming to the same web server from a single IP address.

I mean on the play store side, where they scan app submissions for TOS violations before they even hit the store. UA rotation on the client side is rarely used for good.

HTML requests are just text, how would you even go about scanning for that?

As from the blog post, the source is public[0] and the Android review process is almost entirely automated static/dynamic analysis of apps submitted, so it wouldn't be super hard to find UA-like strings and have some elevated manual review if there are a lot of them (if they decided to implement this sort of abuse policy).

0: https://github.com/DrKLO/Telegram/blob/c1c2ebaf4690fd91c116d...

Google very likely only scans the input application, I'm not sure why you would bother with an automated system to detect a code repo for it when the majority of applications on the Play Store are closed-source and there's a low confidence if the builds are not repeatable.

Anyway, regardless of that it sounds like it would be easily defeated with the following C format string:

"%s-%s: %s/%s (%s) %s/%s %s/%s"

with argument list:

"User", "Agent", "Mozilla", "5.0", "X11; Ubuntu; Linux x86_64", "Gecko", "2010000", "Firefox", "90.0"

For bonus points you can make those floating points, too, and split it up a bit further. Now nobody can scan for this without a lot of false positives (The strings are going to display in anything that embeds a web browser or references it, lol) and you get ultimate flexibility.

Why would they? This is the Google Play Store, not the Apple App Store with all its inane rules.

I think that’s only possible if they ban TCP/IP for play store apps, enforce that in the OS kernel (SELinux can probably do), and instead expose the one and only high-level HTTP API for apps.

These days Google isn't up to much on a technical level ;-)

They cannot even fix the old verbatim feature that they broke a few years ago, so how should should they be able to stop this without breaking something else?

Yep, this is somewhat hyperbolic but I'll write it anyway. I want my old Google back.

Tools -> All Results -> Verbatim is confirmed broken? I thought something was fishy...

Hasn't worked consistently for me for years.

To be precise: I think it work sometimes, but I know it doesn't work most of the time unless verbatim means something completely different from what I think ;-)

You can verify this quickly by searching for something slight unusual or very specific, apply verbatim and verify that most of the results still doesn't contain your words.

it's my understanding that Telegram won't automatically translate messages unless the user chooses to click the translate button, and the option to enable the translate button does disclose that translated message content is sent to google.

"It's a bold strategy, Cotton. Let's see if it pays off for 'em"

Deciding to use the Google Translate API in a way that bypasses Google's API-key system seems like a dangerous game. Google controls your access to the Android platform† and now that this blog post has been published, it seems like Google could remove the app from the Play Store for unauthorized access of Google services.

If they'd found a way to use an API from some third party, maybe that third party would try and shut it down or whatnot. In this case, it feels like they're poking the bear - especially given how much traffic they might throw at it. At some point, Google might get annoyed that an API that they charge a lot of money for is being used for free and somewhat legitimately remove Telegram from the Play Store. Google can pretty legitimately claim that the Telegram app was accessing Google's servers in an unauthorized way and that they went through steps to obfuscate their access which shows that they knew what they were doing was wrong and tried to hide it.

This seems like a bold move. Google might simply shrug and not care. Google might decide that they'll remove Telegram from the Play Store permanently. Google might decide they'll only allow Telegram in the Play Store if it doesn't have translation features. If Google removes Telegram from the Play Store, that's basically the end of Telegram. As people bought new phones, the number of people reachable on Telegram would dwindle‡. As the app no longer could receive updates, eventually it would become old and stale. They'd have to start moving to another platform whether WhatsApp or Signal or Matrix.

†sure, other stores and side-loading exist on Android, but Google does control access for the vast majority of Android users (at least in the US/Europe).

‡yes, maybe one can transfer apps and side-loading does exist, but the number of users would dwindle

Personally, I think Telegram has enough reach that removing/blocking it would get a lot of unwanted political attention against Google - I mean, a lot of politicians and decision makers seem to use Telegram depending on the country. So I don't think they would do that.

I think they will just figure out how to break this feature.

Telegram is also popular within the crypto circles and those are a lucrative target for advertisers which Google's business relies on. Banning it from Android may shrink that market.

I think it's possible construct to construct a (very weak) argument for the random user agent rotation, but why split the spring if not to avoid being flagged.

On the other hand, I find it hard to believe that Telegram would risk a Play Store ToS violation, given how many tens of millions of users use the app.

Pretty sure at the point you have over a billion(!) installs, even Google affords some leniency towards its Play Store policies. Or at least we are about to find out anyway..

Meanwhile, indie developers with smaller user base are subject to unappealable automated decisions.

Telegram is well-known for operating in grey areas.

I'm not sure if Google will start flagging the IP addresses of the users because of each request having a different agent. That would render normal Translate unworkable for them too!

These users will just say "google translate down", shrug and go to Bing or competing translate services.

Isn't Google going to move to always having the user agent be the same anyhow? They've already decided to break that contract with the tech community, so I don't see that they have much room to talk there.

That contract was torn up a long time ago. Or do you really browse the web with "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"?

Would deepl cheaper?

On one hand, it's quite asshole-ish. On the other, google is serving broken frontends to their services and charge ridiculous prices on their API's. When I tried to make a third party search using google engine, I've exhausted the limit in less than an hour. It'd cost me like $40/mo to get what I get for free using their crappy frontend.

> On the other, google is serving broken frontends to their services and charge ridiculous prices on their API's.

How does that make this okay? Nobody is entitled to get a company’s services for free just because you think their price is too high or their front ends aren’t built to your liking.

You could use the "turnabout is fair play argument". If you publish a web page, and don't specifically block google, they scrape your content, and use it for their own purposes. And even use it for "rich snippets", products other than search, etc. You're basically doing the same to them...using their content for your own purposes until they specifically block you.

Disagree. The web is clearly architected such that publishing a webpage makes it public and crawlable. You don’t “block Google”, you specify that the site is not for crawling in robots.txt according to well-known standards. This is all basically the contract of the internet and it shouldn’t be surprising to anyone.

Google specifically does not publish their API for free consumption by other companies, yet that’s what’s happening here anyway. The company is also using specific tricks to circumvent detection of the behavior.

In your analogy, this would be like a crawler ignoring robots.txt and then scraping the content for their own website with zero attribution to the source, which is nothing like Google indexing your site with full attribution and driving traffic to it for you.

Regardless, “turnabout is fair play” is unequivocally not a legally or even ethically acceptable standard, so that argument wouldn’t actually hold up anywhere anyway.

I don’t understand your argument. There is no actual “publishing” of web sites or APIs on the web. You simply make something available at a URL, and it’s up to anyone else to discover that URL. In this regard, your personal web site is no different than this Google Translate web API.

> this would be like a crawler ignoring robots.txt

Google ignores the noindex directive in robots.txt now. You're supposed to put it in your HTTP response headers or HTML meta tags...

`noindex` was a Google-specific rule that was never officially documented nor supported. I think they were perfectly entitled to withdraw support for it, especially considering there are alternatives.


"driving traffic to it for you."

I did mention rich snippets.

Google isn't entitled to get my personal data for free to, yet they do it anyway.

"Nobody is entitled to get a company’s services for free just because you think their price is too high or their front ends aren’t built to your liking." Tell this to Google!

Like telegram did with the translate api, there is also a way to have an unlimited api for search results. You have to find one of the old mobile pages of google.

The following predictable chain of events will happen. Someone working at Google will read this blog post and report it internally. Google will contact Telegram and inform them that they are violating the Play Store agreement and could they please use the official API instead. Telegram will remove the feature as they can't spend the GDP of the Earth on translations. The end.

Another plausible alternative is that Telegram embeds an open source NLP model in the app so translation can be performed locally (offline).

To be fair we know that mobile hardware resources can be extremely limited, and I’d wager a server side model will always be bigger and therefore better.

But recently there’s been a lot of amazing progress in techniques to shrink large models down while preserving most of the accuracy (eg quantization aware training, etc).

With some additional constraints on scope (maybe only supporting the handful of languages the user needs?), I believe a sharp team of a few experts could deliver this fairly quickly, with reasonable results that would of course improve over time.

Google also provides an official on-device translation SDK that is free-as-in-beer (https://developers.google.com/ml-kit/language/translation).

The models are around 20M for any single language so there is a not insignificant cost on the user in terms of delay for downloading, data usage and disk space.

Disclaimer: I work for Google and worked on the ML Kit Translate SDK, but I don't speak for Google.

Is integration with some Google Services App required or do those models also run on de-googled Android distributions?

ML Kit only runs on devices with Play Services, sorry.

Don't know why they can't just roll their own... At this point you could probably even do reasonably well on-device.

The entire dev team of Telegram is less than 30 people. That's why.

I guess this would also be in a legal grey zone/break TOS, but I wonder if they could reuse the offline models the official Google Translate app downloads.

> worst case you could just self host a translation service

We do offer an official SDK that does that, see https://developers.google.com/ml-kit/language/translation

(I work for Google but do not speak for it)

I would trust a 3rd party on-device service slightly more, given that Google's incentives aren't exactly aligned with making that service high quality.

I don't think their incentives are exactly misaligned with it being high quality, either. If better translation can get folks to see more ads, they might not care much about losing out potential api-based translation revenue.

What? How would this violate ToS? iirc even the Google translate app does on device translation.

And worst case you could just self host a translation service.

Google's translation is good, but it's not that much better than what you can get OSS.

e: Misunderstood your comment (I believe the quote was at the top and you edited), now I see that you were referring to your idea as the potential ToS violation. I agree, you can't violate IP law in your app.

As in, the translate models are owned by Google and AFAIK they don't allow anyone, even other developers, to use them. It's against ToS to use intellectual property/code/ml models/etc without permission from the owner

> 11.2 If You use third-party materials, You represent and warrant that You have the right to distribute the third-party material in the Product. You agree that You will not submit material to Google Play that is subject to third -party Intellectual Property Rights unless You are the owner of such rights or have permission from their rightful owner to submit the material.


> Google's translation is good, but it's not that much better than what you can get OSS.

Disagree on this one. For the languages of the European Union and a couple other outliers, sure it's close. For the long, long tail of languages software developers typically don't care about and are outside the wealthy world Google is the only thing that is remotely intelligible.

Reminds me of this thread a while ago:


It's smart.

It allows Telegram users to hide in plain-sight, within the noise of other Google Translate web users.

I'm pretty sure that using the official pre-built java SDK, as suggested by the author, would allow Google to cluster the content of Telegram users (since app-specific id/token should be sent).

Other than that, a great read and kudos to the author for shedding light on it.

Edit: typo.

I think Google can still cluster Telegram users pretty easily, especially now that that the method is in the open.

Yes, Telegram fakes the user-agent, but the rest of the request still looks very different from a request an actual browser would do. (No referrer, missing headers, different connection pooling behaviour, possibly different TLS and HTTP2 behaviour, etc).

So if Google is doing any detection for browser vs non-browser requests, those requests should show up as suspicious.

If they used cronet, they could get past these checks.

> It's smart.

On the contrary - it's the most stupid thing to do. The only result will be their users wondering soon why this function is broken.

If Telegram or Google users would pay for services, they wouldn’t treat them like the product being sold.

It doesn't look very well hidden if there are blog posts about it...

The users are hiding among all the web traffic to translate.google.tld; not that that Telegram's doing this is top secret undiscoverable magic sauce. It's open source (GPL2): https://github.com/DrKLO/Telegram/blob/9e740dfd4d2b1ab6b8ed2...

A rotated user agent does not hide anything from Google.

Shell game street fraud (with cups and balls) is also "smart" in some way, but it's not really the right thing to do.

Telegram should have disclosed that every time someone uses this feature, their IP address is leaked to Google.

also the content of the translated message is leaked in plain text too?

That is disclosed

I think they already do: https://imgur.com/a/7UIFLxT

Well, the plain text, not the IP, but that should be implicit with how web services work.

I wouldn't say it's obvious to either laymen or technical users, some apps will proxy a web service for the exact reason of hiding user info.

Telegram includes google services baked in to the app for things like maps.

There is a Telegram-FOSS fork with most Google services removed but the translation feature is not removed (yet?) in the code.

I doubt users of Telegram care much or they wouldn't use Telegram in the first place.

Ooooh, GDPR?

Do you really expect Telegram to get the GDPR hammer when Google and Facebook are still around and do orders of magnitude worse?

Easier target?

I really don't understand this. Is Telegram a legitimate app? If so, then why are they attempting to rip off other companies' work without paying them? You want an integration with a translation API? Then pay a fair price for one, or build your own?

If Telegram really can't afford an integration, just make a translate button that opens a link to https://translate.google.com/?sl=es&tl=en&text=API%20de%20tr...

Edit: not to mention the privacy implications of sending messages to Google.

Okay, how about I open an embedded WebView of https://translate.google.com/?text=..., with the viewport carefully set to only show the translation results?

I used something like this years past for image resizing, the URL was: https://images1-focus-opensocial.googleusercontent.com/gadge...

It is now blocked, always responds 403, maybe tweaking some request parameters can make it work again.

Edit: if you want to try it out the parameters I used were:

- container: focus (there are other values I cannot find anymore)

- url: urlencoded URL of the image to be resized

- resize_w: width in px

- resize_h: height in px

It still works, just not with a direct url. It's checking that there's a referrer, I think. So that it only works if the image is in a web page:


One thing I don't see mentioned here is that the Google Cloud version of Translate is actually different than the user-facing one at translate.google.com. At least when I tried it a year ago, the Google cloud version was vastly inferior. I suspect it has to do with licensing agreements around certain datasets. Very curious if anyone knows more on this...

There are bound to be duplicate phrases for translation over all the many Telegram users. Why not cache to avoid API calls. How many times do you have to use the API to translate "OK" or other commonly used words.

Visiting a publicly available web page doesn’t create contractual obligation between end users and web server owners.

If Google views what telegram doing as abuse, then how it’s different from what end users are doing while interacting with https://translate.google.com/ web page? Especially if these end users are running an ad blocker or two in their web browser? BTW, uBlock origin blocked 4 pieces of content on that web page.

Because Telegram has their app in the Play Store, which does require legal agreements, that presumably forbid this sort of thing.

Quite a lot of libraries exist to do this. But doing this in an app with a large user base looks offensive. Solution would be for some decent open source translation APIs to appear.

There are plenty of open source, decent translation APIs, the catch being that obviously nobody is going to pay for your translation compute.

Any ideas which ones are the best? I would quite like to use a good open-source api, am happy to pay for my own compute. (I've just used googles before because it was handy. But I wouldn't use it the way telegram have done.)

It depends if you have pretrained models or not.

Huggingface offers a few models that are pre-trained [0], OpenNMT is a good framework [1] as is MarianMT [2]. Many of the best MarianMT models have been ported to HuggingFace.

If you don't want to self host, huggingface offers these APIs at a cost.

[0]: https://huggingface.co/models?pipeline_tag=translation [1]: https://opennmt.net/ [2]: https://marian-nmt.github.io/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact