Hacker News new | comments | show | ask | jobs | submit login
Amazon Is Downloading Apps From Google Play and Inspecting Them (rajbala.com)
310 points by rajbala on Mar 29, 2014 | hide | past | web | favorite | 109 comments

This seems to be the natural outcome of Amazon's excellent customer service policy, where they have on numerous occasions[0] refunded costs for hacked instances. When they commit to that policy, they have a huge incentive to limit customer security breaches.

I love examples like that where a company's policies result in incentives that are so well-aligned with those of their users. Does anyone have other good examples to share?

[0] https://securosis.com/blog/my-500-cloud-security-screwup and http://vertis.io/2013/12/17/an-update-on-my-aws-bill.html are two examples.

I had this idea that Amazon was this huge corporation that I'd never be able to get through to a human with. I spawned a bunch (4 actually) of instances in some region and forgot about them. Due to their bad ui (at the time), I couldn't see them when I logged in so I ended up getting charged heavily for them. Called up and they refunded me. Turns out I never terminated the instances. Just stopped them. Next month, I get billed again. Given the misunderstanding on my part, they more than happily refunded me and renewed my free year of the micro instance.

I really like their customer service and hope it stays how it is. I think if Walmart or insert big box company here did stuff like that, people would shop there more. Look at Nordstrom's with their insane refund policy (in the customer's favor). I think Costco does something like this as well, but I'm not a cardholder.

> Facebook has concealed the profiles of anyone on the social network who used the same email and password combination as those exposed after the recent Adobe hack


Facebook at that point had a huge incentive to avoid those accounts from being taken over by spammers.

Credit card companies are a good example of the this too. Since they're responsible for fraudulent charges, there's a huge incentive for them to detect them.

Although Walmart in their lawsuit against Visa for fees pointed out that Visa is slowing the adoption of security features in the US. http://www.foxbusiness.com/industries/2014/03/27/wal-mart-fi...

Actually it's usually the merchant who gets screwed.

Banks are liable for funds withdrawn from personal (not business!) accounts without your authorization.

They are not financially responsible. If there's a chargeback, stolen credit card, etc they reverse the charge on the merchant. In other word the merchant pays for the items purchased with stolen credit cards. All the credit car companies need to do is keep theft low enough so that the perception on them is good. Make no mistake the crest card companies do NOT pay for items purchased with stolen credit cards, the merchants do.

I don't think they are inspecting the app ; they don't need to. They can see that there are a higher-than-average number of API accesses from a given platform, using the AWS Secret Key as the login credential.

I think this is a more plausible approach (and likely cheaper resource-wise to implement). Though, if the authentication requests somehow identify the Android app in question, it might be easy for Amazon to then perform a follow-up and download the app to verify their suspicions and avoid false-positives.

I dont think it would be that hard to automate decompilation and fire off an email after looking for secret keys. Either way it's good to see.

This exists as a service - http://apkscan.nviso.be/ does a scan of your APK and registers requests to servers, lists hardcoded strings and other possible issues.

I received a similar email about AWS keys checked into a public github repo. The email was very specific about this being the issue, so I suspect they were crawling and not merely detecting a strange usage pattern.

I don't think they're looking for higher than average API calls for a given key because my charges were completely expected.

API calls from different IP Addresses plus any other information that gets through in the request packet.

raj, just saw your post on here. I was wondering if you were the same guy Dennis in Delaware was trying to connect us to. We were doing the large scale touchscreen collaboration stuff

Wait, there's someone in Delaware besides me? Unpossible!

Nope, not I. :)

Ok thanks lol

But then how would you automatically identify the app name as shown in his email? I think it's easier to download the app. Only ~1m android applications out there. You can probably crawl the android store with just 1 or 2 servers using the algorithm I described below with a trie. Identifying patterns from random data seems way harder to get right, but maybe that's because I have a CS background rather than a statistics one.

Most Android http clients put the package name in the user agent header. It's trivial for Amazon to find the app as a result.

Also, Amazon would use Aho-Corasick if they were really in the mood for violating the Google Play TOS.

They clearly say that they've detected my access credentials in the app. There's no way to associate my credentials to my app without downloading the app first and inspecting it.

Perhaps, but the assumed order of events here might not be what you think. I doubt that Amazon is scanning every app in the Android app store to find unsecured AWS credentials.

It's more likely that they have an alert when the secret key is used to access an account from many different IP addresses, and looking at the user-agent string in the HTTP headers probably pointed them towards an Android app.

It wouldn't be too hard for them to then look to see who owns the AWS account and then search for that person's name in the Android app store.

Facebook does the same thing. I got a notice about an application I published years ago in March:

> Security Notice - Your App Secret

> We see that your app, XYZ, is embedding the Facebook integration’s App Secret inside the Android Play Store app bundle for your app. This is a serious vulnerability that violates our published recommendations for proper login security. Someone with access to the app secret for your app can act on behalf of the app - this includes changing configurations for the app, accessing some types of information associated with people who have granted permissions to the app, and posting on behalf of those people.

> To mitigate this sizable risk, we have reset the app secret for your app. If your app is mobile-only, this should not cause any issues. If it has a server-side component, there is a greater likelihood that it has caused some issues for your app that you will need to address. Going forward, please do not include the app secret in your app bundle, or disclose it publicly. You can read more about app secrets and how to secure your Facebook app here.

Now this is interesting. Could we imagine a service that would be in charge of protecting your customers secrets?

You would provide a list of secret strings, and ask to have them monitored on search engines but also from mobile applications, browser extensions, published JARs etc.


But seriously, treat them like passwords.

Don't have the service store the secrets, have the service store hashes of the secrets with a regex for prefiltering (because hashing every word everywhere would be prohibitively expensive).

> Don't have the service store the secrets

Why not? You can use the service to make sure it doesn't leak its own secrets so it's safe ;-)

But seriously yes I really like your approach.

You could even provide a second set of API to do the opposite: given a block of text see if there's any sensitive string inside. Google & co could use it before publishing an app in their Store.

Either way you still have to trust another third party to keep your secrets safe. Even if the secrets weren't publicly leaked, any comrpomise to this service affects any service whose keys you have stored there.

That's actually kind of awesome. Good on Amazon for taking security seriously.

That's exactly how I felt. And they wrote such a detailed email with helpful links and everything.

If they are really downloading apps to inspect, I suspect the email is an automated template.

But someone went through the trouble of writing the template. Most companies wouldn't even bother checking apps in the first place.

I understand perfectly how people end up mistakenly pushing credentials into public source repos when releasing server-side stuff. But I don't get how a seemingly sane person develops an application intended for distribution to the public which contains AWS credentials.

At what point in your development process do you say "I want this application, which will be distributed to unknown persons, to contain the means to control my AWS account."?

Some people don't realise their apps can be decompiled, it's not a question of sanity.

I don't think I've ever encountered that particular illusion in anyone making a living off writing compiled code, only very new developers and non-engineer managers. It's one of the few securityish things that seems to be successfully beaten into everyone's head pretty early on.

(And more often than not, they get there all by themselves -- such people usually appear on my radar asking questions showing they've figured out for themselves that it's a bad idea, they just need help turning that knowledge into practice.)

I give talks at development conferences where I mostly blow people's minds by showing how easy it is to pull information out of binaries (both statically and dynamically); there are always tons of questions from the audience afterwards about "but we do X? doesn't that make us safe?", so I have to sit there shooting down a ton of silly ways of obfuscating their data, showing how each one could be defeated, but it "clicks" for everyone that there is no safe way to do this.

I think this happens all_the_time. Amazon has a solution for managing their access credentials that I admittedly wasn't using, but how many vendors do not?

If you're using a web services API from a 3rd party that requires developer authentication keys you may be storing those keys in the code because there's not a great alternative.

The obvious and universal solution to APIs that don't have the kinds of facilities AWS does is that your app does not talk directly to the third party. You construct your own API that runs on your own servers and permits only those operations the users are supposed to be able to perform.

It's not an "alternative", it's the correct solution.

In fact, you still have to do a lighter-weight version of it with AWS -- you need an API to generate and hand out the restricted keys to your apps.

With a few very rare exceptions, you don't use third-party APIs as a complete substitute for building your own services, you use them to make building your own services easier.

I'm not necessarily disagreeing with you, but it's not always practical and perhaps not even possible.

For example the push notifications SDK from Urban Airship and app analytics SDK from Flurry depend on having credentials stored in the app.

These examples are not unique to them. I don't disagree that it's wrong, but I don't know how to work around this to be candid.

Those are examples of AWS-like facilities. The embedded keys are not secret credentials that allow people to control your account! If you are embedding your account credentials from Urban Airship or Flurry in your app, you are badly misusing their APIs. They provide facilities for generating certificates/keys for each application.

Urban Airship actually instructs you to create a plist file for an iOS app where you specify your production app keys.


The point is that these keys do not let you control the account: they only let you inject potentially-fake data; if these keys also let you register new applications, delete data, download data, or send information to third-parties, then that would be a serious problem. (In the case of Urban Airship, as opposed to Flurry, I don't know as much about the specific use case, but it would surprise me if the scenario were drastically different.

I'm not embedding account credentials for Flurry and UA in my app. I embedding app keys and while those don't allow someone to take over my account they could certainly wreak havoc with push notifications.

I once worked on a website that had a "sql.asp" page where you could enter any arbitrary SQL into a textarea and submit it. The ONLY security it had was the obscurity of its URL.

This was implemented deliberately and with full knowledge of managers and developers.

Stuff like this happens....

I do not think that anyone ever said that. More likely, they just put keys in string to do quick test right now with intention to fix it later. Then they simply forget about it and later never happen.

Anyone who reads the article can see that the author is drawing conclusions from conjecture.

"We were made aware" does not equal "we are downloading apps and inspecting them."

If they were doing that, that would be great! But let's not leap to conclusions.

They (or someone working with them) would have had to download the app and inspect it. They clearly tell me that they've detected access credentials in the app itself.

That's not conjecture.

I see nothing wrong here. They are probably doing this now because it in fact a major problem, even with large, professionally developed apps. About 8 months ago I did a brief analysis of the then-current Vine apk and relatively quickly extracted their S3 credentials (they were not stored in plain text, but close enough). Very bad idea.

The number of valid EC2 keys you can find with a simple GitHub search is mind-blowing (or was, at least, when I tried it a month ago).

I'm not implying that there's anything wrong. Quite the contrary actually.

I reported this to them over a year ago, and was told they fixed it. Never bothered to check again tho.

MixRank analyzes mobiles apps (android and ios) and we often see apps with embedded api secrets, private keys, and passwords. It's really surprising.

If you'd like to send an email like this to your users, send me an email (in profile) and I can query our database and check to see if any of them are including their api keys.

Couldn't they just look at the user agent and know that the hit to their API is coming from an Android device rather than a server?

http://developer.android.com/reference/java/net/HttpURLConne... might not have a default User-Agent header that identifies android

Of course a developer could change this, but yes the default user-agent string for an Android app using HttpURLConnection identifies it clearly as Android: http://www.gtrifonov.com/2011/04/15/google-android-user-agen...

This is a bad idea.it's easy to change the user agent to whatever you want.

There is no reason why it would be a bad idea.

False positives (people who are legitimately using AWS credentials from their phone for some reason, or somebody who is legitimately using AWS credentials from their computer but with an incorrect useragent for some reason) would cause an inconvenience as time is wasted to inspect it, but ultimately little harm would be done.

False negatives (improperly using AWS credentials but with a useragent that looks reasonable) would not be a deviation from the status quo.

You don't need 0% false negative and false positive rates to make this sort of sanity checking worthwhile. Even if you only find a few of the many instances of improperly used credentials, you're better off than if you had done nothing.

(Of course there is the issue of correlating misused credentials with the specific application that is misusing them. I don't know how that is done if they are basing their investigation off of useragents.)

Most probable.

They did a good thing, title feels slightly misguiding. Could they have figured it out based on API access locations being random?

What's misguiding?

No intention to misguide. I think it's completely accurate. They downloaded my app, inspected it, found AWS credentials and emailed me as a result.

I think the misguiding part (or at least, what the tone suggests) is that the order of events is 1) Amazon downloading and scanning all apps and then 2) looking for AWS credentials in the code.

I think that is what's happening. They would have no other way to identify my app and my AWS credentials otherwise.

I wonder how they would identify a string that appears to be an API secret, and queries their database for it. For every plausible string in every app? I guess they decompile it and find string literals of the correct length?

AWS knows the clients that are connecting to it. All they have to detect is that a large amount of traffic is coming from a wide distribution of mobile devices. This is indicative they embedded the creds into the APK. If they got the creds from a server during runtime, it would be safer to proxy to AWS through the server itself, and never distribute the sensitive data. This would result in only a few proxies connecting to AWS.

Amazon surely has automated this with monitoring. I doubt they ever scan Google Play and download the APKs and scan them. Not only is that extremely wasteful it's most definitely violating the Google Play terms of service.

It wouldn't even need to be a "large amount of traffic", just traffic using the AWS secret key from more than a handful of IP addresses in the same time window would be suspicious.

Probably just looked in strings.xml and perhaps for some obvious variable names / validation against string values. There might be some hashing check they can do that means they don't query every valid string in their database.

You can build a trie of all the AWS keys and then for each Android binary, traverse each series of bytes until it either terminates at a leaf node or fails to continue. If it fails, you start over again on the next byte. Or in order words, the trie allows you to easily test if a given byte is the start of an AWS key, and so then you just check every possible offset. Believe the run time will only be O(n*k) where n is the size of the binary and k is the size of the AWS key, plus enough memory to store the trie which is probably relatively small; less than 1m AWS accounts?

Using Aho-Corasick [1] the runtime would be about O(n+k+)

[1] http://en.m.wikipedia.org/wiki/Aho–Corasick_string_matching_...

This is probably a good thing and also automated.

It could have raised alarms and then personally investigated. You Could monitor distribution of AWS connections per client. You could easily determine that accessing the same account from many android devices is probably result of poor security practices.

I will think they inspect apps based on the number of hits generated to AWS.

The advantages of doing this are 1) showing Amazon thinks for the customers (well, also for itself) 2) proves it has pro-actively notified the customer and done its due diligence.

This step could serve as a solid proof in any dispute on later security issues or/and related costs.

Smart, I will say.

I'm curious why some apps need API to access to AWS. What's the use case? Surely not to spin up an EC2 instance when the user clicks a button? Save files to S3? I'm not being sarcastic, genuinely curious. And what's the proposed solution suggested by AWS?

It's hard to say without knowing the app, but my first guess would be for storing files in S3 or pushing a message into a SQS queue.

AWS supports temporary access keys, and one of the recommended solutions is to have an API which generates temporary credentials for a specific task that will expire shortly after.


Heres a link http://docs.aws.amazon.com/STS/latest/UsingSTS/CreatingSessi...

Just off the top of my head:

- Store/retrieve state in/from DynamoDB or RDS

- Pull an object from S3

- Send an SNS notification

- Add a message to an SQS queue

- Dispatch email via SES

Save files to S3.

You can do that with signed forms and similar techniques, though. No need to have the key on the client side (and lots of reasons not to).

The flow is roughly this:

1) Client: "Hey, I want to upload a file."

2) Server: "Okay, here's a temporary key good for the next <n> minutes. The file has to be named <blah> and can't be more than <x> MB long" (there are other restrictions you can set, too, IIRC)

3) Client posts the form to S3 including the temporary key as a field.

4) Result.

great for them. i worked for an unnamed company who was shipping AWS credentials in clients for years. worse, they were not clients that required a packaged binary (no need to decompile). it's long since patched but i can't believe no one ever sniffed that out.

We wrote a blog post that shows how you can authenticate your users and get temporary security credentials from AWS based on the user tokens to avoid putting your keys on the client (both JavaScript apps in the browser or native apps). This technique is using Auth0 so you don't have to deploy a TVM and it works with all the APIs (S3, EC2, SQS, SES, etc.). Behind the scenes what we do is generating a SAML Token based on the user JSON Web Token and exchange that for AWS Temp Credentials using AssumeRoleWithSAML AWS API.


Does Google Play have a public API for downloading APKs? Does it work for paid apps as well? (I'm not able to construct good keywords for search here: Google thinks I'm looking for an APK for the store app instead)

> Does Google Play have a public API for downloading APKs? Does it work for paid apps as well?

There's no "public API" but have fun: https://github.com/kanzure/googleplay-api

(because otherwise how does Google Play itself participate in downloading apps?)

Well, this is very cool and an approach that some security companies are taking at the moment. "Security outside your network" they call it.

I'm myself working (side/pet project so far) in something similar. I don't have any working software at the moment but some "INTEL" and it is incredible how easy anyone would be able to compromise/hurt people and companies just using available information published by themselves.

If anyone more technical (I'm looking at you, devs!) wants to team up to create a service like this please get in touch.

I hope other developers see this and take action if they aren't properly securing cloud API keys. Data access by an unauthorized party is not something you want to deal with.

I wonder if any malicious parties have been doing this as well.

That's exactly what Amazon trying to prevent.

I'm being dumb. I can see that it is preferable to embed credentials for a restricted IAM acct, not your root/master AWS account.

But how does using a TVM improve the situation? Surely you still need to embed creds which allow the app to use the TVM? In that case, an attacker can extract those creds, and ask the TVM for a time-limited token any time they like.

How does using a TVM improve security over embedding the creds of a restricted account?

Your token service would authenticate users using their credentials for your system.

Still not sure how that helps. In both cases, we have creds embedded in the app which can be used (and only used) for access to the AWS resource.

In one case directly (via an IAM limited account), in another via a token they can request. In both cases, the acct is limited to one specific AWS resource. In both cases, the creds can be revoked centrally. In both cases the creds are embedded in the app.

Smart people who build these things (AWS) seem to think a TVM is a better solution. I don't understand why.

I asked myself the same question and don't have an answer yet.

What are common use cases for AWS in mobile apps? (where the app needs direct connection to AWS)

Reading/writing to S3 was my use case. I'm sure there are others.

They also scan for Keys on github. They are proactive in terms of security!

a free security audit of your app, pretty cool ;)

Or somebody else found it and notified Amazon.

Was your source obfuscated?

Does it matter? A constant string is a constant string.

Well, if you want to obfuscate a constant string in your code, you can, e.g. by generating it dynamically from summing two hardcoded integer arrays. Not to say you should, but you can.

That sounds an awful lot like DRM with all its failed approaches…

Sounds like? That's what obfuscation is (or more accurately, obfuscation is what DRM is). No one ever claimed, or ever will claim, that obfuscation stops people from seeing what you're doing.

Kudos to Amazon.

So am I.

Ultimately, Web Identity Federation or Federated Identity is the only way to secure apps in walled gardens, which means aligning yourself with a virtual land Barron. I, for one, welcome our new fiefdom overlords. Everything else is just pushing new credentials through temp credentials and obfuscating it with protocol complexity.

Go on you Amazon.

is decompiling an app legal? does it not break someone's terms of service?

Decompilation isn't even needed. They know the credential being sent so an if(strings | grep -c "foo") != 0 SendEmail() is all that is required.

You must be new here.

One of the things that justifies the higher prices.


Conjecture, and I guess you're welcome? My guess is if you embedded your Google cloud credentials in your app and it was compromised Google would be happy to bill you, terminate your account, or otherwise provide zero latitude as a customer. At least they dropped their prices, right?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact