Hacker News new | past | comments | ask | show | jobs | submit login
Groupon leaks entire Indian user database (risky.biz)
209 points by Garbage on June 28, 2011 | hide | past | favorite | 90 comments



We have begun notifying our subscribers and advising them to change their Sosasta passwords as soon as possible. We will keep our Indian subscribers fully informed as we learn more.

This is a lie. Neither I nor my brother have heard from them. Keep in mind that this happened on Friday and it's already Tuesday here. In the meantime, I have been spammed about deals that I don't care about through e-mail and text messages four times.


I suppose we also have no verification that the passwords are no longer stored in plain text either, so changing it seems to be postponing the inevitable -- it is now a target for getting hacked again, with a delicious plain text payload as the reward for anyone doing it.

If my doctor was leaking my medical records left and right to advertisers around town, I would sue him... at some point leaking my online identity has to have some sort of repercussions behind it besides me getting online and being openly angry on the internet.

Yep, that'll do it ;)

ADDENDUM. Speak of the devil (another HN story): http://www.consumeraffairs.com/news04/2011/06/cloud-site-dro...


Wrote a quick command line Gmail password changer for this type of thing; if I see my username I can go ahead and start changing passwords. Hopefully I'll pound more out this weekend. http://christopherwoodall.com/gmail.py


> If my doctor was leaking my medical records left and right to advertisers around town, I would sue him... at some point leaking my online identity has to have some sort of repercussions behind it besides me getting online and being openly angry on the internet.

What would you do ? sue them? The EULA you have accepted usually forbids it, or limits the amount of damages you can claim to a few bucks. I am not a lawyer, but i doubt such an EULA would be declared void in court.


Somehow, I think this wouldn't be the case. Otherwise, doctors would be throwing EULA agreements left and right, and saving millions on malpractice insurance fees.


They do. They are called waivers. You sign them all the time when using the health care system.


I've never seen a complete waiver of rights. What you typically sign is an agreement to try arbitration before (or instead of) suing in the courts. I don't believe it is possible to waive the ability to be penalized for negligence, just the venue the penalty is assessed in.

Of course, said arbiter is much less likely to give out a seven digit award than a jury is.


Though I am not a subscriber, I have received information from some of my friends that they are getting these warning emails.


How is that spam? You opted-in to receive the e-mails.


Not if those emails are from spammers harvesting the Groupon DB that is now available on Google.


He said he was spammed about deals.


Most spam offers deals.


He's counting deals he doesn't care about as spam.


Unless telepathy is invented all subscription email can be spam then.


Groupon takes security and privacy very seriously.

Just not seriously enough to encrypt passwords.


Here's the communication they sent. This is totally a lie - my email was in the database, and they have been only saying an issue potentially affecting them -------

Hi SoSasta Subscriber,

Over this weekend, we've been alerted to a security issue potentially affecting subscribers of Sosasta. We wanted to let you know that the issue has been brought under control and your accounts are secure. However, as a precautionary measure, we recommend that you change your SoSasta password immediately, by visiting the SoSasta website (Sign-In using your existing password, then click on Profile followed by Change Password). If you use the same email/password combination at other websites, we recommend you change those passwords as soon as possible, too.

Please be aware that none of your financial information (Credit Card, Debit Card, NetBanking etc) has been compromised since this information is not stored on SoSasta, as per law.

If you have any concerns or find any unusual changes in your SoSasta account, please contact our Customer Support team as soon as possible at 1800 103 2111 between 9.30 a.m. and 6.30 p.m. IST, Monday to Saturday so that we can review your account.

You should know that we are working aggressively to prevent this from happening again. Sosasta takes security and privacy very seriously -- it's important to us to provide you with a safe shopping experience of the highest quality, and we will do everything possible to keep your trust. Please accept our apology for any inconvenience or concern we've caused.

Sincerely, SoSasta Customer Support


Was your password in plaintext? Just to confirm what others have been saying.


as we are currently groupon bashing, take a look at groupon.de (germany) http://www.groupon.de/deals/berlin

scroll down ... down ... down ... there it is (gray text on black background), the crappiest example of SEO i have seen in a long long time. keyword stuffing is so 2004.

"Berlin ist als Hauptstadt der Bundesrepublik bekannt für seine Sehenswürdigkeiten und das umfassende Angebot an Freizeit-Aktivitäten. ... ... Berlin Deal ... ... Rabatten ... ... Geld zu sparen... ... Gutschein ...bla ... ... Angebote des Berlin Deals ... ... Wellness-Angeboten ... ... Restaurantgutscheinen.... ... .Freizeiterlebnisse, Events und Dienstleistungen in Berlin ... ... Shopping und Online Shop. ... ... Berlin Gutscheine ... ... "

i would have guessed that a multi billion dollar company could at least hire a decent SEO guy.


While arguably interesting, I don't see how this is related to the linked password leak


yeah, i know, but shitty SEO is not worth a separate 'submit' - also the overall theme of this thread is 'just another poor business decision / f*ck up by groupon' and this seems to fit the bill


Maybe Russian search engines fall for keyword stuffing? I do not know what the primary search engine of Russia is.


It's Yandex, but I don't see how that's relevant since this is a German site


I have no idea where I read Russian. My bad. :\


Ok my mouth is literally a-gape at the number of database dumps indexed by Google. I guess I just never thought of searching for something so simple and now I'm floored at how often this seems to happen. How does anyone possibly allow a data dump to come anywhere near somewhere Google could index it?


> How does anyone possibly allow a data dump to come anywhere near somewhere Google could index it?

Let me just put this sql dump in the web root for a couple hours to copy over to the test server.


Google also has to know that the file is there, before indexing it, either from a link available to Google, or from the website's sitemap, or by activating directory listing in Apache, or some other shit like that.


This is very simple. If you're using Chrome Browser, ChromeOS, or Google Toolbar, then google is using their pagerank tech ... or essentially sending the url you type into the browser to their servers for ranking purposes. If you can access it freely on the net, assume it is already indexed, even if there are no links to it.


Is this true? Can you (or anyone) point to some kind of reference or evidence? If valid, I'd consider this an almost dangerous breach of privacy.


As requested:

http://en.wikipedia.org/wiki/PageRank#The_intentional_surfer...

> The Google toolbar sends information to Google for every page visited, and thereby provides a basis for computing PageRank based on the intentional surfer model.

For it to display a pagerank, it has to send the url to Google (otherwise, how is it going to know what to display the rank for?). Google can then send the crawler to that address later.

> If valid, I'd consider this an almost dangerous breach of privacy.

I don't believe they monitor who is going where. Just where people are going. Although it would be trivial for them to monitor who is going where ...

Also, an FYI, if you are logged in to Google and you're using their search engine, then they ARE monitoring you. Check out Google Web History.


Thanks for the link.

I was concerned more with content indexing of URLs that are not meant to be public, to the point where that content could show in search results. Imagine my editor emails me a link to a blog article for approval before publishing. Or, as a designer, you create a draft of a web page to show to your client; and for the convenience of said client, you prefer not to have it password protected (nor take the time to set it up - you have enough to do!)

In both cases, imagine that someone loads the URL in their Chrome browser. If that action resulted in the URL being added to the googlebot's itinerary, even though no publicly visible webpage links to it, the result could be the exposure of information that we don't want. Or for the blog post example, it could even affect SEO by causing a duplicate content penalty.

Of course we can password protect the page, exclude the urls in robots.txt, etc. But there is a labor cost and inconvenience to having to do that, and there is always risk that something would slip through.

That said, what I write above is likely pure speculation; I don't know of any evidence that Google is actually doing this, and it seems unlikely to me that they would.


Directory listing was on. Searching google's cache for http://www.sosasta.com/uploaded/ will confirm as much.


Searching on cache:http://www.sosasta.com/uploaded/ doesn't show a result now from here (Australia.)

Also, searching on link:http://www.sosasta.com/uploaded/ doesn't show anyone linking to it. Even if the directory is there, it had to get there in the first place somehow.


To: mycolleague@gmail.com

The database dump is here: http://www.secretserver.com/database.sql.gz

Don't tell anyone.


You really think google adds private emails to its public index? Get real.


If the URL is a publicly accessible webserver with no robots.txt telling them to stay off it, I wouldn't be surprised if it gets fed to the crawler.


Google can even index files listed in robots.txt (it just doesn't crawl them)


The email wasn't indexed but (maybe) a link in the email was.

Google honours robots.txt, X-Robots headers etc but everything else is fair game.


They scan my private emails to target advertising at me - why would they not follow links (as the link obviously denotes something I'm interested in)?

And, as the others state, if it's not robots.txt denied then why not add it to the public index?


A company I worked for had a clever script that automatically put all content on the server into a Google sitemap.


Did anyone manage to get a copy of the sql file? A password analysis of a largely Indian audience could be pretty interesting.


I'm mostly wondering if their usernames and passwords are strictly ascii, or if they're using another alphabet.


Most Indian computers I've seen are standard American or British layouts. India has a lot of English speakers [1] and most computer-savvy folk, especially the kind that use Groupon will definitely know enough English to use English usernames and passwords.

[1] http://en.wikipedia.org/wiki/List_of_countries_by_English-sp...


I can guess, how this can happen.

--------------------- First,Take the db dump, for backups/setting up another server etc.

$ mysqldump -u <user> -p <password> <db name> > xyz.sql

Now, lets move db dump file to webroot, I hate SSH,FTP,RSYNC -- too complicated for me. I like clicking hyperlinks. KISS FTW!

I guess nobody will notice that file is present here. How can they know, I won't tell them!

$ mv xyz.sql public_html/uploaded/users

now, I can download it simply by going to

http://www.sosasta.com/uploaded/users/xyz.sql

See how easy this is, why complicate things unnecessarily.

---------------------

I guess the guy wouldn't have even imagined mighty google will index this & people from around will download the file, resulting in major security breach.

This is what you get when you act ignorant or plain lazy. poor guy...lol


> This is what you get when you act ignorant or plain lazy.

Or, in general case[1], when 'industry standard' tools are PITA to use. Simpler solutions are always preferred, for better or worse.

[1] I'm not defending here the person that caused the SoSasta breach.

EDIT: Formatting.


Incidents like this make me think that if success is anything to do with talent? Even a mediocre developer wouldn't do such mistake and these people are acquired by Groupon. Then, I think, it's all about who you know ?


It appears you're putting the equal sign between developer ability and business success. Being a great developer lands you a good job with a 6-figure yearly salary. Being successful in business requires a different sort of talent.

Business "talent" is primarily about knowing what matters. There are lots of Groupon clones or other startups where founders can't do basic arithmetic on user acquisition costs or lifetime customer value, or they choose to work on scaling the backend prematurely or on solving things that will not matter unless they achieve product/market fit and scale. Recognizing priorities and working on what matters is also talent, or more precisely, a lot of hard work.

Once you have product/market fit, doing risk management is a must and nowadays it seems to be a lacking skill (look at this incident or the Dropbox story from a couple of days ago). But doing risk management is like translating your business in French: if you're big it pays off to do it but otherwise you should be more worried about not dying.

Therefore I'm not surprised when I see that people who are focused on not dying more than risk management have been more successful in reaching... "success" (managing it once you've got it is another business)


I am a young 20 something, right out of college, first time startup founder, developer and with almost zero business knowledge guy. It's hard to accept the truth you are saying, but I am learning.


Just make sure you can tie everything you do to your customer's needs. If you find yourself thinking about technology for technology's sake (e.g. "let's rewrite in Scala" or "We need a 12-tier system to manage scalability) (for you curtsy user...)), start worrying.

Fortunately, there is an easy cure: go talk to customers. Your good customers (or potential customers) will point you the right way.


As you get older, you realize the world isn't a meritocracy. Only the young, poor, and/or foolish will believe that.


If you're lucky, you might also realize that being condescending is alienating and a bad way to get your point across.


and they were ex-googlers by the way.


Sosasta means "so cheap" in Hindi. Maybe they are too cheap to spend any effort on security of their users' data.


Sasta means cheap ... the so is English ;)


But the intended meaning is 'so cheap'.


Sosastaa == too cheap


In clear text? Really? How is this even allowed anymore?


And indexed by Google. That is terrible.


It might have been good that it was indexed by Google so that the problem was found quicker. Completely agree about how can cleartext be allowed. Harder to regulate overseas, but would it be possible to have an actual law in the USA against storing passwords and other confidential in cleartext? There are various consumer protections such as food, cars, toys, and could similar things be legislated for data?


Of course. Keywords: PCI, MIPSA/HIPAA, etc.

More: https://secure.wikimedia.org/wikipedia/en/wiki/Information_p...


PCI isn't law, it's the product of an industry group, and is arguably more about the protection of the networks and issuers than consumers.

There is even less guidance and specific regulation pertaining to the encryption and security of banking records. The audits and regulations that are in place are more about overall controls than technological measures.


Yes, my impression is that passwords are not treated with the same seriousness as say credit card numbers, or health info. That (passwords) was actually what I was curious about, before I generalized my comment.


Passwords are one of the safeguards (perhaps the most important) that protect personal/sensitive data.


I didn't even laugh, I just facepalmed.


Obligatory post I make on these stories: Use PwdHash for Firefox and Chrome. It's very low overhead and provides and extra layer of protection from lazy programmers.


That only tackles the master password for every site issue. It does not solve the password being plaintext.

For example, Facebook has a central ID and if they don't protect the password that gets exposed, someone could use the password to withdraw money from another section of the website.


You can store money on Facebook?


Not quite. At the moment it's virtual currency [0]. My point was that security holes increase as service increases in complexity. Especially when it's used for everything and becomes a hard lesson in SPoF for users, like the money example.

[0] http://www.facebook.com/credits/


Yeah, nobody can prevent a leaked password from being used to the site from which it was leaked.

This extension is helpful because people reuse passwords: a leaked password cannot be used to causes damage on other sites.

Obviously, I agree that sites shouldn't store passwords in plaintext, but good luck enforcing that.


PwdHash looks great and it would be nicer if it was implemented by all browsers. To take the idea further, what if the HTML5 standard for input type=password automatically did the hashing before forms are submitted? On any form submission, browsers would hash like PwdHash by using the current domain. There would also be no need to use a prefix like PwdHash. Those are some quick thoughts and I think there's better ideas that can be implemented, that don't require work for consumers. Won't protect against things like weak passwords, but even a little contributes to security.


I use PasswordMaker which has support for pretty much every browser and uses the current url for hashing (and you can always override the default salt). http://passwordmaker.org/


Holy cow. This thing is even better than PwdHash (ability to avoid special chars in particular, and multiple platforms supported, and it's actually being maintained). Thanks!


> On any form submission, browsers would hash like PwdHash by using the current domain.

Nitpick: works great until you move domains, or try to log in from another subdomain, or use a redirect in your /etc/hosts file, etc., etc.

Assuming that the submitted hash would (should!) be salted and hashed again server-side anyway, simply running it through bcrypt would be enough, I think.

An optional attribute could be added, too: <input type="password" salt="sosasta" />. If we wanted to go further, the salt could be a randomly generated nonce that would be submitted as another field; POST['password'] = 'whatever', POST['password_salt'] = 'sosasta'


I don't think a nonce would work, because the server wouldn't be able to verify whether the hashed value you sent was correct or not. The whole point is to never send your 'real' pwd to the site, because they're going to do something idiotic like store it in plaintext, and then you have to change it everywhere.


Heads outta roll for this one. The dev, the supervisor, his supervisor. Probably all the way up to VP.

It's a stupid, boneheaded mistake, but one of those that could only be made in an environment where security is extremely lax. Easiest way to fix the environment here is to just fire everyone involved.


Continuing on that line of thought - a simple search like 'filetype:sql phpmyadmin' also shows a lot of 'interesting' results.


Maybe Google should start working with security firm so that once their bots crawled on a leaked database they will notify the website owner immediately.


How would Google do it with an algorithm to tell whether a database is leaked or not? Get emails from the firm to assess it? Maybe a honeypot email?

In that case, it seems far more easier for developers to put in honeypot emails in the databases and constantly query search engines hourly when those become available.

That is assuming database gets released, let along exposed for indexing.


Well, you know. A .sql file containing username, password, email is a very high possibility that it's a leaked database.


I think Google would rather just not index such data.


I can't think of a reason why Google would care... I certainly can't think of a reason why Google would care enough to spend money on staff/research in order to get that type of content out of the index.

If you don't want something indexed, don't put it on the web. And sure as hell don't link to it.


True, if you are incompetent enough to expose your most sensitive data on the internet yourself, why do you expect others to come to your aid. Even if Google does not index it, what is stopping other search engines from indexing or hackers to get access to it directly. There are lot of tools which will automatically index all the content of a website with the click of a button. So I think it is best to put the onus on the website collecting the user data to protect it. If they cannot, then maybe they should just use some third party authentication and minimize the amount of sensitive data that they need to keep on their servers.


Yes, do not index AND possibly inform the website.


If Companies itself ready for compromising there email id and password then who can protect them. Hacker's just shown there mistakes done to the people. Otherwise a small child knows well how to hide his password from the other people.


anybody got that sql?


All the data has been removed and only exists in sha1 hash form but you can see if you were affected at https://shouldichangemypassword.com/


I am not normally in favor of legislation, but I'd be okay with a fine for US-based companies that leak and expose this kind of data. Specifically a harsher fine for cleartext or anything less than bcrypt.


nothing is gonna happen to Groupon (or the indian subsidiary). OTOH, Dropbox just got hit with a class action lawsuit.


How about something similar to HIPAA?

I worry, though, that it would end up making things more difficult for developers while not improving things for the end users - much like the European/Dutch cookie law.


Your User Information - 100% Off!


Sue this irresponsible company into oblivion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: