Hacker News new | past | comments | ask | show | jobs | submit login
Twitter sells multi-billion tweet archive (rt.com)
52 points by narad on Mar 1, 2012 | hide | past | web | favorite | 35 comments

"Twitter has sold billions of archived tweets believed to have vanished forever. A privacy row has erupted as hundreds of companies queue up to purchase users’ personal information from the new database."

Is there actually any information in this? What do they mean by believed to be vanished forever? According to the BBC new article [1] "private accounts and tweets that have been deleted will not be indexed by the site."

Historically Twitter have been pretty good about ensuring, in their licences at least, things like deleted tweets are deleted even from external archives. It's what makes compiling a database of tweets - even for research purposes - quite difficult.

[1] http://www.bbc.co.uk/news/technology-17178022

Normal people believe that information that is nearly impossibly difficult to find is as good as gone forever. (This is mostly as these same "normal people" also often don't understand that website scraping can be automated; a sad misapplication of almost-reasonable cost heuristics to a system they don't understand.)

However, on Twitter there is an extra wrinkle: neither the site nor the API allow you to go back more than 800 tweets for any view, whether it be "search results matching X" or "tweets that mention me" or "tweets posted by user Y" (the one exception being "your own tweets", where you can go back 3200).

This means that if you tweet a lot, it is impossible for another user to go back very far into your history. Even you may not be able to go back that far in your own history. As an example: if you send 20 tweets a day, your history will be inaccessible to you within 6 months, and inaccessible to other users within 6 weeks.

Given that this information is seemingly just gone--inaccessible, unable to be found from the user interface of any client, or Twitter's website, or searches on Twitter--while you and I know that they didn't delete it (in fact, Twitter even implies as much in their documentation somewhere: that the 800/3200 are only "temporary" limits), normal people are going to consider that data "vanished forever".

If you know the id of a tweet you can still access it I believe. https://api.twitter.com/1/statuses/show.json?id=20 still works, which allows you to construct https://twitter.com/#!/jack/status/20 which also still works. So, it's only searches which don't go that far back.

Right. You can also find a lot of old tweets using Google (which is, of course, hit or miss). As I said, you (a general you of "people on HN", not just the person I responded to) and I know that it isn't deleted. You and I know all sorts of things about websites that the normal person does not, which means we honestly can (and do) come off as if we know deep hacker magic secrets for demonstrating what is, to us, rather obvious things. When we design websites (and privacy policies...) we need to remember this.

Remember: for a normal person, those somewhat scary-looking URLs are likely opaque. To the extent that people spend any time pulling them apart, it turns into a game of "guess the really long number, by hand, typing attempts into the URL bar of their web browser". They are unlikely even to figure out that it is monotonically increasing (which may or may not even help given the volume of the site); and, even if they did, for them the probability of guessing one of those numbers would still be effectively zero.

I imagine the "believed to have vanished forever" is based on the common misconception that because Twitter's search only goes back about a week, they don't save older tweets anywhere.

Wait, so this "article" makes a bunch of claims against Twitter based on zero authoritative links and a bunch of unsourced quotes from ... the Daily Mail?

And we're supposed to take this seriously? Come on, doesn't it take more than a bunch of unsubstantiated claims for the HN community to jump on something and just take it at face value?

The only thing that annoys me is twitter won't sell my own archive back to me! I'd happily hand over money for that

Am I the only one not shocked or offended? I post publicly on Twitter. Anyone can do that anyway. … Unless I'm missing something.

The one possible privacy problem that I can see is that they are selling tweets that users can no longer control - if I said something a year ago that I wish I could take back ("Hi everyone, just to let you know I'm gay" or whatever), I'm now unable to delete it, or even review what it was I said, but it can still be sold and tied to me.

Still, not something I personally have an issue with, and I suspect not something most people would have an issue with.

Agreed. I use twitter as my public micro-blogging/rating/raving platform, and for them to sell data that I assumed was already being pulled by anyone via the API is a non issue to me.

The people who are shocked and offended don't understand that they aren't the user of these "free" social networks (Twitter, Facebook, LinkedIn, etc). They / we are in fact the product and once it's realized that your data is being sold to others than you will change the way you interact with the networks.

so true. public is public. Anyone could have data mined all these twitts... But the difficult question that may rise is the "right to delete". IMO it's a hard dilemma: history VS privacy. Can someone decide to delete all its trace and destroy everything he built? On the other hand, we need to keep trace of what's happened in the past, don't you think?

As long as they don't sell DM's or private account tweets, I don't really see the big issue.. By saying that, I mean I'm comfortable with companies having what was already public in the first place.

If you were publicly tweeting and then went private just before this database was closed off, would those tweets get out?

What if you switched back and forth between private and public a bunch of times? Were those private tweets lost or did they reappear when you went public?

My gut says everything stays in the database and it's all being sold.

Related: http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acq... Library of Congress already has an archive of public tweets.

Twitter is a business, just like any other business, trying to turn a profit. Protect your account if you don't want your public data being pulled. Anyone can make a timeline API request...

"privacy betrayed" -> any of this data was ever private. it wasn't.

Unless you opted to have a private profile. I wonder how if those private tweets are included in the bundle.

I agree, however: if you want to keep your thoughts private, don't post them on the Internet. When do you do that, they don't belong to you anymore.

> I wonder how if those private tweets are included in the bundle.


"private accounts and tweets that have been deleted will not be indexed by the site"

Thanks! I admit not reading TFA.

Millions of web apps crawling your tweets and you're worried. If you want privacy don't use social medias.

Once upon a time it was the consumer who was confused about what they owned. We all bought vinyl records because it meant we could hear the music we wanted to. We thought this meant we could do what we liked with the music and even though we couldn't the limitations of the technology meant the artist and record company profits never really suffered. One day digital music came along and suddenly our assumptions about what we could do with the music we owned became a serious problem for record company profits and the whole planet shook.

Now we have all become artists; our words, our emails, our tweets and posts the new music. The Twitters and the Googles are the new record companies and once more the confusion between music and vinyl has arisen.

Is there a crowdsourced public archive of tweets? If not, we should build one.

And then sell it to the big corps for 10% less than twitter... excellent idea :)

95% of these tweets are ones with links to "enlarge", "boost", "find a adult partner", "xxx" :))

OMG. They'll know everything I ate the past two years!

I wonder how it gets over this [1].

[1] http://tweetcc.com/



> By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).

> You agree that this license includes the right for Twitter to make such Content available to other companies, organizations or individuals who partner with Twitter for the syndication, broadcast, distribution or publication of such Content on other media and services, subject to our terms and conditions for such Content use.

> Such additional uses by Twitter, or other companies, organizations or individuals who partner with Twitter, may be made with no compensation paid to you with respect to the Content that you submit, post, transmit or otherwise make available through the Services.

You can put a CC license on your tweets, and it'll grant those additional copyright rights to anyone, but it doesn't limit what Twitter can do with the content you posted one bit.

I don't get what you can do with twitter data...

TweetReports, Twazzup, Topsy, and Twitscoop. oneRiot was acquired.


check out datasift, they recently got covered by the press a couple of days on their tweet analysing platform

Instead of down voting can someone give some examples...

Company C launched product P on date D.. did mentions go up? Are tweets about company/topic/place/store positive or negative? What do people tweet about when inside our store?

"analyze tweets posted each day for anything said about their products and services."

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact