
Twitter sells multi-billion tweet archive - narad
http://rt.com/news/twitter-sells-tweet-archive-529/
======
ajanuary
"Twitter has sold billions of archived tweets believed to have vanished
forever. A privacy row has erupted as hundreds of companies queue up to
purchase users’ personal information from the new database."

Is there actually any information in this? What do they mean by believed to be
vanished forever? According to the BBC new article [1] "private accounts and
tweets that have been deleted will not be indexed by the site."

Historically Twitter have been pretty good about ensuring, in their licences
at least, things like deleted tweets are deleted even from external archives.
It's what makes compiling a database of tweets - even for research purposes -
quite difficult.

[1] <http://www.bbc.co.uk/news/technology-17178022>

~~~
saurik
Normal people believe that information that is nearly impossibly difficult to
find is as good as gone forever. (This is mostly as these same "normal people"
also often don't understand that website scraping can be automated; a sad
misapplication of almost-reasonable cost heuristics to a system they don't
understand.)

However, on Twitter there is an extra wrinkle: neither the site nor the API
allow you to go back more than 800 tweets for any view, whether it be "search
results matching X" or "tweets that mention me" or "tweets posted by user Y"
(the one exception being "your own tweets", where you can go back 3200).

This means that if you tweet a lot, it is impossible for another user to go
back very far into your history. Even you may not be able to go back that far
in your own history. As an example: if you send 20 tweets a day, your history
will be inaccessible to you within 6 months, and inaccessible to other users
within 6 weeks.

Given that this information is seemingly just gone--inaccessible, unable to be
found from the user interface of any client, or Twitter's website, or searches
on Twitter--while you and I know that they didn't delete it (in fact, Twitter
even implies as much in their documentation somewhere: that the 800/3200 are
only "temporary" limits), normal people are going to consider that data
"vanished forever".

~~~
warp
If you know the id of a tweet you can still access it I believe.
<https://api.twitter.com/1/statuses/show.json?id=20> still works, which allows
you to construct <https://twitter.com/#!/jack/status/20> which also still
works. So, it's only searches which don't go that far back.

~~~
saurik
Right. You can also find a lot of old tweets using Google (which is, of
course, hit or miss). As I said, you (a general you of "people on HN", not
just the person I responded to) and I know that it isn't deleted. You and I
know all sorts of things about websites that the normal person does not, which
means we honestly can (and do) come off as if we know deep hacker magic
secrets for demonstrating what is, to us, rather obvious things. When we
design websites (and privacy policies...) we need to remember this.

Remember: for a normal person, those somewhat scary-looking URLs are likely
opaque. To the extent that people spend any time pulling them apart, it turns
into a game of "guess the really long number, by hand, typing attempts into
the URL bar of their web browser". They are unlikely even to figure out that
it is monotonically increasing (which may or may not even help given the
volume of the site); and, even if they did, for them the probability of
guessing one of those numbers would still be effectively zero.

------
primigenus
Wait, so this "article" makes a bunch of claims against Twitter based on zero
authoritative links and a bunch of unsourced quotes from ... the Daily Mail?

And we're supposed to take this seriously? Come on, doesn't it take more than
a bunch of unsubstantiated claims for the HN community to jump on something
and just take it at face value?

~~~
hopeless
The only thing that annoys me is twitter won't sell my own archive back to
_me_! I'd happily hand over money for that

------
zachinglis
Am I the only one not shocked or offended? I post publicly on Twitter. Anyone
can do that anyway. … Unless I'm missing something.

~~~
corin_
The one possible privacy problem that I can see is that they are selling
tweets that users can no longer control - if I said something a year ago that
I wish I could take back ("Hi everyone, just to let you know I'm gay" or
whatever), I'm now unable to delete it, or even review what it was I said, but
it can still be sold and tied to me.

Still, not something I personally have an issue with, and I suspect not
something most people would have an issue with.

------
dutchbrit
As long as they don't sell DM's or private account tweets, I don't really see
the big issue.. By saying that, I mean I'm comfortable with companies having
what was already public in the first place.

~~~
joezydeco
If you were publicly tweeting and then went private just before this database
was closed off, would those tweets get out?

What if you switched back and forth between private and public a bunch of
times? Were those private tweets lost or did they reappear when you went
public?

My gut says _everything_ stays in the database and it's all being sold.

------
narad
Related: [http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-
acq...](http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-
entire-twitter-archive/) Library of Congress already has an archive of public
tweets.

------
methoddk
Twitter is a business, just like any other business, trying to turn a profit.
Protect your account if you don't want your __public __data being pulled.
Anyone can make a timeline API request...

------
jond3k
"privacy betrayed" -> any of this data was ever private. it wasn't.

~~~
zedr
Unless you opted to have a private profile. I wonder how if those private
tweets are included in the bundle.

I agree, however: if you want to keep your thoughts private, don't post them
on the Internet. When do you do that, they don't belong to you anymore.

~~~
psquid
> I wonder how if those private tweets are included in the bundle.

<http://www.bbc.co.uk/news/technology-17178022>

"private accounts and tweets that have been deleted will not be indexed by the
site"

~~~
zedr
Thanks! I admit not reading TFA.

------
viana007
Millions of web apps crawling your tweets and you're worried. If you want
privacy don't use social medias.

------
Toenex
Once upon a time it was the consumer who was confused about what they owned.
We all bought vinyl records because it meant we could hear the music we wanted
to. We thought this meant we could do what we liked with the music and even
though we couldn't the limitations of the technology meant the artist and
record company profits never really suffered. One day digital music came along
and suddenly our assumptions about what we could do with the music we owned
became a serious problem for record company profits and the whole planet
shook.

Now we have all become artists; our words, our emails, our tweets and posts
the new music. The Twitters and the Googles are the new record companies and
once more the confusion between music and vinyl has arisen.

------
revorad
Is there a crowdsourced public archive of tweets? If not, we should build one.

~~~
kamjam
And then sell it to the big corps for 10% less than twitter... excellent idea
:)

------
hippich
95% of these tweets are ones with links to "enlarge", "boost", "find a adult
partner", "xxx" :))

------
islon
OMG. They'll know everything I ate the past two years!

------
af3
I wonder how it gets over this [1].

[1] <http://tweetcc.com/>

~~~
ceejayoz
Easily.

<https://twitter.com/tos>

> By submitting, posting or displaying Content on or through the Services, you
> grant us a worldwide, non-exclusive, royalty-free license (with the right to
> sublicense) to use, copy, reproduce, process, adapt, modify, publish,
> transmit, display and distribute such Content in any and all media or
> distribution methods (now known or later developed).

> You agree that this license includes the right for Twitter to make such
> Content available to other companies, organizations or individuals who
> partner with Twitter for the syndication, broadcast, distribution or
> publication of such Content on other media and services, subject to our
> terms and conditions for such Content use.

> Such additional uses by Twitter, or other companies, organizations or
> individuals who partner with Twitter, may be made with no compensation paid
> to you with respect to the Content that you submit, post, transmit or
> otherwise make available through the Services.

You can put a CC license on your tweets, and it'll grant those additional
copyright rights to anyone, but it doesn't limit what Twitter can do with the
content you posted one bit.

------
suking
I don't get what you can do with twitter data...

~~~
suking
Instead of down voting can someone give some examples...

~~~
gizzlon
Company C launched product P on date D.. did mentions go up? Are tweets about
company/topic/place/store positive or negative? What do people tweet about
when inside our store?

