Hacker News new | past | comments | ask | show | jobs | submit login
Hey Twitter: Give us our Tweets (zachholman.com)
139 points by holman on Sept 24, 2010 | hide | past | web | favorite | 75 comments

If I whisper the most beautiful poem over the telephone to my lover today, am I going to complain about the phone company not archiving that call properly when I want to revisit it 20 years from now? You go into Twitter knowing it's about the ebb and flow of current conversation, not about what was said when it first came out. If Twitter wants to eventually offer that, that's cool, but that's not the case right now, and you shouldn't expect it any more than you should expect 7-11 to keep their security footage of you for your perusal indefinitely into the future.

edit: from Twitter's Terms of Use (http://twitter.com/tos): "This license is you authorizing us to make your Tweets available to the rest of the world and to let others do the same. But what’s yours is yours – you own your content."

You own your content. You are responsible for it.

While everything you said is correct, at the same time, there's no sense in losing access to data that's recorded and is sitting right there for the taking, if only there were a more convenient way to extract it.

It would be very wise of Twitter to offer a bulk export function - not because they have to, but because it's to Twitter's benefit to not have me thinking twice about whether I should post to Twitter or communicate in some other, more archive-friendly way.

From Twitter's perspective, you're absolutely right. From a user's perspective, the bulk reliance on singular companies to manage our data is an issue that I'm afraid is going to become dangerous.

Look at the 2.5 hour Facebook outage from a couple of days ago. If Facebook were to be hit by an airplane or asteroid, a seriously significant portion of 500,000,000 peoples' lives would be lost, if only just in photos. If my house were to have the same thing happen, 150 people would experience a somewhat lesser loss.

I have thousands of emails between close friends in Gmail that I would like to preserve, but no convenient system exists for archiving them away from Google. The point of archive is to have redundant systems on which to rely. Archiving my Gmail with Google is therefore not an archive since it's still only one system.

It's like leaving a tasty pork chop in the 'fridge at work and expecting the cleaning lady to not toss it over the weekend. I really think archive should be the responsibility of the user.

I have thousands of emails between close friends in Gmail that I would like to preserve, but no convenient system exists for archiving them away from Google.

How about IMAP?

Tried it, volume is too large. Every client I've tried crashes before it can get everything, and there is still the issue of convenient storage format. I'll probably have to write my own archiver.

I wrote this a few years ago to archive my Gmail emails to text files (and the attachments to whatever they are).

I haven't ran it in awhile but when I did it worked and was able to download my emails from the very beginning without crashing. Anyway, you might find it useful...


I have been using Zimbra Desktop as an email client, http://www.zimbra.com/products/desktop.html, in my Mac in several GMail accounts with no problem. There are versions for Window and Linux.

So use getmail_fetch through the POP server; it downloads a few hundred at a time into a maildir format. Hard to get more convenient and storable than that (tar it up and gzip or 7zip it).

That reminds me of Jason Scott's comments about trusting "the cloud" with your data and not keeping it locally: http://ascii.textfiles.com/archives/1717

That's exactly right

I agree, for the most part. But there's some subtleties:

1. Yes, Twitter's real-time just like the phone, but it's not only real-time like the phone. You can revisit any tweet you like. They support viewing old tweets, they support listing old tweets. My only concern is that they place limits on how much they list.

2. Yes, we're responsible for our content, but the significance of services change over time. Tomorrow's Hacker News startup might be interesting, you might drop some content in, but most people aren't going to immediately create an offsite backup. Likewise, the tweets we made in the first two years of Twitter's life probably seemed worthless to us at the time, but now their value has increased. It just would be nice to have access to that.

They place limits on how much they list for, most likely, technical difficulties with guaranteeing full history access.

This means that if you want a full history of your tweets, then it's your responsibility to maintain it. Would it be a good service to its users if Twitter maintained full history? Yes, but that's not your argument. Your argument is that it's a user's right.

Does it make sense to rely on the original provider for archive? As I say in my other reply to the other replier, I really think diversity in archive options is important to the overall infrastructure of the 'Net.

This is precisely why I built http://tweetsaver.com ... I wanted a nice way to search through ALL my old tweets AND keep them around. I'm always retweeting awesome content, favoriting interesting things, and sometimes I just want to remember what I said without having to click MORE MORE MORE MORE for 100 pages to find it on twitter.com. Maybe someday Twitter will turn on a feature that kills my app, but for now I think it's the best way to archive, organize and search the great content that flows on twitter.

Please don't design your site so that I have to authorize my account and only then find out that I have to fork over $5 or $10 for the service (and de-authorize my account). Only a "Pricing" link in the top-right corner indicates that the site is behind a paywall.

you don't have to authorize your account to find out it's behind a paywall... but I agree, the flow needs to be improved.

$120/year is not worth it to me. $10/year would be. Just my feedback. :-)

You can run Tweetnest yourself http://pongsocket.com/tweetnest/ so that's free, here's mine: http://tweets.rythie.com/ for example.

There are other solutions around if you look.

Seconding Tweetnest. It doesn't save the favorites, unless they're by yourself, but it's a nice compromise.

And it's worked without a hitch.

I agree with Tweetnest. It's an easy and free way to have a backup of all your tweets.

It's nice that it groups tweets by mont and day, so you can pinpoint that one tweet you tweeted that day. Got mine running here: http://jarqu.es/t/

we tried yearly pricing and it turned out to be a big pain.. people forgot about their subscriptions and we're confused/angry when it renewed, people's credit card numbers changed or expired after a year and caused subscriptions to expire, etc etc... $10/yr pricing on a monthly subscription doesn't work because there are just too many fees. Appreciate the feedback though... still trying to figure out the right price.

Which fees? If you mean credit card processing fees, then even $20 would make more sense, no? You have to remember: your service isn't twitter so I think people would be less inclined to pay a total of $5-10/month. (I don't know this market well enough, but that's my initial takeaway.)

I'm thinking if they are storing 3 million tweets that's got to be about 900 customers meaning $9000/month in revenue.

I recently signed up for http://tweetstreamapp.com. $5 for a year of weekly backups, and they do some neat data visualizations, too. Well worth the fiver.

How did you decide that price point? Are your operating costs very high?

to be honest, it was mostly based on "what would I pay for this service" ... operating costs at this point are pretty low

Point rss2email at the rss feed of your twitter account, and then set up your mail server to filter those emails into a special twitter folder. Job done.

Archival of 'cloud' data is an issue I've been thinking a lot about lately. People are putting so much of their lives on the internet these days without generally giving too much thought to permanence and availability. Twitter's only been around a few years and people are already running into retention issues. The tweet from the article will still be valuable 30 years from now - will twitter even be around then? Will all the tweets from the current system have been migrated to whatever tools we're using in the future?

I've been playing around with a project to locally archive a bunch of data sources that interest me (email, instant messaging logs, Twitter, SMS, some blog and social news comments) in a straightforward and open data format. Unfortunately this type of tool might be something that most people don't realize they need until it's too late.

It's not local, and not in an open data format, and not free, but you might like http://www.backupify.com/, they have a nice list of services that they can back up for you (haven't used it, so wouldn't know how good they are in practice).

Backupify works, but the formatting is odd. Your Twitter stream is packaged as a PDF book.

I value my past tweets too but twitter is having a hard enough time keeping the service up as it is. Id rather they focus on making the api more reliable before working on giving access to older data.

It really is a shame they haven't provided any mechanism, even if it were less convenient than the standard API interface.

I've seen a lot of talk from Twitter that "you can trust us", "we still have all of your tweets", but it really doesn't help much if there isn't any way to get at them. Has there been any discussion from Twitter that this is even on their radar as a near-term priority?

I was sort of surprised that the Library of Congress work did not include some sort of web.archive equivalent that allowed anyone access to the entire database — Twitter has shown a willingness to let other people solve hard problems it didn't want to, including when it pointed developers to a 3rd-party firehose for a while.

Well, I think you just found a way for Twitter to make some money. Remove the theoretical access and Web limit for accounts or increase limit to impossibly high number.

It is much akin to NYT charging for access to old articles.

They may have to store more tweets in the current database or charge a retrieval fee to obtain it from the archive. They could give it away for free too...

That's like what Flickr do by letting you browse the last 200 photos unless you get a pro account.

Charge for mapreduce jobs that can run over the whole corpus.

I'm going to plug Anil Dash and Gina Trapani's open source PHP project ThinkUp (http://thinkupapp.com/). It started off as a pet project by Gina Trapani, but now it's used by the White House to try to engage the American public more.

ThinkUp can archive your tweets to your server for any reason you want to. Not only does it archive tweets, but it also allows you to view some nice analytics such as a map of where replies to a tweet came from, a chart of your followers over time, who your most active friends, etc.

But wait, there's more! If you check it out right now, we'll throw Facebook integration, for absolutely free! That's right. For absolutely nothing, you can archive your Facebook statuses, and preserve all those precious memories.

You can get all this for $0.00. That's right, $0.00. Just go to http://github.com/ginatrapani/thinkup/ and download the code today. You can create plugins for it, modify it, and do anything to it the GPL allows you to do.

Yeah, this is a frequent pain point for a large number of users. If you search twitter for "how can I see my old tweets", there's thousands of people wanting that feature.

No wonder dozens of services (or even curl-based shell scripts or tutorials) are available to back up the tweets (shameless plug for my own http://sparrw.com/, which focuses on easy searching of past tweets and treating tweeted links as sort of auto-bookmarks, which is what I'm often using Twitter for).

But all this works only if you're quick enough, and set up some sort of backup system before the magic 3200 limit :(

Be careful what you wish for, Zach. One of the defining features of real-time is that it's very spur-of-the-moment. Unfortunately, that often leads to poorly thought out posts. In the current scheme of things that weakness is mitigated by the fact that sufficiently old posts are effectively inaccessible.

Can you imagine the implications of an account's entire tweet archives being open to the public? A prospective employer could look up every mention you ever made about them or their industry. Venting tweets would be forever inscribed in history. That's not good. It's bad.

Preservation of Twitter and other real-time services is definitely important, but not just for personal use. I'm also interested in viewing the tweet history of my friends, evolution of certain hash tags, etc.

My startup http://keepstream.com is involved in this real-time curation (Twitter now, more services later). I would love to chat with anyone interested in the subject; my contact information is in my profile.

There is also http://jetwick.com where you can search your (and also others' !!) tweets. If you searched the first time you probably won't get results for a user. But please ;-), come back 10 minutes later. Then 50 tweets are searchable and stay searchable a long time. What long time is depends how popular you make this service ;-) I.e. if you are using the service regularly you can create a free (!) archive. Again: no payment necessary (this is important for us!). and even no registration at the moment.

There are a lot more features read through the about page (e.g. filter via dates, languages, sort against date, query dependent trends, ...)

Check out http://pongsocket.com/tweetnest/ .. it's open source, free, PHP (easy to install), yada yada. It's a simple PHP Twitter backup system that also presents your tweets in a reasonably attractive archive on your own site. Example: http://peterc.org/twitter/

(Yes, there's also http://thinkupapp.com/ - but it's more complex and has more features/supported services. I found TweetNest crazy easy to get started with though.)

Thankfully I took a CSV dump of my tweets 2006-2008 but I'm missing quite a few in the gap :-(

The library of congress has them, if we could get a copy we could index it all somewhere to re-construct a copy of people's early timelines http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acq...

I was thinking recently about getting all the tweets from 2006, it should be about 2 million tweets, so should be possible to fetch by sequentially walking through the ids.

This is great. It's going to be a goldmine for historians in a few years. Twitter is becoming a finger on the pulse of the world.

Only in that we'll be able to look back and, definitively, say that, yes, we have always been as stupid as we are now, with no discernible difference.

Twitter is going to be huge!

Heatmap of your foursquare checkins. I'm sure these guys would be up for doing it for geotweets.


Incidentally, the limit of 3200 tweets via the API is the same as the 160-page limit on the website. 20 tweets per page * 159 additional pages = 3180.

i like it that tweets are ephemeral. its good that way. why the need to archive every-fucking-thing? imagine how shit the world would be if everyone could browse everything they ever said? rather than the things memorable enough to remember. or memorable enough to someone else that they'd remember? imagine how many useful people would become lawyers?

WHAT ARE YOU THINKING!!!!!!!!!!!!???????????????

Historians & archaeologists value having an unfiltered look at what people were doing & thinking, because it's more honest than, well, looking at memorable events in retrospect with 20/20 hindsight, filtered through biases that only happened later.

You can always favorite those really important tweets, or even resort to printing them in a compendium by Tweetbook.com. Some of my earliest tweets in a book form that I can go back and reflect on.

I agree that Twitter can do a better job of allowing us to search tweets by topic, but that's a 3rd party app waiting to happen. In the meantime, take a screen shot and post on your web or in flickr.

I'm glad Twitter doesn't keep a record of my tweets. Perhaps for what you want, they could make it an opt-in service, but for invasion of my privacy, I'd just as soon not. And for their service load (media needed to store your tweets and server bandwidth time needed to make them accessible to you) I can well see how it might be a payable service.

Who said Twitter DOESN'T keep a record of your tweets? What's been said is that Twitter doesn't let users access any archive of your tweets they may well have, and that they may well continue to mine. Your point makes me want to revisit the standard terms under which users license their original tweets to Twitter.

Um. Hand on a sec... on the bottom of the same page you link to:

  Twitter still maintains a database of all 
  the tweets sent by a user. However, to 
  ensure performance of the site, this 
  artificial limit is temporarily in place
So maybe you can't do it now - but its not lost (yet).

Hmmm. Long ago I sent examples of the "NULL" issues for Twitter status URL's that I had bookmarked (possibly before favorites were available) as replies to the Twitter team -- not sure where any of that went but while a database is likely to exist, a guaranteed and representative content store may not be as likely.

I recommend using The Archivist, which is free and will automatically save your tweets forevermore, for easy download at any later time. I've used it a couple of times to dig out old tweets of my own! http://archivist.visitmix.com/

They could simply slap a 'last visited' flag on ever tweet and discard those that had not been looked at after a year or two. You always have the option of backing up your own data, how much storage is it going to take to periodically scrape you tweet history in to a text file?

I recall hearing that its due to scaling/db limit. Something like they cant actually go back further due to memory limits. Ive no idea if this is true or not. If the Twitter team skype in at the next uk #devnest I shall try and remember to ask.

I've heard (second-hand) that Twitter isn't discarding users' tweets after they hit 3,200, they just don't provide API access to them. They're still sitting in a DB somewhere, just inaccessible currently, until Twitter provides access to them.

Yes, back when they were using an RDBMS they said that once they had moved over fully to Cassandra, the 3,200 limit was going to be lifted. Someone I know that works at Twitter now was always going on about Twitter archiving and had written Python scripts for backing tweets up... So there might be some hope.

Keep on banging on at Twitter about it as users. It is your data, so demand it back from Twitter!

I had always assumed that this would become a feature of Twitter if they ever offered a pay version of the service (higher API limits and complete access to past tweets). I imagine that some people would pay for that.

I agree, I save old emails, old letters, old receipts - it sucks that I don't have access to all my twitter messages, wtf? And please don't tell me that "you're entitled to a full refund" crap.

I've been using http://www.backupify.com to backup my twitter account. It was really interesting to go back over my first tweets.

I completely agree. I saved the link to a tweet from Biz Stone about an app i launched 2 years ago (twootball), but now it's gone.

Found it! Googled "biz stone twootball" http://twitter.com/biz/status/976238704

"The website at www.twootball.com appears to host malware – software that can hurt your computer or otherwise operate without your consent. Just visiting a site that hosts malware can infect your computer."

sigh... yeah I need to take that project back to life. The iPhone app is still alive but the site was put on hold because we didn't have enough resources.

I guess last time I checked they had problems with the favorites

It's just a technological failure (because it requires too much cpu power to dig through a non-indexed non-relational db) and they don't want to show their shame of poor design.

Can you imagine any other service surviving where you cannot lookup older data?

BTW Google has a social search that indexes twitter and facebook but it's not complete by any means.

as a journalist -- I loved this post. there is a lot we could mine and then synthesize. really need better ways to try to tell more _meaningful_ stories and cover all this 'now, now, now' content.

@coryhaik washington post

Didn't someone recently create a way to go back and see your first tweet?

Yes, but those types of services only work if you have less than 3,200 statuses, or, depending on the service, if your first tweet was published while their service was active and collecting your tweets (which isn't likely if your account is old).

What services are these?

we should be talking about how to build the federated social web as the solution, not just bitching about the problem

backupify - it's the new ronco showtime rotisserie - just set it and forget it.

It's funny to me to see people prizeing their tweets so highly.

I reviewed mine a while back and realised they were inconsequential drivel, so I deleted them along with the account. Now I have a read-only account for reading the tweets of some folk who tend to be funny to read. No techies trying to demonstrate value, no not-that-interesting acquaintances, no naive political rants from relatives - much better.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact