Hacker News new | past | comments | ask | show | jobs | submit login

I use Posterous, Weebly, Hacker News, and twitter, and love them all. I don't pay a nickel to any of them and never will.

I hope very much that they all last forever.

But I will lose nothing if any or all of them disappear without giving me a chance to retrieve my data. Why? Because everything I've ever posted to any of them was already saved on my hard drive first. A simple solution for a frugal, OCD hacker who wants the best of all worlds.

<HN201203141055.txt>

[UPDATE: Several people have suggested automating this or making it easier. My response: If you're a hacker and more than 5% of what you type goes anywhere other than your code repository, you should probably reevaluate what you're doing before you make it any easier to do it.]




Idea: a browser extension that automatically saves a copy of every form you fill out, as well as content you submit to common XHR-based sites.


I thought of that (it's rather easy to do in Firefox), but the content would be lost in the noise of all the little search forms and automated XHR requests. I think it makes more sense to write scraper to download your data from the website after its submitted, which is essentially what the Locker Project[1] is trying to do.

[1]: http://lockerproject.org/


Using html classes, ids and other identification methods I'd think it would be fairly easy to set up intelligent filtering that differentiates between different types of input fields. Discriminating between input and textarea probably gets you already 90% of the way.


+ Ignore forms with a password field


And hope nobody trojans the Firefox extension you're using to handle passwords "specially".


All Firefox extensions can read the passwords if they're trojaned, regardless of whether they handle them or not.


Oh, no, I just meant that most forms with a password field are either log on or sign up forms, which I wouldn't want to bother with preserving locally.



I assume that Reginald's complain was not about the data itself, but about the fact that he will now have to move his blog elsewhere - the drag of setting it up, and making sure that everyone can find it.


Any major platform has import tools from the other large platforms.

In fact, it's ironic that Posterous once wooed users by building simple import tools to pull in data from Wordpress (among others). Now, people are using Wordpress's import tools to pull their Posterous blogs back into the WP platform.

If you have your own domain name set up, rather than a free subdomain tied to the blogging platform, there is no issue of worrying about people finding you. Same goes for using RSS if you use Feedburner or have your feed available at a standard location on your own domain.


In addition, there is the problem that links from other websites to your blog will stop working, and your Google search rankings will be affected.

I have a custom domain name for my Posterous blog, so links to the main site will not break, but links to individual posts will break if the new blogging platform doesn't follow the same link naming convention as Posterous (which most likely they don't)

Do you guys have any ideas/tricks how I can avoid issues like the above, and how to make sure that certain posts which currently rank high in Google search results keep their rank, if possible?


You can create a set of 301 redirects to make sure traffic that is sent directly to a given post finds the correct page. It will also let search engines re-index your content at the new address.

You can do this by dumping the relative paths of your previous blog posts and current ones into your .htaccess file. This won't get all of the old links (category/tag links, for instance) but at least direct links to a specific post will end up in the right place.


I guess the only way to accomplish that is to set up the new blog on my own server where I will have access to a .htaccess file. That is, I can't use any hosted blog like blogger, tumblr, wordress.com. Is this correct, or are there any hosted blog services that will give you this fine-grained control? Or are there any things I can do on the domain registrar side?

Alternatively, is it a good or bad idea to route all traffic to my blog through a VPS I have, so that all the old blog post addresses go to the right place on one of the hosted blog services? (I'm just trying to avoid hosting my own blog, since I assume there are a lot of headaches involved)


I decided a while ago to move my personal site from Posterous to GitHub pages [1]. It's been a little tricky to set up but only because I'm fussy and haven't messed with css before. Still not finished since I can't find a sensible way to extract my content from Posterous (that dosen't involve the API).

With GitHub pages, you don't need to worry about the hosting aspect and you'll always have a copy of your site locally. Pretty sure you can deal with the .htaccess file stuff too.

If you decide to look into this, check out Jekyll-bootstrap too [2].

[1] http://pages.github.com/

[2] http://jekyllbootstrap.com/


I moved mine from Tumblr to GitHub recently and used a migration script to grab all the posts and turn them into Markdown files. There's a script that works with Posterous too: https://github.com/mojombo/jekyll/wiki/blog-migrations


Yup, I already tried that script and I believe it's out of date. Open pull request at https://github.com/mojombo/jekyll/pull/477

edit: I've just realised I don't need the code to be merged into jekyll in order to use it (facepalm).

edit2: ok, maybe not as straightforward as I thought. I'm not really a coder so it takes me longer to understand what scripts are doing (and then tweak them so they work for me).


why not use the API?


If you don't have access to the .htaccess file you can use WP plugins to set up 301s, e.g. http://wordpress.org/extend/plugins/simple-301-redirects/

Routing all blog traffic through your VPS is a bad idea, IMHO. Your VPS becomes a single point of failure for your blog - if it stops working all of your posts are unavailable, not just the ones that require 301s.


Why do you think that having your own Wordpress stand-alone blog is such a problem?


If you're a hacker and more than 5% of what you type goes anywhere other than your code repository, you should probably reevaluate what you're doing before you make it any easier to do it.

You know, I was more than willing to "agree to disagree" until I read this. Sorry, but I just don't buy the idea that I should spend 95% of my time in front of a computer writing code. I don't think it's practical or even preferable.

Writing things other than code is a useful thing for a hacker. It certainly doesn't seem to have hurt pg, does it? Plus, I can't think of very many people whose emails I've read (myself included) who wouldn't benefit from more writing practice.


I think you misunderstood; he suggested using a (version controlled) repository to store the 95% of anything you type (emails, docs, blog posts, etc), not spending 95% of your time coding.


He wrote 'type', not 'time'. I think there is a huge difference.


How? The way I see it it just makes things more extreme because you type orders of magnitude more in a ten minute comment than in ten minutes of coding.


95% of your computer typing is far less than 95% of all your time.


But still, if this 95% of what you type is code, it takes much, much more time to type it than if it was a free-form text.

In other words, it means that 'if 99.9% of the time you spend typing isn't spent on typing code, you're doing it wrong'. Or at least I understood it this way.


It's worth mentioning that, if you're not doing regular backups of your hard drive, storing something on your hard drive as a "backup" is a gargantuanly bad idea. Use AWS or some other redundant cloud solution; the likelihood of Amazon shutting down AWS within the next <X> <time unit>s is a lot smaller than the likelihood that your hard drive will fail in the same timeframe.


As natural as it feels on everyday use, dropbox is really an amazing thing


Seconding this. Dropbox is the first "backup" system I really, really like.


These days, most folks posting here have a backup regime for their PCs.


My experience would lead me to disagree.

I would hazard to guess that a lot of people reading this are very capable of creating a backup system, but they 'don't have the time' to either implement it or double-check that it's working.

A lot of smart people who deal in abstractions (like computing) are paralyzed. They have understanding how to do a task, but never seem to have the motivation to follow though. Especially if the task has no immediate reward, or no obvious indication that the task is done.


That might be true - but in my little corner of the world, Apple have solved that problem for their users, and even most of my least technical friends know the importance of keeping regular backups and Time Machine has reduced the friction enough that proper backups are the rule rather than the exception. I also see a significant number of friends/colleagues in the Linux camp telling people how they've got backup regimes that work "just like Time Machine", so I think the effect of Apple having shipped a "good enough and very easy-to-use" backup system has further reaching consequences than just people running Apple hardware - when people have Macs at home with Time Machine, they ask difficult questions at work when the IT people tell them recent revisions of their files aren't available…

Going forward, I see Dropbox or something like it making "backups to additional local spindles" become quaint and anachronistic. Why backup to an external drive next to my computer, when "the cloud" arranges to have copies (and archives) of all my data on multiple machines I own, as well as on Dropbox's (aka Amazon's) servers?


I agree concerning non tech people. From my experience, they start doing back-ups once they lost a disk.

However, considering how ridiculously easy it is to set up a simple rsync script, I doubt that many tech people do not do backups. Even though proper backups include frequent integrity checks, just setting up a rsync script gets you a long way already. If your data is encrypted though, a flipped bit might cause havoc already, thus making frequent backup integrity checks absolutely mandatory.


Sysadmin here; I've set up Amanda (and those damn disk "tape" rotate scripts) and Crashplan before, and both are currently broken (mostly due to the awful high failure rate of WD and Seagate "green" drives)

TL;DR I agree with the above.


Really? I have lots of WD Green's and none have ever failed me. In constant operation (however, mostly parked and idling) since the first generation came out. With me for a long time, they're been around the world. What do you suggest instead? Where do you see problems? I guess my usage profile is very different from yours..


Wow, do you always copy+paste or download everything you write on the internets?


Yes.

I write everything in TextPad first, saving as I go with Control-S to the appropriate directory using the date & time in the file name. I like to work full screen in large letters and I love lime green on black.

I have one corporate client that deletes all emails every 6 weeks (for legal reasons). Seems like I'm the only one who has a record of anything because of my practice.

I operate under the assumption that anything free will go away at some point. I plan on moving somewhere else with my data. Also, it's really easy to find anything on my hard disk.

A little background:

Years ago, I found my best friend from college and we started to exchange emails. I emailed him a trivia quiz with 20 questions. He later told me that he was laughing like crazy while he answered them for the next hour. His wife called him when he was on Question #19. By the time he returned, his computer had crashed and he lost everything. (This was obviously pre-Gmail.) Now we'll never know what he answered.

I don't know what will be important later, so I'll never let anything like that ever happen to me.


I'm pleased it works for you.

I'm gently surprised you haven't created a more automatic solution. Something that's autosaving (with a timestamp in the name) every X seconds, perhaps. Something that allows you to enter text direct to the browser text field but still saves a copy for you.

I'm sure a many people would like to automatically capture everything they type and have it saved, but would not have the discipline to copy and paste.

But, as I say, if it works for you that's all you need.


Years ago, when computers were more crash-prone and and autosave wasn't yet a feature in MS Word, I ran a keylogger on my own computer for just this reason. On several occasions, it saved me from having to rewrite multiple pages of school assignments I would have otherwise lost due to crashes.


I like Microsoft's OneNote for this because it removes the saving/naming from the workflow, and it has a decent search.


<i>I have one corporate client that deletes all emails every 6 weeks (for legal reasons).</i>

Isn't this kinda weird? How exactly is he operating his business, such that needs to leave absolutely no paper trail?


You can surround your text in asterisks to achieve italicization.

http://news.ycombinator.com/formatdoc


Thanks!


Do you sincerely think that everything you (or anyone) write on sites like this is worth saving for later? I don't think so. 99.99% of comments are inane and worth 0 (including mine).

What do you do with those texts? Reading them back months later and say "oh, what a sharp mind I have"? And what about the context (other's messages you answer to or that answer yours)? Do you also save it? Are also as worthly as yours?

I'm really surprised somebody does this.


Funny you should ask that, of Ed in particular.

http://news.ycombinator.com/item?id=2564099

Of course, you are welcome to judge it as simple ego-stroking or whatever, but quite a few people would disagree.


There's the "It's all text" Firefox extension which might help here. It can open your favorite text editor for text areas. This makes it easier to save a backup copy (I guess you could change it to do this automatically), no copy/paste needed.


That's a pretty neat extension, thanks! To make it work on my machine I had to follow this though: https://addons.mozilla.org/en-US/firefox/addon/its-all-text/...


Interesting. How do you deal with / save / manage permalinks and inbound links from other sites? Or is it a case of those are just not important enough to worry about? (Since Google has a record of all the content it believes is currently on the Web)


What exactly do people do with this data, specifically for sites like hackernews? I don't think I've ever particularly cared about my own comments. Without context they are pretty much useless anyway.


I'd go further and say that even with context my comments on hacker news are practically useless. I see online discussions as a pastime, not something that needs to be conserved on triple redundant backups.


"If you're a hacker and more than 5% of what you type goes anywhere other than your code repository, you should probably reevaluate what you're doing before you make it any easier to do it."

I'm a hacker but I'm also a blogger, marketer, entrepreneur and many other things. And I am sure many other users here also are. The idea that you should only write code all the time is flawed, at best.

And hard drives crash. And other backups also can get lost.

But I agree one shouldn't worry much about online data being lost. And for any data you really care about, have some backups, yes.


Interesting idea. I do a similar thing with OneNote on long outgoing mails that I write that take a long time to compose.

However, how do you deal with threaded conversations and tracking the 1 liner replies to those mails? There are a large # of mails that I write that fall into this category.

Someone I know wrote his own email client that is essentially a long text buffer where email conversations are persisted to a single text file. He has commands that allow him to select portions of text from that file to compose in his replies. He's used that system since the 1980's and still has all of his email from back then logged into individual files.


I don't get it, email is the easiest thing in the world to backup and view offline. You can just run e.g. offlineimap to periodically download the messages, and then you can use virtually any email client to view them

Tracking replies is done automatically as long as you actually click "reply", because mail clients add the In-Reply-To header with the ID of the message you're replying to.

I'm not trying to be obnoxious, I just don't understand what's the issue.


I have a cron job that archives all my gmail content to my local box in mbox format. I use getmail: http://pyropus.ca/software/getmail/


how do you deal with threaded conversations and tracking the 1 liner replies to those mails?

I only save that last copy in the same .txt file. I email it to myself, Ctrl-A, Ctrl-C, Ctrl-V, X-Delete.


I plan on taking this approach and moving my blog to GitHub Pages w/ Jekyll. That way it will all be neatly backed up in a local repo as well, with the benefit of version control.

edit: Has anyone here migrated from Google Sites to Jekyll before?


That's what scripting is for! 10 minutes with Python + lxml and I have a script that saves everything I write to a text file (usually in JSON to keep the structure).

For example, I use Read It Later, so I wrote a script to fetch the URLs and then use wget to mirror each URL.

Tumblr is even easier: open the e.g. Liked page, use xdotool in a bash loop to keep pressing pagedown, then just save the page with the browser itself.


I think carrying this to its logical conclusion will lead you to a self-hosted instance of something like http://www.engag.io meets http://thinkupapp.com/ meets https://singly.com/

Which would be a very useful tool indeed.


"But I will lose nothing..." Oh come now - surely your time was valuable. And your enthusiasm/willingness to give another service hours and hours of your time for the hope that they will be around "forever" will almost certainly wane if these services went down for the count.


And of course you grab a copy of what everybody else says too, since things like HN and Twitter are mostly about context...


Well, I pretty much do, but my process is automatic and heavy on third-parties: http://www.gwern.net/Archiving%20URLs


What you've alluded to is actually how people will interact with nearly all services in the future, through a proxy. Great effort is taken by social media sites to make it difficult to migrate off. But what they, and their investors, don't want you to know is that eventually these services will be reduced to API's that your proxy (an intelligent agent) communicates with.

Proxies already exist for linking your Twitter and Facebook feeds, and this trend is going to continue and possibly one day even replace the WWW we use today.


Why would the site owner allow this?


Site owners can't prevent their data from going through a proxy. If they don't provide an API the service is going to look backward, and be less useful in the future. This type of system is inevitably coming because there's no way a human can possibly keep up with the number of new content services that are coming out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: