

Ask YC: Who's pushing their data to 'the cloud' - inovica

Just ordered a new laptop and I've decided to try to push as much of my data online as I can to make life more flexible.  I'm just curious to see who here does this and what do you use if so? I am thinking my regular work files, my photos etc.  I'd like to keep some files locally as well, but thinking that anything older than 30 days could be stored online - especially if its over a replicated system like Amazon S3.
======
bootload
_"... Just ordered a new laptop and I've decided to try to push as much of my
data online as I can to make life more flexible. I'm just curious to see who
here does this and what do you use if so? ..."_

WHY

I do. Any post I make I have squared away. This allows me to make my site the
definitive collection of my data. It also allows google to index it. I have
control over my own content and if for some reason a third party site wants to
exert control I still have my stuff. Unlike slashdot who didn't allow any
tools to save posts. Hence I lost from about 1996 to 2002's worth of comments.
[0]

HOW

There are 2 ways of looking at this. You can either generate you _"stuff"_
from a single point [0] at your machine, save it and pump it out OR use the
web apps as clients and suck the data back via RSS, Atom, JSON etc.

I've been doing a bit of both. Pushing stuff from my blog engine after I've
cached it and now I'm beginning to suck up the various websites I frequent. So
for pushing out I have:

\- OUT: flickr (blog, tags, images) , twitter (snippets), hackerid (hackernews
data), links (various links I save including links to hackernews & export to
delicious)

\- IN: hackernews (all posts every 15m, friends),

\- IN TODO: wordy (words I use), spock (tags), colourlovers (colours),
librarything (my library), amazon (new books), lastfm (what I'm currently
listening to), delicious (new links I find), twitter (friends), flickr
(friends, processed images, text, tags)

Now as you can see that's a lot of data. Some of the things I'm finding:

\- It's easier to push than pull if you want an accurate copy because you save
before you export

\- pulling data means you don't have to write firstly the interface to capture
the data and simply call RSS

\- not every site has an API or good RSS feed

\- Linking data together is not easy except by time though you could try to
match by friend (ie: friend is on twitter, flickr, hackernews)

\- displaying the data effectively is difficult simply because of the volume
and complexity of it. A good example of how to do this is
<http://friendfeed.com> Clear, simple and pretty much allows for good reading.

I'm now at a stumbling block with a templating engine I'm using and so I'm
pretty keen to just extract the data as Atom, RSS and JSON as individual feeds
or a mashed feed by date and write a Javascript based website to avoid having
to deal with heavy weight blog engines. Let the data go free and see how
people use it.

The key thing to realise is when you are compiling your data timestamp it at
DB level (if you are using one) ISO1606 format, maybe add a tag layer over the
top so you can get the benefit of tagging across data layers.

It's turning out to be an interesting project.

[0] <http://goonmail.customer.netspace.net.au/2005DEC131709.html>

~~~
bayareaguy
I've heard of ISO 8601 timestamps but never 1606 (and neither has google it
seems). What is that?

~~~
bootload
You are dead right. I was wrong. I meant "ISO 8601". Late night post and I
should have checked ~ <http://www.ietf.org/rfc/rfc3339.txt>

~~~
gojomo
From this RFC I learned that:

    
    
      Standard time in the Netherlands was exactly 19
      minutes and 32.13 seconds ahead of UTC by law 
      from 1909-05-01 through 1937-06-30.
    

Those wacky Dutch!

~~~
bootload
_"... From this RFC I learned that: ..."_

The bit I use is a bastardisation of _"1985-04-12T23:20:50.52Z"_. A shorter
version as a string, say "20070202T1422" stripping out the hyphens, colons and
Zulu.

~~~
bayareaguy
Actually I don't believe "20070202T1422" is a legal 8601 timestamp. My read of
the partial ABNF below requires you to designate the time zone with a "Z" or a
specific offset (e.g. +08:00).

    
    
       time-secfrac    = "." 1*DIGIT
       time-numoffset  = ("+" / "-") time-hour ":" time-minute
       time-offset     = "Z" / time-numoffset
    
       partial-time    = time-hour ":" time-minute ":" time-second
                         [time-secfrac]
       full-date       = date-fullyear "-" date-month "-" date-mday
       full-time       = partial-time time-offset
    
       date-time       = full-date "T" full-time
    

If your data is going to stick around a while (and possibly loaded into some
other system in the future) you should be sure to encode the timezone.

~~~
bootload
_"... 20070202T1422 is a legal 8601 timestamp. ..."_

20070202T1422 - but close enough for me in 1 timezone and limited space for
display. I have a double use as both a (8601 hack) and as a human readable
title. Adding extra "-" and "Z" makes it harder to read and the accuracy for
seconds is simply not required. Trade-offs I'm willing to make.

------
iseff
As one of the developers/founders of Openomy (<http://www.openomy.com>), I'm
keenly interested in this.

Our goal is to provide a place where people can store and access all their
online files. We want to be the Online File System. We're heading to the
"cloud" already and it's becoming increasingly difficult for us to manage our
own files. My photos are locked in Flickr, my docs in Zoho, etc.

We want to break down the "data silos" and provide a single interface to all
your "cloud storage" as well as provide a single API for developers to code
against to access user's data in meaningful and interesting ways.

We're taking a multi-pronged approach to this. One prong (currently launched)
is to allow you to upload your files through our API or our web interface and
we'll store it (using Amazon's S3 for reliability and scalability). You can
access the files and their metadata through the UI or through the API. You can
tag them, share them, make them public, etc.

Our next step is to provide many backing-stores. That is, import your files
from Flickr/Zoho, etc. We'll keep things sync'd up, so that Openomy becomes
your Online File System; storing the metadata of everything you do online.
Then, the UI becomes the equivalent of Windows Explorer or Finder, and the API
becomes the equivalent of the single file system API provided to you as a
desktop developer.

We think it's going to be, at the very least, extremely fun to build. :)

------
secorp
I keep copies of all of my files online using a distributed storage grid that
we've created.

(bias alert: I'm deeply involved in the project I'm about to mention)

A few years ago I realized that I'd either lost data due to hard-drive crashes
or mistakes and wanted some sort of safe place to put my data. We then started
a project (<http://allmydata.org>) which provides for a decentralized, secure
storage fabric. You can create your own grid if you'd like, create one with
friends, or use our managed grid (for a service fee).

I think that Amazon S3, Nirvanix, and others are starting to provide nice
basic services in this area as well, but I do think there is a lot of room for
retaining and saving information that is gently locked into web services
(Flickr, blogs, LinkedIn, etc.).

------
justtease
I've been thinking about the same thing. There's a Flickr tool for iphoto if
you use a Mac and I currently use Furl for bookmarking.

------
ubudesign
I use our own service at <http://www.i2drive.com> not only to store files but
also to edit files on the remote server. it uses webdav for back-end which
lets you map a drive from windows or mount from Mac and Linux.

------
DanielBMarkham
There are some products that do P2P+Server data sharing. I'm thinking Groove
especially, but I'm sure there are more.

This allows me to put my data "out there" among all the computers/friends that
I share my space with, without publishing the information for the whole world
to see. So I have ease-of-access, backups, online/offline access, etc without
having to go public. Plus it tightly integrates with my operating system and
office software, as opposed to my having to write a bunch of hand-coded stuff.

~~~
tx
Microsoft is so dead... I was intrigued by your post and went ahead to
research Groove a bit more, but their site is full of "marketing messages" and
I ran out of patience looking for an answer for "WHAT DOES GROOVE DO?",
therefore your post remains the only information I got about this service.

~~~
bayareaguy
<http://en.wikipedia.org/wiki/Microsoft_Groove> has a pretty good summary.

It certainly says something when you can get better information from Wikipedia
than from Microsoft about one of their own products.

