
Camlistore – open-source personal storage system for life - hswolff
http://camlistore.org/?
======
NhanH
I can't quite grasp what this is (as in, what the software is, is that a
server, a client, or a combination of both), or how could I get started in
using it.

>Your data should be alive in 80 years, especially if you are

Is there any special significane for 80 years? And exactly how is it safe? The
"Download" sections have me installing the server on my box, and my hdd
certainly won't survive 80 years.

Under storage, it says that

>Implementations are trivial and exist for local disk, Amazon S3, Google
Storage, etc.

How does encryption work? I can't seem to find out at which step the
encryption is done (client side, or server side?). The home page mentions it's
"private by default", considering that I don't know that much about security
and cryptography in general, how safe would I be in using this, for whatever
purposes that it could be used for.

Mention under potential use cases: filesystems backups and Document management
CMS. That's ... interesting.

~~~
lovemenot
I am not authoritative, but a glance at the documentation suggests encryption
is not explicitly part of this spec. Since a blob may be anything, one
instance of a blob can be a file encrypted by external means.

I believe Private by default just indicates that objects are not exposed until
they are shared: like a file system is private by default and a Github repo is
not.

~~~
xiaomai
Looks like there is an encrypt module that uses AES-128:
[http://camlistore.org/pkg/blobserver/encrypt/](http://camlistore.org/pkg/blobserver/encrypt/).
The modules are composable so you could encrypt your remote replicas of the
main store (or hopefully all of them?)

------
natural219
If you're confused as to what this project is about or why it's important, I
highly recommend watching the first ~10 minutes of the video posted on the
main site. Brad Fitzgerald does a much better way of explaining the value
proposition.

I personally think this is one of the most hugely important ideas/protocols to
come out of the last decade. Even if Camlistore doesn't do it, it's hard to
imagine software programmers of the future _not_ agreeing on a shared protocol
to fill this huge, huge need.

I actually wrote a proposal for a Mozilla grant recently outlining a couple of
the reasons why decoupling storage from user interface is a fundamentally good
thing for society[1].

[1][https://www.newschallenge.org/challenge/2014/submissions/sto...](https://www.newschallenge.org/challenge/2014/submissions/stop-
letting-facebook-own-your-data-web-applications-should-ask-you-for-your-data-
not-the-other-way-around)

~~~
XorNot
I'm 18 minutes in and I have no idea what they're actually doing.

I get what they seem to _think_ they're doing.

But "storing arbitrary" data is a problem _everyone_ in the world is trying to
solve. It's what a filesystem _is_. It's what backup tools do. It's a pretty
well attacked problem.

A bunch of SHA1 hash addressed data is almost as useless as a raw disk with no
filesystem.

Similarly, I'm seeing a lot of JSON. Who says JSON will be remembering in 80
years? It's ASCII! But that's not much of a guarantee.

There's a lot of claims here, which don't seem to translate into something
which seems actually useful. For example, "immutable" sounds good till you run
out of disk space because you have 1000 copies of a sightly different VM image
in the system. It's pretty easy to build a system which stores "everything".
It's a lot harder to build one which is _useful_.

------
zrail
Original post from three years ago:
[https://news.ycombinator.com/item?id=2156374](https://news.ycombinator.com/item?id=2156374)

Camlistore has always been a project that I'd like to try out someday but
every time I've looked it's seemed more hypothetical than real. Kenneth Reitz
(of python requests fame, among other things) put together a much smaller
thing called Elephant[1] that I've been tempted to explore as well, and it's
sort of in the same vein.

[1]:
[https://github.com/kennethreitz/elephant](https://github.com/kennethreitz/elephant)

~~~
lovemenot
From Elephant's Github readme:

>> Suddenly, your data becomes as durable as S3

Seems slightly less ambitious.

------
PhasmaFelis
This is a subject of great interest to librarians and archivists--how to store
various types of data so that both it and its associated indexes remain fully
accessible and searchable, with a minimum of maintenance, across decades or
centuries, even as formats, maintainers, and institutions rise and fall.

Anyone know if these guys are working with existing archival groups and
standards? It would be a shame if they're reinventing the wheel.

~~~
PhantomGremlin
> how to store various types of data so that both it and its associated
> indexes remain fully accessible and searchable, with a minimum of
> maintenance, across decades or centuries, even as formats, maintainers, and
> institutions rise and fall

Very nicely written description of something useful. IMO much better than the
words on that website.

What I don't like about the website is the "jargon". These words are from just
the first paragraph:

    
    
       formats
       protocols
       modeling
       synchronizing
       post-PC
       objects
       FUSE
    

Huh?

The website text was probably written by an engineer. Reminds me of a great
quote:

    
    
       "engineers are all basically high-functioning
       autistics who have no idea how normal people
       do stuff" - Cory Doctorow

~~~
icebraining
You're assuming the website is _targeted_ at normal people. Considering the
state of the software, I find that a suspicious assumption. Most likely, the
jargon is used because the intended audience can understand it.

~~~
PhantomGremlin
You may be right. Maybe its best to only have jargon until the software is
ready for prime time.

My general counter argument to jargon filled websites is the great ending of
Trading Places:

    
    
       What about lunch?
       The lobster or the cracked crab?
       What do you think?
    
       Can't we have both?
    

Why can't a website have _both_ a clear explanation for normal people,
followed by all the jargon necessary for the target audience?

------
rabino
If you are learning / playing with Go(Lang) you should definitely take a look
at this source.

As a tool it's a bit rough yet, but it's going to be awesome.

Disclaimer : I have a man crush with Brad Fitzpatrick. Well, mostly with his
code.

~~~
bronson
[http://sfist.com/2014/05/19/silicon_valley_recap_youre_gay_f...](http://sfist.com/2014/05/19/silicon_valley_recap_youre_gay_for.php)

~~~
rabino
Exactly. But keep it quiet because I'm going to meet Brad in a conference in a
few months and I don't want to freak him out.

------
buro9
My understanding of this is that it's "A git style CMS for all your data"...
so you can't nuke things as there's history of it, and you can put any data
you want into it.

Where I struggle is that either the definition of "your data" is narrow, or I
shouldn't be using it for all my data.

Back in 1999 when I first learned about MP3, I started ripping my CDs. I have
several thousand CDs, this took a lot of time. Before I completed the task, at
a rate of a few CDs each evening, FLACs came into my life and I started back
at the beginning. I deleted the MP3s as I replaced them with FLACs.

I really don't ever need to keep some data. But maybe it's not the kind of
data that I should be putting in Camlistore? I think of it as my data, after
all these are my CDs.

I struggle with the concept of Camlistore as I have an 18TB NAS in RAID6, 12TB
usable... and it's 80% full. If I had history I'd have a storage problem
today.

I'm perhaps an outlier, I chose to self-host my data locally rather than rely
on cloud based things. And I chose to keep everything... photos, documents,
email, video, music. And everything I keep is in the highest possible quality:
FLACs, DVD VOBs, raw photos, etc.

But then... who is Camlistore aimed at if not the people who like to store and
have control over their own data?

I guess I just find delete too valuable a feature for the larger data I store.

And perhaps I'm just wrong on the use-case, maybe it's really "for all your
data (that you cannot re-acquire)". I just don't want to ever rip those CDs
again. But if I do, those old versions are dead to me.

~~~
epaulson
It's not quite as much like git as you might think it is. It's git-style in
that it stores data as blobs named by hash and tracks everything with pointers
to those blobs, but isn't as committed to keeping everything forever. Git is
designed to be able to reconstruct a set of data at any point in that data's
history, so it makes sense to keep all previous data in its storage system.

However, even git will delete data if you delete the "tree" metadata, ie you
nuke some branch that has no downstream dependencies because you never merged
it or there are no branches off of it. In that case, if the blobs aren't
reachable by any tree/graph, git can garbage collect those blobs.

Camlistore does the same thing: if you delete all pointers to the data, those
blobs might eventually be reclaimed. As a matter of implementation, camlistore
doesn't do that today, but it's not the case that camlistore can't or won't
let you delete data.

------
tokenizerrr
I just tried to get this running using release 0.7, but I am puzzled by the
web interface. I have not managed to upload a file through it, and once I did
upload a file through the commandline tool camput it did not seem to show up
on the web interface. Seems like a cool concept, and I hope I'm doing
something wrong since I would like this to work.

------
KMag
Spoiler: if you came here all excited about the name beginning with Caml,
Camli stands for Content-Addressable Multi-Level Indexed. It isn't related to
the Categorical Abstract Machine Language. It's written in Go, not OCaml.
(Yes, I know long ago Zinc was substituted for the Categorical Abstract
Machine down in the depths of OCaml.)

~~~
angersock
Is OCaml used anywhere?

I've only met one OCaml programmer in my life, and they were a graduate
student and rather strange.

~~~
amirmc
It's used in a lot of places, including Facebook for Hack and Bloomberg as
well as the oft cited Jane Street.

OCaml users:
[http://ocaml.org/learn/companies.html](http://ocaml.org/learn/companies.html)

FB Hack: [http://cufp.org/2013/julien-verlaguet-facebook-analyzing-
php...](http://cufp.org/2013/julien-verlaguet-facebook-analyzing-php-
statically.html)

JaneStreet:
[https://blogs.janestreet.com/category/ocaml/](https://blogs.janestreet.com/category/ocaml/)

------
rlpb
This sounds like what git-annex does today, except with a front-end. How is it
different from this understanding?

git-annex stores things content addressed, gives me different views into the
data (tags, etc), and supports different back-ends (S3, remote rsync, local
filesystem, external disks, etc). Isn't this exactly what is described here?

------
jamii
[http://nymote.org/](http://nymote.org/) is running along similar lines and
has some serious technical chops behind it (including some of the original Xen
folks). I'm not so sure about their UX skills, but it's worth keeping an eye
on.

~~~
amirmc
(I'm the site author) Is the UX comment something about the site or the tools?

If it's the site, that's on me and I have a bunch of work queued up to better
represent the tools. Specific feedback would be welcome.

If it's about the tools, the first UI is the command line as we expect these
to be components that developers can use. In terms of the initial applications
we refer to, they would effectively be CardDAV and CalDAV servers so you'd
hook in your existing apps to them.

~~~
jamii
To phrase it better - I don't have any idea whether the UX skills of the team
match up with the technical skills. I wasn't criticising anything in
particular, just noting a lack of knowledge on my part.

------
genericacct
You can't overwrite your data and can't delete it either? I am puzzled.

~~~
adamgravitis
Similar semantics to git... it's a super robust approach.

~~~
XorNot
How is it _different_ to git? Or I guess bup, more appropriately.

What is _this_ doing that solves so many problems that apparently they can't
outline what it is actually doing on the frontpage of the website in clear
language?

------
rakoo
To everyone who doesn't understand what this is all about, I suggest you read
the presentations and watch the videos [0]. They're going deeper into what
camlistore is and can do.

[0]
[https://news.ycombinator.com/item?id=7842629](https://news.ycombinator.com/item?id=7842629)

------
malkia
As a non-english speaker, I always associated this with caml/ocaml and the ml
languages - and I still do somehow even after I have visited the site and read
about it, and what it is.

What does "camli" means?

~~~
epaulson
It's an acronym: "Content-Addressable, Multi-Layer, Indexed" Storage. Nothing
to do with ocaml.

Content Addressable: What things are named depends on their content. Two
identical things have the same name. For example, the "name" or "key" for the
data is the SHA-1 for the data, ala git.

Multi-Layer: The whole storage stack is built out of several layers. The blob
store sits on the bottom, and only knows about bytes, and access is via the
SHA-1 of those bytes. Things that you might store (Files, directories, sets,
collections of tweets, social graphs, etc) build on top of the blob store by
additional blobs that hold pointers to data blobs. Again, it's sort of like
git. A front-end might sit on top of that abstraction.

Indexed: blobs of JSON that have a few special attributes are recognized and
indexed. So, you might have a bunch of blobs with these special attributes
(ie, "tag") and be able to ask the indexer "Give me all blobs with tag equal
to foo", rather than having to search through the blobs directly.

------
gnopgnip
This sounds very similar to diaspora. The whole project is about replacing
social networkings idea of giving all of your info to a 3rd party, and only
sharing some info with people you trust.

------
fiatjaf
This is the perfect solution for the laymen constant losses of data due to
Windows breakages that lead to complete formatting of disks.

