
Perkeep – Open-source data modeling, storing, search, sharing and synchronizing - noncoml
https://perkeep.org/
======
mastax
Since people are confused about what this is I'll write a summary (from old
memory so it's probably 80% correct)

It is a consumer-oriented storage system that is:

\- Content addressable

\- Indexed

\- Tag-oriented (vs. hierarchical)

\- Permissions, encryption, compression, sharing, etc.

\- Spans storage across machines and clouds

\- FUSE mountable

\- Has CLI and Web interfaces built-in

The intent is to be a personal data dumpster that you can throw all of your
files and other data (tweets, etc.) into for search and backup.

The website could be better organized to convey this information quickly.

~~~
bradfitz
Camlistore (renamed to Perkeep) author here.

It is true that the website needs some love & updated docs. We've been working
on Camlistore for 8 years now (with a few drier spells) but our focus has
never been marketing. If anything, we didn't want too many non-nerd users for
a number of years because it wasn't ready for non-developer usage. That's
starting to change.

We have pretty good docs for configuration and such, but we lack some concise
high-level text about what the project is and why.

I'll prioritize that.

~~~
jasim
For everyone else reading this, here's more context. I once tried creating
durable physical storage that spanned multiple external hard-disks with a
single logical schema, but then discovered Camlistore and git-annex and
decided to let more competent people build it.

The idea is that we should be able to own and manage our personal data - which
runs into terabytes across one lifetime - without having to trust and/or pay
the big cloud companies. So Camlistore from its earliest days had integrated
photo gallery since multimedia is where most of the bytes are consumed.

The whole thing once had the label the IndieWeb movement (which we should
revive), and Wired wrote about it here - [https://www.wired.com/2013/08/indie-
web/](https://www.wired.com/2013/08/indie-web/)

Brad Fitzpatrick is also the creator of LiveJournal where he wrote the
original version of Memcached in Perl. He also wrote OpenID, and then went on
to work with Rob Pike and team on the Go Programming language. Camlistore was
one of the earliest projects written in Go (before Hashicorp made it cool) and
I imagine that had something to do with him getting into the language itself,
but that's for Brad to clarify :)

~~~
toomuchtodo
Brad also wrote MogileFS (omg files!) at LiveJournal, a self-hosted precursor
to cloud object stores like AWS’ S3.

[https://code.google.com/archive/p/mogilefs/](https://code.google.com/archive/p/mogilefs/)

------
nayuki
Some previous threads on Camlistore/Perkeep:

* [2014 Jun] [https://news.ycombinator.com/item?id=7842629](https://news.ycombinator.com/item?id=7842629)

* [2011 Jan] [https://news.ycombinator.com/item?id=2156374](https://news.ycombinator.com/item?id=2156374)

------
euske
The thing is that nothing is good enough for keeping it for lifetime. A
hardware might be broken, a supply might be discontinued and a software
maintainer might disappear. You'll need to keep refreshing the data from one
device to another, for the rest of your life. That said, I'm curious how easy
this system can handle porting from one device or service to another, in
varying formats and architectures. The only way to stay relevant is to
constantly keep changing/adapting to new things.

~~~
bradfitz
A huge focus of the project is on human-readable schemas and formats. Even if
all specs & source code of the project is lost, the data should still be
recoverable from a curious archaeologist.

Between replicating between several companies as well as your own hardware &
having friends & family mirror your stuff (encrypted or not), the ideas is
that some copies will continue to exist.

Hardware failures are a given. Companies failing and friends & family dying is
also a given. Natural disasters too. The only option seems to be trusting
nothing and replicating all your data to lots of places, in future-friendly
formats, and that's what Perkeep aims to do. And then a ton of tooling on top
of that.

~~~
euske
Interesting. I thought plaintext + .tar.gz or .zip format on either FAT or
ext2 fs is the best bet for forward compatibility, and anything beyond that is
too complex or obscure for future archaeologists. The obvious problem is the
searchability, but I'd imagine in future that indexing a few TB of text/image
will be a breeze.

------
davidbanham
Looks like there's been some nice progress since I last looked at Camlistore!
The importers from cloud services like Twitter look really interesting.

------
natural219
Camlistore & Brad Fitzpatrick's original writings are what initially got me
into decentralized web advocacy. Since then, I've moved on from this project,
since it seems to move at a very slow place and the authors do not seem very
interested in widespread user adoption.

With this name change, I'm slightly more interested again. We'll have to see
in the coming months whether they become ready to displace actual large social
media platforms or whether it remains a toy project.

------
nerdponx
How does this work?

~~~
jamestomasino
That was my first question, too. After clicking through a few links and even
opening up an intro presentation I was left unsatisfied and closed the tab.
This project desperately needs an FAQ or overview video up-front.

~~~
jbob2000
It downloads and catalogs a bunch of crap onto a local hard drive.

~~~
munk-a
... but hard drives don't last forever? And if that is all it's doing, why not
just save the stuff to your hard drive in the first place.

I am so confused by what these people do.

~~~
mikestew
I spent about ten minutes on the site, so hardly a domain expert. I’m still
confused, too. But as best I can understand, you have the option of storage on
S3, Azure, and the like. I assume that with a plug-in/driver, you could store
anywhere you like.

But non-local storage does seem to be designed in, because there is text like
“if there’s a daemon running rsync in the background, you’re doing it wrong”
and “if your UI requires marking folders to be synched/not synched, it’s
broken”, so there appears to be an assumption of putting your data elsewhere.

~~~
NateDad
It's like your own personal google drive / dropbox / git repo

It's a content addressable storage system. There's plugins to import or export
from various major services like foursquare, twitter, etc. and plugins let you
store stuff in S3 or mongo or google cloud storage, etc.

~~~
spondyl
I suppose technically running something like OwnCloud with a plugin to fetch
your content (text/photos etc) from the various social networking APIs would
look near identical from the outside?

------
linsomniac
I've been watching Camlistore for a few years. I peek in on it every once in a
while, long enough between that I usually can't remember the name. I like the
look of it, but haven't been convinced to go from my decade old ZFS setup to
Camlistore.

I feel like OwnCloud is more compelling, from a glance. Anyone use one or both
and able to comment?

~~~
bradfitz
Camlistore author here.

If you only store files, sure, use ZFS.

Perkeep (Camlistore) doesn't write to a block device. It has storage backends
for a filesystem (which can be ZFS) and any number of cloud object storage
providers (S3, GCS, etc).

Perkeep's main value over a fancy POSIX filesystem is storing nameless things
(tweets, other social media content + interactions, bookmarks) in common
schemas, and permitting search over it all, and then having a variety of ways
to browse it (CLI, FUSE, API, web UI, etc).

It's also good at sync to & from things any which way without merge conflicts.

------
tabeth
How is this any better than just burning your data to a blu-ray, which lasts
centuries when stored under proper conditions (theoretically, anyway) I need
to give this a closer look.

~~~
ams6110
Not having to worry if there will be any Blu-Ray readers available in a
century.

~~~
tantalor
Seriously. The only device I have which can read a CD-ROM is my car. The PS4
can read Blu-Ray and DVD but not CD-ROM.

~~~
freeflight
If you really need to read a CD-ROM then getting your hands on a SATA DVD-
Drive, which usually are able to read CD-ROM, shouldn't be that big of a
problem. Without looking hard I'd probably come up with 3 spare ones in my
basement alone.

Tho I don't think that many of the self-burned CD's from 2 decades ago are
still any good, I know mine usually ain't.

~~~
djrogers
Then I’d need to find a computer with a SATA interface... looking 10 years in
the future it’d be even less easy.

~~~
jwfxpr
I have friends who work in the IT section for an under-resourced cultural
institute focused on the preservation and recording of disappearing cultures
and ethnic groups, including the preservation of speech and utterances in
languages that now have no living native speakers. They discovered recently,
to their alarm, that the only surviving copies of some recordings were now on
old 3½ and 5¼ inch floppies that had somehow been stored without accurate
cataloguing. They are struggling to find equipment that can 1) read the discs,
2) interface with the disc drives, 3) tell them what is actually on each disc
and what file formats are in use (they have good guesses, but no certainty)
and 4) find software that will be compatible with those formats.

They have neither the skills nor budget to do in-house nor outsourced
forensics for this. At this point they don't even know what exactly might be
lost to humanity's knowledge, and the descendents of these people, forever.

~~~
figgis
Would something like

[https://www.officedepot.com/a/products/490099/Bytecc-
Interna...](https://www.officedepot.com/a/products/490099/Bytecc-Internal-
Floppy-Drive-with-FlashCard/)

work?

~~~
Tempest1981
Floppy + SD card reader -- looks handy. But still has an IDE connector. Recent
motherboards only have SATA. The review here says you can't access the floppy
via USB, fwiw:
[https://www.newegg.com/Product/Product.aspx?Item=N82E1682019...](https://www.newegg.com/Product/Product.aspx?Item=N82E16820192022)

Maybe something like this: (USB floppy) [https://www.amazon.com/External-
Floppy-Portable-Windows-Requ...](https://www.amazon.com/External-Floppy-
Portable-Windows-Required/dp/B00RXEWOAA)

------
stevekemp
I've been keeping an eye on this project for years, because it seems well-
designed, and the authors are very capable developers.

The biggest problem I found was getting documentation on replication. Having
two+ servers mirror-each other, across the internet, seems like a good idea
given that otherwise you have a single point of failure as you import all your
media/files.

------
teddyh
I’d be interested in a system for converting existing stuff from, for example,
the Firefox “ScrapBook” plugin, to this format. (The ScrapBook plugin is not
compatible with Firefox 57’s plugin API, so anyone who upgrades to Firefox 57
immediately loses all their saved ScrapBook pages.)

~~~
sp332
I have no idea how compatible this is, but someone is working on a new
version. [https://addons.mozilla.org/en-
US/firefox/addon/scrapbookq/](https://addons.mozilla.org/en-
US/firefox/addon/scrapbookq/)

------
andrepd
The perfect tool for a digital hoarder like myself. Will follow this with
attention.

------
didibus
So, its just a document server that can be run over multiple computers? I was
expecting something peer to peer. If I understand correctly, you can think of
this as a dropbox that you can self host?

------
kindfellow92
What is the target audience of this? What are the intended use cases?

Is this supposed to be used directly by users or as an API for a user-facing
application? How is this different from a document DB like MongoDB?

~~~
flarg
Long time follower of the project here... So far it's been aimed at geeks who
want to archive their content from the cloud, eg tweets, but it also stores
files. Because of the way it is designed I've always thought there is a
compelling use case for its use as a file and object store for organizations
where auditing of data records is expected and sharing of data is a
requirement.

------
brotherjerky
So is this ready for prime time yet? I used to follow camlistore, and it was
still a little rough even for CLI nerds.

~~~
gh02t
So I just downloaded it and played around and as far as I can tell there is no
way to delete files. Or, more specifically there is a way but it's not
implemented or otherwise accessible as far as I can figure from the rather
sparse documentation.

If someone would like to explain to me how (if?) the garbage collection works
I'd appreciate it, because I like the concept and kinda want to use this, but
deleting stuff is a rather important feature for me. All I could find
searching was a post by the devs saying it was already mostly implemented but
not finished and not a priority...

[https://github.com/camlistore/camlistore/issues/792](https://github.com/camlistore/camlistore/issues/792)

Like, I understand that this is a spare time project (I think) but not
considering deleting/pruning files to be an important feature is really
confusing to me. In its current state, if I accidentally upload the wrong
file, am I now stuck with it forever?

Edit: ok I figured out how to at least delete things in the UI (clicking the
check mark opens a side menu apparently, `camput delete` doesn't seem to do
anything), but as far as I can tell it doesn't actually delete them from the
database without running a garbage collect, which isn't implemented so it just
hangs around in purgatory.

------
j7ake
Is this possibly a Dropbox replacement ? do I have to host the files on my own
server ?

------
tradersam
Alternatively: "Hard-drives let you permanently keep your stuff, for life"

~~~
melq
Hard drives are an especially bad choice for lifetime reasons, and SSDs don't
solve the problem either :P

~~~
tradersam
I don't agree — that's why things like _redundancy_ are commonplace. :D

~~~
mulmen
“You're weak on logic, that's the trouble with you. You're like the guy in the
story who was caught in a sudden shower and Who ran to a grove of trees and
got under one. He wasn't worried, you see, because he figured when one tree
got wet through, he would just get under another one."

[http://multivax.com/last_question.html](http://multivax.com/last_question.html)

~~~
anderspitman
Huh, I don't think I've seen a reference to that story in years, but just
emailed it to a coworker a couple hours ago.

~~~
mulmen
One of my all time favorites. Though it took me a while to remember the source
of the quote. I thought it had been used in the context of global warming so
google didn’t turn up much. Then I remembered it’s actually from a story about
universal cooling.

------
passwordqq2
Question if anybody gets to this: I'm taking a break from work and computers
for a year. How would you guys suggest I store my kbdx data securely In a
failsafe manner without worrying about forgetting passwords or losing paper
chits or USB keys?

Edit: after seeing some good suggestions about physical storage, I've decided
to increase the difficulty of the question, hard mode- How would you do this
without physical stuff? (more, new answers about physical welcome too)

~~~
jacquesm
For something on the timescale of a year I would just keep the system that you
already have up and running. It it were much longer than that I'd go with a
bank vault that contains the access keys and something like tarsnap and yet
another backup with another cloud provider.

~~~
passwordqq2
I'm assuming all my electronics fries, papers burn and memory goes away. (to
be safe)

Bank vault might be a good idea (assuming they id me fine)

------
zyxzkz
I was gonna say, this sounds like Camlistore.

~~~
DiThi
Because it is! (edit: oh I see it's in the header)

