
Perkeep: personal storage system for life - setra
https://perkeep.org/doc/overview
======
buro9
[https://perkeep.org/](https://perkeep.org/) is the new(ish) name for
Camlistore, created by Brad Fitzpatrick and with a lot of active developers.

From the home page (rather than the linked overview):

> Perkeep (née Camlistore) is a set of open source formats, protocols, and
> software for modeling, storing, searching, sharing and synchronizing data in
> the post-PC era. Data may be files or objects, tweets or 5TB videos, and you
> can access it via a phone, browser or FUSE filesystem.

Things Perkeep believes:

\+ Your data is entirely under your control

\+ Open Source

\+ Paranoid about privacy, everything private by default

\+ No SPOF: don't rely on any single party (including yourself)

\+ Your data should be alive in 80 years, especially if you are

~~~
peterwwillis
> Your data should be alive in 80 years, especially if you are

How do they deal with obsolescence?

Software that used to exist 50 years ago doesn't run today, and most of those
formats (if they aren't text formats) are either obsolete or completely
unsupported. Emulators exist, but nobody actually uses it. Part of this is
because software becomes obsolete over time, and part of _that_ is because
hardware becomes obsolete.

How are they going to make software today that will run on new computers in 80
years, or how will they make software and data formats backwards compatible
for 80 years?

~~~
tialaramex
I sympathize with your skepticism but I think 1968 is so quantitatively and
qualitatively different that it's not a very helpful comparison.

In 1968 nobody had personal computers, they were not a thing. ASCII is still
really new, "files" aren't really a thing yet, the Multics system is under
development and nobody has yet made the pun "Unix" let alone named an
operating system.

What formats are you thinking of that weren't text formats but are now
"obsolete or completely unsupported" ? The Joint Technical Committee (home of
JPEG, MPEG, and so on) isn't even an _idea_ yet, many of the people who'll
form this committee are undergraduates or still in school. Machines aren't
storing pictures, they're barely storing meaningful text, it's mostly numbers,
big calculations.

If we ask about 40 years ago instead, things are hugely different. By this
point Unix exists, ADVENT exists, ASCII has "won". There is no Internet, no X
Window System yet, and there still isn't a Joint Technical Committee but
already the documents, software and systems are familiar because we're still
using them. At home there is Pong, and in pinball arcades the new Space
Invaders, both are nicely emulated today.

~~~
kbenson
> I sympathize with your skepticism but I think 1968 is so quantitatively and
> qualitatively different that it's not a very helpful comparison.

It's sort of like automobiles in 1968 advertising how they are made with care
and detail so they'll last, and made to be easy to work on so you can expect
them to actually have people (or yourself) that know how to fix them decades
later. People could easily come out and say most of what made a car in 1918
was very different to then, all the way down to the tires themselves.
Industries that have had multiple decades of general use mature quite a bit,
and people don't like to throw away stuff that works (or that they're fond
of). We'll still have computers capable of running a von neumann architecture
in 50 years, whether through hardware or software, and that's assuming we
can't just port/compile to newer systems if they aren't as extreme of
departures.

I still occasionally play computer games written in the 1980's, generally
through dosbox or something similar. I think the most likely reason we have to
lose access to running this software is if we lose access to running all
software, in which case nobody will really care (not that I think that's
remotely likely, just that it's the most likely scenario where that holds).

------
rwbt
I've recently started using Fossil[0] to archive all my personal data. It
works rather brilliantly. Technically you can use any VCS but Fossil is unique
in that the entire repo is a single SQLite db, so it's very easy to backup and
restore. Not to mention the web UI to have a quick glance before checking out
any files. Even better I can sync flawlessly between multiple hard drives and
computers. I've a few separate branches for Docs/Photos etc. I checkout the
related branch and just add more files whenever needed. After files are added
to the repo, I just remove the working copy. There are some limitations though
like files larger than 2GB aren't supported.

[0] - [https://www.fossil-scm.org/](https://www.fossil-scm.org/)

~~~
star-techate
Fossil's also very easy to put online, needing at a minimum a two-line bash
file to function as a CGI script.

Maybe more relevant to private data, the builtin wiki makes a good personal
knowledge database.

The next version of fossil will have a forum (seen already at [https://fossil-
scm.org/forum/forum](https://fossil-scm.org/forum/forum) ). With the time
sorting for threads, that might be good for temporal data that you wouldn't
want to put in a wiki.

------
setra
TDLR: This is a content addressed data store similar to IPFS (although this
project is older). You can configure one of several backends such as local
file storage, S3, SSH, etc. It includes an organization system based on tags,
and other meta data. You can construct a fuse filesystem representation based
on a query. A web UI exists allowing exploration of existing files, uploading,
etc.

------
BlackLotus89
I'm looking for something like perkeep, but with the ability to add
(scientific) metadata. Oftentimes when doing science for the university your
research fund is attached with clauses that obligate you to store all data of
your research for a timespan of 10-20 years and to do (who would have guessed)
scientific research - which entails saving information with every data point:
When was the data obtained, how was it obtained, who generated it, for which
experiment, what's the copyright on this, is it anonymized, pseudonymised, is
it connected to any other research, what's the doi/arxiv/ark-id connected to
it,....

An archive where you drag and drop your files that can upload everything to a
s3 storage (no not amazon s3) and tag metdata to it would be a dream. Right
now there is no good solution for this and in the beginning I took a deep look
at camlistore and hoped for a solution in it. (I looked at upspin, ipfs and
other solutions as well). If someone as a solution for this or if perkeep
could be expaned (or has the option somehow hidden somewhere) I would be very
happy if somebody could point me in the right direction.

------
jonbronson
It seems weird that deletion is prohibited. As we grow as people, sometimes we
no longer want to associate something with ourselves. A photo we don't want to
remember, for instance. This feels like an unnecessary restriction.

~~~
have_faith
> no delete support

Yeah that's a show stopper. There's just way too many scenarios where's you
_need_ to delete something.

~~~
amelius
For instance if forced by law.

~~~
1ris
Delete is not (or only very poorly) supported in git as well. For almost all
use cases this is correct way.

~~~
skybrian
Perkeep is for single users so the use case to compare with is a private git
repo.

If you're not publishing anything, reverting the last change is easy and a
rebase isn't that hard.

------
skybrian
The bottom of page says last updated in 2013, but the name has been changed
and the latest version does seem to be 0.10. This was previously called
camlistore.

Is it still the case that you can't delete anything? Although rarely needed,
that seems like a showstopper these days. Irreversible actions are bad UI.

~~~
jonbronson
Not to mention a violation of GDPR.

~~~
detaro
Software is not a violation of the GDPR. GDPR means you can not use it for
some things, but given the focus on a _personal_ storage system it's less
relevant.

~~~
jonbronson
Yes software itself is safe. But eventually you'll want this stored online. At
that point, the company hosting your data will be obligated to comply, but by
design, cannot. In that sense, it's worse than simply incompliant. It's
virally incompliant. Any software that uses it will also be affected.

~~~
Aengeuad
The GDPR is effectively irrelevant here unless your goal is to host Perkeep as
a service. Yes, if you upload your own personal database to Dropbox then
Dropbox does still have GDPR obligations to you but those obligations do not
extend to managing your files for you also, as an analogy if you were to
upload a zip file to Dropbox it would not be reasonable to expect them to
remove a file from within that zip file at your request.

------
zestyping
Is there a user guide anywhere?

I'm having trouble finding one. The "Getting Started" page just says "run the
daemon" and not much more. There are pages on how to set the many
configuration options.

What if I just want to use Perkeep, or find out what the experience of using
it is like? Is there a friendly walkthrough or tutorial? Or an introduction to
the concepts one needs to understand as a user, not as a developer?

------
mikepurvis
Looks like a pretty interesting project, and it's been consistently worked on
for seven years, which is definitely something:

[https://github.com/perkeep/perkeep/graphs/code-
frequency](https://github.com/perkeep/perkeep/graphs/code-frequency)

Anyone have a testimonial from the perspective of a user or hacker on it?

------
adrianratnapala
It's really worth thinking about the idea of not having filenames by default.
They give a good example: if you take photos you don't want to name them,
instead you want automatically collected metadata (like creation time) and
some UI for easily searching by that metadata.

So it's basically a correct idea, but I want to know what is needed to make it
work.

I remember the Palm Pilot tried to do this by pretending not to have files,
and having "databases" instead. The result was that the palm-pilot database
just became an obscure, inconvenient file format.

On the other hand, modern big giant internet storage service do a pretty good
job of "freeing" you from filenames, letting you get photos, docs stuff.

On the other, other, hand, there might be something about the _personal_
aspect of perkeep that makes it more like the palm-pilot.

~~~
joshka
The reason for a filename is identity. This might be automatically assigned
based on metadata (e.g. creator+date+index), but it's definitely necessary.

~~~
adrianratnapala
Right, so to be clear by "filename" I did mean something like "filename the
user actually cares about".

Almost any database (including a filesystem) has a primary key, which can be
thought of as a file-name. Filesystems are unusual in that ordinary users
sometimes want to explicitly deal with the records (files) and their keys
(names).

------
jimmy1
There was some discussion earlier about the former Camlistore, and how it
differs from the Upspin project in a couple threads here
([https://news.ycombinator.com/item?id=13700492](https://news.ycombinator.com/item?id=13700492))
but maybe the authors can chime in here and restate what the different
usecases would be between Upspin and Perkeep -- it seems like they are
targeting the same audience: personal users wanting to back up data. The
biggest point of emphasis is that these are _not_ to be used for enterprises,
and using them as such would be an anti-pattern, but curious as how the
breakdown goes after that.

~~~
BlackLotus89
This was answered by bradfitz himself
[https://news.ycombinator.com/item?id=13700968](https://news.ycombinator.com/item?id=13700968)

------
milin
Where does it store the data?

~~~
skybrian
Seems to be local disk or Amazon S3.

[https://perkeep.org/doc/server-config](https://perkeep.org/doc/server-config)

------
Walkman
The thing about files is that they are never going away and they are simple
like a rock. If you want to avoid any type of lock-in ever, just store things
in files.

~~~
chriswarbo
I think you're making a category error: files are an _interface_ , they don't
actually store anything (the underlying filesystem may or may not do that).
Obvious counterexamples to "just store things in files" are /proc on Linux,
pifs ( [https://github.com/philipl/pifs](https://github.com/philipl/pifs) )
and Plan9.

Note that Perkeep provides a FUSE interface, i.e. you _can_ use files.

Being slightly less facetious, it depends on the filesystem. Files can easily
disappear if, say, a disk crashes or there's a network outage.

Those problems can be avoided if we make backups and distribute copies across
several disks and machines, but that gives us a synchronisation problem:

\- If something gets renamed during an outage, how do we know that it was a
rename rather than a brand new file?

\- If we find that two nodes have different content in files with the same
name/path, which one is "correct"?

\- If we don't have much local storage (say, a netbook or a 'phone or a
raspberrypi), how can we take part in the storage?

\- How can we cache things to avoid remotely accessing the same data over and
over?

\- How can we keep data self-contained, i.e. without needing external
metadata/keys/parity info/etc.?

These are hard problems, and Perkeep is a very promising solution to some of
them.

------
kuwze
Past discussion[0].

[0]:
[https://news.ycombinator.com/item?id=15928685](https://news.ycombinator.com/item?id=15928685)

------
rsync
"You are in control of your Perkeep server(s), whether you run your own copy
or use a hosted version."

Can the perkeep server be an SSH/SFTP login ? Or is there a server side
component that would need to be running ?

I've thought in the past about the intersection between (camlistore) and
rsync.net but it's not obvious what that looks like ...

------
sehugg
I've been looking for a system that lets me track replication of
online/offline data, as well as a search tool + format obsolescence report on
files. I once started writing such a thing using Python + SQLite. It's kind of
trickier than it seems.

------
eismcc
This is in the same spirit as some OSS work I did a few years back, to enable
similar scenarios

[https://github.com/briangu/cloudcmd](https://github.com/briangu/cloudcmd)

------
gramakri
This looks like an article from 2013. "Last updated 2013-06-12" is in the
footer

~~~
bovermyer
The date on that page is old, but the source code was last updated only a
couple days ago, and the last release was in May of this year.

------
milin
Hmmm how is this different from Box/Dropbox etc?

~~~
komali2
I don't think you can upload your own Dropbox server, or run dropbox locally.

Furthermore, dropbox uses folder structures, and can only sync folder-by-
folder, and to have one folder synced requires EVERYTHING in that folder being
synced.

There are many other differences that are listed on the article.

