There are some bits (permanodes and claims) for adding metadata to objects (filename, timestamp, geo location and other attributes, I think even arbitrary jsons) and for authentication/sharing. A few really cool bits around modularity: blob servers can be composed over network - you can transparently split your blob storage over multiple machines, databases, cloud services, set up replication, maybe encryption (unclear to me if it works or not).
Importing data from different services is not really its core competency, at least not yet. It can ingest anything you can put on your file system and there are importers for a few third-party services (see https://github.com/perkeep/perkeep/tree/master/pkg/importer
), but that's about it
One thing that I'm still trying to figure out is, if you do happen to know: how does it handle data deduplication (if at all)? How about redundancy and backups? I've been glancing over the docs and I do see mention of replication to another Perkeep instance but that's not quite what I'm looking for.
Then there is also some logic to chunk large objects into small pieces or "blobs". These small chunks are actually what the storage layer works with, rather than with the original unlimited-length blobs that the user uploaded. Chunking helps to space-efficiently store multiple versions of same large file (say, a large VM image) - the system only needs to store the set of unique chunks, which can be much smaller than N full but slightly-different copies of the same file. But I personally I find that it deteriorates its performance to the point of making it unusable for my use case of multi-TB multi-million-files storage of immutable media files. If chunking/snapshotting/versioning is important for your use case, I'd look more towards backup-flavored tools like restic, which share many of these storage ideas with Perkeep.
Redundancy and backup is handled by configuring storage layer ("blobserver") to do it. Perkeep's blobservers are composable - you can have leaf servers storing your blobs, say, directly in a local filesystem directory, remote server over sftp, or an S3 bucket, and you can compose them using special virtual blobserver implementations into bigger and more powerful systems. One such virtual blobserver is https://github.com/perkeep/perkeep/blob/master/pkg/blobserve... - which takes addresses of 2+ other blobservers and replicates your reads and writes to them.
You give it the addresses of source and destination blobservers, it enumerates blobs in both, and copies the
source blobs missing from destination into the destination server.