
I made an app that lets you split a file into horcruxes - jesseduffield
https://github.com/jesseduffield/horcrux-ui
======
sudhirj
One related scheme is fountain codes, where you can split a file into a pseudo
infinite stream of blocks such that finding N blocks will almost certainly
allow reconstruction. Very useful in UDP / satellite transmission, where you
can keep broadcasting these blocks and clients can listen at their
convenience.

The state of the art codec is RaptorQ, I’ve got a Go library that uses the
slightly older Raptor standard to do chunking

[https://github.com/sudhirj/pump](https://github.com/sudhirj/pump)

~~~
jodrellblank
What happens if you can't get N blocks, but say N-1 or N-2? Can you get a
partial reconstruction, or nothing at all?

~~~
pkulak
It's actually probabilistic, though the probabilities very quickly approach 0
and 1 on each side.

But to answer your question, yes, you could recover some of the file if you
didn't get as many pieces as you needed. Been a year or so since I did any
work with fountain codes, but I believe most implementations send all the
chunks, followed by n error correction chunks, so it would depend on how many
real chunks you got. The error corrections wouldn't get you anywhere though.

~~~
toolslive
online codes [0] just generate random linear equations of the source chunks
and send both generated right hand side and the seed that generated the
equation. This way there's no state to keep (now, which equations did I send
?) Most generated equations are of degree 2 (so an equation with 2 ones) some
(about 1% iirc) are of degree 1 (so pure data). Having less than the critical
mass of equations will give you partial reconstruction. But all bets are off
regarding how much you will get. They were used in storage by Amplidata in
their Amplistor at some point.

[0]
[https://en.wikipedia.org/wiki/Online_codes](https://en.wikipedia.org/wiki/Online_codes)

------
widforss
I was almost posting a snarky comment about how this is just Shamir's Secret
Sharing, which is in no way new.

But, hey, this is really cool. It was probably really fun to write, and
luminates a cool scheme that too few know anything about.

~~~
ur-whale
Turns out the only other shamir secret sharing app I know of that's actually
usable (ssss or somesuch name) only does the deed for _keys_ , while this does
it for _files_.

This is actually pretty useful, but sort of sad it drags the whole go
ecosystem with it (who knows what go will look like in 20 years and if this
app will still compile and work).

~~~
amelius
> This is actually pretty useful, but sort of sad it drags the whole go
> ecosystem with it (who knows what go will look like in 20 years and if this
> app will still compile and work).

This is something which the CS community needs to solve, imho. I.e. provide an
executable language with a formal specification that is guaranteed to be
available indefinitely. And to make this more useful, other programming
languages should provide back-ends targeting this language.

~~~
WorldMaker
For better and worse, isn't that what WASM is/has become/will become?

------
k_sze
1\. Write your will;

2\. Split it into N + 1 horcruxes and distribute them to your N children; and
a remaining piece to a lawyer;

3\. Force them to all come together to decrypt the will for fairness.

~~~
SoylentOrange
Then what happens if one child who knows they won’t get anything, refuses to
provide their piece?

BTW K of N cryptography is well known as Shamir’s Secret Sharing.

[https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing](https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing)

~~~
hyko
Even worse, what if one or more children lose their bits and _can’t_ provide
their piece?

~~~
acid__
Have each child make horcuxes of their horcrux :)

------
tyingq
Interesting. Feels like you could accomplish the same effect with regular
encryption though.

Base64 encode the key, pad it with random data that matches the size of
splitting into N-1 parts. Then split the encrypted file into N-1 b64 encoded
parts. For lowish values of 'N', you could then just decrypt with each "key"
until something readable emerges. The key size, algo, etc, could be prepended
to each part in plaintext.

Or, if you want a variation where no parts are optional, a piece of the key in
every split part, with a sufficiently long key.

~~~
ur-whale
Does your scheme support M-of-N recovery?

~~~
tyingq
No, that's a good point. I was misled by some of the comments that assumed the
posted scheme didn't either. However, I'm assuming _" all parts present"_ is
one of many desired use cases.

------
nanomonkey
See also Dark Crystal[[https://darkcrystal.pw](https://darkcrystal.pw)], which
uses Shamir's Secret Sharing to break your secret into "shards", allowing you
to also set a number of friends that are needed to recreate the secret (less
than the total number of shards). Sharing is done over your social network
(currently Briar and Secure Scuttlebutt).

------
nikeee
There is also the CLI tool ssss that uses shamir secret sharing to split data:

[https://linux.die.net/man/1/ssss](https://linux.die.net/man/1/ssss)

~~~
filoeleven
Huh, so the predecessor to Horcrux is also an oblique Harry Potter reference,
in that it is named in Parceltongue.

~~~
myself248
Boooo. Hiss.

------
hirundo
> Q) This isn't really in line with how horcruxes work in the harry potter
> universe!

> A) It's pretty close! You can't allow any one horcrux to be used to
> resurrect the original file (and why would you that would be useless) but
> you can allow two horcruxes to do it (so only off by one). Checkmate HP
> fans.

Not buying it, and the fact that this is the first FAQ is evidence that the
author doesn't really either. A better fit to Tom Riddle's horcrux would
simply be a lossy compression copy of the file. Which would admittedly be
pretty useless, maybe unless the copy contains a lossy copy of your soul.

But then Virgin Galactic is also a pretty good name even though they haven't
yet left the solar system. That should be his defense: it's just a cool name.

~~~
chrismorgan
Also I don’t _think_ this program requires the murder of an innocent to create
each horcrux.

~~~
saagarjha
Depending on your definition of “innocent”, neither does a Horcrux, really.

------
Tomino
Very cool! I built something similar for patent application about 5 years ago
for proximity image encryption. Idea was that image was split into X number of
encrypted pieces each still being a valid image (disorted or something
custom). If you wanted to see the image again, you had to be in close
proximity to other parties that have these parts. BLE beacon served as the
proximity for the prototype.

------
sneeuwpopsneeuw
A friend of mine has made something like this for a blockchain hackathon once,
around 2 years ago. The technics he used where relatively simple. It stats
with some Elliptic-curve cryptography math to split up a single main key into
multiple keys. Every person would than have a full copy of the encrypted files
and when enough people combine there keys on the blockchain they would get the
main key to decrypt the files and from then on it would be public that the
files have been opened and by who.

This app seams to use Shamir's Secret Sharing, this is something where I am
not familiar with, but from how far I understand the Wikipedia article about
it. it works roughly the same but it is more general.

I'm interested to see if people will actually use this. If anyone has some
additional explanations about the differences between these algorithms then
that would be very appreciated.

------
compsciphd
what is the difference between reed solomon erasure codes and shamir secret
sharing?

i.e. if I just split the secret into a number of reed solomon error correcting
blocks (where n blocks are sufficient to recover the full data), is that
fundamentally different?

~~~
OskarS
Yes, it's very different: if you split some error coded thing across several
files, each individual file is going to give you some chunk of the data. It's
not secret in any way, it's just that you've only got, like, 1/7th of it.

With Shamir's secret sharing, having anything less than the required number of
files is useless, you can't decrypt _any_ of the data unless you reach the
required number.

~~~
compsciphd
hmm, googling around after my Q found this

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.2829&rep=rep1&type=pdf)

where the authors explicitly write: "Shamir's scheme for sharing secrets is
closely related to Reed-Solomon coding schemes."

now, my eyes glaze over when it comes to the math, so I'll need to look it
through a few times before I understand what they are saying.

~~~
lowercase1
You could also look at
[http://web.eecs.utk.edu/~plank/plank/papers/FAST-2011.pdf](http://web.eecs.utk.edu/~plank/plank/papers/FAST-2011.pdf)

You can combine the reconstruction properties of Reed Solomon (you need k of n
pieces) with the All or Nothing Transform. Encrypting data so that you need
all of the data to decrypt.

Then you can essentially do Shamir secret sharing witout storage overhead. A 1
MB file split so you need 5 pieces would have 200 KB pieces instead of 1mb
pieces. This is at a cost of exchanging information theoretic security for
computational security.

------
Multicomp
so is this an alternative to multipar / quickpar / par2 from the usenet days?

Looks good. I always try to build redundancy into my offline backups, as if
the given backup in my hand is the last backup that hasn't been cooked /
melted / flooded etc. ...because one day, it just might! Talking about worst-
worst-case scenario, with triple redundancy of online-offsite (can be
ransomwared), offline-onsite (can be flooded/burned), and online-onsite (1st
line of defense, ie syncthing or a nas)

~~~
mnw21cam
Par2 is current, still receiving bugfixes, and is used every day as part of my
file backup system.

~~~
alternatetwo
rar files having a recovery record is also really helpful for archiving files.
I add a 3% redundancy to any archive I create with it because I've had some
byte errors in the past.

------
verroq
How efficient are these splits? If a 100mb file is split in 5, with 3 needed
to recombine, we’d expect the pieces to be at least 333mb won’t we?

~~~
dj_mc_merlin
Each share is (roughly) the same size as the original. In effect, all of them
are just encrypted versions of the original.

~~~
ur-whale
Ah, this is slightly disappointing.

For some reason, my gut was telling me each piece would be smaller than the
original.

I wonder if this (horcruxes of size ~ 1/N) is actually possible.

~~~
wjn0
Without some form of compression, I don't think it is.

In particular, consider your example (5 horcruxes with 3 needed to
reconstruct). View the original file as the interval (0, N) and view it as a
set covering problem. If each horcrux covers an interval of size N/3, then if
any pair overlaps, there is no third horcrux that can complete the covering.
This is a contradiction because 5 horcruxes of size N/3 must overlap
somewhere.

------
r0rshrk
Dropbox does something similar for cold storage:
[https://dropbox.tech/infrastructure/how-we-optimized-
magic-p...](https://dropbox.tech/infrastructure/how-we-optimized-magic-pocket-
for-cold-storage)

------
postit
I love when the universe throws my own ideas back at me but with a slightly
better implementation

The backstory on this was me freaking out when I had a newborn coming and I
wanted my legacy to be handed to him at the right age if something happened to
me.

------
enimodas
Recently similar:
[https://news.ycombinator.com/item?id=23541949](https://news.ycombinator.com/item?id=23541949)

------
bokwoon
heh, you seem to have messed up the bracket order for the markdown links. I
memorise it as the mnemonic "square bracket": first the square [], then the
bracket ().

~~~
hinkley
[I have claimed something](here is my proof)

------
dougmwne
It would be a fun modification to require 6/7 or 5/7 files so that you needed
to bring a certain number of pieces, but not every piece. Inspired by RAID 5
algorithm that has enough parity to allow one drive failure in a group of 3 or
more.

~~~
petters
It looks like this is exactly what the linked software does?

~~~
dougmwne
Right you are, it does support thresholds.

