
Distributing NixOS with IPFS - robto
https://sourcediver.org/blog/2017/01/18/distributing-nixos-with-ipfs-part-1/
======
chriswarbo
I've been following these github issues for a while; fetching sources from
IPFS seems like a great step forward for resiliency in general, and quite a
natural one for Nix considering things are already immutable. Using IPFS as a
binary cache is nice, as it would lower the maintainers' burden and make out-
of-tree experimentation easier, i.e. without damaging the integrity of nixpkgs
and cache.nixos.org.

I hadn't even thought about using the FUSE integration of IPFS, but it makes a
lot of sense. Nix is a lazy language, and the nixpkgs repository basically
defines one big value: a set of name/value pairs for every package it contains
(as well various libraries for e.g. working with Python packages, Haskell
packages, etc.). The only difference between installed/uninstalled packages is
whether anything's forced the contents to be evaluated yet.

Likewise, an IPFS FUSE mount conceptually contains the whole of IPFS. The only
difference between downloaded/undownloaded files is whether anything's forced
the contents to be evaluated yet.

~~~
ris
This article doesn't mention the most significant fallout of the IPFS idea
(imo), which is that of .nar deduplication, as detailed in the issue
[https://github.com/NixOS/nix/issues/859](https://github.com/NixOS/nix/issues/859)
(point 4).

Perhaps a nail in the coffin of one of Nix's biggest absurdities.

------
cjbprime
Very cool.

One benefit of schemes like this that people don't talk about much is that, by
no longer downloading from an expected place, you're removing the possibility
for a compromised developer or server operator to selectively serve up malware
to a targeted user. Instead you're getting the file over bittorrent and
checking its hash, and you could gossip with other bittorrent clients to
confirm that everyone's trying to get the same hash.

Compare with the state of the art in most software updates, which is that you
connect to some download server and it could serve signed malware to people on
its target list and probably no-one would notice.

(Schemes that use some of these techniques to take out the single point of
malware-insertion have been called "Binary Transparency" schemes, as an analog
to Certificate Transparency.)

~~~
m3ta
Sorry, I don't understand. If the problem is a hypothetical situation where a
compromised developer uploads malicious code, then how does IPFS relieve any
pressure from that circumstance?

Individual IPFS nodes are certainly blindly trusting the developer's signature
as a stamp of approval. Adding more nodes doesn't make that problem better. It
makes it worse by providing a greater false sense of security.

In the case of a compromised server operator, as long as hosting company X is
smaller than Amazon, it's always better to use Amazon's cloud service to
mitigate the possibility of server operator tampering.

~~~
icebraining
I think the point is that you can't serve a malicious version to a specific
user, you must serve it to everyone, which makes it much easier to detect.

------
Ericson2314
I want to ditch the Nar format as soon as possible. IPFS's unixfs format is
too rich however.

When will the IPFS people finish up
[https://github.com/ipld/cid](https://github.com/ipld/cid) so we can link
whatever content addressable data we want?

I'd use git tree objects, despite SHA-1, because it's widely supported. Or do
a format identical tree objects but with the IPFS's multihash and SHA-1
banned.

Point is, underlying protocol should be agnostic to hashing scheme, we should
have a trait/type class like

    
    
      /// Node in try
      trait Payload {
        type Hash: HashingTrait;
    
        fn unpack(Payload) -> (Vec<u8>, Set<Hash>);
        fn pack(Vec<u8>, Set<Hash>) -> Payload;
    
        // Implement either and get the other for free!
        fn hash_packed(p: Payload) -> Hash { hash_unpacked(packed(p))
        fn hash_unpacked(p: (Vec<u8>, Set<Hash>)) -> Hash { hash_packed(packed(p)) }
      }
    
    

any `(Hash, Payload)` than can define a `(binary blob, Set<Hash>) -> Hash` and
Payload function should work.

~~~
whyrusleeping
Hey! IPFS Dev here.

The cid stuff has been implemented and initial support for it is being landed
in our 0.4.5 release (which will be soon, hopefully release candidate within a
week).

With that and IPLD, you can craft arbitrary objects in JSON or CBOR (theres a
1 to 1 mapping, objects are stored as cbor) and work with them in ipfs. For
example, i could make an object that looks like:

    
    
      {
        "Contents": {"/":"QmHashOfPkgContents"},
        "Compression": "bzip2",
        "NarSize": 12345,
        "References": {
          "foo": {"/": "QmHashOfFoo"},
          "bar": {"/": "QmHashOfBar"}
        },
        "Signature": "signature info, or a link to the signature",
      }
    

(please excuse my attempt at recreating a nar file in rough json).

This object could then be put into ipfs with:

    
    
      cat thing.json | ipfs dag put
    

And you would get an ipld object that you can move around in ipfs, and do fun
things like:

    
    
      ipfs get <thatobjhash>/Contents
    

to download the package contents, or:

    
    
      ipfs get <thatobjhash>/References/foo
    

to get the referenced package (or open that hash/path in an ipfs gateway to
browse the package graph for free in your browser :) )

~~~
Ericson2314
IPLD does allow storing tons of data, but custom schemas allow _restricting_
the data referenced in arbitrary ways.

IPLD, last I checked, supports relative paths (which can make certain cycles),
and not every node child gets its own hash. This is too much flexibility for
my purposes (Nix or otherwise).

Also, when interfacing with legacy systems like git repos, one needs to
dereference a legacy hash without knowing what it points to, which is easiest
done with custom schemas.

Now, granted, customs schemas aren't a super fine-grained solution as every
node in the network that cares about the data needs to implement the schema,
but they are useful tool for these reasons (and that downside doesn't apply to
private networks).

~~~
lgierth
Also see the multicodec table for codes of ethereum/bitcoin/zcash/stellar

[https://github.com/multiformats/multicodec/blob/2725f3c5cd7b...](https://github.com/multiformats/multicodec/blob/2725f3c5cd7b7b192ee90c6c1a5b90c76024eb81/table.csv#L165-L182)

~~~
Ericson2314
Ok, so it's good we can finally refer to other node types. But I worry about
putting all that in a single namespace. The IPLD node types constitute
different hashing strategies as I describe above, but stuff like media codecs
are _orthogonal_ to hashing strategies---media of various sorts given a
hashing strategy will be treated as black-box binary data for the foreseeable
future.

The big takeaway here is a _really_ like the idea of IPFS, and want to be a
full fan, but everywhere I look I see dubious interfaces. I see what already
looks like legacy cruft, and they haven't even hit 1.0!

------
twoodfin
I've felt for a while that a standard, widely-implemented, distributed
content-addressable store is one of the biggest missing pieces of the modern
internet. Glad to see any steps in that direction.

I'll know real progress has been made when my browser can resolve something
like:

cas://sha256/2d66257e9d2cda5c08e850a947caacbc0177411f379f986dd9a8abc653ce5a8e

~~~
lucian1900
That's exactly what WWW is, though. Your browser knows how to resolve a domain
with DNS and fetch over HTTP file at a certain path.

~~~
ashark
It's the difference between "get whatever this is" and "get _this_ , wherever
it is".

------
civodul
Nice project! Guix had a GSoC student working on binary distribution using
GNUnet's file sharing component a while back:
[https://gnu.org/s/guix/news/gsoc-
update.html](https://gnu.org/s/guix/news/gsoc-update.html) . That has not led
(yet?) to production code, but there might be ideas worth sharing.

------
k__
It's almost ridiculous how good the two fit together.

I had the feeling NixOS has a bit of a hard time get users and prove that it's
a superior solution to ansible/docker/chef/etc. probably because of it's
mediocre UX, haha.

But this would add another killer feature to it.

------
vog
Very interesting development. It would be great to see NixOS as an early
adopter for IPFS.

BTW, there is a small typo:

    
    
        IPFS is aims to create the distributed net.
    

It should be:

    
    
        IPFS aims to create the distributed net.

~~~
sly010
This is a great idea. A lot of businesses heavily rely on old versions of open
source packages always being available. The one time someone deprecated an npm
package, half of the nodejs stacks went with it.

edit: Didn't mean to hit reply. Sorry.

~~~
notheguyouthink
It doesn't really solve that use case though, does it?

Eg, IPFS isn't permanent hosting - it's purely hosting as long as there are
seeds, like bittorrent. Hypothetically if a package is very old there may be
no seeds for it anywhere. Someone (NIXOS/etc) will still have to pay for
hosting.

~~~
IanCal
It does provide a solution though. If someone is still hosting it, you can
access it in the same way. Nobody can pull, independently, a specific version
down. They could remove their copy and hope nobody else is also providing it,
but if it was a big problem other people could quickly start providing it and
nobody else would notice the difference.

Perhaps an equivalent thing for "someone will still have to pay for hosting"
is that although that's true, anyone can put money into the pot to keep it
going or bring it back, it's not reliant on the _original_ creator to keep
paying for it.

------
citrusui
I'm really excited to see what the future holds for IPFS! However, hosting
websites with custom domains is not quite feasible yet. Using IPFS' DNS (IPNS)
means you have to keep the IPFS daemon running constantly, or else the files
will be purged within an hour.

~~~
diggan
> However, hosting websites with custom domains is not quite feasible yet

Sure, point your domain to your IPFS hash and use dnslink, it's quite reliable
already actually. That's how ipfs.io is hosted for example, and we haven't hit
any issues so far.

> Using IPFS' DNS (IPNS) means you have to keep the IPFS daemon running
> constantly, or else the files will be purged within an hour

So, the files won't be purged, but the record you push out with IPNS won't be
valid after 24 hours. You can solve this easily by using /ipfs/:hash instead
of /ipns/:id and it wont disappear.

~~~
citrusui
Yet again I failed to include important details... :p

Yeah, it certainly is possible to host static sites on IPFS -- I have been
testing it on my site[0] just for kicks. However, since I really enjoy using
my domain name, rather than "ipfs.io/ip{f|n}s/$hash", I'm reluctant to try
adapting my site to IPFS. I am aware that this is an alpha product, though,
and I can't stress enough how cool it is for files containing the same data to
be given the same ID (hash). That way you don't have to run shasum ever again
:D

[0]: [https://ipfs.citrusui.me](https://ipfs.citrusui.me)

(on a random note: what's up with /blog returning the IPFS blog, instead of my
own content? did i misconfigure something?)

~~~
lgierth
The HTTP gateway included in go-ipfs will try to use a Host header on requests
for constructing an /ipns path. So e.g.
[http://ipfs.io/docs](http://ipfs.io/docs) gets turned into /ipns/ipfs.io/docs
-- this is actually how we host all webpages within the ipfs project.

edit: And the part which turns /ipns/ipfs.io into an /ipfs path is called
dnslink: [https://github.com/ipfs/go-dnslink](https://github.com/ipfs/go-
dnslink) \-- it resolves to what's in TXT _dnslink.ipfs.io.

> (on a random note: what's up with /blog returning the IPFS blog, instead of
> my own content? did i misconfigure something?)

I'm so sorry, that's a bug I put into the nginx configuration -- will fix it!

------
drdre2001
This is a really great idea! Reminds me of other projects that are working on
integrating IPFS with the Operating System:
[https://github.com/vtomole/IPOS](https://github.com/vtomole/IPOS)

------
rkeene2
Good to see other people are inventing AppFS (
[http://appfs.rkeene.org/](http://appfs.rkeene.org/) ) :-)

~~~
HurrdurrHodor
Please don't use SHA-1. It's almost broken.

~~~
lgierth
Use multihashes for hash algorithm agility :)
[https://github.com/multiformats/multihash](https://github.com/multiformats/multihash)

~~~
rkeene2
Multiple hashing algorithms is already built-in and mandatory, everywhere a
hashing operation is used the hashing algorithm must also be specified.

------
matthewbauer
Stage 2 seems problematic at least the way I see it. Most users have at least
a thousand derivations- is it possible to fuse mount each one?

Also: I think some people are unaware that Nix hashes are not content
addressable. The best solution (which OP is proposing) is probably to use the
.nar hashes in IPFS which is content addressable.

------
anonbanker
Someone should do something similar with Gentoo's portage, because the
potential of IPFS could lead to amazing things, like verified pre-compiled
-march=native builds for every architechture Gentoo supports.

~~~
taktoa
For a while, I have interested in the idea of modifying the NixOS stdenv
(standard build environment) to use a compiler that emits LLVM bitcode, and
then having a function that takes any derivation to an equivalent derivation
containing the result of running the LLVM IR through the specializer for your
architecture. This would mean that you can share a binary cache with others,
but still get `-march=native` performance. There's also some pretty
interesting ideas along these lines wrt. randomly permuting instructions to
prevent ROP attacks (you could even implement that as yet another package ->
package function, so that you don't have to do the _full_ set of LLVM
optimizations for every package at boot time).

