
Hashpipe – Pipe iff the hash matches - _prometheus
http://jbenet.github.io/hashpipe/
======
mpeg
I think this is a topic worth raising, I spend so much time auditing the
"magic" in open source projects to try and find out how they work behind the
scenes, and whether they are doing anything to protect me.

For example:

Pip (python package manager) tells you to run a "secure" script off the
internet [0], where the only real protection is being behind SSL. But, surely
I can just check the code of that script? Oh wait, it's embedding a zip file
inside the python script, so I need to manually unzip it and look into it
before I can start to consider it safe.

I like what CoreOS does, they don't even provide SSL download links for their
isos, it's all plain text http, so they stress that you need to verify the
image signature against their GPG signing key which is distributed via github.

[0] [https://bootstrap.pypa.io/get-pip.py](https://bootstrap.pypa.io/get-
pip.py)

~~~
hueving
>so they stress that you need to verify the image signature against their GPG
signing key which is distributed via github.

If someone has the ability to MiTM HTTPS connections, they could just as
easily MiTM the victim's connection to github to return a different GPG
signing key. Also, the protection requires active checking rather than coming
'for free' like it does by downloading via HTTPS, so 95%+ of the users won't
bother and won't have any protection.

I think CoreOS took a big step backwards here. Just offer the download links
via HTTPS in addition to the GPG key. That way people that don't bother with
GPG have protection.

~~~
mpeg
SSL does nothing to guarantee the integrity of code you download and run off
the internet.

I'm not worried about MITM attacks, SSL provides adequate protection against
those in most cases; but if someone gains access to the coreos systems SSL
would not help.

In this case, I have a GPG key saved for the coreos image signing, which
hopefully is done offline and I got that through a third party which I have a
certain amount of trust on (github), and I can always verify it on other
channels.

~~~
hueving
But did you verify it on other channels? Or did you do like 99.9% of users and
assume nobody would be attacking you? My point is mainly that gpg is such a
usability nightmare that it's effectively a broken security model.

------
pritambaral
I think this one isn't too far from useability:

curl -o filename url && sha256sum -c <(echo "PRECOMPUTEDHASH filename") && sh
filename

And works without requiring installation of third-party tools too!

~~~
comex
This is non-portable, as OS X has no sha256sum out of the box. But it does
have shasum from Perl, and Linux distros typically come with Perl, so 'shasum
-ba256' should work...

I wonder if there is any _concise_ way to do this without needing to save a
file to disk. I can't think of one, as it requires splitting the input in two,
which bash can do with >(command) but not sequentially or with the ability to
communicate from the subshell to the outer one.

~~~
tbrownaw
You need to buffer the entire input _somewhere_.

You don't get your hash without consuming the entire input.

You can't start feeding the input to the shell, until you verify the hash.

You don't want to download it twice, it case it comes back different the
second time (or in case your network is slow).

At the point that you are verifying the hash, the entire file must exist on
your system _somewhere_. Disk, memory, OCR-friendly printout, whatever.

Buffering to memory can only work if the input is "small", whatever that
means. And pipelines aren't meant to do this, so you'll have to do something a
bit odd (ie, confusing to anyone trying to understand your code) to make it
work.

So, buffer to disk.

.

Or, use Perl. Perl solves everything.

~~~
e12e
> Buffering to memory can only work if the input is "small", whatever that
> means.

Well, increasingly it means "up to 8GB or more". Or put another way, RAM is
increasing faster than network speed (or the speed of light for that
matter...). So, I'd say that, yes, for many things you'd want to download from
the internet, caching in RAM is absolutely an option?

------
dcposch
I don't see how this gives additional security. When you run

    
    
       curl https://project.com/script.sh | sh
    

...you're relying on three things:

1\. That the people running the project are trustworthy

2\. That the server hasn't been compromised

3\. That the CA system will ensure you're talking to the correct server

(I can think of recent news stories where each of those were violated.)

If, instead, you go to `[https://project.com`](https://project.com`), read the
instructions, and paste in the following command...

    
    
       curl https://project.com/script.sh | hashpipe <somehash> | sh
    

..then you're relying on those same three things! Someone who wants to serve a
modified version of `script.sh` just has to serve modified instructions as
well. You also have a new requirement: you have to get a trusted install of
hashpipe first.

~~~
vincentclair
You add an extra layer of security, if you receive the hash over a different
channel, such as encrypted messaging.

That means, that even if the CA system is broken, you only execute the
intended contents.

~~~
ZoFreX
> That means, that even if the CA system is broken, you only execute the
> intended contents.

No it doesn't - how are you installing hashpipe? ;)

------
vortico
Deployment one-liners with hashpipe will only work if hashpipe is installed,
which would be equally difficult for users to install properly than the
software itself. Then you'd need something like this:

> Simply install using `curl hashpi.pe | bash`

~~~
RegEx
This is a really, really interesting point!

In addition to your point, (and perhaps this point has already been made
elsewhere in the thread), but the type of people that would install hashpipe
would be the kind of people to avoid the practice entirely.

Perhaps there's some mechanism for convincing people to use the tool, and if
that mechanism ends up being easier than convincing people to stop piping
directly to bash, it sounds like it would be worth pursuing :)

~~~
_prometheus
yeah, this is just a stopgap for me. in the end i want:

    
    
      ipfs daemon --mount &
      sh /ipfs/QmTpnQL97XEHmyt54mgEwf5BN8gJWvw4sGgwQzjqtBwLX6
    

(this actually works, btw :)

~~~
e12e
Thank's for pimpong ipfs -- I hadn't heard of it before. Initial discussion on
hn:

[https://news.ycombinator.com/item?id=8069836](https://news.ycombinator.com/item?id=8069836)

github(?): [https://github.com/ipfs/go-ipfs](https://github.com/ipfs/go-ipfs)

Recent re-submission of new(?) home page:
[https://news.ycombinator.com/item?id=9321209](https://news.ycombinator.com/item?id=9321209)

------
quicksnap
Since the main use case for this utility is verifying network shell scripts,
it would be interesting to see a query param convention, so we could use a
tool such as:

> hashcurl [http://load.this/script?hash=](http://load.this/script?hash=)
> QmUJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8 | sh

~~~
gojomo
When the just-over-the-next-hilltop promised-land nirvana of content-centric
networking arrives, the hash will be enough to locate & download the content –
so you shouldn't even need an URL:

    
    
      $ hashcurl QmUJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8 | sh
    

Maybe it's even a special filesystem path, that contains (but does not list)
everything-that's-nameable-and-findable:

    
    
      $ sh /everything/QmUJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8
    

(BTW, personally not a fan of the opaque 'multihash' format, which obscures
the algorithm-in-use to save a few characters.)

~~~
_prometheus
this already works. install ipfs:
[http://ipfs.io/docs/install](http://ipfs.io/docs/install) then:

    
    
        ipfs init
        ipfs daemon &
        sleep 20  # sorry this will go away
        ipfs mount
        sh /ipfs/QmTpnQL97XEHmyt54mgEwf5BN8gJWvw4sGgwQzjqtBwLX6
    

you can see it on the web at:
[http://gateway.ipfs.io/ipfs/QmTpnQL97XEHmyt54mgEwf5BN8gJWvw4...](http://gateway.ipfs.io/ipfs/QmTpnQL97XEHmyt54mgEwf5BN8gJWvw4sGgwQzjqtBwLX6)

~~~
gojomo
That's great! I hope that IPFS, or something of its ilk, can be the promised-
land to which I alluded.

If I were to install it, how hard/breaking would it be to change the access-
path to something more grandiosely descriptive like '/everything/'?

~~~
whyrusleeping
its a simple matter of changing the ipfs config file

~~~
_prometheus
<3

------
tlrobinson
Keep in mind whatever script you run through this could download additional
unverified code.

~~~
_prometheus
Great point!! -- \o read this everyone o/ \--

Please please dont use hashpipe thinking you'll be super safe about
everything. It only raises the bar a bit! It solves my biggest gripe with most
`curl <url> | sh` things, which is that any MITM can own my machines without
compromising the origin http servers.

(of course, if the HTTP server + page you got the checksum from is owned too--
good luck!).

------
aablkn
You might want to `set -o pipefail` on your bash because a failing process
does not stop things from getting piped:

`echo OK | false | echo OK2` --> this command returns zero exit code even
though false does return non-zero exit code. If you do `set -o pipefail`
entire pipe will fail with non-zero exit code (of `false`).

~~~
_prometheus
great point. if you have a good idea of how hashpipe should do it better pls
comment at:
[https://github.com/jbenet/hashpipe/issues](https://github.com/jbenet/hashpipe/issues)

at least hashpipe stops the output, so this would not run the bad code.
hashpipe would stop it:

    
    
      cat evil | hashpipe <some-other-hash> | sh
    

without pipefail, it would still exit 0, but at least the evil would be
contained.

------
tyho
Be warned that this reads everything until EOF into memory, doing something
like:

    
    
        hashpipe QmUJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8 < large_file
    

will produce unexpected behaviour.

~~~
_prometheus
yeah, hashes are decided by every single bit including the last one, and not a
single bit should be output until the hash matches. It currently buffers
everything in memory, but might do this:
[https://github.com/jbenet/hashpipe/issues/1](https://github.com/jbenet/hashpipe/issues/1)

(some settings dont have disk though).

hashpipe is intended for most executable use cases (usually under <50MB)

~~~
doomrobo
Loading everything into memory at once shouldn't be necessary to produce a
hash of the entire input. All of the hash functions currently supported allow
for incremental hashing. That means you can hash in blocks instead of all at
once.

~~~
unsoundInput
The input still needs to be cached for eventual output in case the hash
matches I assume.

~~~
stouset
More importantly, so nothing is output if the hash _doesn 't_ match.

------
jrlocke
A polarizing name, if perhaps less so amongst programmers

~~~
_prometheus
yeah, i was going to call it `hashcat` -- maybe i should do that. or
`pipehash`.

~~~
nibbler
already have a hashcat purring on my computer...

[https://hashcat.net/oclhashcat/](https://hashcat.net/oclhashcat/)

~~~
_prometheus
all the good names are taken. i guess it's time we give up on not clashing.

------
bfg
If the code url and hash are provided at the same place, why wouldn't an
attacker just MitM that and switch them both? What does this add?

Also anyone who is not serving code over https should fix that immediately.

------
moe
I would say the author now morally owes the world an equally convenient PGP-
signer/verifier.

~~~
_prometheus
you're right
[https://github.com/jbenet/hashpipe/issues/3](https://github.com/jbenet/hashpipe/issues/3)

------
jakeogh
Cool. The use case that comes to mind is knowing you are going to get the file
you think it is if you are looking it up on some local or remote hash indexed
storage scheme.

It would be nice if we had (it prob exists already) a standard block based
scheme so files larger than the RAM could be handled without writing to disk.
Making something like a .torrent for every file and using that as the per
block checksum and then checking the hash of the concatenated checksums could
do it. Right? I'm sure there is a real name for that but idk it.

Really this seems like something that the FS should be asked to do. ZFS seems
to be close, but I haven't found out how to address a block by it's hash yet.

------
wyc
Great, thank you for this software! Any idea if there are trusted 3rd party
hashes for popular install scripts?

If a website wants you to:

curl "[http://sketchyurl.com/script.sh"](http://sketchyurl.com/script.sh") |
hashpipe PRECOMPUTEDHASH | sh

it might be even worse, giving you only the facade of additional security.

~~~
_prometheus
That's a bigger problem, because we essentially need full PKI. My preferred
solution is via [http://ipfs.io](http://ipfs.io) \-- but i may be a bit biased
:)

I think we can get halfway there with a "signed hash" construction, but yeah--
PKI...

------
latchkey
I love this idea, as it has bothered me for a while now that homebrew initial
install just points to a script on github.

That said, now we just need someone to hack gobuilder.me and replace the
binary distribution of hashpipe with something that always returns the input.

~~~
_prometheus
Great point!! working towards fixing that (the general problem of safe
execution of binaries). signed releases will help.

I will release as self-describing signed package for this soon.

------
nawitus
How does it protect from man-in-the-middle attacks? The man in the middle can
simply replace the hash. If there's an additional communication channel to
provide the hash you could simply provide the whole command with it.

~~~
_prometheus
not everything is hosted in the same place. most binaries use CDNs different
from the websites. e.g. github + s3.

(see my other comments on this page-- hashpipe is only meant to raise the bar
a little)

------
humanarity
This is cool. What was your thinking about potentially hashing whole commands
(and even their executables) in addition to their parameters? You could be the
hash (with as you say PKI) gateway for all execution... :)

------
LukeHoersten
I implemented multihash in Haskell:
[http://hackage.haskell.org/package/multihash](http://hackage.haskell.org/package/multihash)

------
ForHackernews
Maybe OSX developers should just stop asking people to pipe random scripts
into their shell. This is a nice idea with a clever name, but it's kind of a
false sense of security.

------
jamiesonbecker
Hopefully people will only ever pipe from a HTTPS (PLEASE!), but this is still
a great idea and a big step forward!

------
cheald
I am rather amused that the author provides precompiled binaries without hash
checksums. Heh.

------
languagehacker
Getting this in before the gratuitous negativity brigade starts hammering
down:

This an implementation of a stupid joke looking for a problem. If someone
actually needed this in their toolbelt they could use perl or even (amazingly)
bash to handle this problem.

I would personally not name anything I worked on after a song from one of
Weezer's shittiest albums, but that's just me.

~~~
_prometheus
many execution contexts don't have perl. i have some <10MB VMs in mind.

