There's a long tradition of using this kind of approach in capability systems. If you do it right, you can have globally unique identifiers that are not human readable (probably, some sort of public key which is also routable), and then let humans assign local "petnames" (labels) to them in a way that is actually pretty usable, if everyone is using software designed for it.
Problems come when, say, you want to put your web address on a billboard. In an ideal world the billboard would somehow transmit the advertiser's public key to the viewer's device so that the viewer could then look them up, but obviously we don't have any particular tech for doing that. So instead we create this whole complex system by which people can register human-readable identities, which in turn requires a centralized name service (yes, DNS is centralized), certificate authorities (ugh), etc.
Similarly, whenever you tell your friend about some third-party entity (another person, company, whatever), you should be giving them the public key of that entity. But that's not really practical. We need some sort of brain implant for this. :)
Augmented-reality tech (whatever comes next in the line of tech Google Glass is in) would presumably do this as one of its primary use-cases, though. As soon as there's a reader-device that knows how to passively scan for and "absorb" encountered pubkeys into your keychain, a "signed link emitter in hybrid QR-code/NFC format" would become as commonplace as printed URLs are today, because they'd actually be useful over-and-above URLs.
As already pointed out by ivoras here, magnet: links are close to exactly what you're looking for. It also reminds me of, for example, the Freenet CHK/SSK/USK system, or several other things with similar designs.
You should also not use SHA-1 hashes for uniqueness or checksumming anymore: they're too weak.
This would avoid the inefficient situation we have at present, where the same jQuery script is fragmented across dozens of CDNs, causing a new request each time even when previous instances are already cached by the browser.
You almost never point directly to a dependency as a standalone file.
Doing so would mean 15-30 requests per webapp and since browsers only accepts 5-6 parallel requests, it would slow down the page considerably.
I think we've been doing some very silly things with data over the last few decades related to the UPDATE statement.
Why not just consider all published data immutable? Look at book publishing as an analogy. There are multiple printings of a book. If there were corrections or updates they don't retroactively affect the previous printings. Why can't we look at data that is published on the Internet is the same manner?
If you want to update something you'll have to publish a brand new version. This also mirrors versioning in software libraries.
CRDTs, immutable data structures, eventually consistent data... from UI programming to big data, these are more than just eternally recurring trends. We're learning some very lasting things about how computers should deal with data.
With respect to web content, what are you proposing? If I got to site.com/product1data and you update the price, we certainly don't want the URL to change. In such situations, how would a versioned system add any value, and how would the UI be exposed?
This applies to the names and categorizations of things as well.
As for updates, imagine you're a shopkeep and in the morning you publish a table of prices, titles and content-addressable hashes.
Now, for the whole naming-of-the-things... take your pick: ICANN or Namecoin-like.
Claim ownership of a top-level name and then you can point it at whatever you want.
It would also be possible to store and distribute new versions as patches to previous versions by linking between the patch set, old version, and new version.
Or perhaps I don't understand "Why can't we look at data that is published on the Internet is the same manner".
It terms of commerce this is somewhat analogous to bait-and-switch.
In this thread I'm mainly referring digital content as an end-to-itself, not of digital content as a reference to physical products.
As for referencing physical products, be they automobiles or paintings, deriving a direct cryptographic hash isn't possible, but GUIDs are. Cars already have serial numbers. Paintings have signed certificates from experts.
If I'm on a website buying a used car I definitely want the price list to be linking to the GUID, that is, to a reference of the object itself.
(I was going to give your team a shout-out if you hadn't beat me to it!)
As a more specialized project, you'd expect IPFS to be better at the part of the problem it solves, for the same reason a sprinter sprints faster than a decathlete. (Not to mention that JB is awesome.) On the other hand, not having an identity model other than public keys (or, to put it differently, not trying to square Zooko's triangle), imposes certain problems on IPFS that urbit doesn't have. For instance, with routable identities, you don't need a DHT, and so the idea of a hash-addressed namespace is less interesting.
That said, it would be easy to imagine a world in which urbit either could talk to IPFS, or even layered its own filesystem (which has a fairly ordinary git structure under the hood) over IPFS. Like I said, it's a cool project.
The chosen-name can be a UUID, but doesn't have to be something so semantically-opaque. It's more likely to be a tree-namespace like traditional domain-centric URLs. The publishers-key replaces the role of the domain-name as the 'authority' portion of the URL.
(Once upon a time, I suggested 'kau:' – for Keyed AUthority – as a URI-scheme for such URLs in a location/protocol-oblivious web: http://zgp.org/pipermail/p2p-hackers/2002-July/000719.html )
- First you create a "random" blob that has an identity (called a permanode), that you sign with your private key. It acts as an "anchor" you can link to.
- Then you sign a piece of json that references the permanode and the content you wish; it effectively means "I, owner of key XXX, claim that permanode called 12345 now references value ABCD".
- To get the content of a permanode, you search all modifications and merge them to obtain the final value
There should be a way for people to search a particular server by name without being in cahoots with every other party who wants to get involved in it.
The link to the The Mess We Are In he refers to ( https://www.youtube.com/watch?v=lKXe3HUG2l4 ) is a fun and accessible talk to gave at Strange Loop conference last year.
I don't understand this statement. Dispensing with encryption makes the request and returned content susceptible to passive observation. Saying that this approach is resistant to active manipulation assumes the existence of some kind of web of trust between hashed documents. At some point you're going to have to click on a hash without knowing its provenance. How do you know you're not being phished?
Also consider what benefit do you get by verifying the entire file? Some applications may wish to read the first few bytes to ensure such file is openable by the application, but in the end the first N bytes can fool you if the last M bytes are malicious. So you would open the file in a sandbox to minimize impact.
Tomorrow everyone discovers that MD5 has been compromised by some organisation with a lot of money (obviously this happened long ago).
So the author needs to re-publish it as sha:adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
And all my links are suddenly broken and I can't provide a mapping from old to new.
And then someone breaks SHA...
All I'm saying is that content-addressed hashing doesn't obviate the need for secure transport and trust.
But SHA2-256 and up, and many other hashes, are still safe for this purpose and likely to remain so for decades – and perhaps indefinitely.
So within the lifetime of an application or even a person, secure-hash-naming does obviate the need for secure transport and trust. Also note that 'secure' transport and trust, if dependent on things like SSL/TLS/PKI, also relies on the collision-resistance of secure hash functions – in some cases even weaker hash functions than anyone would consider for content-naming.
(For the extremely paranoid, using pairs of hash functions that won't be broken simultaneously, and assuming some sort of reliable historical-record/secure-timestamping is possible, mappings can be robust against individual hash breaks and refreshed, relay-race-baton-style, indefinitely.)
So even that hypothetical example – with an early, old, and ultimately flawed secure hash – reveals hash-based as more robust than the alternatives.
And in practice, hash-names are as strong or stronger than the implied alternative of "trust by source" – because identification of the source is, under the covers, also reliant on secure hashes… plus other systems that can independently fail.
We have experience now with how secure hash functions weaken and fail. It's happened for a few once-trusted hashes, with warning, slowly over decades. And as a result, the current recommended secure hashes are much improved – their collision-resistance could outlive everyone here.
Compare that to the rate of surprise compromises in SSL libraries or the PKI/CA infrastructure – several a year. Or the fact that SSL websites were still offering sessions bootstrapped from MD5-based PKI certificates after MD5 collisions were demonstrated.
If you use a tree hash, the side sending you content can even include compact proofs that what they're sending you is a legitimate part of a full-file with the desired final hash.
So for example, if receiving a 10GB file, you don't have to get all 10GB before learning any particular relayer is a dishonest node.
1. lets you talk about individual objects by hash-based URNs;
2. or lets an publisher insert versioned streams of objects using document-signing with deterministic subkeying‡; gives the stream as a whole a UUID-based URN; and then lets clients query for either the latest, or for any fixed version-index of a given object-stream;
3. and which does a sort of pull-based store-and-forward of content—every node acting as a caching proxy for every other node.
I'm really surprised nobody has just built this trimmed-down design and called it a "distributed object storage mesh network" or somesuch. A public instance of it would beat the Bittorrent DHT at its own game; and private instances of it would be competitive with systems like Riak CS.
† Which is perfectly sensible even for Freenet itself; you could always just run your Freenet node as a Tor hidden service, now that both exist. Tor cleanly encapsulates all the problems of anonymous packet delivery away; the DHT can then just be a DHT.
‡ This is similar to Bitcoin's BIP0032 proposal, but the root keys are public keys and are available in the same object-space as the transactions. Given that you have the root public key, you can both 1. prove that all the documents were signed with keys derived from this key, and also 2. figure out what the "nonce" added to the root key to create the subkey was in each case. If the inserting client agrees to use a monotonically-increasing counter for nonces, then the subkey-signed documents are orderable once you've recovered their subkeys.
You mean something like this?
The other thing in this area to watch is ipfs.
Things like git hashes and the bitcoin blockchain are already giving us pieces that head in this direction.
One thing I would not like to see would be total lockdown of the worldwide body of documents, where every digital creation is perfectly and irrefutably tagged and tracked back to its creator and on every step along the way. I'm not sure what all the bad consequences of this could be, but it doesn't give me a warm and fuzzy feeling.
For content website, something like Youtube which has combination of 11 Chars. It's pretty good. For them but surely will have a problem in future but that might be after 5+ Years. But if they add one more char, it will have more combinations.
If this is about URLs and not files. Can any one correct me if I am wrong.
Joe Armstrong's proposal seems to boil down to this:
- Identities as UUIDs
- State as the payload of UUID URIs
- Values as SHA-1 URIs
previous discussion: https://news.ycombinator.com/item?id=6996398
It's a very interesting project, worth a look
Objects with identity, like a buildpack, are given a UUID. That UUID is stable, but the hash changes on disk depending on the exact file uploaded (because you can replace buildpacks).
File paths include both UUID and hash.
Edit: If you're downvoting, please explain. This is more or less factually correct.
"openssl speed" reports RIPEMD-160 being a few percent faster than SHA256 on my computer.
What makes you say it's "less resistant to collisions"? I don't think there are any serious cryptanalytical attacks on RIPEMD160.
RIPEMD has a 256-bit variant, but it hasn't received enough scrutiny.
We don't care about the likelihood of producing some random collision; we care about the likelihood of producing some specific collision (which is not vulnerable to the birthday attack). http://en.wikipedia.org/wiki/Preimage_attack
The reason SHA-1 is considered insufficient is that it is cryptographically broken https://marc-stevens.nl/research/papers/PhD%20Thesis%20Marc%...
I.e. the chance of finding a collision is substantially higher than would be expected from an ideal PRF.
As far as I know, there are no serious cryptanalytical attacks on RIPEMD-160, and 160 bits is more than sufficient for cryptographically unique identifiers.
- Popular/important websites refer to my library as hashname://..., trusting that this refers to the version of the library that they audited.
- I can then create a new, malicious version of the library that has the same hash and use it to infect popular sites.
Allowing collisions breaks the immutability requirement, which impacts security in many important cases.
The reason SHA-1 is insecure is that it is cryptographically broken, and the same attack takes less than 2^60 attempts.
All hashes have collisions. There are no cryptographic hashes that can promise zero collisions.