Hacker News new | past | comments | ask | show | jobs | submit login
Wikipedia-IPFS: An exploration to host Wikipedia in IPFS (github.com/santhoshtr)
143 points by bpierre on May 9, 2020 | hide | past | favorite | 48 comments

> I think this issues are going to continue till some big use cases emerge and protocol becomes main stream.

Nailed it. Remember Freenet? https://web.archive.org/web/20130908073158/http://www.thegua...

There's always a chance of Freenet being the IBM Simon and Apple Newton and IPFS is the iPhone. Or, the VFX1 vs the Oculus. But no. While surely the tech has evolved from a shitty Java app as Freenet was, all those other examples had immediately obvious use cases and just needed the tech to catch up which it did.

I remember discussions of Freenet on "news for nerds" forums in the early millennium, and it was obvious that that particular technology was doomed and would never be widely adopted. You see, people were reluctant to run a Freenet instance because that would involve hosting and passing on content from other users as well. While all Freenet content was encrypted, people were horrified by the thought that if the crypto algorithm were ever broken, they would suddenly be revealed to have been hosting child pornography on their computers (unwittingly, but still). Thus, distributed platforms today must allow people to pick and choose what content they wish to host.

It hasn't even been necessary to break the encryption. Police can just run Freenet nodes, which serve child porn, and then log IPs of peers that download chunks of those files. While it's true that Freenet does a good job of obscuring data routing, arguably providing plausible deniability, it's in practice nontrivial to convince juries of that.

The question being, how that is supposed to be any different than any other service operated by anybody else?

If a pedophile signs up for AT&T, is AT&T not routing their illegal data? Or Starbucks if they use the WiFi there? If they sign up for AWS or some other hosting provider, isn't the provider hosting their illegal data?

The obvious solution is to make actual knowledge a prerequisite to liability. The pedophile who knows what it is goes to jail, the random person who is only providing a generic service to the public does not.

In reality, police identify Freenet users that have "downloaded" child pornography. And then they arrest those people, impound all their gear, and file charges.

So defendants now face the challenge of convincing juries that they were just relaying data to other users. And that's not trivial. Or at least, it's expensive to hire expert witnesses. So many just accept some plea bargain.

These are the stories fearmongers tell because it happened to one or two people many years ago.

The same thing has happened to hosting providers. Sometimes bad police are malicious or incompetent and screw up the lives of innocent people. But that doesn't have anything to do with Freenet, that can even happen to you driving down the street when some dirty cop needs a bust and decides to pull over a random car and plant drugs on it.

The answer isn't to never do anything, it's to fix the systems that oppress innocent people for no good reason. And in the meantime you can't live your life in fear of low-probability oppression by defective authority figures.

> These are the stories fearmongers tell because it happened to one or two people many years ago.

Here's a recent one:[0]

> Gibson’s arrest grew out of an ongoing probe of the “Freenet” — an online network that allows users to anonymously share images, chat on message boards and access sites, the probable cause statement says.

0) https://eu.courierpostonline.com/story/news/2020/02/09/craig...

> also had some 900 images of suspected child pornography on the hard drive, says a criminal complaint.

So not an example of someone being arrested just for operating a node, then.

No, but the article implies that he was identified by the fact that he was running Freenet.

By all means, then, go for it.

Me, I'll run my Freenet nodes in anonymously leased VPS, and access them via Tor.

But actually, I won't, because there's not much of interest there.

> Me, I'll run my Freenet nodes in anonymously leased VPS, and access them via Tor.

Some would say that's the way to do everything. But it doesn't exactly make it easy for the average Joe.

And that's kind of the point too. Police behave badly more often when there is a network with 500 people on it because they don't really understand it and nobody is paying attention to anything. But when there are only 500 people on it, the 500 people can all be using five proxies and blockchains and credit default swaps and whatever else.

Meanwhile by the time it's popular enough that Joe wants to use it, it's also popular enough that Inspector Clouseau is no longer on the case by and large.

Hey, I respect Freenet hugely. It's almost 20 years old, and it's been ~well maintained throughout. People do hate on Java, but I've had no problems with it, using OpenJDK in Debian.

But I don't buy the argument that it's targeted just because it's too small. No matter how many used Freenet, it would get targeted because it's way too laid back about child porn. Sure there's child porn on Tor onion sites, but it's not so easy to find, since the Hidden Wiki has been cleaned up. On Freenet, however, it's a top level category on one of the featured search sites.

If the same percentage of people used Freenet as use email then it couldn't be "targeted" because targeting implies some kind of special notice, but there's nothing special about using email. Which is true even if email providers exist who are "way too laid back about child porn" etc.

It was also terribly slow.

Indeed, I think it's the slowness which killed it more than anything.

Also I remember it supported a very limited subset of CSS even for the time and no way to make websites dynamic since there were no JS, that for sure did not help.

>Thus, distributed platforms today must allow people to pick and choose what content they wish to host.

IPFS works more like torrents than FreeNet. If you run an IPFS node, it only hosts content that you pinned on it yourself.

IPFS also has none of the anonymity. So it's a tradeoff.

It seems more likely that IPFS is the GM-NAA I/O to something in the future's OS/360. The concept is cool, but it doesn't seem like it's there yet.

IPFS is waiting for the use case to catch up. If/when we start to colonized Mars, the InterPlanetary File System might actually be useful. (Only being somewhat tongue-in-cheek.)

Small static sites like this one sound like a good use case: https://ipfs.io/ipfs/QmZCGRVJNjVJtEqtS2HwhELszNBSU2XNwuBkFEQ...

Actually, IPFS' network resilience is a great fit for high-latency overlay networks.

I believe that is their point.

Where do you see that?

You're right. I read your comment and thought it was in reference to the thought of an Inter Planetary network, not realizing you were talking about a present day usecase. Sorry. I've actually heard about IPFS being used by some private cloud/on prem operators to speed up delivery of container images.

>While surely the tech has evolved from a shitty Java app as Freenet was,

Shitty or not, freenet did and does still provide anonymity, which IPFS does not.

That's dangerously incorrect. Freenet provides no anonymity. What it provides is "plausible deniability".

All Freenet peers know each others IP addresses. There's no onion routing. There is the option of running in "darknet" mode. That is, connecting only with known peers. Ideally, people you know and trust. But that provides no access to data from the global Freenet opennet. For that, at least one darknet peer must risk peering freely with the global opennet.

Although there's no onion routing, there is a very effective method for routing data through peer networks. With that, it's arguably impossible to distinguish those who share and access data from those who merely relay it. However, the "plausible deniability" argument depends on understanding how Freenet routing works.

If you want to check out Freenet, never run a node at home. Use a VPS that you lease and access only via Tor. Ideally, connect to Tor via nested VPN chains. Use cryptocurrency that's been thoroughly anonymized. And setup the Freenet WebGUI as a Tor onion service.

Well by that definition Tor doesn't provide anonyminity either.

A common definition of anonoyminity is that given a set of users, its impossible to determine which element (person) of the set did the action (published a document) [or have a better chance of guessing than guessing at random]. I think freenet mostly provides this property (with a bunch of devil in the details)

I agree that Freenet allows publication that's both anonymous and not readily censurable. What it doesn't allow is anonymous access, except in the sense of plausible deniability that depends on highly technical arguments.

Tor actually does provide anonymity through onion routing. It's not effective against global adversaries, and doesn't claim to be. But it's hard to imagine how any low-latency overlay network could resist global adversaries. Chaff, caching and mixing would help, but they aren't that practical or effective unless you accept greater latency.

But your argument was "All Freenet peers know each others IP addresses". All the nodes in the tor network also know each other's ip addresses. (I suppose clients arent really public, especially if you are using a obfuscating bridge). I don't understand how you are distinguishing between "plausible deniability" and "anonyminity". I feel like either both freenet & tor either have these properties or they both don't, depending on what your definition of those terms are.

That's not to say they have the same threat model or provide the same protections. They are very different systems that have different properties - but at a broad-brush high level they both roughly allow you to publish stuff without people being able to easily track the information back to you under various assumptions.

> But it's hard to imagine how any low-latency overlay network could resist global adversaries.

This is a bit of an aside, but making a scalabe low-latency anonymous network is tricky. If you drop the scalabe requirement you can use dining cryptographers.

There's a key distinction between Tor and Freenet. With Freenet, all nodes host content, access content, and relay traffic for other nodes.

But with Tor, only clients (including onion services) host and access content, and they don't relay traffic. Conversely, relays merely relay traffic, and don't host or access content. Also, only entry guards ever see the IP addresses of clients. And they only see encrypted data going to middle relays.

So attacks like police have used against Freenet are impossible. Because clients never connect directly to each other. The closest analog is taking over an onion site, and then serving malware to users. But that's much harder than running nodes, serving child porn, and logging IPs.

Also, given that relays retain no content, operators have so far managed to escape liability.

I hadn't heard much about IPFS but the way this is built, with the immutability and hash keys etc, sounds very much like FreeNet. That's the first thing I thought of, and predates this by about 20 years. I remember when that promised to be the next big thing and save us from the commercialisation of the internet that was in full swing then.

Unfortunately, that didn't turn out very well. Let's just say the 'fringes of free speech' took the upper hand and really started defining the platform in the public view, to the point of being strongly associated with the platform. Why would you run a node if it was mainly used for horrible stuff. Same as what happened to tor services, but to a bigger extent.

I really believe in the idea freedom of speech. But some kind of control is unfortunately needed. Otherwise a platform like this will always spiral out of control and kill itself. I think it's one of those things that sounds great in theory but doesn't work in real life.

Perhaps some kind of user voting could be implemented, but it would be extremely slow, too slow to catch up with new content being published. It's a very hard thing to tackle. Free speech is unfortunately not as black and white as its advocates like to see it.

I think comparisons to FreeNet are misleading, because IPFS doesn't let you publish content to other people's nodes, and it doesn't try to provide any way to browse content in IPFS. It's more like torrents, if torrent applications were tuned to serve content to web browsers quickly.

It seems like every single thread about IPFS on HN has several people asking "what happens if you run an IPFS node and a random person online puts something illegal on it?", and that's just not a thing that happens with IPFS.

Freenet and IPFS are both content addressable retrieval systems, so they are similar in that way. However otherwise they are very different and there have been many other systems based on the idea of addressing things by hash. BitTorrent and DHT are two very notable systems of that type, and unlike freenet they are known for their speed. They are also known for people getting sued for piracy. I dont think the comparison between freenet and ipfs is very illuminating

> Let's just say the 'fringes of free speech' took the upper hand...

Luckily for you that blurry line between mainstream and "fringe" is rapidly shifting, and the list of permissible opinions continues to shrink. More and more people will be forced to decide how important guilt by association is to them, and will either make use of the stigmatized tech - or simply shutup.

> But some kind of control is unfortunately needed.

To what end? Are we protecting the children? Are we preventing the rise of Turbo-Hitler?

> Otherwise a platform like this will always spiral out of control and kill itself.

How do you imagine that happens? Shaming from people saying exactly what you have just said, leading to reduced uptake, justifying censorship - because only bad people use the dark web.

Here is a clue: look at what has happened to lesbians, they are already going through this. I know this because I frequent a thoroughly demonized site run by a free-speech absolutist (well, almost). One day a whole bunch of these women showed up to complain about some insane pre-op transsexual who had succeeded in getting them censored on every major platform. These ladies absolutely hated the site and everyone on it, but they had no alternative.

Well, the problem with providing a platform for truly free speech is that there's always someone ready to find that extreme niche that is really not acceptable to even the staunchest free-speech advocate. As you say, the 'almost' is an important thing. As soon as you permit everything, someone crazy will find that niche and start overflowing your platform with it because they have nowhere else to go.

I'm not doing the demonizing, just pointing out that this inevitably happens.

PS: Who's against lesbians??? I'm pretty sure they are by law protected against discrimination, including the US?

The problem is one of the platforms' own making, because they want the legal protection of being a platform in addition to the commercial benefits of being a publisher. This has been allowed to happen despite the fact that such aggressive "content curation" clearly makes the likes of Twitter and Youtube publishers. That should come at the cost of safe harbor protections. A platform's only duty, with regard to the policing of speech, is in cases of copyright violation and court orders; as you can imagine, adherence to the law would make it difficult to control political narratives and sell laundry detergent ad space. The "almost" aspect is more to do with ambiguities in the laws themselves, as well as untested and conflicting international laws.

The solution is very simple: enforce the law, force the choice between platform and publisher. If there truly is a need to curtail free speech on a legally protected platform - pass a law making that speech illegal. I'd be curious to see how many heads explode when people learn that "hate speech" is not illegal - and couldn't be made so in the US without hilarious consequences.

I'd rather not name the individual in Canada that was targeting women, but I will say this: some biologically born men have found an interesting loophole in both the political arguments and legal protections afforded on the basis of class and identity. Some lesbians were unimpressed with the idea that they had a moral obligation to grant equal sexual preference to both biologically born women and biologically born men who declared themselves to be women. This is something that has been going on for a while now, and nobody is really allowed to talk about it - because "hate speech".

I recently tried to use IPFS to share 20 gigabytes of data on a local network, using ipfs-desktop.

Apparently it hashes the entire file, meaning it will read it entirely, so I killed it because it took way too long. Apparently there are other methods for fingerprinting large files, like reading a limited set of chunks. I asked on IRC but I did not get a thorough answer.

To share files in a local network FTP or SMB/CIFS may be better suited.

IPFS is content addressable storage by design on entire files. Reading limited set of chunks poses some problem regarding data integrity and deduplication colisions. One bit flip in JPG may destroy the entire image - thankfuly with currently used hash functions the right and broken files will have different "fingerprint". One way hash functions have extremely low probability of collision with full file scan. However when you skip part of files the probalibity of collision is roughly equal to count of bytes not scaned divided by total file size.

It’s a shame modern file systems don’t generate and store common hash algorithms as file metadata.

Some file systems like ZFS, btrfs support checksum on block level. However file level checksum is not trivial with random write file system. Imagine 1 byte is changed in 20GB file - that would require full file scan to update the checksum. Recalculating a file checksum from all block checksums could be a solution however far from any standard.

Wouldn't a merkle tree, like torrents or dat use, avoid this? You'd only have to generate a hash of the changed chunk, then it's really low cost to update the root hash.


Yep, my last sentence covered also Markle tree.

I don't understand why you got impatient? Just let it finish and then you can share, easily enough across your entire network.

Awesome ! But Wouldn’t ZeroNet be a better fit for this case ? Changes propagation is built-in

How long does it take to propagate an edit with this? The previous PoC I saw (2+ years ago) worked from a static dump from the Kiwix project.

Author of this new PoC here. IPNS propagation is slow in the current implementation and it is a blocker for instant propagation irrespective of number of peers. In my discussion with IPFS developers, I was told that they are going to look into this issue in next versions.

Theoretically milliseconds.

Practically I don't think code for real time updates is written uet

Oh damn. For a second there I confused IPFS with pingfs: https://github.com/yarrick/pingfs

Not the same thing, not the same thing at all...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact