Hacker News new | past | comments | ask | show | jobs | submit login
Network Coding in P2P Networks (adlrocha.substack.com)
86 points by adlrocha on Aug 23, 2020 | hide | past | favorite | 16 comments



I did some research on network coding a few years ago (see [1] if interested), and I was very excited by the potential until I read the paper by Li, Li, and Lau that proves the maximum gain in throughput we can expect in undirected networks is 2x. https://iqua.ece.toronto.edu/papers/zp-transit.pdf This makes me think there is no point in introducing all this complexity... just stick to packet networks and routing, and you're at most 2x less efficient than then optimal netwrok coding scheme...

[1] A short literature review paper on network coding dating back to 2009: https://minireference.com/static/tmp/network_coding_review.p...


When you've made all other optimizations and your performance is bottlenecked by that last bit of efficiency, then perhaps it is interesting! 2x is nothing to sneeze at.

If your application is costing users two million dollars a year in bandwidth ... then halving it would potentially save a million dollars a year-- and could well be worth an impressive engineering effort.


Real world incremental complexity is very high. To say nothing about the lack of an ecosystem.

I fondly recall interviewing at one of the faang companies years ago, when they ran out of things to ask me, and the guy turned to me and said: "Let's talk about network coding". My response, "Why would anyone want to do that?", probably didn't help land me the job.


Network coding is very interesting but the maximum gains from the network coding itself are fairly limited... a small constant factor. In attempting to apply network coding I've found that vulnerability to (maliciously) corrupt data significantly limits the utility.

If this is an area that interests you, you might also be interested in efficient set reconciliation-- where a party can communicate to another party a set using only exactly the size of the set-difference to the other parties set without having any idea of what the other parties set has.

This can result in fairly large bandwidth savings in P2P gossip applications without reducing robustness-- and could be combined with network coding in places where network coding is applicable.

I worked on an efficient implementation of setrecon https://github.com/sipa/minisketch/ and an application of it to Bitcoin ( https://arxiv.org/abs/1905.10518 ).


Seems pretty weird to waste substantial (1000x) CPU and disk I/O to figure out what you are missing, and if you have enough pieces.

Why not just use a reed-solomon based approach and turn 1000 pieces into 1100 and then you know you can recover the file with any 1000 of those 1100 pieces? That enables you to actively search for the rare pieces to improve the health of the network and as a side benefit you use 1000x less CPU and I/O.

Adding significant complexity so that sometimes you get a missing piece one time step sooner seems like a huge waste.


Out of curiosity I compress a 3 hour lecture one time to 70 MB and was totally surprised by how watchable it still was. It got me wondering, out of that 10 GB ultra uber HD XXL video there is about 50 MB that you must have, 100-200 MB of data that would be really useful but the rest quickly drops to "nice to have but not at all necessary". (my numbers are mostly imaginary)

I strongly think that packaging and delivery methods should be specifically designed for the goods being shipped. (Sand is not glass)

Poor TV and radio signals use to gradually degrade the picture and audio. Even with partial reception you could still figure out useful things like if your city was going to get bombed.

Today on p2p you get a multi GB blob of data with which you cant do anything until you got all of it. Streaming will only give you the beginning which could be worse than not watching something.

But if we take say 100 fold and do 70 GB of reads for a 700 MB video... (I guess a few more numbers here and below)

From the 80's to 90's memory went up from 16 kb to 1 MB. From 90's - 2000 from 1 MB to 512 MB. From 2000 to 2010: 512 MB to 48 GB. 2020, 128 GB? (rather disappointing compared to the previous jump) Is 512 GB a reasonable guess for 2030? I don't think we will see the 32 000 fold increase from 1980-2000 but 2 or 4 times seems reasonable? 70 GB of reads wouldn't be such a big deal.

Monitors will grow of course but houses are getting smaller. We can do all 4 walls, then the ceiling and even the floor, increase frame rate and resolution but it has to stop eventually. (haha)


Progressive images - see demo images a bit down the page here: https://blog.codinghorror.com/progressive-image-rendering/

It's pretty cool on the rare occasion when you stumble across a huge image on the web (think astronomy); first a low res version pops up and then you can see the top-to-bottom loading as it gets gradually replaced by higher resolution versions.


Well, when you’ve covered all the walls and the ceiling and the floor, and maxed out the frame rate and the resolution, you can start stacking displays on top of each other to get volumetric display on all said surfaces.

https://www.hackster.io/news/creating-a-3d-volumetric-displa...

Depending on how you do things, it might not require that much more data reads, memory, etc. Which is a shame, if the goal was to make the requirements much much higher :p


Reed Solomon might run into issues with it's n^2 runtime if you make too many pieces. Fountain codes or other codes though could do the job.


Take a look at tornado codes if you want a rated (instead of rateless) erasure code. It's related to fountain codes (and linked on the wikipedia article), and does basically the same thing as Reed-Solomon but with quasi linear encoding, IIRC n(log*n). I have to check if the patents expired, but my masters work was adapting BitTorrent to be backed by Tornado codes. Works surprisingly well since the original BitTorrent is structured around blocks anyways. I have the Ruby code around somewhere. When I modeled it, when the last seed left you'd need about half to one quarter the number of non-seed nodes in the network to be able to reconstruct the file.


Ooh interesting, I'd be interested in that code/work. I'm somewhat familiar with Tornado codes but haven't looked in years


I'll have to go searching (been 13 years), but conceptually building a tornado encoder is a multi stage pipeline with input blocks and output blocks depending on how the bipart graph is set up. Once you have that you extend the bittorent block dictionary a bit and you're pretty much there.


Heh, don't let computer science blind you to the real world realities of running something. Sure the n^2 time will dominate somewhere before infinity, doesn't mean that Reed Solomon is slow for practical use cases.

I have a 19GB test file on a desktop Xeon e3-1230 v5 (a quad core Intel from 2015) and an encoder written in Go (not the fastest language).

I can turn a 19GB file into 10 2.4GB files in 44 seconds, I can recover the original with any 8 of the 10 files. That sounds pretty slow, but it is reading 19GB and writing 24GB. In fact when I watch with top it looks like it spends a good while reading in 19GB, spends 1-2 seconds running the Reed Solomon calculation on 4 cores, then spending a bunch of time writing out 10 2.4GB files.

So Reed Solomon is fine even on a 6 year old computer on a pretty large file. I suspect a current 8 core desktop from 2020 could do even a 100GB file without taking noticeably long. After all a 100GB file shared over a p2p network is going to take WAY longer than the reed solomon calculation. Similarly I expect a recent vintage android phone or IOS phone could manage similar not to differently than a 6 year old desktop.


Rs code is itself more complicated, you already mention that.

Also it requires some form of coordination. IIRC, (I never really learned the details of RS code), RS code cannot be mixed, I.e. 2 encoders have to code the blocks with the same initialization parameters.

On the contrary, Network Coder can just perform random linear combination, then they can quickly reach the sufficient blocks to be able to decide all blocks.


I love your reed-solmon idea.


I never knew! What fascinating stuff. The idea that anyone who has the entire file could generate a block that fits as the final piece seems totally magical. You could do a final even block or odd, group every 3rd, every 4th, first half second half... and then the "last" block just magically fits... for everyone... mind blown!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: