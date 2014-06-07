Hacker News new | comments | show | ask | jobs | submit login
BitTorrent vs. HTTP (haxx.se)
I wish browsers and OS would implement torrent by default. Not only would it make the sharing file experience (downoading, sending, etc) much better, but it would open the door to many cool P2P stuff on the client side.

It's a standard, we know it works and how.

It also sounds like a cool caching technique : since people already have the resource cached on their local system, why not allow them to distribute it?

HTTP headers can allow to know when to flush cache (just like it's currently done) and provide last known md5/sha1/whatever digest to make sure page is not tempered with (let's say it's checked when the download is complete, and retry a download if the signature does not match: it should not happen often anyway). It obviously won't work for pages which distribute auth related content, but it would be great for assets.

I guess a problem could be that page load will be slower (depends on the ability to parallelize and to contact geographically close peers, I suppose), but it would mean way less heavy load on servers.

Yeah, thats cool but I dont want to pay to be a part of Microsoft's/Apple's/ you-name-it's insfrastructure.

I dont want to be (potentiall forced to) distribute software.

With recent developments in net neutrality and data plans, p2p could drastically impact my data plan and cost me money.

Sure I can see why you would think the potential is, but for me p2p data distribution is out of the question, at least for now.

Opera 12 had torrent...

It's incredible how Opera was innovative at the time. Nowadays it just another useless chrome.

Opera's layout engine and UI were incredibly fast, responsive, and lightweight, remaining the fastest for decades. I have no doubt it was the product of a very small team of brilliant engineers. Its userscript support was also top-notch, allowing things that are tricky or impossible even today in a browser extension.

Too bad it couldn't keep up with the whole Web 2.0 sh*tstorm.

Yes, and it also had an IRC client. Having lots of features unrelated to web browsing did not make it better.

But it was better. The UX was way too good, even if things like standards implementation weren't great.

I seem to recall that back then Opera stood shoulder to shoulder with Mozilla about standard correctness.

I seem to recall that the Mozilla suite had a irc client as well.

An interesting tidbit: the original bittorrent implementation mimicked the download dialog for a browser of that era (back then, each download opened a new dialog box, with its own progress bar). That is, the user experience for a normal download (click on a .mov) and for a bittorrent download (click on a .torrent which downloads a .mov) was almost identical.

> While clients (probably) could ask for chunks in a chronological order and then get the data in a sequential manner [...]

Why probably? This is exactly how the immensely popular Popcorn Time worked.

It's literally in the same sentence. In the part that you elided:

"it will not at all distribute the load as evenly among the peers as the "random access" method does."

If people only stay to download the file and then disconnect then there will be a higher concentration of parts from the beginning of the file than to the end. This reduces the resiliency of the swarm.

I only wanted to bring up that the beginning of the sentence made it sound like a theoretical concept that might work, but no one has ever tried or seen in practice. The concerns are definitely valid.

Depends on the algorithm but you need not go 100% one way or the other. The basic rule can be 70% download next part and 30% download rare bits. The swarm is slightly less resilient, but seeders are still generally uploading the rare bits not the start of the file. This also tends to make a faster swarm as new downloads have something to trade.

Side note: AFAIK Popcorn time generally uses very crowded swarms (e.g. YIFY torrents) so chronological downloads do not kill the swarms.

chronological downloads never really kill swarms. If everyone is downloading chronologically, then the earlier pieces are more in demand, but they also have more supply

The only thing that kills swarms is average seed ratio < 1

They do as the supply of the final pieces is low and if they go offline the torrent could die. Happens a lot in less popular torrents.

The non-random fetching is also how super-seeding works [1].

https://en.wikipedia.org/wiki/Super-seeding

Most clients support this in some way. In qBittorrent it's a context menu option, for example.

He forgot to mention a very important aspect of bittorrent: the uTP (micro-TP) protocol.

uTP is a very nice bittorrent over UDP protocol implementing LEDBAT congestion control algorithm [1].

1 - https://en.m.wikipedia.org/wiki/LEDBAT

One thing I always fail to see in these types of comparison articles is that BitTorrent will happily destroy any ability for those sharing your connection to use it if you let it.

Probably due to the sorry state of affairs that is the consumer router space, running BitTorrent renders even web browsing painfully slow and sometimes completely nonfunctional.

Why BitTorrent does this while normal http transfers do not is not clear to me. Perhaps due to the huge number of connections made.

Either way, when given a choice I'll always take a direct HTTP transfer over a torrent, for no other reason than the fact that I'd like to be able to watch cat videos while the download completes.

If you can put a Linux router between your LAN and the uplink, you can use the Wonder Shaper.

http://lartc.org/wondershaper/

I've been using it for 15 years and it's still working great. Even with multiple P2P clients, stuff like HTTP, SSH, and gaming keep a low latency. Also you learn a lot about networks just by configuring it :-)

Really? I'll take Bittorrent any time, because, no matter how large the transfer, I can just leave my client to download even after I close my browser. Since I have a home NAS with Deluge installed, I can even turn my PC off and the torrent will keep going (and will send a notification to my phone when done), etc.

> Why BitTorrent does this while normal http transfers do not is not clear to me.

Two key reasons, usually, both related to congestion control (or practical lack thereof).

> Perhaps due to the huge number of connections made.

This is one of those reasons: unless the other end of your incoming connection is prioritising interactive traffic somehow packets for each stream will get through at more or less the same rate once the connection is saturated. So if you have a SSH link and are requesting a http(s) stream (for a web page or that cat video) while a torrent process has 98 connections getting data, for every 100 packets down the link only two are for your interactive process. On fast enough link this isn't an issue, but "fast enough" needs to be "very fast" in such circumstances as it is relative to the combined speed of all the hosts sending data. You can mitigate this by telling the torrent client to use minimal incoming connections (limiting incoming bandwidth can have some effect but is generally ineffective as bandwidth limits like that need to be applied on the other side of the link).

The other problem is due to control packets such as those for connection handshakes and so forth fighting for space on the same link as those carrying data. As soon as the connection is saturated in either direction so that there are packets queued for more than an instant, latency in both directions takes a massive hit. This is particularly noticeable on asymmetric links such as many residential arrangements. You can mitigate this by throttling the outgoing traffic either within the torrent client or at other parts of the network (assuming the traffic isn't hidden in a VPN link that means you can't reliably distinguish it from other encrypted packets) and reserving some bandwidth for giving priority to interactive traffic and protocol level control packets but you have very little control (usually practically none) over traffic coming the other way as you the measures have to be taken before the packets hit the choke point and you don't control those hosts your ISP does (they will implement some generic QoS filtering/shaping but more than that requires traffic inspection which we don't want them to do, and they don't want responsibility either legally or in terms of providing/managing relevant computing capacity).

(the above is a significant simplification - network congestion is one of those real world things that quickly gets very complicated/messy!)

Simply limit your client's allowed number of connections and limit its allowed bandwidth.

As mentioned by someone else, yeah, this is almost certainly the TCP ack thing. If you throttle back the upload about 10KB/s under your max upload speed, it won't choke your download ability.

The reason is that most ISPs offer much higher download speeds than upload speeds.

Bittorrent very quickly uses 100% of your upload speed, which effectively breaks the internet. The solution is to limit upload speed in your client

The most common problem is with asymmetric network connections with limited upload bandwidth. If you don't limit your upload rate, BitTorrent will consume your upload bandwidth so thoroughly that even TCP ACKs for other applications aren't sent in a timely fashion.

Why we still use HTTP is beyond me. And I don't mean about the speed issues. Why have a protocol that's so complicated when most of the things we need to build with it are either simpler or reimplement parts of the protocol.

reply


Could you elaborate your issues with HTTP a bit? What kind of protocol would do a better job?

Minimal implementations of HTTP (and I'm strictly talking about the transport protocol, not about HTML, JS, ...) is dead simple and relatively easy to implement.

Of course there's a ton of extensions (gzip compression, keepalive, chunks, websockets, ...), but if you simply need to 'add HTTP' to one of your projects (and for some reason none of the existing libraries can be used) it shouldn't take too many lines of code until you can serve a simple 'hello world' site.

On top of all that, it's dead simple to put any one of the many existing reverse proxies/load balancers in front of your custom HTTP server to add load balancing, authentication, rate limiting (and all of those can be done in a standard way)

Furthermore, HTTP has the huge advantage of being readily available on pretty much every piece of hardware that has even the slightest idea of networking. Any new technology would have to fight a steep uphill battle to convince existing users to switch.

Have I mentioned that it's standardized and open?

My friend once mentioned that FTP would be a good option, I'm not sure why though. I think they regarded HTTP as superfluous for the purpose of what we use the web for.

Anyone who has to deal with ftp on firewalls would say ftp (and ftps) would do well to disappear from the world.

It's not just firewalls. The fact that (unencrypted) FTP is still widely used today when better alternatives like SFTP (via SSH) have existed for years strikes me as odd.

(I'm speaking about authenticated connections. For anonymous access - which should be read-only anyway - you're usually better off using HTTP anyway)

HTTP tends to be faster for what we use the web for: [0]

FTP does have some advantages, but HTTP has more advanced support for resuming connections, virtual hosting, better compression, and persistent connections, to name a few.

[0] https://daniel.haxx.se/docs/ftp-vs-http.html

Nice trolling! :)

http://i3.kym-cdn.com/photos/images/original/000/732/170/796...

HTTP is quite a good protocol. Simple, extensible to a sane extent, but not overly extensible (XMPP i'm thinking about you). HTTP is not accidentally successful. FTP is a bad joke. (stateful. binary mode, 7 bit by default. uses multiple connections (unless in passive mode))

Interesting...

I bet if we had used FTP instead of HTTP for serving HTML right from the start, FTP would today have all of the same extensions and the same people would argue for it being too bloated :) (HTTP started as pretty minimalistic protocol back in the day)

I often find the discrepancy between what HTTP has originally been designed for (serving static HTML pages) and all the different things it's being used for today highly amusing. Yes, some of todays applications for HTTP border on abuse, but its versatility (combined with its simplicity) fascinates me.

I'm not as familiar with HTTP 2, so I'll only talk about the previous specifications...

Which are dead simple to construct, send, receive and parse.

Really.

For example, let's curl -L (view everything but the body) for the spec: http://www.ietf.org/rfc/rfc7230.txt

HTTP/1.1 200 OK

Date: Tue, 24 Jan 2017 12:00:55 GMT

Content-Type: text/plain

Transfer-Encoding: chunked

Connection: keep-alive

Set-Cookie: __cfduid=df57c7720b704a40e4c3367bbe248771c1485259254; expires=Wed, 24-Jan-18 12:00:54 GMT; path=/; domain=.ietf.org; HttpOnly

Last-Modified: Sat, 07 Jun 2014 00:41:49 GMT

ETag: W/"3247b-4fb343e4dcd40-gzip"

Vary: Accept-Encoding

Strict-Transport-Security: max-age=31536000

X-Frame-Options: SAMEORIGIN

X-Xss-Protection: 1; mode=block

X-Content-Type-Options: nosniff

CF-Cache-Status: EXPIRED

Expires: Tue, 24 Jan 2017 16:00:54 GMT

Cache-Control: public, max-age=14400

Server: cloudflare-nginx

CF-RAY: 326353e6a6a257a7-IAD

A bunch of newline, (CRLF), seperated key-value mappings. Some with a DSL (Such as Set-Cookie).

It gives you a status message instantly, a date to check against cache, a Content-Type for your parser, acceptable encoding, for your parser, a bunch of other values for your cache. All for free.

As for the body of the content? For a gzipped value like this, it's everything outside the header, until EOF. That's not quite as easy as when the content-length parameter is given, but hardly difficult for parsing.

HTTP is easy.

In fact, HTTP is so easy, that in-complete HTTP servers can still serve up real content, and browsers can still read it.

HTTPS is more complicated, but if you simply rely on certificate stores and CAs, it becomes much easier, but HTTPS is a different protocol.

> As for the body of the content? For a gzipped value like this, it's everything outside the header, until EOF. That's not quite as easy as when the content-length parameter is given, but hardly difficult for parsing.

This is chunked and keep- alive. Things get a little trickier

True, you keep the connection open, and receive a length of expected bytes, and then said bytes, until 0 is sent. Still simple enough that there are a dozen implementations of less than a page, only a search away.

Basic http is dead simple, it works, and it also has many addons with backward compatibility (one can still use a basic http client or server in most cases) and even new version fully optimized to nowadays needs (and even in binary form)

Why we use HTTP you ask? Because it works. How about that.

Based on my Snort statistics, a small but increasing number of sites and service providers are starting to expose support for the QUIC protocol.

QUIC is not application-layer protocol.

