Hacker News new | past | comments | ask | show | jobs | submit login
Shutting down FTP services (kernel.org)
140 points by danirod on Jan 28, 2017 | hide | past | favorite | 77 comments



This sentence struck out to me:

>>The protocol is inefficient and requires adding awkward kludges to firewalls and load-balancing daemons

I have always been aware that ftp across firewalls can be wonky, but never stopped to ask why.

http://www.ncftp.com/ncftpd/doc/misc/ftp_and_firewalls.html

>>The primary problems that the FTP poses to firewalls, NAT devices, and load-balancing devices (all of which will simply be referred to as Ârouting devices and not "routers" since gateway machines generally aren't problematic) are:

>>Additional TCP/IP connections are used for data transfers;

>>Data connections may be sent to random port numbers;

>>Data connections may originate from the server to the client, as well as originating from the client to the server;

>>Data connections destination addresses are negotiated on the fly between the client and server over the channel used for the control connection;

>>The control connection is idle while the data transfer takes place on the data connection.

What a protocol.


The control/data connections are actually pretty neat, and for the era (actual working Internet) it was a good idea that worked quite well. I understand it looks "hacky" today - but that's largely due to far worse hacks lower down the stack.

NAT came along, and everyone broke everything. Now you can't make up neat protocols like this any more - everything must be user-initiated TCP or server based. I do wonder how much development and innovation this has held back. Even games back in the day had far more interesting network models - that were more sustainable long-term.

NAT was likely the first major crack in the wall of the open Internet, making it much weaker and the users more compliant than it previously was. I strongly believe that's one of the major reasons why we have what we're left with today.


Don't forget that FTP also lets you connect two servers together with the data connection going between them, so can transfer a file from one server to another without downloading it. (http://www.proftpd.org/docs/howto/FXP.html)

Of course this awesome ability was exploited by hackers (the FTP Bounce Attack: https://www.cert.org/historical/advisories/CA-1997-27.cfm - and they say that inventing fancy names for attacks is a new thing). This is why we can't have nice things :-)


Yea I used to use FXP it was handy to be a client actor in that role but... exploited and abandoned mostly.


Thank you, guys. You prompted me to learn a bit about the protocol. I want to share what I learned after skimming over the RFC.

A typical session looks like:

    $ telnet ftp.kernel.org 21
    Trying 149.20.4.69...
    Connected to ftp.all.kernel.org.
    Escape character is '^]'.
    220 Welcome to kernel.org
    USER anonymous
    331 Please specify the password.
    PASS
    230 Login successful.
    PASV
    227 Entering Passive Mode (149,20,4,69,119,142).
Then I was baffled. What does 149,20,4,69,119,142 mean? Okay, 149,20,4,69 looks like an IP address for ftp.kernel.org. But what do they want me to do with 119,142? Turns out it's a port number divided into two octets. Basically they are asking me to connect to port 119 * 256 + 142 = 30606.

So, on the second terminal I run:

    $ telnet 149.20.4.69 30606
    Trying 149.20.4.69...
    Connected to 149.20.4.69.
    Escape character is '^]'.
Back to the first one:

    LIST
    150 Here comes the directory listing.
    226 Directory send OK.
And I get the listing I requested, on the second terminal:

    drwxr-xr-x    9 ftp      ftp          4096 Dec 01  2011 pub
    Connection closed by foreign host.
    $
Instead of using PASV, you can request the FTP server to connect back to you, with PORT. First, you need to start listening on your WAN interface (if you have one; if you don't, you are out of luck) with `nc -l your.ip.goes.here 2560`. Then, request the server to connect by typing `PORT your,ip,goes,here,10,0` (NB: use commas, not dots). Of course, don't forget to replace your.ip.goes.here with your real WAN IP address.

Cheers.


This was cool! Thank you for sharing!


Why is this design better than the HTTPS design, where you make an (integrity-protected) request for a file and you get a (framed, integrity-protected) response for that file over the same TCP connection?

I understand the argument for peer-to-peer gaming, but I don't see why for file transfer, FTP's design is better.

The big difference I see is that a separate TCP connection has separate flow control and avoids head-of-line blocking for multiple simultaneous transfers, but for static file transfers, the only cause of head-of-line blocking should be actual congestion, which would impact every connection simultaneously, right? (That is, it's better for the Internet to route all traffic between two hosts, where neither side is CPU-bound, over a single TCP connection so that TCP can do its thing.)

And certainly active mode (server connects back to client) doesn't seem like it has any benefits to congestion or anything else: once a connection is established, it doesn't matter who established it.


Imagine you're not bandwidth-limited and you want to transfer multiple files. One is on Disk A, one is on Disk B, one is on NFS, etc. Sending them all at once exploits parallelism, and the transfer on the faster disk doesn't get stuck behind the transfer on the slower disk.

One file per connection also makes the implementation simpler. If you want to use one TCP connection for multiple transfers then you need some kind of a header and logic to determine start and end of file, and new control messages get delayed in queue behind already-buffered data. If you have a data connection for exactly one file then you just call sendfile() and then close().

And transferring a file with active mode FTP looks like this:

  Client -> Server: I want to transfer a file, connect to 1.2.3.4:12345.
  Server -> Client: TCP SYN to 1.2.3.4:12345
Passive mode requires another half trip latency:

  Client -> Server: I want to transfer a file, passive mode.
  Server -> Client: OK, connect to 4.3.2.1:54321.
  Client -> Server: TCP SYN to 4.3.2.1:54321
A different protocol could avoid the extra latency by having the server always use the same port, but that makes the implementation more complicated again. The server could get a data connection from the client before it gets the associated control message. Now it has to deal with matching them up, another timeout in case the control message never arrives, etc.

It's possible to run FTP over TLS. Hardly anybody does that because it makes the NAT problem worse, because then the NAT device can't even snoop the control connection and map the port. FTP programs could fix this by mapping the data port using something like NAT-PMP or PCP, but I'm not aware of any that currently do.


Unfortunately, these days crypto really is a mandatory feature, and sendfile doesn't work with TLS anyway. (IMO it should - secure communication should be an OS-level service rather than requiring each application to figure it out, just like the OS provides TCP rather than requiring each application to implement reliable connections - but that's another story.)


Hmmm. sendfile is great because it lets the kernel arrange for DMA straight from disk to the network card, in addition to bypassing context switches and copies to and from userspace buffers. If the kernel gains an SSL stack (the encrypted version of the file is going to be different for each connection), you can't DMA and you'll need to make copies into kernelspace buffers to do the encryption on. Is skipping the context switch enough to make a meaningful performance different?


Yes. And I believe Solaris was the first to discover this. Here's a paper Netflix put out covering their first attempt. Measurable gains, hopefully more coming in the future.

https://openconnect.netflix.com/publications/asiabsd_2015_tl...

edit: check this post first

https://news.ycombinator.com/item?id=12037748


If you do it that way you could add hardware encryption support to the network card and be back to using DMA.


Oh, that's fascinating and awesome! Thanks for the links.


> Is skipping the context switch enough to make a meaningful performance different?

Only in (very) high-bandwidth cases and (maybe!) for low-power cases. For most things it's just added complexity for no benefit.


FTP is broken regardless of NAT. It flat out does not work with a default deny firewall policy.


FTP invented in either 1971 or 1985 depending on whether you think RFC 114 or RFC 959 best describes the basic protocol.

Simple packet-filtering firewalls invented in 1988.

In addition you can force passive mode and confine FTP to a range of ports on just about every server and client out there. So your statement that FTP "flat out does not work" is untrue, unless you mean that your firewall is so much default-deny that you cannot even open incoming ports.

However, while it makes me a bit sad, I have to agree that there are better, simpler and more secure protocols now for downloading files. HTTP/2 in particular.


How well do HTTP/2 handle the transfer of whole directory trees?


By sequentially transmitting whole directory trees? FTP-the-protocol never had anything useful to say here. It's always been about FTP-the-client and these didn't need things like "control channels" and "active mode" to do what they're doing.

FTP is a pretty conceited protocol, and one that deserves to go away. People say it made more sense in the 80's, but I was there and even then *NIX/Solaris admins found FTP to be about as much fun as that IRC pingback identd service; which is to say not very fun.


There's webdav, but it seems to me it never caught on for some reason, and it's mostly dead these days.


I tried implementing WebDAV servers some 10+ years ago. Back then this was a major pain with different clients expecting and sending different properties etc. No idea how this looks these days.


I think I last tried it four years ago or so. Not much had changed. I briefly used Apache as a server; I think the server was fine, but the open source clients were flaky at best. I also used to have a hosting account that included WebDAV support, among others, but I never got any client to work with it except for the web interface bolted on top of that thing.

I'm surprised to see WebDAV mentioned here, to be honest, I thought it was a dead horse everywhere except in enterprise.


It's still a little kludgy, but it got a lot better in November when Firefox joined Chrome and Edge in implementing "folder upload". https://developer.mozilla.org/en-US/Firefox/Releases/50#File...


In the use case of kernel.org a folder download feature would be needed, maybe even recursive.


We offer rsync for that purpose.


Good point, I didn't really think of that. I have used wget -r but that's not very HTML 5 lol


How about WebDAV? Is that minimally relevant these days? If not, what has superseded it?

The modern web is much more about routes, resources, and URLs than directories and files.


...and can you transfer data from one server to another without downloading it locally?


rsync will do it nicely.


Sorry, but HTTP/2 is not a panacea for everything, especially for transferring large files. FTP would actually transfers files faster than HTTP/2.


Agreed. HTTP2 is garbage.


The original RFC[1] dates from 1971, before the internet, firewalls, NAT, etc. So, not too surprising.

[1]https://tools.ietf.org/html/rfc114


That's the new and improved version of FTP! Before "passive mode" the old one would try and directly connect with your machine on some random port. If you were behind a NAT firewall you couldn't use it at all.

On top of that there's never been a coherent standard for what directory listings are supposed to look like so there's no reliable way to emit directories in a machine readable format. It's always been arbitrary text.

The whole thing is junk and has deserved to die the day it was born.


>>> The primary problems that the FTP poses to firewalls, NAT devices, and load-balancing devices (all of which will simply be referred to as Ârouting devices and not "routers" since gateway machines generally aren't problematic) are:

this reminds me how some companies such as Nestle use term "chocolate candy" because in order to call it a chocolate, it actually needs to be a chocolate :)

>>>Additional TCP/IP connections are used for data transfers;

>>>Data connections may be sent to random port numbers;

>>>Data connections may originate from the server to the client, as well as originating from the client to the server;

>>>Data connections destination addresses are negotiated on the fly between the client and server over the channel used for the control connection;

>>>The control connection is idle while the data transfer takes place on the data connection.

>What a protocol.

These behaviors thought allows things like FXP, where you can actually transfer files from one FTP server to another without it having to go through you (this is especially useful when those FTP servers are on a fast connection, but you're not).

The part about using random ports, is half truth, by default transfers were happening on port 20. This port was opened on the client (to avoid mix-ups when multiple users are accessing the same server) and server connected to it.

As NAT was introduced this (an in fact NAT made a lot of things difficult, I'm glad IPv6 doesn't have NAT) became difficult so a PASV (passive) mode was introduced, where the server picked a random port and waited for client to connect to it.

The passive mode made firewalling a bit harder, but not that difficult. As an admin you could control the range of ports used by the server, also firewalls such as IPFilter had built-in FTP proxy[1]. It's also possible to do this in other firewalls but it might require an external ftp proxy rules.

[1] http://www.freebsdwiki.net/index.php/IPFILTER_(IPF)_Firewall...


That's only scratching the surface; the problems with FTP are numerous. It's a horrible protocol built on kludges and it really has no purpose in the modern era of the internet. There's already numerous better transfer protocols so the sooner FTP dies the better.


The worst part is the unspecified directory listings. They look standardized but aren't. It's basically impossible to write a parser for them.


What is the most efficient?

What is the most reliable for NAT traversal for all of the various implementations?


FTP doesn't support compression, error correction nor partial transfers. So any protocol that supports some or all of the above will improve efficiency and data reliability (reliability also affects efficiency too)

As for NAT traversal, p2p protocols would obviously suffer the same issues but anything with a stronger client / server relationship would fair much better. Something like SFTP is a better alternative to FTP in those instances.


Given that FTP was mostly used to transfer compressed files, compression in the protocol wools be a bad idea as well as a layer violation. Error correction wasn't really necessary either (and is also possibly a layer violation).


Unfortunately neither of those two statements are true.

1) FTP predates widespread usage of compressed file formats by quite a long time. JPEG, MPEG formats (MP3, AVI, etc), ZIP archived XML documents formats (OOXML, ODF, etc), they're all comparatively new in relation to FTP. GIF is much older than the above but still more than a decade older than FTP and frankly GIF wasn't even that widely used pre-WWW where as uncompressed bitmaps like BMP and PCX were common place. However a lot of text files were copied around in the early days of FTP (hence why FTP has an ASCII transfer mode in addition to binary) and given the modem speeds of the era, compression would have really helped - albeit it's questionable whether the machines were powerful enough to do real time deflating even against text files. That isn't so much a problem these days so it's well worth having compression built into the protocol if just to catch that extra 5%. Even if you think you don't care as an end user, it matters to us sysadmins hosting high traffic servers :)

2) TCP does have a CRC-32 checksum but that doesn't help against failed downloads, memory or other hardware errors etc. A lot of more modern protocols will compare a source and destination hash of each chunk of data and/or the completed file to ensure that the two copies are identical. Frankly, I wouldn't trust FTP for archiving data for that reason alone.


FTP does indeed predate compress (ca. 1984), but I'm unsure of the penetration of IP at the time. UUCP networking was more common and that really only carried 7-bit US ASCII. (Including quite a lot of gifs, I assure you. :-) ) And, yes, on the fly compression would have killed most workstations at the time. My own experience, though, from the late '80s and '90s, was primarily using FTP to get .tar.Z files.

For failed downloads, memory errors, and so on, a hash in the protocol isn't that helpful. You really do want a whole- filled hash check after the file is written persistently. (TCP's checksum is a performance optimization, although it doesn't work very well with wireless networks.)

P.s. "wools"? Stupid android keyboard.


I was on about checksumming the file on disk rather than data in transit. That's how rsync works

I'd forgotten about FTP predating TCP though. Good point.


The really cool modems of the 1988-1992 period implemented proprietary on-the-fly compression on top of 2400 baud. This was standardized as V.42bis.

As a reference, I could usually read a text file faster than 2400 baud (roughly 240 characters per second) but not faster than 2400 baud with V.42bis compression.


The proprietary predecessor was MNP5 (Microcom Network Protocol). The cool thing about MNP5 was that you could implement it on the host PC. There was a DOS terminal program and a FOSSIL driver which worked with my ancient 2400 baud modem (so old it didn't do AT commands) that made BBSing more tolerable.


"TCP does have a CRC-32 checksum"

Where?

TCP has a checksum in the header, but it is neither 32 bits nor is it a CRC.


Maybe I was thinking of the ethernet frame check instead of the TCP header? 32bit did seem a bit high when I wrote I must admit.


FTP is for static files, why would you do on the fly compression? People used to just transfer precompressed files like tar.gz files or .zip files.

FTP has an ascii transfer mode because operating systems have different line endings and this automatically translates for them.


> FTP is for static files, why would you do on the fly compression? People used to just transfer precompressed files like tar.gz or zip files

Indeed they did. But that was when the majority of users were technicians. These days we need a protocol for layman.

> FTP has an ascii transfer mode because operating systems have different line endings and this automatically translates for them.

I know, I left that part you because it wasn't relevant to my point but I'm glad you raised it as it's another feature that isn't necessary these days because, aside Notepad.exe, every text editor on the planet can handle different line endings. Thus ASCII mode is just another point to break files for inexperienced users (I'm not really happy with the kludge that some clients use either; by using the file extension to guess the correct method of transport)


FTP does have some support for partial. One can request a specific start point in the file


Have you ever tried resuming a failed download using that feature though? Support for it is so bad that it's usually quicker to just redownload the whole file from scratch.


I have actually, at work I made a tool which has a SQL server backup/compress into a folder which I then download via FTP to restore on another server. I was having connection issues so I made the downloader reconnect/resume on disconnnect, fixing the issue


Some FTP servers support compression: http://www.proftpd.org/docs/contrib/mod_deflate.html


There's edge cases for nearly all of FTPs failings but none of them are employed as part of the default standard protocol. Which means nearly everyone ends up falling back to the lowest common denominator.


> FTP doesn't support compression

The FTP standard (RFC959) actually includes RLE compression (see section 3.4.3).

(Almost no one implements it, since RLE is a pretty poor compression algorithm.)


There's also this amusing article about FTP: http://mywiki.wooledge.org/FtpMustDie


FTP may look crufty today but it was an excellent protocol for its time, and offers capabilities you can't really get with any other current protocol.

Really, the problem is that NAT is a terrible idea and screws up the idea of a fully end-to-end Internet.


FTP had been superseded by a wide range of better protocols for a good 20 years now. It doesn't offer anything that hasn't already been reinvented and improved upon in other protocols. It's about time people stopped defending FTP and let it die. After all, we wouldn't be having the same conversation about recommending telnet or rsh instead of ssh. Or teletypes over HD monitors. I'm not somebody who advocates the latest technologies for the sake of being modern (eg I still rock IRC on Irssi) but sometimes there are quantifiable good reasons for switching away from older tech.


> FTP may look crufty today but it was an excellent protocol for its time, and offers capabilities you can't really get with any other current protocol.

This is some weird astroturfing of the FTP protocol's goodwill, but it was never particularly good. Even for its time it's a curiously conceited protocol that is remarkably complicated without doing very much that a similarly intelligent RMI-style interface couldn't have worked with.

And in practice, many FTP clients basically left the control channel concept behind and began to use it as if it were streamed RMI, opening up multiple connections for transfers to help users deal with even 1980's network environments.

It stands out as complicated for its time.


This is tangencial but I don't think gumpy's comment could be considered "astroturfing".

https://en.m.wikipedia.org/wiki/Astroturfing


Almost all downloads these days are over HTTP or native apps. Probably for the better that we let FTP go. If we do a custom one, its should have benefits the popular solution doesn't have. Maybe performance (eg Tsunami, UDT) or security (stuff with TLS). Ton of solutions for availability with existing stuff so I leave it off.


The big thing we seem to be missing (likely because "everyone" is using some kind of _drive service) is uploading.


HTTP does support uploading. But even that aside, there are plenty of better protocols for uploading than FTP. Eg SFTP, p2p, heck even cloud storage services if you want something easy for the layman.


There's always rz!


> while kinda neat and convenient, offering a public NFS/CIFS server was a Pretty Bad Idea

\\live.sysinternals.com\tools says Hi


That's WebDAV, not CIFS. You can also access WebDAV paths with the UNC notation.


I pretty much avoid it, even on sites that need it for wordpress you can use SFTP instead with the use of the wp sftp plugin


Why can't you run FTP on a CDN?


FTP uses multiple connections that need to coordinate. It only works if these connections are from/to the same machine which is difficult to guarantee using a CDN.


Kinda sad, as i often fire up ftp urls when looking for tar-balls of source releases.


IMHO stupid decision. There are many mirrors, that will be affect. FTP has it's place.


They're the ones that will have the logs and information and will know how the service is and isn't being used. Not you, (unless you happen to be an actual kernel.org admin?)

Just because it doesn't make sense to you, does not make it stupid.


Mirrors almost universally use rsync, not FTP.


I mean FTP mirrors, that offer an 1:1 FTP directory mirror.


Literally no one does that. rsync-over-SSH is often used for this, sometimes plain rsync, sometimes custom stuff running atop SFTP (which is a completely different thing from FTP(S)).


1998, was so long ago. I was 8 years old! The Kernel was released the year I was born. Then, in like 2011 I first heard of Linux and downloaded Ubuntu 10.


Read the whole paragraph, the 1998 was about mounting public NFS share.

Now they want to discontinue FTP. Two different things.


The Linux kernel was released in 1991. If you were born in 1991 you were 6 or 7 in 1998.


I was working at a high school as systems/network administrator using NT 3.5/4., Novell Netware 3,4,5 and SuSE linux 5.x. Doesn't seem so long ago to me.




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: