Hacker News new | past | comments | ask | show | jobs | submit login
SFTPGo: A Full Featured SFTP Server in Go (github.com/drakkan)
242 points by ngaut on July 26, 2019 | hide | past | favorite | 57 comments

This looks like a very interesting project to make a lightweight sftp server, though the dependency on an SQL db makes it a bit harder to install.

Go has a really good set of crypto primitives, an excellent ssh implementation maintained by the core team, and a great sftp server and client library which makes this project possible.

I recently added an sftp server to rclone: https://rclone.org/commands/rclone_serve_sftp/ - this can serve any of the cloud providers rclone support as sftp (or local disk). This runs on windows/macOS/linux too.

It was a joy to add this as the Go libraries are very well thought out and easy to use.

Not sure the case in this context, but usually the SQLite dependency can almost be seen as no-dependency at all - it's just a file on the disk without any external dependency.

On the other hand it requires a C compiler, so it's definitely heavier than a pure go application

Serious question, why hasn't anyone rewritten it in Go? Been awhile since I've looked at it but C to Go doesn't seem that difficult to translate.

Edit: just looked at the amalgamation again, it's huge... Still surprised no one has tried it yet that I know of. Pre-processor directives would probably make it even more difficult though.

Edit2: and it already works so what's the point, other than an extremely challenging problem

Sqlite has been rewritten in both Go and C#.

It's really not that hard. The code is clearly documented and everything works in roughly the same way. Easily achievable by a single developer in a few months.

Sqlite code is emulating classes with structs and methods (they're even called classes in the code). This it's pretty easy to translate into languages with reference and struct support.

Memory allocation is separated into an allocation provider, which is easy to implement in GC languages (just create the struct requested).

The parser and VM opcode file is tricky. The opcodes are parsed from comments in one big file. I'm not a fan of that. But once generated for a particular version, pretty easy to translate.

Testing is another can of worms. Translating the exact version of Tcl, too is the best course of action. This is required to run and pass the (hundreds of thousands of) base line tests. Other correctness tests (in sqlite because of branch coverage requirements for aircraft software) are harder to translate. The prerelease burn in test is also not publicly available.

Preprocessor directives are not used as macros. Rather they are used to enable features. In a port, you can for example choose to exclude WAL or specific optimizations. FTS can be left out if not needed. The different lock implementations can be deferred to the stdlib.

If you interested in the perf overhead, here's a small benchmark of a C# implementation I maintain:


Can you point to the pure Go re-implementation of Sqlite? I was looking for such a thing but never found it. Instead I had to go with a k/v store like Bolt.

That's not it. It was a GitLab project, but searching for that is pretty difficult. I'll edit this comment if i find it.

Edit: Here it is: https://gitlab.com/cznic/sqlite

Sqlite databases are often not shared between applications. Do you know a similar sql, in process database in pure go (even if not completely similar feature wise ?)

If you were planing on using this on a PaaS host where the file system is ephemeral this would be a non-starter.

What would the RL use case be for running this server, whose purpose is to facilitate transferring files from one computer's filesystem to another, on an ephemeral filesystem?

If what's meant here is that it would be hard to run this on a container platform because of the ephemeral nature of containers, assumably you would be using persistent volumes to keep your state safe across the container lifecycle.

That being said if you really wanted to use sqlite for the database, you'd do the exact same thing, use a persistent volume.

I wrote an interface that would allow SFTP transfer to Azure Blob Storage (doesn't support it) that doesn't store anything on the machine running the service. It was really just supposed to be a frontend.

Does it need to be relational? Perhaps boltdb could do it


I’ve been looking for something precisely like Bolt, thank you for the link!

Then you might also want to check out BadgerDB. I can't say which is better, but I know that both are popular and in production use, so probably each with their pros and cons (e.g. read vs. write speed).

Thanks for the kind words Nick... And thanks for rclone. :)

"though the dependency on an SQL db makes it a bit harder to install"

There's only one table..."users". SQL seems like a bit of overkill for this. Maybe there are future plans to save more state.

A lot of the world runs on sftp. The flexible user management in this code would have saved my team about a month of time implimenting our own solution.

Part of the utility of SFTP to clients and admins is the fact there are no additional requirements other than SSHD to function.

This seems to fly in the face of that.

If you are willing to install more software, why not use a more feature filled file server?

OpenSSH is very much tied to system users, but sometimes you might want to give access to external users that don’t need to exist as actual unix users.

because SSH and SFTP are so closely tied together, a configuration via PAM is pretty hard and inconvenient because creating fake users via PAM for SFTP will also create them for SSH and because there’s no easy way to map all such virtual users to the same user-id. Also, because OpenSSH has zero support for virtual users, aside of a PAM configuration, you also need an NSS configuration and now all your virtual users in some database have suddenly become system users on your box.

SFTP as a protocol on the other hand is very convenient over, say FTP over TLS because it’s using a single TCP port and it has been created this century.

So having this self-contained project is useful when you need to allow third parties access to files but you also don’t want to create system users for them or risk f’ing something up with PAM

I'd second this, I've longed struggled to come up with a easy-to-deploy sshd/sftp/chroot configuration that permitted easy database-driven configuration w/o extra shell access. You have to fight a lot of defaults to get this just right.

Would the OpenSSH upstream accept patches for an unprivileged sshd/sftp-subsystem to make this easier to use their battle tested code?

If you just want to share files and not giving the option to upload files, you could try graft...


graft serve ./*.txt

will serve every txt file in the current dir...

For that use case, I just use `python3 -m http.server` (or the Python 2 equivalent), especially since Python is almost always already installed.

Cool... is SSL support for secure file transfer?

If you don't mind, I have a question about graft.

> graft will prompt for a password, run an sftp server and promote it via zeroconf.

Is that a one time password that will be used by the "receiver" to download the file?

No, it is a static unchangeable password. Graft can't "manage" user accounts, it just has one user with password. It does not support keyfiles or other authentication mechanisms.

I wrote graft to have a simple portable tool for transfering files in a network without shares - the main idea behind it was to run:

graft serve myfiles/*.txt

on the server side and then

graft receive

on the client side without having to remember the ip or hostname - because zeroconf / mdns is used, it will find the server automatically, if the network is not too big. If there is more than one server, it will prompt you to choose the right one.

I only used SFTP, because it is a secure way to transfer files over the network.

Proftpd can do sftp-only out of the box.

on the other hand, its SQL backend modules insist on fetching a password and comparing them using their own functionality and there's no way to get to the the user's typed password into a custom query.

That means that there's no way to authenticate users if you use a password encryption scheme not supported by the bundled modules (like bcrypt for example)

I am not sure why I didn't realize that sooner, but that's awesome, thank you for pointing that out.

Seems like a use-case for just wrapping regular SFTP in a virtualized userland with definable hooks for things like PAM’s user data lookups. I believe that gVisor does this?

Thanks for the explanation! The OP's question was the first one that popped into my mind too...

To be fair, it is written in Go - a language where a lot of projects built can just be single, portable binaries with no dependencies (as this project appears to be).

With this solution you can easily create fake account just for sftp, and the installation is very simple. In other words, it solves the biggest security problem with sftp accounts - what you usually want is only to give people access to a few files, and not to give them full system accounts.

I actually had to implement something very similar. The use case was an SFTP upload interface for clients that they would use the same login credentials they use for the web portal to upload to unique subfolders in Azure Storage.

What do you recommend?


A lot of more advanced servers are either (1) proprietary (2) unmaintained (3) hard to use (4) require a Linux VM or all of the above.

I agree, unless some robust solution is needed - I prefer just the daemon and using the CLI to push/pull content

I did a project a while ago for fun using go to implement an sftp server for a dropbox account. (Though I used openssh for the ssh server).


Go seems like a really good fit for this kind of project

Yeah, I'm hoping to see use of Go routines in useful rewrites of projects like rsync with parallel file uploads. It can lead to clean, extensible, and maintainable code with still low overhead and high throughput.

It's in C again, but since you mentioned rewrites and rsync: https://github.com/kristapsdz/openrsync

I've been avoiding starting a project myself which will require SFTP. Passing text files around via SFTP is how a lot of the world works. It's still painful to implement this, and makes the transfer of data slow (polling for new files to process etc). This introduces a big delay in any data feedback cycles. Managing users with system accounts is painful too.

This project looks to alleviate a lot of these problems.

It'd be great to be able to register webhooks so that it could send events to external systems. Ideally I'd like to know when a file is created/deleted etc without having to walk directories on the sftp server on a regular basis.

Events are now supported, you can configure custom system commands and/or HTTP notifications on SFTP upload, download, delete or rename

Yes, if you can avoid polling by getting events it can be a good system. But really you want to be notified when a file is closed successfully, not closed after a timeout of other error.

For me, in a previous system I uploaded as a .tmp file, then the uploaded renames to .zip at the end of the upload. The rename is atomic so no chance of the consumer reading a partial file.

I was dealing in CSV, TSV and XML files. I chose to upload them all in a zip wrapper not only because they compress well, but also because each file has a CRC and the file won’t be extracted if the file is damaged or truncated. It’s hard to tell if a raw CSV file has been truncated.

If you use AWS, there's a managed S3 SFTP service. Solves the availability and notification problems.

They way overcharge though. ($300/month)

just drop files with guids into stage dirs and send all staged files over to another stage dir on the other side. When successfullly archive on both ends of the transfer.

Can this project be used as a library and integrated into an existing program?

I am asking, b/c I am currently working on a little document management system for home-use. It looks like the only sane way to integrate with document scanners is (S)FTP upload or Email (SMPT). I have tried to use off-the-shelf FTP servers for this and inotify to get notified of new uploads. The the solutions work OK, but are hard to setup and rather brittle (and limited to Linux).

Go seems to be the ideal language for this project, because its concurrency model and the ease of distribution (as single binary files).

- Has anyone here experience with building systems like this (integrating via FTP/SMTP)?

- Any recommendations for languages and libraries?

The linked project seems to be a wrapper around


Ah Thx! Should have checked the code myself. The library looks good. Not too much documentation (https://github.com/pkg/sftp/issues/267, https://github.com/pkg/sftp/issues/247), but straight forward and battle tested.

You might need to double check your requirements — SFTP is a very different protocol from FTP/FTPS. Using SFTP for something like document scanners without strong Unix roots would be surprising to me, at least.

I’m currently architecting a transport administration system written fully in Go which includes XMLEDI over SFTP; this project is perfect as we need to integrate with the current IAM.

I was looking for something like this when I was asked to get an FTP working on kubernetes...

Don't ask why...

Go is a very convenient option of you have an API agreement based on dropping files on an SFTP.

Using JSON for configuration: Fail

What would your preferred config file formats be, and why?

Any format which enables inline comments, so INI, YAML

SFTPGo switched to viper for configuration so it now supports JSON, YAML, TOML, env vars ecc..

The configuration format it's a your choice

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact