Hacker News new | past | comments | ask | show | jobs | submit login
Self-Contained Pure-Go Web Server with Lua, MD, HTTP/2, QUIC, Redis Support (github.com/xyproto)
216 points by Propolice on April 5, 2019 | hide | past | favorite | 60 comments



I wonder why they didn't include Let's Encrypt integration - it's completely painless using the acme library, and that would prevent the whole "HTTP or HTTPS?" discussion around HTTP/2


It's in progress. Algernon is an open source project where I am the main contributor, and I develop Algernon in my spare time. Pull requests are welcome.


I'd love to help, but my coding time is already taken building a product.

I pretty much followed the instructions here: https://godoc.org/golang.org/x/crypto/acme/autocert

edit, better here: https://blog.kowalczyk.info/article/Jl3G/https-for-free-in-g...

I didn't believe it could be that simple, but it worked first time and has proven really robust.


It is even easier: https://github.com/mholt/certmagic

Edit: Taken from certmagic docs

Instead of:

// plaintext HTTP, gross

http.ListenAndServe(":80", mux)

Use CertMagic:

// encrypted HTTPS with HTTP->HTTPS redirects - yay!

certmagic.HTTPS([]string{"example.com"}, mux)


"Files that are sent to the client are compressed with gzip, unless they are under 4096 bytes."

That's interesting. Is that a common optimization? I hadn't heard of any other web server doing that.


I did some quick benchmarking and for files under roughly 4096 bytes, not compressing with gzip was faster.

It's not terribly exact, +-1000 bytes would probably not make a big difference, but I think it's a good default.

And of course, some people may have unique use cases where a custom threshold may be better.


Nginx uses a similar optimization with a configurable threshold defaulted to 1kb.


Correct, the NGINX configuration should be something like:

    gzip on;
    gunzip on;
    gzip_http_version 1.1;
    gzip_proxied any;
    gzip_comp_level 5;
    gzip_disable "msie6";
    gzip_vary on;
    gzip_min_length 2048;


I heared somewhere (Some blogs comment section, I believe) that gzip actually reduces the security of HTTPS; maybe someone can confirm / explain that?


Compression technologies, including gzip, obviously have the goal of making things smaller by predicting later data based on earlier data. If the later data looks more like the earlier data, the result is smaller than if it was random gibberish. Compression!

If an attacker controls /some/ of this data, and would like to read /other parts/, they can abuse compression to measure whether the parts they don't know are "like" the part they control, because if they are then the compression will make the results shorter than otherwise which they can passively measure.

It's not a problem to move a compressed object over a secure channel on its own, the problem arises if either you try to compress the channel which is moving objects from different origins (e.g. a cookie set by a random advertising web site and your Facebook password) or compress a composite object e.g. maybe your backups mixed with a file you downloaded from a dodgy "pirate" video site.


In scenario when an attacker can see encrypted message (e.g. monitoring network traffic) and can affect part of the message that is being encrypted, he can use the compression to his advantage. He can for example try different inputs and observe the length of encrypted message, if with the certain input the length drops that means the given input contains string that's similar to another part of the decrypted message and the compressing algorithm did its job and used that to reduce the size.

This was mentioned in 2012 when CRIME[1] (also included BEAST[2] exploit) and later BREACH[2] vulnerability (when it was considered cool to come up with cool sounding name, creating a logo and a website for specific vulnerabilities)

[1] https://en.wikipedia.org/wiki/CRIME

[2] https://en.wikipedia.org/wiki/Transport_Layer_Security#BEAST...

[3] https://en.wikipedia.org/wiki/BREACH


This shouldn't be true in any sane system.


I'd be surprised if it didn't exist in every compression middleware.

For example, https://github.com/expressjs/compression/blob/dd5055dc92fdea...


I don't see anything like that documented for apache's mod_deflate/zlib.


Seems not, but there is for mod_gzip for example. I'd definitely consider it common.


Also, according to the spec HTTP servers may not always honour the value in the `Accept-Encoding` header[0].

> Even if both the client and the server supports the same compression algorithms, the server may choose not to compress the body of a response, if the identity value is also acceptable.

I've actually run into this twice in my career and it has been a surprise to those around me in both cases. Both times in the context of small payloads where the server is applying some heuristic about whether to encode or not. (e.g status page stops sending gzipped output when the server is becoming "unhealthy")

[0]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...


Makes more sense to use the verb "negotiating" with Accept-* headers rather than "honoring".

This makes obvious sense once you consider that the client tells the server which compression formats it supports in every request yet not every data format is compressible, nor does the server necessary support any candidate compression format.

For example, the server wouldn't gzip a jpeg since it's already compressed.

All Accept-* headers are like this. e.g. the server doesn't necessarily support any of the languages requested in the Accept-Language header, but it doesn't hurt to ask. You always have to inspect the response headers to see the result of negotiation.


"Accept-Encoding" means only that the client also understands specific encoding (in this case compression) it is still up to the server to chose what to dot. There was a time when browsers didn't support any compression. This header was introduced to signal to server what is acceptable by the client, that's why the header allows specifying multiple compression algorithms.

Similar thing is with header that's quite useful, but for some reason very few sites honor it: "Accept-Language" browser can specify which languages are preferred, but it is up to server to honor it (for example given language version is not available).


Cloudfront is similar [1]:

> The file size must be between 1,000 and 10,000,000 bytes.

[1]: https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...


Tomcat, one of the most used Java application servers and the default server behind Spring Boot applications defaults to a "compressionMinSize " value of 2048.

The docs do not explain why.


Not a direct answer, but also interesting to consider:

As everything will end up in a packet when sent through the network stack, you might want to choose your minimum input size in such way, that you generate gzip-compressed output big enough. Why big enought? Nagle's algorithm [1]

So yet another reason to think about 'what to gzip'.

[1] https://en.wikipedia.org/wiki/Nagle%27s_algorithm


Applications that always know exactly what they want to send disable this algorithm, as that article explains, by setting TCP_NODELAY or its moral equvialents in their framework. A web server will almost invariably set TCP_NODELAY.

More sophisticated algorithms can either decide exactly which packets to send, or use TCP_CORK to shove part of a packet into a buffer before they add the rest of the stuff, e.g. preparing HTTP headers and then adding the static document that goes after them.


If you read the page you linked, it says the algorithm applies to data of any size.


I don't know about "under 4096 bytes" but I have heard of not compressing data that is under ~1500 bytes. Part of the thinking is this - if your result data (plus HTTP overhead) is already smaller than the data payload of an IP packet (MTU settings come into play here), then you are spending CPU time that will not save you any network I/O time.


Yes, also compressing and decompressing a small amount of bytes may take longer than just sending it uncompressed.


This is quite impressive, but this claim is a bit wrong:

> All in one small self-contained executable.

Size of algernon executable: 24.4 MiB

Size of nginx-full executable: 1.1 MiB

Size of apache2 executable: 648K


For self-contained architecture-specific server binaries, there is no practical difference between 240KB, or 2.4MB, or 24.4MB, or even, at a stretch, 244MB. It's not worth mentioning or optimizing for, except as novelty. I wish people would stop golfing with these numbers.


You have it backwards, apache or nginx size is not for the novelty. Go is just a pig and its size grows every new release, and the size isn't really for being statically linked or debugging symbols, because it's huge even when those options are disabled.

Right now a "hello world" application in Go has comparable size to an OS with full GUI.


You would need find an application where the binary image size is a problem. In an age where 1TB SSDs are 300 USD this will be ... challenging. You are able to have about a thousand gigabyte sized images on that one SSD and I suspect you will hit CPU and I/O limits well before you have a thousand different binaries running for real.

Media storage would go on spinning rust disks anyways separate from the SSD(s).


The consideration isn’t binary size, but hot-loop cache coherency. Big binaries cause code-segment cache-line eviction.


Then that's a typical case of optimizing on the wrong metric.

Find a case where it's actually too slow, ok, but saying "A does X, and X can lead to Y, therefore A does Y" is wrong.

If you say the size of the binary is an issue, then give the issue, not how it could (or not) be an issue.


We observed noticeable difference in deployment time and auto scaling based on container size.


Who cares? Literally, what problem does it cause, or what does it make worse? It is totally immaterial.


- increases amount of time to fetch and run container (it actually is quite noticeable when you have app that scales out and you updating it)

- it increases amount of storage to store multiple versions of containers (when you have an internal app and do frequent releases it adds quickly)

- it increases amount of data transferred on every deployment

- increases amount of memory used (the whole point of containers, was to efficiently use hardware (BORG), although a lot of people today miss that reason and run containers on VMs)


True. Luckily, most Go binaries can be upx'd ( https://upx.github.io/ ) for a fraction of their original size. Just put it into your Dockerfile as a part of the build process.


This works and helps with storage/transit. Word of warning though that many AV like to flag UPX'd executables (only important if it's not an internal tool), and it'll take even longer for it to start, and will use more memory. My understanding is it essentially does standard compression on the executable and appends the decompression part on the front of it.


With UPX the entire docker image with Algernon takes only 9MB.


You have to ask yourself what is that space used for??

They're all compiled languages doing (roughly) the same operations. There is an order of magnitude difference in the number of instructions in one compared to the other.

Apache is more likely to be entirely cached whereas the others aren't.

Size matters for performance.... if you're not CPU bound, fine, but to say its immaterial is naive.


Even if we don't know which particular problem it might cause right now, being wasteful when you can avoid it is never a good idea.

Image size is currently not the most important metric, but - guessing how your average node.js package already looks today - demanding people ignore it completely will probably set you on the road of multi TB images that also contain the developer's favorite desktop environment in the medium future.


Algernon does a bit more than plain Nginx or Apache. 24.4 MiB also includes bloat. See: https://github.com/golang/go/issues/27266 https://github.com/golang/go/issues/2559

Hopefully we can get smaller binaries by Go 1.13.


My Apache installation with modules takes 4.3MB, and I'm quite sure Apache with modules can do more than Algernon.


If that's the main criticism you have of the project, I'd say that's pretty good.

In fact, the readme of this project is really thorough!


Very civilized comment from mr. Caddy himself. Heartening, not least considering the kind of vendettas some projects - Caddy among them - have to put up with from competing developers.


Well yeah it's definitely an interesting approach. I guess it's much more lean than running a whole Ubuntu container VM that has all of these things installed, or running a gigantic bloated Javascript toolchain just to convert the scss or md files to css or html.


A big part is almost certainly the difference between dynamic and static linking.

Can you run ldd on all of these and then report the combined size for each binary+libraries?


As someone else commented, the huge size of Go executables is down to a design decision to include a map of functions for panic reporting. There was a whole discussion on this recently on HN.

I don't know why the grandparent was downvoted. Go binaries are not small and the claim that this is a "small" single executable is untrue.

Hopefully the Go team will give us a flag to decide for ourselves whether to optimise for executable size or initialisation time. I know I'm fed up of uploading 50Mb files over dodgy wifi+vpn connections to update my server.

(edit fix repetition of design)


You can actually pass LDFLAGS to any go tool (e.g. go build[1]). The flags are specified in the linker documentation[2]

[1] https://golang.org/cmd/go/#hdr-Compile_packages_and_dependen... [2] https://golang.org/cmd/link/


which is useful, except as far as I'm aware it doesn't cover this case.

The original is at: https://science.raphael.poss.name/go-executable-size-visuali...


That's one thing that might motivate me to set up CI/CD on a server.


I don't have nginx installed (got the number from a web download of the .deb package), but running this for apache:

$ echo $((`ldd /usr/sbin/apache2 | cut -d">" -f 2 | sed "s/(.*$//;s/ //" | xargs du -L | cut -f 1 | sed "s/$/+/" | xargs echo` 0))

3200

So 3.2 MB of shared library dependencies. 1.8 MB just being the libc which is almost guaranteed to be used by a different program already.


> 1.8 MB just being the libc which is almost guaranteed to be used by a different program already

Not if we're talking about containers :)


I don't think it is fair that you are being downvoted.

Size does matter and not just in sense that it is using resources. The largest part of the 24 MB probably never gets executed but it adds unnecessary complexity that may hide bugs and security flaws.


Algernon would be a lot smaller if it was not statically compiled and compiled with gccgo instead.

Sadly, the Go package that provides support for QUIC does not compile with gccgo, yet.


apache2 is mostly modules that are loaded at runtime.


On my FreeBSD machine, all files in the entire apache package (including modules, manpages, headers, default pages, graphic for displaying directories in gif and png formats, tools are) take 4.3MB



How does this compare to OpenResty?


I suppose this one offers an all-in-one package, while openresty is really just an nginx server with builtin Lua(JIT) support.


What is the benefit of using this? In what scenario would this excel? Thanks.


Good question. I'm not sure if it excels in any scenario. There are specialized web servers that excel at caching or at raw performance. There are dedicated backends for popular front-end toolkits like Vue or React. There are dedicated editors that excel at editing and previewing Markdown, or HTML.

I guess the main benefit is that Algernon covers a lot of ground, with a minimum of configuration, while being powerful enough to have a plugin system and support for programming in Lua. There is an auto-refresh feature that uses Server Sent Events, when editing Markdown or web pages. There is also support for the latest in Web technologies, like HTTP/2, QUIC and TLS 1.3. The caching system is decent. And the use of Go ensures that also smaller platforms like NetBSD and systems like Raspberry Pi are covered. There are no external dependencies, so Algernon can run on any system that Go can support.

The main benefit is that is is versatile, fresh, and covers many platforms and use cases.

For a more specific description of a potential benefit, a more specific use case would be needed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: