Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Open-source non-blocking NIO Java HTTP Server (github.com/fusionauth)
59 points by mooreds on Oct 24, 2022 | hide | past | favorite | 37 comments

Another person commented the same, but a "project loom" virtual threads based HTTP server implementation would be very desirable right now. A new NIO based server probably doesn't add a whole lot of value to the ecosystem. But note, I'm not discounting the effort here.

I do like the fact that this project attempts a no-dependency implementation. That's nice. And of course, there's always room for one more http server implementation, if it offers some alternative to existing solutions (throughput, api, processing model, etc.). But any "new" http server projects wanting to start from scratch today would be wise to base its implementation on virtual threads (in my opinion).

Thanks for the feedback. Full disclosure, I work for the company which open sourced this project.

I haven't been following loom closely, but it appears there are still pieces being put in the Java SE for it and it hasn't been fully delivered.

For instance, in java 19, virtual threads are still in preview[0].

If you wanted to build a web server running on a LTS of Java with loom, you'd have to wait until Sep 2023[1] (though of course you could write code against versions 19+ too).

I think it'd be great to file a feature asking for loom support[2], but do you really think loom is ready to be the foundation of a prod ready web server right now?

Sorry if that sounds like FUD, but I truly don't know. the two projects[3][4] linked in the infoq article are explicitly experimental or demo apps.

That said, I did find this thread of projects that support loom now[5].

0: https://www.infoq.com/news/2022/09/java19-released/

1: https://www.oracle.com/java/technologies/java-se-support-roa...

2: https://github.com/FusionAuth/java-http/issues

3: https://github.com/rokon12/project-loom-slides-and-demo-code

4: https://github.com/nipafx/loom-lab

5: https://twitter.com/nipafx/status/1567448335367151616

Not FUD, these are good questions. In my opinion, since you asked, writing an http server using Loom (preview mode and all) is the only reasonable way to get ahead of the market, so to speak. It's not like I am an insider or have any special access, nor do I have any personal proofs to my statements, but I do believe that Project Loom / virtual threads will hold up well in production. I think there will be lots of interesting improvements and use-cases that will be drawn out of loom, and using it now in preview mode would be a reasonable way to get ahead.

That being said, yes everything is very "experimental" right now. But a start-from-scratch no-dependencies project such as yours (or anyone else's for that matter) might be the best approach versus trying to retrofit an existing server.

I wrote most of the server code for the project and I actually looked extensively at Loom.

We decided to anchor the project to LTS Java versions only. The issue with Loom is that as a preview release, you can't use it without introducing compiled code that is hard to deploy on new LTS versions. We ran into this with Java 14 and used a bunch of the preview features in that release. When we upgraded to Java 17, it caused a number of issues and we had to rebuild almost everything.

Once the next LTS has been released and Loom is top-level, we'll definitely look at using it as long as the Java community is willing to make the jump with us.

In the meantime, I think our threading implementation works quite nicely and scales really well. Let me know what you think if you review the code.

> If you wanted to build a web server running on a LTS of Java with loom, you'd have to wait until Sep 2023[1]

People who want LTS inherently do not want change. I highly doubt they would consider adopting this project.

> do you really think loom is ready to be the foundation of a prod ready web server right now?

I think the same can be said about java-http. It’s not even on a 1.0 release. Why choose it over battle tested Netty or Jetty?

I think that's actually what most people are waiting for with Loom. They want it to be fully baked into the JDK and ready for production first. Then they will start using it.

For java-http, once it has been tested a bit more, we'll likely release a 1.0.0 version, but it's already in production, so it works right now.

In terms of Netty or Jetty, why pull in all the dependencies and overhead if all you need is an HTTP server? java-http solves the 25 year old HTTP server problem we've had in Java.

In the past, you either had to learn Netty or use Jetty/Tomcat/JBoss/Glassfish/WebLogic/etc. In my opinion, these tools are complex, bloated, and legacy. Most other platform have a simple HTTP server (Node, Ruby, etc). Java has lacked this for a long time and we've been forced to use things like Tomcat when we didn't need JEE or WARs or the complexity.

The JDK team plans to add one in version 20 or 21, I think. But they have specifically stated it won't be production quality (ugh). Not sure why they made that decision honestly.

Need a simple HTTP server that only takes a few lines of code to start, is production quality, requires no dependencies, is crazy fast, scales well, and is only 140k? No problem. Just use java-http! :)

> a "project loom" virtual threads based HTTP server implementation would be very desirable right now

Oracle is doing this with Helidon Nima: https://helidon.io/nima

If you’re only getting 500qps out of Netty something is deeply wrong with your set up. I’ve written multiple HTTP & socket servers based on Netty (albeit 6+ years ago) and they all could handle 20kqps+.

Probably a combination of Netty having a small default thread pool size (2*cores) and the benchmark doing something blocking or compute intensive. Tomcat defaults to 200 threads, which would explain the difference.

Edit: Though the page says "The controller does nothing except return a simple 200".

Yeah really doesn't add up - 2 ms to just return a code? What could it possibly be doing with so much time.

The Nagle algorithm interferring with a small packet size, perhaps?

The 2ms delay would be just about right.

I ran the load tests and couldn't explain it either. I adjusted the thread pools, buffer sizes and a bunch of other parameters and couldn't get Netty to scale.

I think Netty tries too hard to be everything to everyone. This makes it really hard to determine how to configure it properly across a bunch of different versions with lots of incompatibilities.

I wrote java-http with the concept of not doing that. It's purpose built for HTTP and high performance.

Once I have some time, I'll publish my Netty setup and let the community bang on it and see if they can beat my RPS. At 65k, it might be hard though. :)

What's the hardware being used for your test? I get 55k RPS with a basic 200 responder with zio-http[0] (which uses Netty) on my i5-6600K, and over 20k RPS for an e2e POST endpoint that does write batching to postgres (committing the insert before responding to all of the clients in the batch with with their individual db generated ids). Postgres, client (vegeta[1]), and the app all on the same machine. I think that was with keep-alive, I think like 256 clients for the basic responder and 1024 for the one that writes to the db. There's a recently merged PR for zio-http that does 1M RPS on whatever machine they test on[2] so Netty can absolutely scale to high RPS.

[0] https://github.com/zio/zio-http

[1] https://github.com/tsenart/vegeta

[2] https://github.com/zio/zio-http/pull/1659

Would love to see your set up!

Sounds good. Once I get the project published, it will include all of the load tests for each server as well as the setup and code for it all. Might be a couple of weeks or so, but it will be a separate GH project. Something like java-http-performance.

If Netty get you that number you should spend the time understanding why because your benchmark is 100% wrong.

It's a bit hard to beleive that one of the most used and fastest Java framework is chocking with a trivial test.

I thought so as well. See my comment on the other thread about Netty. I'm sure someone that is a Netty expert or committer could figure it out, but it's so complex that it makes it nearly untenable.

Regarding the venerable Tomcat, they [somewhat] recently added support for Unix domain sockets.

* https://github.com/apache/tomcat/pull/402

* https://github.com/apache/tomcat/pull/532

We fronted the server with haproxy LTS. Our initial testing showed roughly and order of magnitude [10x] increase in the number of requests the server could handle.

It's not completely plug-and-play; we still had a write a custom valve to set the request remote ip address and some other TCPish stuff, but nevertheless the capacity far outstripped our need for the technology.

I am interested in the nonblocking and parallelism of modern server architectures.

I am curious what server architecture leads to the best performance. There was an interesting post the last few days about epoll being broken due to thundering herd.

My understanding is Apache is a forking web server whereas Nginx is an epoll event looped.

I wrote an epoll echo server that multiplexes multiple connections to threads and uses a multiconsumer multiproducer RingBuffer to send sockets to different threads.


My plan is to also parallelise send() and recv() on different threads for sending and receiving in parallel with different threads.

I am yet to use NIO in Java but I've heard Netty is high performance.

I would like to use io_uring for maximum performance.

One of my hobbies is trying to build a multithreaded architecture that can support full parallelism and scale with hardware.

Nodejs and python rely on process based parallelism but C, Rust, Java and C# can provide real parallelism. I work on various multithreaded problems on my GitHub.

I am yet to add network support to the algorithms in this repository.


Bit awkward timing with project loom looming in the future

Aren't loom and nio somewhat orthogonal?

Agreed that the threading parts of this project might need to be looked at/reworked when loom is released.

More discussion here: https://www.reddit.com/r/java/comments/nsb2w1/will_loom_rend...

Isn’t the point of project loom that you can just use pseudo-blocking IO using the traditional thread API on a fiber scheduler, making NIO in application code obsolete?

I haven’t try loom yet but the thread api looks untouched ? That was my understanding.

But yeah … opening this project I was scanning for Java 19 / loom reference

Fair enough. I added a PR to clear up why Loom isn't used: https://github.com/FusionAuth/java-http/pull/1/files

Aren't loom and nio somewhat orthogonal?

No, they are virtually the same abstraction.

Actually, Loom is about threading and helps support NIO. You'll still need Selectors, Channels, and ByteBuffers with Loom, you'll just be able to pass off the parsing and handling to a Fiber. You might be able to get away with doing the IO blocking with Fibers, but it likely won't scale. Non-blocking IO is still way faster at the OS level so my guess is that Loom will simply replace 10-20 lines of code in java-http and the majority of the IO will be the same.

All blocking IO when run in fibers will be potentially non-blocking. That is literally the point of Loom. Also, non-blocking IO is almost never faster than blocking IO but you can handle way more connections with it. You use NIO for scale not for latency. Due to fairness concerns you will also want to hand off heavy computation to native threads vs fibers. Fibers are for IO.

I don't think this is accurate. The Loom documentation says specifically that it is a concurrency model to replace native threads with fibers. It says very little about non-blocking IO, except that it is a use-case that it assists with:


The NIO section covers a bit about how the fibers release the channels, but the OS will not know that. Non-blocking IO is about interrupts at the hardware level that let the application code know when bytes are ready to be read or written. Fibers help by fanning out the work, but they don't replace this concept.

FTA, it's pretty clear.

  Typically, a virtual thread will unmount when it blocks on I/O or some other blocking operation in the JDK, 
  such as BlockingQueue.take(). When the blocking operation is ready to complete (e.g., bytes have been 
  received on a socket), it submits the virtual thread back to the scheduler, which will mount the virtual 
  thread on a carrier to resume execution.

  The mounting and unmounting of virtual threads happens frequently and transparently, and without blocking any 
  OS threads. For example, the server application shown earlier included the following line of code, which 
  contains calls to blocking operations: response.send(future1.get() + future2.get());

  These operations will cause the virtual thread to mount and unmount multiple times, typically once for each call to get() and possibly 
  multiple times in the course of performing I/O in send(...).
> Non-blocking IO is about interrupts at the hardware level

This isn't relevant. We only care from the perspective of userspace.

I'm not clear why there is a distinction here. Any HTTP server can easily use a non-blocking Selector to handle the I/O operations and then perform the application logic on Threads or Fibers. My point is that Loom fundamentally is a threading model that works well for blocking I/O.

What is unclear is whether or not Loom will increase performance of the server I/O (not the application logic code) that is already working well with non-blocking Selectors. I doubt it. But if someone has a Loom implementation of a plain HTTP server (nothing complex or JEE), I'd be up for some benchmarking exercises.

Overall though, I don't think Loom negates the usefulness of java-http. We built a super simple API, with no dependencies, that works with Java 17 and above, supports TLS natively, and is insanely fast.

Am I missing something?

True, at the OS level the JVM libs/framework are going to rely on some thread pool of non-blocking IO. Maybe a more fair comparison on calling them the same abstraction is event-loop/callbacks versus fibers.

A performant server should be able to do, on 10 GbE:

- 6M/s ~4KiB text/plain HTTP/1.1

- 500k/s <1KiB simple JSON responses to a single, uncached query

- 3M/s <1KiB HTTP/1.1 cached queries

If you want TLS performance, terminate it elsewhere or have a dedicated crypto (not crypto-mining) acceleration card in each server and do DSR.

Until you can afford your own 24/7 web SRE team, put Cloudflare in front of most of it and let them soak up traffic. :)

For speed, the choices usually revolve around Rust, JS+C++, or C++. Let me think: I'll take the one without buffer overflows or RCEs.

If you're trying to run a giant Spring app, good luck and take an Azul C4 GC with you on your journey.

> A performant server should be able to do, on 10 GbE:

> - 6M/s ~4KiB text/plain HTTP/1.1

According to my numbers 4KiB * 6M/s is roughly 196Gbps, that really must be a very performant server, and network card.

255Tbps exist in research labs. 100 Gbps is COTS hardware.

Benchmarks are usually performed locally between a load test program and the local server. Bottlenecks are imposed at the boundaries of storage iops, memory bandwidth, cache bandwidth, cpu context switching, external io, cache miss/hits, and overhead of the program.

Anything production and performant likely leverages DPDK and multiple 40/100 Gbps nics/hcas but doesn't need to serve that much because it has CDNs and "cloudflares" in front of it. Apps with dynamic data have to be sharded in such a way that they can be horizontally-scaled. WhatsApp scaled nicely with Erlang BEAM/HiPE on FreeBSD on a low hardware budget for the volume of traffic. HiPE isn't particularly fast or efficient compared to Rust or Azul C4 JVM but it saved a lot of thread contention by making copies and saving cache invalidations by making almost everything immutable.

I appreciate the simplicity of the API compared with other libraries. Especially when setting up TLS. That by itself is a big win IMHO.

Thanks! Feel free to log any issues you encounter. TLS is complex, but I think I have it working properly now.

I don't like callback hell http servers in java since now you have to go chase async libraries for everything.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact