Hacker News new | past | comments | ask | show | jobs | submit login
Talking to Postgres Through Java 16 Unix-Domain Socket Channels (morling.dev)
124 points by pjmlp on Feb 5, 2021 | hide | past | favorite | 73 comments

Love that you included the failed attempts and documented the ideas which you had an followed. Those tend to teach as much as the successful solution (especially for someone like me, who is a not a Java native and haven’t used the more modern versions)

As a software support tech things like that are extremely helpful.

"If you configure X like Y, you will see Z problem". Especially on new releases of software this cuts down a huge amount of trial and error by your support engineers trying to figure out what's gone wrong.

Thanks, really appreciating this sort of feedback, it's very motivating!

Often the failures are more illuminating than the actual solutions: you will see how different approaches can fail, and become better able to appreciate the solution.

It's very easy to write out a finished blog post with recipe-like instructions, but you will be throwing the baby out with the bathwater, as the reader might miss out on crucial insights into why the solution is the way it is.

I'm glad java is *finally* getting first-class unix domain socket support. It was one of my only qualms about using the language. I wish they offered more explicit unix signal handling, though I know some of the signals are reserved for internal use so it's probably a bit hard to let that cat out of the bag.

It has always been a couple of JNI calls away anyway, this just makes it more practicable.

Interesting that they left out socketpair() and sendmsg() for file descriptors though.

doing via JNI or JNA (for no C code) is beyond trivial. For instance we have integration with systemd - the code is around one screen.

If you trust your local connections over a socket, you may also map the OS user with which the App is running locally to the postgres database user in the pg_ident.conf. We could use that map through peer authentication mode for local connections through that App OS User. What i meant to say is, it is not limited to md5 or other modes no that it doesn't work with peer.

This was a recent discovery for me -- you can verify not only the peer uid/gid but also the pid using getsockopt so_peercred. It's very neat indeed (and this is what postgres uses in the pg_ident.conf):


More on Unix domain socket channels in JDK 16, by the author of the feature: https://inside.java/2021/02/03/jep380-unix-domain-sockets-ch...

This is slightly off-topic but how many 'connections' can an app make over a Unix socket and how would you manage an in-app connection pool like HikariCP or similar when using a Unix socket?

You should be limited only by the number of fds allowed on your system / for your user(s). I believe each connection would be counted as an fd for the server side and the client side; and the listen socket is also a fd on the server side. So max_fds / 2 - 1; assuming you have no other fds open (which is pretty hard to do, but gives you a ceiling).

On FreeBSD, you can tune kern.maxfiles up to the number of physical memory pages / 4 [1], despite the nearby comments suggesting a lower limit. You may need to tune some other values as well, and have a pretty tight software stack (or a lot of swap) to average less than 16k of ram per fd.

[1] https://github.com/freebsd/freebsd-src/blob/master/sys/kern/...

It is fun gimmick but it is not useful for anything realistic.

You have two situations: 1. The application must/can run on the database server 2. The application must be able to run separately on dedicated app server (which is... almost every case).

If the application has to run on the database server, it can use localhost which is realistically as efficient as domain socket but is more flexible.

If the application has to run on another host, there is no problem, you can't use domain sockets anyway.

>it can use localhost which is realistically as efficient as domain socket but is more flexible.

Actually, Unix domain sockets are substantially more flexible than localhost. For exampe, Unix domain sockets have filesystem-based access controls, they can have meaningful paths instead of magic numbers, and they are trivially used across different containers (just bind mount the socket file in to whatever container, just like any other file).

> Unix domain sockets have filesystem-based access controls

This is dependent on the operating system. The Linux kernel obeys access permissions on the file inode, but Solaris doesn't. (I always forget how AIX, macOS, FreeBSD, NetBSD, and OpenBSD behave.) But even on Linux there's the classic race condition if your umask isn't set correctly when you invoke bind. Interestingly, on Linux you can fchmod the descriptor before calling bind, which is better than temporarily changing the umask around bind as it's not thread friendly--the umask is global so would effect whatever files are being opened in other threads at that moment.

But you don't need to rely on the permissions of the socket file inode itself. You can just create a directory with the restrictive permissions and then bind the socket into that directory. That's perfectly portable, presumably even to Windows, and also avoids umask races. In fact, using directories this way is almost always the superior option (e.g. for temporary files, etc), though it's a tad more leg work so unfortunately people rarely use that pattern.

But all Unix systems also support querying credentials over Unix domain sockets, where "credentials" basically means the UID and GID of the peer[1], which permits supporting user- or group-based authentication without passwords. I'm pretty sure Postgres supports this; that is, you can configure Postgres to allow user foo to access DB bar without a password, token, or signed cert. I submitted a patch many years ago so this would work on OpenBSD. (At the time OpenBSD provided getpeereid for querying peer credentials, but Postgres only supported SO_PEERCRED and some other mechanisms. Years later OpenBSD eventually adopted the SO_PEERCRED approach, which is what Linux uses, and made getpeereid a wrapper, because sometimes it's not worth swimming upstream. None of these are defined by POSIX but the capability is supported one way or another on all the extant Unix systems.)

[1] Depending on the OS, credentials can include other information, like PID.

well, X11's abstract sockets try to disagree about the access-control part

Author here; SQL Proxies definitely are a relevant use case. Also low latency scenarios, where you want to have the (query) capabilities and semantics of Postgres instead of an in-memory or embedded database. Depending on the workload, the advantage over TCP is signficant (see links in the post).

You mean SQL Proxy available on the VM/container where the application runs and application connecting locally to the proxy?

I can see how it sometimes may make the application simpler and maybe, potentially, how you can have single proxy to serve multiple tenants (containers with applications) so that they maybe don't have to keep their own pools of connections.

The services would talk to each other locally and use the proxy to talk to the database using single connection pool.

But I don't see how this couldn't be as effective with TCP loopback (which actually is hacked to not invoke any TCP).

Granted, domain sockets are bit faster than localhost (https://redis.io/topics/benchmarks) but if this makes difference for your application you are likely using wrong architecture.

There are other benefits.

UDS can be permissioned using standard unix USER/GROUP which adds more flexible security structure (more so when you combine UDS with SE Linux).

This is opposed to localhost which is much harder to firewall from-localhost->to-localhost

I don't exactly see that as any better security. The assumption nowadays is that once something breached application safeties and has user level access to a machine it is as if it had root level access to that machine.

The reason is that there is just too much surface between an application and the operating system to be able to guarantee safety.

And so we put applications into their separate VMs or containers and treat the entire container as part of the application. There are no other assets on that container.

If the container boundary is meaningful, you can do things like run TLS termination in a container (because OpenSSL is a quagmire), and connect to the server via unix socket. Then the container has no need for outgoing network sockets, and limited filesystem access.

Of course, if the container boundary isn't meaningful, you've complicated your system design for nothing.

And either way, TLS termination in a separate process is a potentially significant penalty to performance, as you'll have a lot more copies.

>Also low latency scenarios

Low latency (e.g. HFT) = no database at all, esp. not on the fast paths. Mostly direct buffers for everything, including interprocess communication via file mapped memory.

> localhost which is realistically as efficient as domain socket

That depends on your platform. FreeBSD actually runs localhost as tcp, and you can have delays, packet loss when the buffer is full, and congestion collapse on localhost. It's lots of fun, but if I asked for tcp and got tcp, I'm not terribly surprised. It's not like BSD hasn't had a better option for local sockets since forever (did unix domain sockets ship at the same time as tcp? I don't know, but they both shipped before I had a computer at home), so my fault for choosing poorly or using a language that doesn't support nice things.

localhost is slower than Unix sockets on Linux as well. Try a benchmark with Redis or KeyDB and the performance differences are clear.

I've also found Unix domain sockets useful when wiring together services in CI or testing environments. In such situations, performance is important but all services are known to be running on the same machine.

"the access method for the local connection type must be switched from the default value peer (which would try to authenticate using the operating system user name of the client process) to md5."

I wonder if they tried scram-sha-256. MD5 seems like a bad idea these days.

Author here; mode of authentication didn't matter that much for that local exploration, so I haven't tried SCRAM-SHA-256. Great point though, it'd definitely be preferable in production.

Omg it is java16 already and the whole world is still on java8

To be fair, the chain is Java 8, Java 11, Java 17 (TBA) if you only consider LTS versions.

Java doesn't have LTS versions. All versions are equal, and vendors sell long-term support services to arbitrary versions of their choosing. The most recent version you can buy an LTS service for from Oracle is 11, and most, though not all vendors pick the same versions as Oracle for their support service.

BTW, LTS is mostly designed for legacy applications, i.e. those that don't see much development. The recommended version for use in production by applications that are heavily maintained is always the current version (15, and soon 16). It is true, however, that many people still misunderstand Java's LTS, and perhaps confuse it with the notion of LTS in other ecosystems, and so many choose an old version and buy LTS even if it's not the best option for them.

(I work at Oracle on OpenJDK)

Sure, so if I go to adoptopenjdk, the home page offers up 2 LTS versions (8,11) , and "LATEST". Same for Amazon's Corretto. And they all make note of LTS or not.

I believe you, but that message isn't front and center on the various OpenJDK distributions.

Sadly, while Java is otherwise careful to enforce various standards, anyone is free to call whatever they like LTS. OpenJDK (the source of most JDK distributions) has no concept of LTS (e.g., you'll find no mention of LTS here: https://openjdk.java.net/projects/jdk/11/). Different vendors can call LTS whatever they like, and what that actually entails depends on the vendor.

If we define LTS to at least include long-term maintenance in the form of bug and security fixes of the entire JDK, then no one offers a free LTS. Free offerings that call themselves LTS are just builds of the OpenJDK Updates project, that backports fixes from mainline, and so only includes fixes for the intersection of the old version and the current one (i.e. no "free LTS" offers patches for, say, CMS or Nashorn, and multiple other components that exist in 11 or 8 but not in 16). "Free LTS" offerings are only recommended for legacy hobby projects. Important production software should use either the current version (free) or buy an LTS service for an old version from some trusted vendor. The only version that is fully and freely maintained is the current one, and it also offers the cheapest upgrade process and the best performance. Don't think that if the vendor calls something LTS and ships updates that include some fixes you're actually getting some fully-maintained JDK.

BTW, AdoptOpenJDK is not affiliated or involved with OpenJDK, and is made by an IBM team that is exceptionally uninvolved and unfamiliar with OpenJDK compared to all other OpenJDK distributions (Oracle, Red Hat, SAP, Azul, Bellsoft, and Amazon).

Hm, my impression is that usually OpenJDK 8 is the fastest:



(edit: added missing version (slighly embarassed) and second link)

It's definitely the slowest, and by pretty significant margins. Those tests did, however, uncover bad default settings in JDK 14 on some machine sizes, which happened to be those Phoronix ran their benchmarks on: https://kstefanj.github.io/2020/04/16/g1-ootb-performance.ht... Those bad defaults are fixed in 15.

Moving away from 8 results in a significant saving in hardware/hosting costs for the vast majority of large applications.

Linux distributions too, they all seem to offer three Java packages: one for Java 8, one for Java 11, and one for Java "latest". The default is either the Java 8 one or the Java 11 one.

I've seen a reluctance to move from Java 8 because of the issues upgrading past Jigsaw (e.g. the build I'm working on today bleeds warnings about groovy doing illegal reflective access - and that's just a cosmetic issue) and so many shops decided to wait until the "LTS" Java 11 came along to make the jump.

Now that they're there and with the memory of the upgrade pain there's reluctance to upgrade again before Java 17. I think once they do that and see a much more normal upgrade path for that leap they'll be a lot more willing to track all the releases and not just the LTS ones (real or not it's perceived as being a thing).

I agree. Funnily enough, though, the upgrade pain to 9+ had nothing to do with Jigsaw, because its encapsulation hasn't even been turned on (unless you count warnings). It was just the biggest and most visible change, so people blamed their unrelated upgrade issues on it. The actual cause for many issues was that some popular libraries hacked into off-specification, undocumented JDK internals and relied on them (Groovy did that a lot), and they simply changed, as they're allowed to. The most disruptive specification change was actually dropping the "1." prefix from the version string, that many codebases used to parse.

Starting with JDK 16, Jigsaw is finally being fully turned on, and is starting to block access to internal JDK classes unless the application explicitly allows it. Ultimately, this will force libraries not to rely on internal implementation details and so get themselves tied to a specific version.

While I remember, here's a good example of an (IMO) bad decision being made due to the LTS meme: https://lists.ubuntu.com/archives/ubuntu-release/2018-Februa...

Point taken about the actual underlying causes of the upgrade pain though.

Speaking personally I very much appreciate all the work that goes into delivering OpenJDK and I think some of the anti-Java groupthink that exists on HN is regrettable.

I suspect there is a silent majority who enjoy Java's industrial reliability, its longevity, and the sheer volume of libraries and online resources, and can't be bothered raising their heads above the parapet every time someone starts ranting on about how crap OOP is.

> and so many shops decided to wait until the "LTS" Java 11 came along to make the jump

Even if they didn't decide to wait, they had to wait until all libraries they depend on made the jump. Many of them took a long time (as late as last year, I was still seeing "fix something which breaks in Java 9" in library changelogs), and sometimes the compatibility with Java 9 was only available on a newer major release of the library, which sometimes required upgrades to major releases of other libraries, and so on, and to make it even more painful some of these major releases dropped support for older Java releases like Java 7, which some of your clients might be using.

Migrating to Java 11 was actually more painful than migrating to Java 9, since Java 11 removed some of the things deprecated by Java 9 (this was a very short deprecation cycle, only one year between deprecation and removal), like all of J2EE; and IIRC, not all of the removed things were available as separate libraries (for instance, I recall CORBA being listed as not available separately, so once it was removed from the JVM, there was no alternative if you depended on it).

And there's the fact that, if you expect to run on the default-installed JVM in the latest release of a popular enterprise Linux distribution, you have to keep compatibility with Java 8. So all this pain is for next to zero gain (the only real gain is some GC improvements, if your client is willing to risk using Java 11 and encountering obscure bugs in your software).

Well, the main gain we're seeing that attracts most companies is money, sometimes lots of it. Savings on hardware due to performance and footprint improvements are often so large that they very quickly offset any migration costs. Here's a pretty common experience from LinkedIn: https://youtu.be/1AYTFRUTyao?t=14411. Systems that don't migrate are usually those that carry little hosting costs and/or make little money.

Second, as someone who works on OpenJDK I can tell you that the belief that JDK 8 is more stable than 15 (or even 11) is misguided either based on how they're actually maintained (an order of magnitude more people maintain 15 than 8, and while they backport some fixes, they don't backport them all, and we've had a few incidents of bugs introduced through misapplied patches) and the actual bug data. It is true that some new features might be less stable, but it's safer, at this point, to use 15 (or 11) than 8.

> I can tell you that the belief that JDK 8 is more stable than 15 (or even 11) is misguided either based on how they're actually maintained

The instability comes not from bugs in the JVM, but from how even tiny changes in the JVM can interact with the software running above it. For instance: Java 15 changed the return value of the getMessage() method of the NullPointerException generated when accessing a null pointer. That shouldn't have made any difference; yet, there is code which accidentally depends on its former return value, and that code is probably in a little used corner of the system where it will break when you least expect.

Java 8 is a known variable; you can expect it to have many less breaking changes than chasing the latest Java release (and if you are in the "latest Java release" train, you have to chase the latest Java release, since there's little if any overlap in their support period). It's a solid base for you to build your software on top.

Yeah, I was going to chip in; there are two kinds of stable.

You can say "Java N" is more stable than "Java 8" because they crash less often under fewer circumstances.

But you can also say "Java 8" is more stable than "Java N" because we know most of the bugs in "Java 8" already so we can avoid them (and we don't exercise the unknown bugs very often because we'd have worked around them by now if we did).

Often that second kind of stability is more important than the other one.

> It's a solid base for you to build your software on top.

Agreed. It is what I usually find at client installations in backend systems (at least for the last five years). IMHO it would be pointless to ask them to re-test/re-certify all their already running applications (in turn, specified for Java 8 by their respective providers), with promises about (unrequested) performance gains.

BTW, I remember in the past the only way for me to "push" some clients to upgrade from Java 5/6 (to Java 8) was the security/regulatory threat related to running an "unsupported" version of Java: think of the regulatory move from TLS 1.1 to 1.2 and now into 1.3.

> Java 8 is a known variable

You might think that, but that's not the case. At least as far as OpenJDK goes, some new features, including huge ones (like JFR) have been backported to it. I believe that the NPE message is being considered for backport, too.

> It's a solid base for you to build your software on top.

It's old tech, that -- unless you actually buy paid support -- is only partially maintained through backports from mainline only. If you're not buying support, it's a less solid, more risky base than the current version.

Thanks for the link, very useful information.

Did they ever resolve sun.misc.unsafe replacement since it was being deprecated/removed at some point?

Basically every high performance app uses that to bypass the vagaries of GC. It may not impact a day-to-day enterprise dev, who is going to be behind by 8 versions for years due to the issues with enterprise IT deployment/policy/management/prioritization.

I have to hand it to Oracle, this every six months release thing was a brilliant way to squeeze money from customers and absolutely screw over the OSS users and ecosystem in the long run.

Google/Android or Amazon could have acquired Java for decimal points of yearly revenue. Likely less than google's legal fees.

sun.misc.Unsafe has not been removed or blocked, and yes, there's also safe replacements -- VarHandles and the foreign memory API -- for much of its functionality, which will be removed piecemeal.

> I have to hand it to Oracle, this every six months release thing was a brilliant way to squeeze money from customers and absolutely screw over the OSS users and ecosystem in the long run.

Huh? How? We've open sourced the entire JDK, making it -- for the first time in Java's 25-year history -- 100% open-source, and we did away with major releases altogether, keeping only the minor ones, and making upgrades cheaper and easier. It's true that some people have not yet internalised what the change means, though. All change is scary, but this one is definitely better for users and for FOSS Java.

In my opinion (obviously a downvoted one but whatever), it fundamentally destabilized the whole ecosystem with a massive deluge of near-breaking changes.

I get for startups or people running immediate-to-production deployments that it is doable.

But Java's ecosystem is boring, staid enterprise, and those people are still running critical systems on Windows XP or worse. IF they are dragging their feet on THAT...

The churn basically stuns these people into 1) staying on Java8 for a LONG time, or 2) look at rewriting the system in something else, or 3) scaring them into paying money.

I have been doing mostly groovy for the last 10 years so not a single feature in the Java besides invokedynamic (which was really a JVM feature not a Java feature) has impacted me in a good way. Meanwhile... that churn.

As a Cassandra user I have seen the "what the hell do we target" conversation play out for 4.0 planning on the dev list, besides the sun.misc stuff. Maybe since (gah AMAZON's Corretto, seriously we have to rely on AMAZON being a good steward of OSS) then the coordination of patching and providing builds has stabilized outside of Oracle offered downloads, but it was a bit unsettling for a while.

So while I get that a lot of people see that as "being nimble" and "moving fast", those are precisely NOT what Java has historically been, for better or worse.

What you're describing is real in some places (Java's ecosystem also includes much/most of Apple, Netflix, Amazon, Google, Twitter, Uber and many others like them), but it is not a result of the new release model. The free support period of JDK 8 ended, like all versions before it, after five years. People who'd want to stay on 8 would have had to pay even if the release model stayed the same.

As to the changes, they aren't "near breaking." There have been few breaking spec changes, and they were very small. What breaks is what had always broken code: reliance on internal undocumented classes that change. The reason this happens more now than in the past is not due to spec backward compatibility -- we're as compatible as ever for most intents and purposes -- but because Java was stagnant for a good while, starting with Sun's dying days, and Oracle has been steadily increasing investment in the platform, which means more work done, which means more implementation changes. I think few would say this is a bad thing, but it also means that more code that does what it's warned not to do -- breaks. We're trying to stop this by turning on encapsulation in JDK 16, which would gradually stop libraries from hacking into the JDK and becoming stuck at any specific version.

As someone who very much felt the pain of stagnation in Java during the long release-cycles of the Java 5 to 8 era, I'm wildly enthusiastic about the new six-monthly release cycle.

Most of my clients use OpenJDK and don't pay for support of Java per se, so there are plenty of us outside the blast radius enjoying the benefits.

I can't find a reference off hand, but from following along the JDK-Dev mailing list the philosophy is not to remove sun.misc.unsafe until all the features are available in other public APIs. They're well aware of how important it is.

Hopefully someone else can chip in with a better reference or I can find an appropriate cite to back this up.

Edit: Ah, here's an example: https://openjdk.java.net/jeps/396

> It is not a goal to remove, encapsulate, or modify any critical internal APIs of the JDK for which standard replacements do not yet exist. This means that sun.misc.Unsafe will remain available.

It's a shame that Clojure only 'officially' supports the LTS versions. That is then reflected in what versions libraries and tools support so things seem to regularly break near the bleeding edge.

The current version isn't the bleeding edge. It's the version most recommended for use in production. The bleeding edge is Early Access release. If things break for new versions, the root cause needs to be found, because this shouldn't happen if libraries target the spec.

I feel like the blood is in the eye of the beholder. When I upgrade JVMs and stuff in Emacs’s Clojure ecosystem breaks, that’s painful for me, and so I reflexively avoid doing that.

but whats happens if a feature i use gets deprecated in version 15/16?

First, commonly used features aren't deprecated frequently. Second, you'd get a warning at least a year in advance (the more important feature -- the longer the warning). BTW, note that when something is removed from mainline, it is no longer maintained in Updates (because there are no more patches to the component to backport), and so it's not supported in "free LTS" offerings. If you want to continue using removed features for a few more years your only option is to buy long-term support.

good point

There is no reason to use the LTS versions.

Meanwhile the Python ecosystem is still trying to get rid of Python 2. Python 3 was released in 2008. The successor to Java 8 (Java 9) was released in 2017.

A marketing trick that Google takes advantage of when selling Kotlin to Android devs unaware of the real Java ecosystem.

Dalvik is hurting Kotlin evolution too, soon loom, value types, the vector api and so much more improvements will not be available on Kotlin for Android but will on the desktop.

Yep, and that is what "I don't need Java" #KotlinFirst movement doesn't get it.

Could you clarify? What's the relation between Kotlin and Java 8 and/or Google?

I think the thrust is that Google would happily convince everybody that the whole world is on Java 8, because that would make both the stalled version of Java on Android look less dated, and further Kotlin a much more modern language than the Java you can get anywhere (I'm not sure how "direct" Google's benefit from Kotlin adoption is... it certainly makes it more respectable as the primary Android lang).

The truth is more like - Kotlin is a much more modern language that the version of Java available on Android, but the gap between Kotlin and any of the previous 2 LTS Java versions is way smaller.

Kotlin is pretty pleasant, though, IMO.

Kotlin is MUCH more pleasant than Java, it's a the same time more strict and less verbose. And it's a good move to have a different language for Dalvik that can reuse the already developed API.

If the Android team was honest about it, they would support the latest Java version and let the developers choose between both languages on their own, without anti-Java marketing.

You don't see iOS team bashing Objective-C on twitter, or doing blog posts and videos, regarding how productive Swift developers are, or using Objective-C 1.0 samples vs Swift.

Sure :) I agree with most of this, it just feels disingenuous to pit Kotlin vs Java 8 and use that to bash Java/elevate Kotlin. IMO Kotlin stands on its own vs Java - nullable types and flow-sensitive typing are real standout features that don't need to use a 7-year old version of Java to compare against.

More or less it.

They use Android Java samples on their Java vs Kotlin talks on why Kotlin.

Then not only is Android Java a cherry picked subset of Java 8, the features they are adding support up to Java 11 are mostly related to standard library, and not all of them.

Then there is the feeling of anti-Java culture in Android, with #KotlinFirst, despite the fact of Android Studio, Android Frameworks, Gradle being Java based.

The irony is that they don't get that as Java and JVM moves forward, Kotlin will eventually be forced to differentiate between JVM and ART worlds.

My take agreeing with this was not well-received elsewhere: https://news.ycombinator.com/item?id=26009804

I did a lot of Java on Android, and it was not representative of the Java I do on the serverside.

Indeed, Android Java is Google's J++.

Microsoft learned their lesson and are now even an OpenJDK OEM, after buying jClarity.

The SV darling can do no wrongs.

Java 11 is the current LTS and seems to finally be somewhat stable.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact