Talking of trees and caches, back in school I remember learning about splay trees. I’ve never actually seen one used in a production system though, I assume because of the poor concurrency. Has anyone heard of any systems with tree rebalancing based on the workload (ie reads too not just writes)?
Presumably depends on the country and the laws. Keep in mind that Apple considers this a new interesting use case - not a killer feature for AirPods. They wouldn’t risk AirPod sales with a gray interpretation of the law.
If your clojure pods are getting OOMKilled, you have a misconfigured JVM. The code (e.g. eval or not) mostly doesn't matter.
If you have an actual memory leak in a JVM app what you want is an exception called java.lang.OutOfMemoryError . This means the heap is full and has no space for new objects even after a GC run.
An OOMKilled means the JVM attempted to allocate memory from the OS but the OS doesn't have any memory available. The kernel then immediately kills the process. The problem is that the JVM at the time thinks that _it should be able to allocate memory_ - i.e. it's not trying to garbage collect old objects - it's just calling malloc for some unrelated reason. It never gets a chance to say "man I should clear up some space cause I'm running out". The JVM doesn't know the cgroup memory limit.
So how do you convince the JVM that it really shouldn't be using that much memory? It's...complicated. The big answer is -Xmx but there's a ton more flags that matter (-Xss, -XX:MaxMetaspaceSize, etc). Folks think that -XX:+UseContainerSupport fixes this whole thing, but it doesn't; there's no magic bullet. See https://ihor-mutel.medium.com/tracking-jvm-memory-issues-on-... for a good discussion.
> It never gets a chance to say "man I should clear up some space cause I'm running out".
To add to everything you said, depending on the type of framework you are using sometimes you don't even want it to do that. The JVM will try increasingly desperate measures, looped GC scans, ref processing, and sleeps with backoffs. With a huge heap, that can easily take hundreds to thousands of ms.
At scale, it's often better to just kill the JVM right away if the heap fills up. That way your open connections don't have all that extra latency added before the clients figure out something went wrong. Even if the JVM could recover this time, usually it will keep limping along and repeating this cycle. Obviously monitor, collect data, and determine the root cause immediately when that happens.
Of course you’re right and you really want to avoid getting to GC thrashing. IMO people still miss the old +UseGCOverheadLimit on the new GCs.
That said trying to enforce overhead limits with RSS limits also won’t end well. Java doesn’t make guarantees around max allocated but unused heap space. You need something like this: https://github.com/bazelbuild/bazel/blob/10060cd638027975480... - but I have rarely seen something like that in production.
This is one of the areas where OpenJ9 does things a lot better than HotSpot. OpenJ9 uses one memory pool for _everything_, HotSpot has a dozen different memory pools for different purposes. This makes it much harder to tune HotSpot in containers.
Would be great if it included sequential sampling as well: https://www.evanmiller.org/ab-testing/sequential.html . Especially given how A/B tests usually get run in product companies, a peek proof method helps quite a bit.
Easier to explain with coin flips. Let’s say we do 100 flips - we know the “most likely” thing to happen is 50 heads and 50 tails. The actual probability of that is C(100, 50) / 2^100 = 0.079.
So about an 8% chance. You’re significantly more likely (ie 92% chance) to see _something else_. And that’s _the most_ likely outcome.
So tldr - it’s not so much that “you never see an all tails sequence in practice” - you’re actually unlikely to see any particular sequence. All the probabilities get astonishingly low very quickly.
I’ve come to a similar conclusion about “self-service BI” myself but my solution is somewhat different. The solution I have is move the layer of abstraction higher: make extremely customizable dashboards but do not expose SQL to business users.
An example of this might be a dashboard with 20 filters and 20 parameters controlling breakdown dimensions and “assumptions used.” So asking “how did Google ads perform in the last month broken down by age group” is about changing 3-4 preset dropdowns. Parameters are also key here - this way you only expose the knobs that you’ve vetted - not arbitrary SQL.
Obviously this is a hard dashboard to build and requires quite a bit of viz expertise (eg experience with looker or tableau or excel) but the result is 70% of questions do become self service. The other 30% - abandon hope. You will need someone to translate business questions into data questions and that’s a human problem.
Yes, know which questions gets asked often and make those dashboards. Now the cfo or whoever can just open those dashboards when they want answers to those questions for any given timeframe.
my experience both as a builder, and as part of people that have to sit in the C-suit is the person then comes with their ill-understanding of the data and try to convince everyone A is B when it isn't
I have the same issue with my router - it supports 1gbps but that’s a total over all ports. That total includes the WAN port. So the max throughout you can get from one port to the internet is half the max - 500 mbps.
Is that really true? With full duplex, you should be able to route 1Gbps in and 1Gbps out simultaneously. Are you sure you aren't just hitting a CPU or ISP bandwidth limit?
I’m going to go for the esoteric opinion: MariaDB. Specifically to get system versioned tables. Imagine having the full change history of every single row for any odd task that you need without the performance penalty of keeping historical data in the same table. It can be a huge amount of leverage.
If that’s not your interest, I will admit that Postgres array support is far ahead of any of the MySQLs. Most ORMs don’t use it but you can get optimal query prefetching via array subqueries.
I’m 100% with you regarding system versioned tables. However, I think they’re coming to Postgres soon-ish. I was following the tracker for a while and it looked like it was done.
I haven’t tried this framework, but just looking at some examples, the web sockets and sse seem a bit out of place. In a Loom world you’d expect these to use channels or queues or something similar in a loop - not callbacks. Callback hell is what virtual threads try to avoid. I’m not sure if loom has any primitives like go’s multi-channel select that might make this workable though.
In either case Loom and frameworks that use it are super exciting! I’m looking forward to removing the 20 different thread pools we need to avoid deadlocks and just using the common one in our Clojure apps.
We don't have selectable channels yet, but they're not needed as much in Java as they are in Go, because often multiple channels are used to signal cancellation, whereas in Java there's a standard mechanism for cancellation (Thread.interrupt() at the low level, with Future.cancel being higher level, and JEP 428's structured concurrency being higher level still https://openjdk.org/jeps/428).
They mention this in the doc, but it seems like the future solution to preloading resources will involve HTTP Status 403 [1]. That seems like a great alternative to server push.
Currently it seems like the best option is to use Transfer-Encoding: chunked and make sure you flush out an HTML header that includes the preload directives before you spend any time rendering the web page. This is a pain - especially if something in the rendering fails and you now have to figure out how to deliver...something to the client. It also limits other headers significantly - e.g. cookies.
Lol yes of course I meant 103. Though I’m ready for an http status code that’s like that: “I can’t give you what you want, but here’s something you didn’t even ask for instead.”
reply