Hacker News new | past | comments | ask | show | jobs | submit | antonhag's comments login

From the code samples it's hard to tell whether or not this has to do with de-serialization though. It would have been fun to see profiling results for tests such as these.

Author here, I'm away from my computer atm, but I can cook up a repo with each test in a few hours when I get home.

I designed the tests as a drag race because that mimics my real world usage.


That's nice - I'd encourage you to play around with attaching e.g. JMC [1] to the process to better understand why things are as they are.

I tried recreating your DataInputStream + BufferedInputStream (wrote the 1brc data to separate output files, read using your code - I had to guess at ResultObserver implementation though). On my machine it roughly in the same time frame as yours - ~1min.

According to Flight Recorder:

  - ~49% of the time is spent in reading the strings (city names). Almost all of it in the DataInputStream.readUTF/readFully methods.
  - ~5% of the time is spent reading temperature (readShort)
  - ~41% of the time is spent doing hashmap look-ups for computeIfAbsent()
  - About 50GB of memory is allocated - %99.9 of it for Strings (and the wrapped byte[] array in them). This likely causes quite a bit of GC pressure.
Hash-map lookups are not de-serialization, yet the lookup likely affected the benchmarks quite a bit. The rest of the time is mostly spent in reading and allocating strings. I would guess that that is true for some of the other implementations in the original post as well.

[1] https://github.com/openjdk/jmc

edit: better link to JMC


JFR only samples running Java methods.

I would guess at least some of the bottlenecks are in hardware, the operating system or in native code (including the JVM) in this case.


JMC is indeed a valuable tool, though what you see in any java profiler is to be taken with a grain of salt. The string parsing and hash lookups are present in most of the implementations, yet some of them are up to 10 times faster than the DataInputStream + BufferedInputStream code.

It doesn't seem like it can be true that 90% of the time is spent in string parsing and hash lookups if the same operation takes 10% of the time when reading from a filechannel and bytebuffer.


Aren't the versions that take 10% of the time only reading each city name once, and then doing an array lookup rather than a hashmap lookup?

Nope, see for example "Custom 1":

  var buffer = ByteBuffer.allocate(4096);
  try (var fc = (FileChannel) Files.newByteChannel(tempFile, 
                        StandardOpenOption.READ)) 
  {

    buffer.flip();

    for (int i = 0; i < records; i++) {

        if (buffer.remaining() < 32) {
            buffer.compact();
            fc.read(buffer);
            buffer.flip();
        }

        int len = buffer.get();
        byte[] cityBytes = new byte[len];
        buffer.get(cityBytes);
        String city = new String(cityBytes);
        int temperature = buffer.getShort();

        stats.computeIfAbsent(city, k -> new ResultsObserver())
             .observe(temperature / 100.);
    }
  }

My bad - I got confused as the original DIS+BIS took ~60s on my machine. I reproducing the Custom 1 implementation locally (before seeing your repo) and it took ~48s on the same machine. JFR (which you honestly can trust most of the time) says that the HashMap lookup now is ~50% of the time and the String constructor call being ~35%.

Hi,

Please add https://github.com/apache/fury to the benchmark. It claims to be a drop-in replacement for the built-in serialization mechanism so it should be easy to try.


Will do!

Hi! Crate author here, happy to see my old project getting mentioned. Let me know if you have any questions!


Do you actively use this function in any projects? What was your inspiration to write the crate?


I don't actively use it, unfortunately. The main inspiration was to faster sort inputs into the https://github.com/BurntSushi/fst crate, which I in turn used to try to build a search library.


fwiw, works on my machine (linux+ff). No failed requests in inspector.


It works on my machine now too


This got me thinking - the salinity of the water in the bothnian bay is very low (seems to be about 1/10th of ocean water). Wouldn't that effect electrolysis?


Possibly—but if that’s a concern, you can also get some from the North Atlantic.


This assumes a clear interface. Which assumes that you get the interfaces right - but what's the chance of that if the code needs rewriting?

Most substantial rewrites crosses module boundaries. In micro services changing the module boundary is harder than in a monolith, since it can be done in a single commit/deploy.


Most times you won't need a well-formed (REST) API to start with, and many times you will never need it. Most small projects should start without one and create one when need arises.


Sure, you won't "need" them. But it's very likely that you'll have resources that behave very well for rest like operations - user comes to mind, for things like user profile page, user settings, user detail, etc

A rest endpoint is likely handy to use across your application without having to replicate the same serialization over and over again.

Also many frameworks remove all the boilerplate for those operations (eg. Django class views), so creating one is a really good starting point.


Counter-example: https://javalin.io/ uses Servlets, and seems to be doing quite fine without annotations.


Elasticsearch is decent at using non-text criteria provided that they are:

1. In the same document (a document in ES is a JSON object) 2. You have indices for them. ES (and Lucene) supports indices on raw text values and numbers as well.

ES does not do well with relations (joins). You can de-normalize data to deal with that.. but that makes data consistency harder.


An alternative if you want a bit more help with charting, without client side JS, is to use d3-shape (https://github.com/d3/d3-shape) to server-side render SVGs.


It is quite common that you only need to optimize very small parts of a program to this level. The rest of the program can be written in more conventional styles.

You could of course FFI into e.g. C for those parts, but that is usually harder to maintain than a few well optimized java classes.


> It is quite common that you only need to optimize very small parts of a program to this level.

It's a quite common myth developers believe about performance. Hotspots do happen sometimes, but once they have been optimized you quicky end up with a flat profile and an "everything is slow" problem. And in some types of apps, the majority of code is performance critical.


It depends - even when you run into an "everything is slow" problem, it might be that it's like 1 endpoint out of 2000 that causes performance issues. In this case, you might need to focus very much on performance for that endpoint, but maybe not for other endpoints. Profilers can help you figure out what code to focus on.

If the majority of code is performance critical, the tradeoffs are of course different.


Nah, "everything is slow" is vanishingly rare in practice.


Depends on the application. In area of compilers, databases, CAD, game engines, simulation, distributed analytics, machine learning the only code that isn't on the critical path is some configuration / control plane / UI - which is minority of the code.


Certainly not in the entirety of those areas. I've worked in a couple of them and functionality was a higher priority than performance and we picked our tools accordingly.


The cases you’re talking about are the rare ones. Source: I did this professionally and now recreationally.


> You could of course FFI into e.g. C for those parts, but that is usually harder to maintain than a few well optimized java classes.

Hopefully Foreign Function & Memory API [0] makes FFI so much easier that we get to drop down to C without much fuss.

[0] https://openjdk.org/jeps/442


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: