
How fast can a BufferedReader read lines in Java? - CCs
https://lemire.me/blog/2019/07/26/how-fast-can-a-bufferedreader-read-lines-in-java/
======
elFarto
The first issue I can see with that code is it's not doing what he expects. He
does this to read the file into a StringBuffer:

    
    
      bf.lines().forEach(s -> sb.append(s));
    

However, this ends up reading all the lines into one giant line, since the
String's that lines() produces have the newline character stripped. This leads
to the second lines() call to read a 23MB line (the file produced by gen.py).
This is less than optimal.

The fastest version I managed to write was:

    
    
        public void readString5(String data) throws IOException {
             int lastIdx = 0;
             for (int idx = data.indexOf('\n'); idx > -1; idx = data.indexOf('\n', lastIdx))
             {
                parseLine(data.substring(lastIdx, idx));
                lastIdx = idx+1;
             }
             parseLine(data.substring(lastIdx));
        }
    

Not the prettiest thing, but it went from 0.594 GB/s to 1.047 GB/s. Also, it
doesn't quite do the same as the lines() method, but that's easily changed.

~~~
ricardobeat
That's a pretty big error if you're correct. What does it say about the
language when a CS professor falls for this on a 40 line file?

~~~
bhaak
Getting benchmarks right is difficult, even for a CS professor. The language
doesn't even enter there.

What does it say about you that you're asking a leading question in this way?

~~~
ricardobeat
I'm always interested in thoughts about language design and programming in
general. We have fifty+ years of commercial software development and it still
sucks. This caught my eye as the kind of error that shouldn't even be allowed
to happen.

~~~
bhaak
There are several simple data transformation steps. There's nothing inherently
wrong with these steps. The compiler can't read your mind and guess what you
wanted to do.

How would this be possible to be prevented with any programming language,
existing or hypothetical?

Even if the meta information that the appended string doesn't contain any
newlines would be passed on, the second call to lines() would rather use is as
an optimization hint instead of raising an error.

~~~
ricardobeat
To give an example: if the lines() method returned instances of a Line class,
the string buffer and it's append() method would be aware and still create a
collection of lines instead of one long string. Not necessarily nice and would
probably create issues in many other interfaces, but problems like this _can_
be solved with careful language and API design. Note that his end goal was to
'read lines' and not use any particular string abstractions. Nothing to get
worked up about here.

~~~
pvg
That would be a very clunky approach to put it mildly - the explicit design
intent of BufferedReader is to let you read lines without caring what the line
separator is - it's a lossy transformation. You also can't subclass String.
You'd be adding something very unwieldy for an edge case that is directly
contrary to the purpose of the API. The 'mistake' (such as it is) of the
author here is simply using the wrong API - there are shorter, more direct
ways to read a file into a string without any of these contortions.

    
    
        data = new String(Files.readAllBytes(Paths.get(filename)));

------
derefr
I don't know what Java's BufferedReader is doing, but it's probably not the
optimal thing in terms of IO throughput. I would blame the algorithm long
before blaming anything inherent about the JVM.

Erlang is another language where "naive" IO is kind of slow.
[https://github.com/bbense/beatwc/](https://github.com/bbense/beatwc/) is a
project someone did to test various methods of doing IO in Erlang/Elixir, and
their performance for a line-counting task, relative to the Unix wc(1)
command.

It's interesting to see which approaches are faster. Yes, parallelism gains
you a bit, but a much larger win comes from avoiding the stutter-stop effect
of cutting the read buffer off whenever you hit a newline. Instead, the read
buffer should be the same size as your IO source's optimal read-chunk size (a
disk block; a TCP huge packet), and you should grab a whole buffer-ful of
lines at a time, do a _pattern-matching binary scan_ to collect all the
indices of the newlines, and then use those indices to part the buffer out as
_slice references_.

This achieves quite a dramatic speedup, since most of the time you don't need
movable copies of the lines, and can copy the line (or more likely just part
of it) yourself when you need to hold onto it.

This approach is _probably_ also already built in to Java's "better" IO
libraries, like NIO.

~~~
ubu7737
Java NIO does exactly what you describe. IO is chunked optimally, the entire
buffer is pulled, the API exposes a select() mechanism just like any libc, and
the user is expected to frame its own received data.

edit: Lemire is showing a lack of what Fowler describes as "mechanical
sympathy".

[https://martinfowler.com/articles/lmax.html#QueuesAndTheirLa...](https://martinfowler.com/articles/lmax.html#QueuesAndTheirLackOfMechanicalSympathy)

~~~
Matthias247
A bit of nitpicking on that: Even Javas nio API isn't very efficient. select()
introduces a ton of garbage due to dynamically allocating lists for selected
keys, and keys have a lot of synchronization overhead. That's why frameworks
like Netty gained a lot of performance by building their own selectors using
JNI.

------
Sindisil
Honestly, it seems that nearly everyone here is missing his point.

Some of the blame for that probably lies with his headline choice, but he
clearly states at the end of this post:

""" This is not the best that Java can do: Java can ingest data much faster.
However, my results suggest that on modern systems, Java file parsing might be
frequently processor-bound, as opposed to system bound. That is, you can buy
much better disks and network cards, and your system won’t go any faster.
Unless, of course, you have really good Java engineers.

Many firms probably just throw more hardware at the problem. """

It's not _about_ this piece of code. It's not even about Java In the previous
post he mentions at the start of this one, he pointed out:

""" These results suggest that reading a text file in C++ could be CPU bound
in that sense that buying an even faster disk would not speed up your single-
threaded throughput. """

So, I take his point to be that one shouldn't make assumptions about
performance. Rough performance scales -- such as have been posted here many
times (e.g. [1]) -- make great rules of thumb for implementation choices or as
a guide for where to look first for bottlenecks. To optimize in the __real
world __, though, you 're best served using __real measurements __.

[1] [https://www.prowesscorp.com/computer-latency-at-a-human-
scal...](https://www.prowesscorp.com/computer-latency-at-a-human-scale/)

------
znpy
The post does nothing to explain how and why, it just throws a couple of
outputs from a non specified machine and does no comparison.

It has no baseline and no specs. For all I know, he could have got his 0.5
GB/sec on ab old Pentium II processor.

There is no analysis.

I am perplexed.

~~~
aristus
Lemire is one of the leading experts on string matching and the author of
several core libraries you probably use every day.

 _edit_ fine, so instead maybe click on the links in the post to see that this
article is just one of a series. He's probably tired of copy-pasting the specs
of his reference hardware (Skylake
[https://arxiv.org/pdf/1902.08318.pdf](https://arxiv.org/pdf/1902.08318.pdf))
since all he's concerned about is the relative performance of different
software.

There is a difference between "I'm being dumb because I don't know what I'm
doing" and "I'm being lazy because I've done it 1,000 times and the target
audience knows what I mean".

~~~
adrianN
Then they should know enough to give at least some theories that can explain
the difference and tell us something more about the test setup.

------
tawy12345
I'm amazed at how upset some commenters are about a blog post that did a toy
experiment and didn't actually make any strong claims.

I'm actually a stickler about good benchmarks - it riles me when people draw
sweeping conclusions from poorly-designed experiments. Lemire is actually one
of the good ones. If you want something more fully developed than a blog post,
read one of his papers.

I personally really enjoy his blog because of this - he's good at picking
interesting exploratory experiments that provide some insight, without trying
to over-generalize from the results. If you read his conclusion, the point is
that there is a good probability that even relatively simple programs are CPU-
bound. His experiment supports that point. My experience also matches that -
I've seen a lot of data processing code that could be I/O bound in theory
(i.e. a perfect implementation could max out CPU or network) but is CPU bound
in practice. Usually because of string manipulation, regexes, or any number of
other things.

> This is not the best that Java can do: Java can ingest data much faster.
> However, my results suggest that on modern systems, Java file parsing might
> be frequently processor-bound, as opposed to system bound. That is, you can
> buy much better disks and network cards, and your system won’t go any
> faster. Unless, of course, you have really good Java engineers.

~~~
jnordwick
I can make a lot of things cpu bound with crappy implementation. I'm not going
to write a blog post about any of them.

------
ubu7737
This is absurd, the original platform libraries do not account for the fastest
use-cases in any specialized IO case.

Java NIO channel should have been used for this. It was demonstrated back in
the early 2000s with the "Grand Canyon" demo achieving very good throughput
for its time, and it's still the gold standard.

------
adrianN
So what's the reason for this? Is it maybe because of some unicode
shenanigans? Java characters are 16bit iirc, and strings have some forty bytes
of constant overhead.

~~~
d2mw
I'm no Java ninja, but a few things jump out of
[https://github.com/AdoptOpenJDK/openjdk-
jdk11/blob/19fb8f93c...](https://github.com/AdoptOpenJDK/openjdk-
jdk11/blob/19fb8f93c59dfd791f62d41f332db9e306bc1422/src/java.base/share/classes/java/io/BufferedReader.java#L314)
:

\- at least one heap allocation for every line. After it finds the EOL it
first uses 'new String' followed by '.toString()

\- the C++ version will almost certainly be backing on to memchr() behind the
scenes, which will be using SIMD instructions where it makes sense (e.g. large
enough scan size, probably true in this case). the Java version is a manual
bytewise-coded loop.

\- the C++ version is reusing its output buffer, no reallocations assuming the
same string length or less

No idea about encodings in Java, maybe that is playing a role too

~~~
zlynx
I haven't looked recently but several years ago I was shocked to discover the
GNU libstdc++ didn't use strchr or memchr. It used a hand-coded for loop
because it was a template for various kinds of character. There was no
specialization for 8-bit char, either.

As a result std::string was disgustingly slow compared to C code.

------
vbezhenar
Java has many inefficient parts. For example there's no immutable array
concept (or owning concept, like in Rust), so there's a lot of unnecessary
array copies happens in JDK. String is not well designed. There was an attempt
to abstract String concept into CharSequence, but a lot of code still uses
Strings.

I made a similar benchmark. The idea is as follows: we have 2 GB byte array
(because arrays in Java have 32 bit limit, LoL) filled with 32..126 values,
imitating ASCII text and 13 values imitating newlines.

The first test is simply does XOR the whole array. It's the ideal result which
should correspond to memory bandwidth.

The second test wraps this array into ByteArrayInputSteram, converts it into
Reader using InputStreamReader with UTF-8 encoding, reads lines using
BufferedReader and in the end also XORs every char value.

For 2 GB I have 516 ms as an ideal time (3,8 GB/s which is still almost order
of magnitude less than theoretical 19.2 GB/s DDR4 speed) and 3566 ms as a
BufferedReader, so you can have almost 7x speed improvement with better
implementation.

Benchmark: [https://pastebin.com/xMD4W8mn](https://pastebin.com/xMD4W8mn)

------
nullwasamistake
Eh, he didn't use NIO. BufferedReader is an ancient Java relic. Like reading
from STDIN in c, it's not made to be fast, it's there for convenience and
backwards compatibility.

Read a file using something like Vert.X, which is optimized for speed. I'm
100% confident it will be faster than the naive c approach

~~~
djhworld
Do you have example of alternatives for the BufferedReader with the NIO APIs?

I do a lot of work with large GZIP that are read line by line using the
standard IO (i.e. GzipInputStrem(FileInputStream)) etc) but your comment has
really made me second guess my choice of doing that...

~~~
nullwasamistake
I would check out the compression handlers in Netty, which underlies Vert.X
and many other projects that need high IO performance.

You should be able to hack something together that feeds zero copy buffers
into Netty compression handler. Maybe using Netty or Vert.X file API, or maybe
just raw NIO2.

I'm not sure how fast this would be, but my gut says "very". Netty can easily
saturate 40 Gigabit Ethernet lines, and file IO should have less overhead.
That's ~5 gigabytes a second.

It's going to be a good bit of coding for sure. Vert.X/Netty/NIO2 are all
async and pretty low level. They're generally 1 thread per core, and along
with SSD read patterns you're probably best off reading files in parallel, one
per core. Might not be worth the effort.

You may want to look into ZStandard as well. It's Superior to Gzip in most
ways when you need something fast but decent.

------
tantalor
One thing that jumps out at me is the test code writes to "volume" variable in
the read loop, I assume for counting the number of bytes in the file, but
never reads it back. A clever compiler will optimize away those writes, the
string length check, the loop over the lines, and actually reading the file.

I'm not saying that's happening here, but it's a basic fact when writing
benchmarks that you have to actually test something real and not a transient
property of the program _after_ the compiler has had its chance to be really
smart.

------
barbarbar
I am a bit confused. The scanFile reads from a file. But the readLines inside
the for loop is reading from a StringReader - and not from a file?

------
chvid
At least two problems with the java code. Concatenation of strings using the
plus operator creates a new string and copies the content of the old, that
pushes the complexity of the code from o(n) to o(n2) where n is the number of
lines. Secondly order is not guaranteed with the for each operation on
streams.

The correct way to do it is using collect(Collectors.joining(“\n”)) or
straight forward imperative style (without streams).

I don’t think the general statement holds (that java or buffered reader is cpu
bound in particular).

~~~
pvg
Where do you see string concatenation with the plus operator in the posted
code?

~~~
chvid
Haha - nowhere ... I completely misread his code.

------
IloveHN84
But all the stream API isso terribile for performance..you write a one line
code and you are already at O(n^5)

------
pjmlp
Completely wrong.

It is like asserting something about C based on GCC specific behaviour.

Java is not a single language implementation.

~~~
tom_mellior
Agreed. I think you are being downvoted because you forgot to paste in your
benchmark results.

~~~
pjmlp
Yeah, I forgot that we aren't allowed to have opinions without doing the hard
work.

So now I have to go out and do benchmarks across all these implementations
just to prove they aren't the same?!?

[https://en.m.wikipedia.org/wiki/List_of_Java_virtual_machine...](https://en.m.wikipedia.org/wiki/List_of_Java_virtual_machines)

------
nottorp
Java is... java.

I was once working on an Android app on a cheap custom board with 128 M ram
(don't ask why Android on a single function custom board, wasn't my decision).

Among other things, I had to parse a 80000 line csv file. Splitting and the
rest of the processing created so many temporary strings the system ran out of
ram. We eventually gave up.

~~~
TheChaplain
Myself I've worked with gigabyte sized CSV files without issues, so it was
likely your implementation rather than the fault of the Java language.

~~~
adrianN
Did you do it in 128 megs of RAM?

~~~
Zach_the_Lizard
Depending on what needs to be done with the CSV files, it's very possible to
do it in 128MB of RAM. For example, if we need to read the rows, transform
them a bit, and then write to another file, we can read up to N rows,
transform them, and then write them. That should result in bounded memory
consumption because only up to N rows need to be kept. Similar strategies are
possible if the rows are used as input to an ETL job, calling a Web service
with the results of parsing the file, etc.

Editing a file gets trickier, though it's not impossible. Maybe using a [piece
table]([https://en.wikipedia.org/wiki/Piece_table](https://en.wikipedia.org/wiki/Piece_table))
plus some smart buffering the file can keep memory consumption below some
constant, letting it function for large files, but with the downside of lower
performance for files larger than whatever the constant is?

