Hacker News new | past | comments | ask | show | jobs | submit login

How can it crunch through 1GB in 1s? Even just reading the data would take longer on any system I know.



According to Apple's MacBook Pro marketing page [0], their SSD supports sequential read speeds of up to 3.2GB/s. So this isn't as far-fetched as you imply, even if we're discounting other factors like the filesystem cache.

[0] https://www.apple.com/macbook-pro/


Yes, I see these specs all the time but I never see them materialize in the real world. Agree about the cache but that's also an easy way to fool yourself when you are measuring performance.


> easy way to fool yourself when you are measuring performance

In this case isn't the "already in RAM" test a more accurate reflection of performance anyway, as we are talking about the performance of grep and not the IO subsystem?

There are many cases where grep's performance won't be bottlenecked by IO, or at least not impacted directly by a bottleneck there. Anywhere when the input is coming from another program, essentially, and even if that task is IO bound it might be sending output to grep much faster than it is consuming input (perhaps gunzip <file | grep searchtext).

And in the case of searching a log file interactively, it is common that you won't just run grep of the file just once in a short space of time, instead doing it a couple of times as you refine your search, so for most of the runs it will be in cache (assuming the file is not to large).


The data could also be in the VFS cache/buffers for a number of reasons depending on the system setup and access pattern; including being written.


This usage is literally ideal for pretty much any file I/O - it's a straight sequential read. Even spinning rust will achieve >400MB/s on this type of load. Take a look at the sequential burst speeds at QD1, first chart: https://www.anandtech.com/show/13761/the-samsung-970-evo-plu...

Nearly ever SSD listed achieves well over 1GB/s in an actual benchmark, not just on a spec sheet. And these are just boring old off the shelf consumer drives. Nothing crazy.


HDDs almost never sustain 400 MB/s unless you are talking about something pretty exotic. 5200 RPM drives are generally in the 100-130 MB/s range and 7200 proportionally faster but still usually under 200 MB/s.


https://www.tomshardware.com/reviews/wd-red-10tb-8tb-nas-hdd...

So yeah maybe not over 400MB/s, but all of them are over 200MB/s. Sequential speeds really spiked as densities kept increasing.


Definitely seeing GB/s IO spikes on my samsung NVMe drives. E.g. when persisting application state into sqlite.

Note that you're not going to get this with SATA SSDs, you need NVMe, it's a 5x difference in throughput and IOPS.


On my puny laptop with an SSD I get ~400MB/s from cold cache, and ~5GB/s after the first run. So the answer is likely "it's in the FS cache".

That's a very common use-case with grep. Either grepping a file you recently wrote, or running grep multiple times as you refine the regex, at which point the files will be in the FS cache.


As TFA mentions,

> #1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

TFA is incredibly short, and will explain it much better than I can.


> it AVOIDS LOOKING AT EVERY INPUT BYTE

This would not help, since the backing storage doesn't provide support for this kind of resolution. It would end up reading in the entire file anyways, unless your input string is on the order of an actual block.


Sure it would help, not for the IO part, but the CPU-bound part of actually checking each character, which is apparently a much lower bound in this case.


Yeah, that's why the article talks about decreasing the amount of CPU work. From the context of disk IO though (which is what this thread seems to be about) this can't help.


DMA would like a word with you


I'm still not seeing how this significantly changes things?


Loading the file from disk into memory doesn’t require reads by the CPU. That’s significantly different than the cpu doing comparisons (or even reads) on each byte.


That doesn't avoid it needing to read the data off disk though. AFAIK even SSDs still only read in 4K chunks.

As the other reply mentioned though, it's just that MacBook SSDs are that fast.


All SSDs are that fast (and way faster). Nothing special on Macbook ones.


"GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE"

Somewhat confusing since it has to look at every byte to find the newlines. They are using a pretty specific definition of "look".


This is why the linked article specifically says it does not look for newlines:

> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!


... until you ask for line numbers.


It can't avoid breaking the output into lines, though, so it probably looks for newlines after a match is found.

I assume the Boyer Moore preprocessor reads a lot of bytes also.

Not disputing it's more efficient, but there's no magic. It avoids reading some bytes when and if it can.


> It can't avoid breaking the output into lines

It can whenever you don't ask for line numbers, can't it?

> It probably looks for newlines after a match is found

Probably, yeah. Counting number of newlines in a range, without caring just where they fall, can probably be pretty darned fast with vector instructions. Idk if that's worth the additional pass, though.


It doesn't need to look for newlines. That is looking for ^ABCD is not harder than looking for ABCD. The set of matches for the former is a subset of the latter, so if you have some fast way of doing the latter you have a fast way of doing the former (with an additional check for the preceding newline).

Another way of looking at is just considering the ^ another character (plus one-off special handling for start of file).


The SM0512G in this machine is capable of over 1.5GB/s in sequential read.

It's also possible that the file is cached in memory (I ran grep a few times through the file before I carried out the specific measurements).


Others have mentioned the SSD and VFS cache, however spinning disks in a RAID configuration can easily surpass this in raw sequential read performance.


Not really "easily" - the second test does 1.5 GB in 0.622s for a throughput of 2.41 GB/s.

If we assume something like 100 MB/s sustained for spinning disks, that's a lot of disks to get to 2.41 GB/s even ignoring overheads.


100 MB/s sustained is very slow these days. Even a Barracuda budget 7200 will hit >150 MB/s real world and it goes up from there.


Even at 200 MB/s that is at least a dozen disks in zero redundancy RAID to get the needed speed.

This test was hitting the OS disk cache.


NVMe SSD's can read about 3GB/s these days.


Faster than that even, but you're not likely to see them in laptops at the moment :D

Samsung's PM1725b (https://www.samsung.com/semiconductor/ssd/enterprise-ssd/MZP...) has a Seq. Read of 6300 MB/s and Seq. Write of 3300 MB/s.


This is answered in the article.


No it isn't. Unless grep is capable of skipping entire blocks (4096 bytes) it still has to pull that data off the disk.


No because most blocks will be found in the file system cache.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: