Hacker News new | past | comments | ask | show | jobs | submit login
Grab – simple but very fast grep (github.com/stealth)
62 points by pmoriarty on Dec 21, 2014 | hide | past | favorite | 35 comments

I'm the author of ag[1]. File searching stuff interests me, so I took a look at grab. Sadly, grab only compiles on Linux right now. Some of the mmap flags and cpu_set_t aren't defined on OS X. I was curious about the performance claims, so I benchmarked grab and ag on my code directory. Times are medians of five runs. I used a Lenovo X140e running Ubuntu 14.10. It has a 160GB Intel SSD, 8GB of RAM, and an AMD A4-5000 (4 x 1.5Ghz Jaguar).

    ggreer@boron:~/code% du -sh .
    8.3G	.

    ggreer@boron:~/code% time ag cpu_set_t
    ag cpu_set_t  4.45s user 5.25s system 295% cpu 3.285 total

    ggreer@boron:~/code% time grab -R cpu_set_t .
    grab -R cpu_set_t .  13.31s user 21.67s system 35% cpu 1:38.28 total

30x faster, but these benchmarks aren't a fair fight. Ag ignores binary and hidden files by default. If I tell ag to do an unrestricted search, it's still 2x faster (43 seconds vs 98 seconds). Even with cold caches (echo 3 | sudo tee /proc/sys/vm/drop_caches between each run), ag beats grab handily:

    ag -u cpu_set_t  19.62s user 32.56s system 90% cpu 57.433 total

    grab -R cpu_set_t .  15.48s user 37.89s system 37% cpu 2:22.67 total
I haven't profiled grab yet, but there's definitely some low-hanging fruit. For example, it looks like grab could get a big speedup by detecting literal patterns and using strstr() instead of a whole PCRE engine. Also, FileGrep::find is calling pthread_mutex_lock/unlock even if there are no matches to print. Adding a condition around that makes grab 1.5x faster.

I'm glad I took a look at grab. Despite its shortcomings, I learned from it. I'll definitely try out a few tricks grab uses that ag doesn't, such as thread affinity.

One more thing: The author of grab is right about counting newlines. It does hurt performance. Still, I enable it by default in ag. I think the tradeoff is worthwhile.

1. https://github.com/ggreer/the_silver_searcher

Edit: The mutex locking change was so straightforward that I submitted a PR: https://github.com/stealth/grab/pull/2

> using strstr() instead of a whole PCRE engine

This is probably faster on a lot of platforms with the common grep use-cases, yeah (i.e. very short literal strings). But as a common practice imo using the C library string functions in performance-sensitive code is tricky, because it can lead to some wildly variable performance between platforms, compared to directly using a portable-C implementation of a good algorithm.

I was bitten by strstr() in particular in the past. I had a large slowdown when taking something pretty simple (filter/extraction code) that ran fine on Linux, and trying it on FreeBSD. It turns out that strstr() on FreeBSD implements the naive O(nm) algorithm: a loop around strncmp() that tries a match at each possible starting location [1]. That's fine for short search strings but increasingly bad for long ones. I had assumed that a typical strstr() would use Boyer-Moore or something of that sort past a threshold, but on many platforms it doesn't (GNU libc, at least in recent versions, does switch to Two-Way [2]).

[1] http://svnweb.freebsd.org/base/head/lib/libc/string/strstr.c...

[2] http://www-igm.univ-mlv.fr/~lecroq/string/node26.html

Thank you so much for Silver Searcher. I depend on it every day. One feature I would like is the ability to ignore very large files that get in the way when I'm searching.

Ag is amazing. Thank you for such a useful tool. I don't have to use find + grep after using ag.

One thing I wish for is to cache the search result of a search term so that next search of the same term would be instant. This is good for large static file set.

I just wanted to thank you for ag. It's literally the most used command in my shell history and I consider it crucial to my work. So, thanks!

Thanks for ag. It was a game changer for me.

I was coming here to post about ag, thank you for the silver searcher.

> Sadly, grab only compiles on Linux right now.

the_silver_searcher is a great piece of software and I use it every day, but why would your main development platform be anything but Linux in this day and age? OS X has inferior performance, no official package manager and it's being produced by people hostile to open source developers. Why in the name of Jah would you subject yourself to that?

I have a MacBook Air. I also have a ThinkPad dual-booting Ubuntu and Windows 8.1. When you see them side-by-side, you'll understand why the mac goes in my travel bag[1]. Others might not mind the difference in weight or size, but I do.

Also, even though the ThinkPad X140e is certified by Ubuntu to be compatible[2], it took me 4 months of messing around before I could change the screen brightness. That issue ruined battery life and made my laptop unusable at night. I still can't get bluetooth to work. Others have been luckier than me, but hardware support on Linux can still be spotty. With a mac, I don't have to worry about that.

But that's just my experience. Others (such as yourself) love using Linux on their laptops. It's entirely possible that people simply have different preferences or workflows. Instead of feigning shock or getting upset, just use what you like.

1. http://abughrai.be/pics/DSC_8737.JPG

2. http://www.ubuntu.com/certification/hardware/201309-14195/

>Instead of feigning shock or getting upset

It's not shock, it's disappointment. You put up with this shit[1] because it's shiny?

[1]: http://openbenchmarking.org/prospect/1304096-FO-RARINGOSX81/...

Please stop.

Because the UI is better. I personally make the tradeoff for more openness and power and use Linux, but I can see why people use MacOS.

Just a slight recommendation for those interested in a good OSX-y looking GNU/Linux distribution: check out Elementary OS [1]. It is heavily inspired by OSX and overall it is very responsive and attractive.

[1] http://elementaryos.org/

Because the GUI on OS X is actually useable, and upgrades don't automatically cause audio to stop working.

There are tens of popular Linux distributions out there. Choose one that doesn't rely on PulseAudio instead of caving in for the shiny GUI and the used car salesman bullshit.

I cut my teeth on Solaris, then Linux. I'm a nix guy at heart.

I really want* to use Linux, but I'm on OS X for a single reason: my living depends on my operating system reliably working well, after every update. OS X gives me this, and Linux, sadly, doesn't.

Maybe you haven't invested enough time in choosing and getting to know your distro. My Gentoo Linux ~amd64 is the most reliable system I encountered and it's being rebooted only when I want to upgrade the kernel.

Because OS X "just works" and I don't need the level of control Linux gives me for a personal laptop(i use linux on my servers).

Also the UI is far better than any Linux distro can offer.

My main development machine is a desktop that I assembled myself so the hardware was bought with Linux in mind from the start. My laptop is an old Dell Inspiron 1525 with zero problems under Linux because I did my homework before choosing it.

>Also the UI is far better than any Linux distro can offer.

I'm a programmer. I do most of my work in a terminal emulator. Openbox takes care of the rest.

I'm a programmer as well and also do most of my work in a terminal, looks are still important to me. Especially font rendering which is something that Linux does subpar compared to Windows or OSX.

> Especially font rendering which is something that Linux does subpar compared to Windows or OSX.

Not any more: http://www.infinality.net/blog/infinality-freetype-patches/

People like what they like. This is not the place for a flame war.

No, this is the place where you get with the program or get downvoted into oblivion.


Homebrew The missing package manager for OS X

A package manager that only handles a subset of the software, in its own part of the filesystem, while duplicating a bunch a of libraries already installed is not "The package manager". It's a crutch.

It informs me when the package it's installing is a duplicate of an OSX package; I can decide if it's worth it on a case-by-case basis. By installing in a separate directory and symlinking into `/usr/local/bin`, it's cleanly separated and easy to nuke at any time. Oh, plus the ability to find problems across all packages with `brew doctor`; it's actually helped me fix problems.

If it's a crutch, it's a damn useful crutch that I can't live without on OSX, and that I'd use over any other package manager on OSX in a heartbeat.

Having a faster grep-like tool installed on a production BSD or illumos machine sounds reasonable.

Some people like things you don't like. Get over it.

It might not be a replacement for ag today but it is 500 lines of C++ code w/o dependencies. Sounds like a great resource to learn from for people like me :P

Are there any benchmarks of this vs ag (https://github.com/ggreer/the_silver_searcher)?

Some of the optimizations are similar (mmap and pcre_study), while others are opposed (ag uses pthreads, grab claims disk I/O is the bottleneck and threads slow things down).

> Are there any benchmarks of this vs ag (https://github.com/ggreer/the_silver_searcher)?

Ask, and ggreer (https://news.ycombinator.com/item?id=8781374) shall give (within 10 minutes). :-)

Surprised to find that no one has yet mentioned ack, which is another developer-focused grep-like thing: http://beyondgrep.com

Because we now have [ag](https://github.com/ggreer/the_silver_searcher), which is similar to ack, but much faster

what's wrong with GNU grep???

Not all that much to be honest. it has much more features than the clones and is slower in some conditions.

I suspect improving grep would have been a better idea than making tools that aren't as versatile as grep, but...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact