Hacker News new | past | comments | ask | show | jobs | submit login
Livegrep: Live searching the Linux kernel source code (livegrep.com)
39 points by alpb on Apr 20, 2012 | hide | past | web | favorite | 20 comments

That was my first search too. I think it'd be interesting to see a list of the most searched terms.

Hey there. I built this. Top 10 queries, all lengths:


Top 10, >=4 characters:


I weep for humanity :)

Bug report: when you have multiple instances of a word in a single line, your service returns the line twice, and in both cases it only highlights the first found term in the line, instead of the second on the second pass or instead of consolidating the two results.

Any chance of the backend source for this being released? It's 2012; I shouldn't be stuck with `grep -r` and OpenGrok on a good day.

Yes, Russ Cox's codesearch is great, I use it to search in around ~50 millions lines of code. A query is so fast I never bothered to measure it, and reindexing the whole thing only takes a minute or two on a 5 year old laptop. It's so fast I reindex everything at every shell login.

I started using ack from http://betterthangrep.com recently and like it a lot. There are even more tools listed at http://betterthangrep.com/more-tools/

I use ack as well, but I am working on a multi-gigabyte codebase at work and it's time to look for a tool with indexing.

Is there an equivalent CLI tool? (Preprocess a tree of files, possibly keep it in RAM as a daemon, and answer queries in realtime.)

Russ Cox's codesearch: https://code.google.com/p/codesearch/

Russ Cox wrote the old Google Code Search, codesearch is an implementation in Go based on the same ideas.

I wrote one once, though not ram based, based on a lucene index, about 6 years ago, that crawled a subversion repository and presented a web query interface. It was a quick project to write. These days, git grep is hard to beat.

Are they really searching the static source in real time with the threads? This is what I understand from the description. Why don't they use something like Lucene, I believe the searched content is pretty much static, so why wouldn't just index? Wouldn't that be much faster?

Lucene, and most other indexing solutions (to my knowledge) index based on words, not arbitrary substrings.

So they can efficiently answer questions like "Give me all documents containing the word 'Linus'", but not necessarily "All documents containing the string '#def'".

That said, I am indexing -- I have my own custom backend that stores an in-memory index that lets me do arbitrary substring search (and more complicated queries, such as most character classes) much faster than a full search.

That said, the backend will fall back on a full regex search if necessary.

Do you use anything like a trigram index (see rsc's wonderful posts about how Google Code Search worked, and https://code.google.com/p/codesearch/ for a Go implementation) to speed up the regex codepath?

I'm using a different data structure -- a suffix array -- but the concept is pretty similar. I started work on this before Russ released his codesearch implementation, but I did read his blog posts while I was working on this.

How is it so quick? I attempted writing a similar thing in node to grep the Rails source code, but it was far from real-time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact