Bug report: when you have multiple instances of a word in a single line, your service returns the line twice, and in both cases it only highlights the first found term in the line, instead of the second on the second pass or instead of consolidating the two results.
Yes, Russ Cox's codesearch is great, I use it to search in around ~50 millions lines of code. A query is so fast I never bothered to measure it, and reindexing the whole thing only takes a minute or two on a 5 year old laptop. It's so fast I reindex everything at every shell login.
I wrote one once, though not ram based, based on a lucene index, about 6 years ago, that crawled a subversion repository and presented a web query interface. It was a quick project to write.
These days, git grep is hard to beat.
Are they really searching the static source in real time with the threads? This is what I understand from the description. Why don't they use something like Lucene, I believe the searched content is pretty much static, so why wouldn't just index? Wouldn't that be much faster?
Lucene, and most other indexing solutions (to my knowledge) index based on words, not arbitrary substrings.
So they can efficiently answer questions like "Give me all documents containing the word 'Linus'", but not necessarily "All documents containing the string '#def'".
That said, I am indexing -- I have my own custom backend that stores an in-memory index that lets me do arbitrary substring search (and more complicated queries, such as most character classes) much faster than a full search.
That said, the backend will fall back on a full regex search if necessary.
Do you use anything like a trigram index (see rsc's wonderful posts about how Google Code Search worked, and https://code.google.com/p/codesearch/ for a Go implementation) to speed up the regex codepath?
I'm using a different data structure -- a suffix array -- but the concept is pretty similar. I started work on this before Russ released his codesearch implementation, but I did read his blog posts while I was working on this.