Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Search engines with regular expressions?
6 points by gnosis on Feb 15, 2010 | hide | past | web | favorite | 3 comments
Does anyone know of any search engines that allow searching with regular expressions?

The only one I know of is Google Code Search, but I'm looking for a search engine that searches ordinary web pages, not just source code.

I think regex search would be a really useful feature that would greatly increase the relevance and value of search results, and I'm kind of surprised no virtually no search engines have implemented it by now.

I think this would be really computationally expensive. You can build indexes of words, but building indexes of arbitrary regular expressions is much harder. I think Google did a study and found that even increasing the number of results per page was detrimental to their traffic because the pages took longer to load. If they allowed regex searches their page load times would increase by much more.

However, I would also like a search engine like this. I don't know how to make it profitable but I think it would be very useful.

What is a use case in searching everyday text webpages for which regexes would help, in your experience?

Well, here's an example of a search that can be done with regexes, but not with a simple boolean word/phrase search:

Find a word/phrase (not)followed closely but not immediately by another word/phrase, with arbitrary characters in between (regexes would allow you to specify exactly how many arbitrary characters between these words/phrases there were).

For each of the constituent words/phrases, regexes also allow alternatives to be specified. I suppose the same could be done with multiple ordinary searches, but regexes make doing this much easier and more convenient.

With regexes you can search for fragments of a word. I've seen some ordinary search engines sometimes automatically search for a root word with various endings, but that's not quite the same.

Something else that regexes allow that no ordinary search engine does (as far as I know) is searching for non-alphanumeric characters. This can be a lifesaver, to disambiguate some words, or to make sure you're only searching for something at the beginning or end of a sentence (or in a heading, maybe delimited by a dash or a colon).

Those are just a few things off the top of my head. I'm sure lots of other use-cases would present themselves, once you started using such search engines regularly (pardon the pun).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact