

Ask HN: Search engines with regular expressions? - gnosis

Does anyone know of any search engines that allow searching with regular expressions?<p>The only one I know of is Google Code Search, but I'm looking for a search engine that searches ordinary web pages, not just source code.<p>I think regex search would be a really useful feature that would greatly increase the relevance and value of search results, and I'm kind of surprised no virtually no search engines have implemented it by now.
======
amock
I think this would be really computationally expensive. You can build indexes
of words, but building indexes of arbitrary regular expressions is much
harder. I think Google did a study and found that even increasing the number
of results per page was detrimental to their traffic because the pages took
longer to load. If they allowed regex searches their page load times would
increase by much more.

However, I would also like a search engine like this. I don't know how to make
it profitable but I think it would be very useful.

------
tokenadult
What is a use case in searching everyday text webpages for which regexes would
help, in your experience?

~~~
gnosis
Well, here's an example of a search that can be done with regexes, but not
with a simple boolean word/phrase search:

Find a word/phrase (not)followed closely but not immediately by another
word/phrase, with arbitrary characters in between (regexes would allow you to
specify exactly how many arbitrary characters between these words/phrases
there were).

For each of the constituent words/phrases, regexes also allow alternatives to
be specified. I suppose the same could be done with multiple ordinary
searches, but regexes make doing this much easier and more convenient.

With regexes you can search for fragments of a word. I've seen some ordinary
search engines sometimes automatically search for a root word with various
endings, but that's not quite the same.

Something else that regexes allow that no ordinary search engine does (as far
as I know) is searching for non-alphanumeric characters. This can be a
lifesaver, to disambiguate some words, or to make sure you're only searching
for something at the beginning or end of a sentence (or in a heading, maybe
delimited by a dash or a colon).

Those are just a few things off the top of my head. I'm sure lots of other
use-cases would present themselves, once you started using such search engines
regularly (pardon the pun).

