Hacker News new | past | comments | ask | show | jobs | submit login
A look at search engines with their own indexes (seirdy.one)
53 points by Seirdy on March 11, 2021 | hide | past | favorite | 9 comments



(Am author) Most "alternatives" to the big three search engines (Google, Bing, Yandex aka GBY) just proxy their results from GBY. I took a look at 30 non-meta search engines with their own crawlers/indexers to find actual alternatives.

Gemini link: gemini://seirdy.one/2021/03/10/search-engines-with-own-indexes.gmi

Feedback + additions welcome.


Good article. Petal search and Gowiki were completely new to me and I pay pretty close attention to other search engines (I created Runnaroo).

Another area to focus on when reviewing search engines are the enriched results (i.e. "Instant Answers", "Deep Searches", etc.).

These types of results have become almost an expectation for average search user almost as much as the organic results.


Enjoyed this. I look forward to seeing more people/companies take on building their own indexes from scratch. It's not only fun (to some extent) but can be very valuable given the right use case.

We've built a search engine at my company over the past year that now gives us significant business value and competitive advantage, has an index of fewer than 1 million records, is currently for internal use only, and listings only live for 30 days after inclusion. Would that work for a general purpose search engine? No. But for very domain specific, tight use cases, even a small index could make a difference to business processes.


Editor for https://www.searchenginemap.com/ here. That needs an update so missses some recent enrants. This article covers the situation pretty accurately from what I know from tracking this too. Longer lists can be found at https://twitter.com/SearchEngineMap/lists for those with a presence on Twitter.


I'm getting an error message when trying to view the lists ("Something went wrong"); I've tried multiple browsers and devices. Trying to save it to the WayBack Machine didn't work either.

I managed to archive a snapshot with archive.today; when clicking on the "Search Engine (Crawler)" list (which I assume is the list you were referring to), I was greeted by a "This page doesn't exist" error.

I then replaced "twitter.com" with a Nitter-based Twitter proxy (nitter.snopyta.org), and found these engines: https://nitter.snopyta.org/SearchEngineMap/lists/Search-Engi...

I really like your Mojeek engine and the Search Engine Map, but I'm not sure Twitter is the best place to compile this information.


Very strange about the Twitter lists. You are, of course, correct Twitter is not the best place to compile; it's just a way to follow multiple search engines/services there. I have a spreadsheet which forms the basis of the graph shown on Search Engine Map so we should surface that.


Is the common crawl index [1] not being used by search engines? Could someone chime in as to its relative anonymity in many such articles.

[1] https://commoncrawl.org/


My experience with Common Crawl is that it's targeting researchers, because the publish cycle is so long (in web freshness terms) and there doesn't appear to be any webhook or Atom feed which would allow CC to indicate only the diffs that a potential consumer would need to sync


By the title I expected you searched for google assets on google, but maybe that's my poor English.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: