(Am author) Most "alternatives" to the big three search engines (Google, Bing, Yandex aka GBY) just proxy their results from GBY. I took a look at 30 non-meta search engines with their own crawlers/indexers to find actual alternatives.
Enjoyed this. I look forward to seeing more people/companies take on building their own indexes from scratch. It's not only fun (to some extent) but can be very valuable given the right use case.
We've built a search engine at my company over the past year that now gives us significant business value and competitive advantage, has an index of fewer than 1 million records, is currently for internal use only, and listings only live for 30 days after inclusion. Would that work for a general purpose search engine? No. But for very domain specific, tight use cases, even a small index could make a difference to business processes.
Editor for https://www.searchenginemap.com/ here. That needs an update so missses some recent enrants. This article covers the situation pretty accurately from what I know from tracking this too. Longer lists can be found at https://twitter.com/SearchEngineMap/lists for those with a presence on Twitter.
I'm getting an error message when trying to view the lists ("Something went wrong"); I've tried multiple browsers and devices. Trying to save it to the WayBack Machine didn't work either.
I managed to archive a snapshot with archive.today; when clicking on the "Search Engine (Crawler)" list (which I assume is the list you were referring to), I was greeted by a "This page doesn't exist" error.
Very strange about the Twitter lists. You are, of course, correct Twitter is not the best place to compile; it's just a way to follow multiple search engines/services there. I have a spreadsheet which forms the basis of the graph shown on Search Engine Map so we should surface that.
My experience with Common Crawl is that it's targeting researchers, because the publish cycle is so long (in web freshness terms) and there doesn't appear to be any webhook or Atom feed which would allow CC to indicate only the diffs that a potential consumer would need to sync
Gemini link: gemini://seirdy.one/2021/03/10/search-engines-with-own-indexes.gmi
Feedback + additions welcome.