Hacker News new | past | comments | ask | show | jobs | submit login
A look at search engines with their own indexes (2021) (seirdy.one)
77 points by mnem 16 days ago | hide | past | favorite | 23 comments



I have been somewhat impressed by Mojeek, but it does have two obvious flaws:

1) It not really good for localized search, it might be if you're local to the US or UK.

2) No !bangs. Coming from Ecosia I frequently just do !w !maps !yt because I know where I want the answer to come from

For English language searches, it completely usable, but not quite as good as Bing or Google. I really wanted to try to use Mojeek as my default for an extended period of time, but the lack of good local search makes it a bit annoying.


Local search and location-aware search is probably Google's biggest moat against smaller search engines. Bing does it passably, but it's aguably still pretty bad.

What's worse is that it's probably hard to ever get working well without the internet-scale profiling Google has access to.


> Local search and location-aware search is probably Google's biggest moat

The European Union, at least, has limited that a bit by preventing Google from linking Google Maps from their SERP.

So now, if you're in the EU, local results will display a map but you can't click on it.


They might be doing it differently, but Ecosia uses Bing and have really good localized search, at least for Denmark. There is very little difference between Google and Bing these days, if anything I'd say Bing is the better search engine.


Perfect, localized search is the most annoying thing there is.

Related:

A look at search engines with their own indexes (2021) - https://news.ycombinator.com/item?id=31820149 - June 2022 (114 comments)


Looking at all top 3 is helpful. I have done a lot of part sourcing for engineering work and begun using searx because of the aggregation. There are some other tools to use also when searching for obscure out of stock supply chain induced woes.


Is there some 80/20 rule for web indexing?

I’m not saying having deep per-page indexing of Reddit, for example, isn’t useful. But is there any value in a breadth-focused index that is far cheaper to maintain?


Almost certainly. Internet search is above all a problem of improving the signal to noise ratio.

There's an inordinate amount of documents that will never be a good search result for any query. Both in trivial cases that have barely anything to index in them, but also sign-up forms, cookie policies, redundant information (e.g. any given man page exists in dozens if not hundreds of identical copies on the web).


> cookie policies

Unless you're specifically searching for other websites' cookie policies (e.g. to understand how they work, or to do research on them, or just to plainly copy them...)


https://index.network for composable, user-owned semantic indexes. Disclaimer: I work there.


Can we get a list for 2024?


This is a living document. Last updated a few weeks ago.

https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content...


Missed exa.ai! Embeddings-based search engine with its own index


How does an embeddings based search work? Without hallucinating bad links?


Not sure what they are doing but embeddings and hallucination are completely separable imo (you can have hallucination even without embedding-based retrieval). Likely you have an embedding for the query which is close to the embedding of the doc for some measure of similarity. That could be semantic similarity or even user behavior.


Embeddings arnt grnerative AI.

Theyre just vecotors of arbitrary.dimension and similarity is calculated by a ndimensional fnction.


A little tangential but does anyone know if there are any modern web directories?

I'm wondering because it seems like due to the amount of spam on the web there needs to be more human curation as opposed to algrothims deciding what websites are valuable or not.



Ohh I remember Google Directory, good times. They closed it down for good in 2011.

Dmoz was also closed, but it seems like there's a "new" Dmoz called Curlie [1], founded by some of the original team members.

[1] https://curlie.org


It needs updating to include you.com, perplexity, etc. Most of those are google reskins/emulators but they are there non the less


Perplexity have their own index now, though it's not clear to me how much they use that over Bing in their core experience.

It's also hard to find information about it (they really need to write more about it), but it's mentioned in this article: https://thenewstack.io/more-than-an-openai-wrapper-perplexit...


> a look at search engines with their own indexes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: