A look at search engines with their own indexes (2021)

mrweasel · 2024-06-09T19:32:04 1717961524

I have been somewhat impressed by Mojeek, but it does have two obvious flaws:

1) It not really good for localized search, it might be if you're local to the US or UK.

2) No !bangs. Coming from Ecosia I frequently just do !w !maps !yt because I know where I want the answer to come from

For English language searches, it completely usable, but not quite as good as Bing or Google. I really wanted to try to use Mojeek as my default for an extended period of time, but the lack of good local search makes it a bit annoying.

marginalia_nu · 2024-06-09T20:21:16 1717964476

Local search and location-aware search is probably Google's biggest moat against smaller search engines. Bing does it passably, but it's aguably still pretty bad.

What's worse is that it's probably hard to ever get working well without the internet-scale profiling Google has access to.

reddalo · 2024-06-09T21:16:11 1717967771

> Local search and location-aware search is probably Google's biggest moat

The European Union, at least, has limited that a bit by preventing Google from linking Google Maps from their SERP.

So now, if you're in the EU, local results will display a map but you can't click on it.

mrweasel · 2024-06-10T07:44:22 1718005462

They might be doing it differently, but Ecosia uses Bing and have really good localized search, at least for Denmark. There is very little difference between Google and Bing these days, if anything I'd say Bing is the better search engine.

ldng · 2024-06-19T19:20:05 1718824805

Perfect, localized search is the most annoying thing there is.

dang · 2024-06-09T19:25:58 1717961158

instagib · 2024-06-10T01:09:11 1717981751

Looking at all top 3 is helpful. I have done a lot of part sourcing for engineering work and begun using searx because of the aggregation. There are some other tools to use also when searching for obscure out of stock supply chain induced woes.

Waterluvian · 2024-06-09T18:51:42 1717959102

Is there some 80/20 rule for web indexing?

I’m not saying having deep per-page indexing of Reddit, for example, isn’t useful. But is there any value in a breadth-focused index that is far cheaper to maintain?

marginalia_nu · 2024-06-09T19:07:26 1717960046

Almost certainly. Internet search is above all a problem of improving the signal to noise ratio.

There's an inordinate amount of documents that will never be a good search result for any query. Both in trivial cases that have barely anything to index in them, but also sign-up forms, cookie policies, redundant information (e.g. any given man page exists in dozens if not hundreds of identical copies on the web).

reddalo · 2024-06-09T21:17:09 1717967829

> cookie policies

Unless you're specifically searching for other websites' cookie policies (e.g. to understand how they work, or to do research on them, or just to plainly copy them...)

serafettin · 2024-06-10T13:21:15 1718025675

https://index.network for composable, user-owned semantic indexes. Disclaimer: I work there.

wakawaka28 · 2024-06-09T20:43:54 1717965834

Can we get a list for 2024?

marginalia_nu · 2024-06-09T20:48:03 1717966083

This is a living document. Last updated a few weeks ago.

https://git.sr.ht/~seirdy/seirdy.one/log/master/item/content...

jeffreyw128 · 2024-06-09T19:04:40 1717959880

Missed exa.ai! Embeddings-based search engine with its own index

HeatrayEnjoyer · 2024-06-09T19:24:30 1717961070

How does an embeddings based search work? Without hallucinating bad links?

janalsncm · 2024-06-09T19:46:07 1717962367

Not sure what they are doing but embeddings and hallucination are completely separable imo (you can have hallucination even without embedding-based retrieval). Likely you have an embedding for the query which is close to the embedding of the doc for some measure of similarity. That could be semantic similarity or even user behavior.

cyanydeez · 2024-06-09T22:26:56 1717972016

Embeddings arnt grnerative AI.

Theyre just vecotors of arbitrary.dimension and similarity is calculated by a ndimensional fnction.

raytopia · 2024-06-09T20:13:32 1717964012

A little tangential but does anyone know if there are any modern web directories?

I'm wondering because it seems like due to the amount of spam on the web there needs to be more human curation as opposed to algrothims deciding what websites are valuable or not.

marginalia_nu · 2024-06-09T20:18:30 1717964310

https://ooh.directory/ is one

reddalo · 2024-06-09T21:27:14 1717968434

Ohh I remember Google Directory, good times. They closed it down for good in 2011.

Dmoz was also closed, but it seems like there's a "new" Dmoz called Curlie [1], founded by some of the original team members.

[1] https://curlie.org

danielcampos93 · 2024-06-09T18:39:18 1717958358

It needs updating to include you.com, perplexity, etc. Most of those are google reskins/emulators but they are there non the less

simonw · 2024-06-09T19:05:27 1717959927

Perplexity have their own index now, though it's not clear to me how much they use that over Bing in their core experience.

It's also hard to find information about it (they really need to write more about it), but it's mentioned in this article: https://thenewstack.io/more-than-an-openai-wrapper-perplexit...

marginalia_nu · 2024-06-09T18:42:53 1717958573

> a look at search engines with their own indexes