Elaborate?

marginalia_nu · 2024-04-05T23:17:04 1712359024

Virtually all public search engine endpoints see an insane amount of bot activity, often several queries per second.

If you delegate queries to e.g. google or bing at that rate, you'll be ip blocked in a heartbeat.

mostlysimilar · 2024-04-05T23:21:35 1712359295

Ah duh, for some reason my mind didn't go to hosting the search instance locally and I misunderstood.

btw thank you for Marginalia! The spirit of the small web is very important to me.

RaisingSpear · 2024-04-06T04:05:29 1712376329

Search engines: they scrape the web, but get narky when scraped themselves.

marginalia_nu · 2024-04-06T08:33:30 1712392410

Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.

Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.

Fnoord · 2024-04-05T23:17:10 1712359030

It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.

Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.

wkat4242 · 2024-04-06T03:07:37 1712372857

Isn't Kagi also really a delegator? I've heard they delegate to brave among others.

ranger_danger · 2024-04-05T23:50:22 1712361022

that does not negate what OP said. your IP will still get blocked very quickly.

although existing searx instances have been run for years and they don't seem to be dropping like flies...

lannisterstark · 2024-04-06T03:07:23 1712372843

Well. I host a public instance. IP is still not blocked. YMMV.

HeatrayEnjoyer · 2024-04-05T23:15:53 1712358953

Your IP address will get burned