Hacker News new | past | comments | ask | show | jobs | submit login

Elaborate?



Virtually all public search engine endpoints see an insane amount of bot activity, often several queries per second.

If you delegate queries to e.g. google or bing at that rate, you'll be ip blocked in a heartbeat.


Ah duh, for some reason my mind didn't go to hosting the search instance locally and I misunderstood.

btw thank you for Marginalia! The spirit of the small web is very important to me.


Search engines: they scrape the web, but get narky when scraped themselves.


Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.

Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.


It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.

Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.


Isn't Kagi also really a delegator? I've heard they delegate to brave among others.


that does not negate what OP said. your IP will still get blocked very quickly.

although existing searx instances have been run for years and they don't seem to be dropping like flies...


Well. I host a public instance. IP is still not blocked. YMMV.


Your IP address will get burned




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: