If anyone is interested in searches applied to the full text of every page in your browser history, or to only select pages that you bookmark, check out our project DownloadNet (formerly, and possibly, futurely: "DiskerNet").
It hooks into your browser to give you an augmented experience. The UI is pretty simple (think 1997 era google but without CSS haha), and we don't do anything super complex with search (but could in future), but it works not bad. Check it out!!!
Oh, it also makes your content (again either everything you browsed or only what you booked) available offline. So if you work on an oil rig, or shipping, or long haul freight, can be a good way to browse as normal but save yer satellite bandwidth!!!
Me neither, it really would benefit from a better documentation since I like the idea a lot.
I just tried it out and it seems to be tied to Chrome. Since I use Firefox and Chromium as my daily drivers this does not work for my case. I understand that they probably rely on some Chrome internals to dig through the content, a SOCKS Proxy approach would have worked better and would have no need to switch between a "save" and "serve" mode. But then again I was only scraping the top of it because of the lack of browser support. Will keep an eye on this one though!
it's unclear to me why anyone, particularly anyone with even a passing interest in what the topic of this submission has to offer, would be even remotely interested in being the "master archivist of your own internet browsing."
i don't need anything else archiving anything related to my internet browsing except for my human brain. and yes, that's just me...
but how is the shameless plug of this not just therefore off-topic but diametrically-opposed-to-total-personal-privacy tool appropriate here?
Is funny because this totally offline and locally hosted search engine in DownloadNet is potentially the most private of all.
I get if you’re not interested, but I imagine people interested in locally hosted search-related solutions, may be.
Your view is probably more personal and hard to support in general given this, and given the comment’s position and votes indicating at least some people are interested.
I totally understand why you wouldn’t want your browsing history archived anywhere. But that is what search engines do somewhat. It’s okay, everyone’s different.
i self-host my human brain online in my own skull. i feed it and nurture it so that it can continue to perform and offer me the highest level of privacy i could possibly maintain.
The entire field of Information Technology exists because the world disagrees that the thing in your skull has an acceptable level of intellectual and recall performance. And here we are on a social site dedicated to the pursuit of making it better. And so are you!
before you go... i apologize for being a dick about it. i'd have to really reflect some more on why it felt necessary to go about it in this way, which is inevitably a deeply personal reflection.
but if i may just say, privacy as a concept for a truly egalitarian society is something very near and critical in my opinion. marketing, on the other hand, is not.
This takes me back. Before Google, meta search tools increased your odds of finding a decent answer between the spammy results from Alta Vista, Hotbot, Lycos, etc.
Not just avoiding spam but some meta search engines (Dogpile IIRC) could also search specialist search engines like White Pages and Yellow Pages (long before Yelp etc existed). You'd be able to find business listings and contact info that wasn't normally found on web search engines. They could also include FTP search results which was useful as public anonymous FTPs had yet to fall from use.
> SearXNG protects the privacy of its users in multiple ways regardless of the type of the instance (private, public). Removal of private data from search requests comes in three forms:
> 1. removal of private data from requests going to search services
> 2. not forwarding anything from a third party services through search services (e.g. advertisement)
> 3. removal of private data from requests going to the result pages
The docs mention a caveat below at "What are the consequences of using public instances?":
> If someone uses a public instance, they have to trust the administrator of that instance. This means that the user of the public instance does not know whether their requests are logged, aggregated and sent or sold to a third party.
All of that is fine but by simply having your IP, Google can continue to profile you in countless ways with data they collect in other ways and it wouldn't be expensive for them at all.
i think since 'IP address' has become something of a baseline non-technical understanding of one of the critical components of networking, it becomes increasingly difficult for non-netpeeps to fully grasp the many uses and non-uses of addressing.
a proxy (or proxies) and how they can shield but one or many of ' your' IP addresses throughout an egress packet's many hops (and from who or what destination it or those addresses can be shielded) is a pretty advanced concept when you think about it.
not to mention that, at this point, bare source IP address is a pretty dilute tracker compared to other current methods of identity profiling or traffic fingerprinting.
a few examples of a self-hosted design that would not, include policy-based routing over a VPN with one or multiple tunneled hops, or through another external proxy. (and then there's also that 'onion' routing 'protocol' there—but i'm not clear if/how that integrates with clearnet destinations like publicly-accessible search engines if at all.)
I would assume that the relaying can strip the request from identifying information such as IP, cookies and other tracking mechanisms that you get when visiting e.g. google.com.
privacy is achieved through the proxy and therefore aggregation of disparate requests/queries. some anonymity is therefore achieved, at least from the perspective of source search engine operators, by blending into 'the crowd.'
but the idea is not necessarily anonymity so much as privacy by foiling the creation of any even somewhat accurate marketing/data profile derived from 'your search.'
Thank you, that makes a lot of sense. Stateless is very good for privacy and I agree with that approach for a multi-user instance, (which I suppose is the most common use-case).
I'm picturing more of an instance-wide configuration of domain blocks for a private, single-user, self-hosted instance. But I understand this may not be the intended use of the project.
Google used to do that, but then stopped.
You can still do it manually by specifying by excluding them in (every) search you do,but the list can get along and it is far from a good user experience.
Kagi has this feature built in and it is a good user experience.
You can also use the uBlacklist browser plugin.
My problem with that is that is slows everything down.
I am not certain but I think all the works is done after the search
is complete. That it filter the actual result.
The two above limit it from ever being part of the result.
uBlacklist does just that with Google and some other search engines.
I use it with Firefox to filter out pinterest junk from search results. Also available for Chrome and Safari.
just installed it to try. For the people that want to give it a try also, I noticed that several of the public list contains legitimate websites such as canva or reddit
I trust myself a bit more than I trust someone else to run my queries sadly. I understand that they claim to store no user data or associations etc, but honestly, it's just their word.
> I understand that they claim to store no user data or associations etc, but honestly, it's just their word.
My guess is that if they are found to do so, then they open themselves up to lawsuits. Not collecting data isn't merely a perk - it's practically the reason Kagi exists.
Another big reason not to keep this stuff is just the cost of dealing with requests from law enforcement. At some point you start getting them.
If you don't have any logs you can just always say the princess is in another castle, since you can't provide data that doesn't exist.
If on the other hand you do have the requested information, you need to determine the validity of the request, and then extract the data; or refuse to comply and possibly put yourself at legal risk. For a smaller business that's probably a can of worms you'd rather avoid opening.
Only if you expose it publically without auth while routing queries through your residential connection, which is not an advised configuration.
For personal use, you can run it directly on your machine or access over VPN. Queries to upstream search engines can be forwarded over proxies or VPNs as you see fit. Some work fine over tor and some can go over commercial or DIY tunnels.
To add, I have been running instance for years for family and friends. I run it behind a nginix basic auth with a config that sets a forever cookie first time you login. Really simple. Another good option is cloud flare zero trust.
A ~dozen. Several are technical and use it because it includes several private and paid engines on request.
Config is in a git repo I give access to if requested. One of the technical users modified it to keep pretty minimal logs. I guess they are trusting me to actually use that config but trust is pretty high in the group so not really an issue.
I think this needs a lot more clarification than is provided in this thread.
If you run it locally, and only you use it, then you won't get blocked - a given search engine will see about the same number of requests as if you used it directly.
Add a few house members and you'll still be fine.
(I ran the original searx for a year or two locally - no issues at all).
Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.
Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.
It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.
Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.
Lots of people publicly host searx instances. There's a list of publicly available instances online, but if you are looking for a tool that randomly redirects you to an instance for every search you do on your browser's bar, you can use neocities: https://searx.neocities.org/changelog
I use this all the time. A downside is that sometimes you land on an instance that doesn't provide any results or gives you really poor ones. This has been happening less frequently recently.
Running it on a machine that also does NAT for many other machines helps to prevent getting blocked by upstream search engines like DuckDuckGo. It'd be good if access to certain upstream search engines could be sent through, say, a proxy set up elsewhere to prevent this very, very common problem, if you can't run it from an IP used for other things.
I'd like to figure out how to have a mode where my search is 100% literal - where every word I type must be in the search results exactly as I type them. Perhaps that's the equivalent of putting a "+" in front of each word, and putting each word in quotes? It's annoying that my words are constantly getting changed for me because there aren't many results, which I expressly don't want.
Like mentioned elsewhere, I want to be able to explicitly exclude certain domains. I get that SearXNG wants to be stateless, but I could either configure a separate URL for it or simply configure it for all searches. For instance, if I search for a PDF manual for something, I never, ever, ever want to see anything from "manualslib.com" and sites like it.
Other than these things which'd be nice to address, I'd say running SearXNG and encouraging people to use it instead of Google has worked quite well :)
People always sell Sear, but myself, I'm a fan of presearch.com
I have no affiliation with them whatsoever or financial interest. I have no interest in their crypto based business model.
In fact I think their lack of google or bing style search result filtering is entirely due to lack of funding and/or prioritizing other things more important to success, not due to taking a stand on free speech or anything like this. And that's perhaps how it was in the early days of the internet, when maslow's hierarchy of corporate needs focused on trying to make the thing work versus public relations goodfeels and presenting only rightspeech.
Anyway, if I'm looking for some topic I believe google would be known to filter heavily, or something esoteric, I take a look at presearch to get a second opinion. I'd also love to see archive.org do something similar, archive.org has an amazing collection of data, poorly indexed and poorly searchable.
Been running thieves the default on all my devices for the past year and I couldn't be happier. Have only had it choke twice and it just needed to be updated to be back in business.
That's clever. X-ING (like those 'crossing' roadsigns), so it's like Search-ching.
There's quite some similarity between the CH and the X sound in English.
But, as this is HN probably someone with a PhD in comparative phonetics will explain why this is a common and infuriating misunderstanding of layfolken.
I've run in to exactly that. By putting my SearXNG on a machine that also does NAT for a busy network, this can be avoided. This is definitely one instance where IPs from a colo are a bad thing and residential IPs are a good thing ;)
Web content itself had gone to shit these days, in order to win google’s SEO game to win google’s Adsense game. “Google going to shit” is just a second order effect (or third/forth depending how you look at it).
The good content has not disappeared. So it is still google going to shit if it can't make up what is good and what isn't, which was the reason people started using it in the first place 25 years ago.
Google search has gone to shit since Google+ .... or more precisely, when they removed the plus operator in Google search around 2011. And no, the quotes aren't as good.
My bet is that Google will become "Google TV" and search won't be possible. They will just show you what they want. They'll probably frame it as "AI knows what you want to see".
Maybe they should ban Google instead of TikTok (I don't use either though).
It does, I run it that way with an optional fan-out to my personal YaCy instance. Here's the relevant part of settings.yml:
- name: yacy
engine: yacy
categories: general
search_type: text
base_url: https://yacy.searchlab.eu
shortcut: ya
disabled: true
# required if you aren't using HTTPS for your local yacy instance
# https://docs.searxng.org/dev/engines/online/yacy.html
# enable_http: true
# timeout: 3.0
# search_mode: 'global'
Change 'disabled' to 'false' and point it at whatever YaCy instance you want to use. It can use the 'general' and 'images' categories.
use when i'm tired of picking obfuscated Fumo plushies or Minecraft screencaps on https://4get.ca/. i don't even know what a Fumo plushie is, never mind six of them.
It hooks into your browser to give you an augmented experience. The UI is pretty simple (think 1997 era google but without CSS haha), and we don't do anything super complex with search (but could in future), but it works not bad. Check it out!!!
https://github.com/dosyago/DownloadNet
Oh, it also makes your content (again either everything you browsed or only what you booked) available offline. So if you work on an oil rig, or shipping, or long haul freight, can be a good way to browse as normal but save yer satellite bandwidth!!!