Today, Google accounted for 92% of all Internet searches. Google and Bing together occupy 96% of the search market. Most other search engines, including the popular privacy focused ones, simply get the search results from Google or Bing and reorder them, because developing a search engine is neither easy nor cost effective. Consequently, the few big tech companies control what people see.
Right Dao is different. We are a fully independent search engine, and we have the infrastructure and build the technology from the ground up. That enable us to show the search results free from search engine monopoly's manipulations.
We invite you to have a try, while we are constantly improving the quality and adding more features.
We think search engine's code is a bit sensitive in general. The engine depends on many subsystems, including storage, scheduling, indexing, etc. If the ranking code is seen by the spammers, they could push their spam websites. We are not sure if there's a viable way to open source a search engine.
Looks promising. But how do we know you are doing your own indexing and not simply buying in the results from Google or Bing like pretty much all of the other search startups? And if you are doing your own indexing, I see you include some very large sites like wikipedia which is going to cost a lot of money to index on a regular basis, so how are you going to pay for this in a sustainable way?
Our results are different from other search engines. Wikipedia has regular database dumps (https://dumps.wikimedia.org/), which is a relatively low cost to index. Overall, our scale is currently small and the cost is manageable.
Looks very decent. Make it configurable and I'll pay for a subscription.
What I mean is instead of indexing the entire internet with an adhoc ranking, trying to guess what I want, let me whitelist and blacklist domains and let me configure the ranking. I'd begin with stackoverflow, hackernews and arxiv and probably blacklist pinterest and other paywalled gardens.
From time to time your search engine could suggest search results from other sources, so I could update my whitelist.
It could index pdf files or even show their summary with some ml model, perhaps for an additional fee.
Another idea that's been bothering me for a while is searching for movies or songs. If you figure how to show me the most interesting (according to my filters) movies in 2020, I'd pay for that. Even more so for music.
This comment made me envision a tabbed results page with my results from my curated, first choice sources, and a second tab of general results based from the search engine's best guess algorithms.
"The Services are offered from the United States of America and, regardless of your place of residence or access location, your use of them is governed by the laws of the United State of America. Right Dao makes no representations that the Sites are appropriate for use in other locations or are legal in all jurisdictions. Those who access the Sites from other locations do so at their own risk and consent to the transfer and processing of their data in the United States of America and any other jurisdiction throughout the world."
That's half the reason I don't use Google right there and they pretend to follow European laws. So, yeah, right, no thanks.
Here's the problem with EU regulations: they really do bar small projects and hobbyists and assume that even the smallest website is backed by a corporation with resources to comply with pretty complex rules.
Actually, that's not really true. If you don't store any user information, you're compliant.
It's only if you start storing those that you have some rules to follow. Nowadays, it's the same if you are in California with the recent data protection laws.
Also, Right Dao is under New York law, so it has to follow US law I guess.
That is a horribly wrong view point. Information required for business purposes, e.g. to write invoices or file taxes is considered user information you are fine to retain.
It's not about not having information, it's about having consent before acquiring it.
I think you are not understanding what the point of view is. These regulations inject a whole set of requirements on a hobbyiest, not for profit or tiny business to write code to track regulatory compliance and ensure that various processes exist the law requires. Those requirements are often more complex and costly than the core business.
For small businesses that don't have a large mess of legacy stuff to clean up, the requirements aren't that bad. Yes, it is extra effort, but mostly documentation, and lots of it can be minimized by keeping as little data as necessary.
You search something which we hope is useful for you as the cosumer. We wont the overly smart and show you information we think might be more fitting or relevant because we tracked you down to live near a super-potent adwords custimer and therefore rank his results higher in your search results.
I tried a few searches in Japanese. The results for search terms written in kanji were okay, but searches for terms written in kana—such as アメリカ or ぴかぴか—yielded no results at all.
"Dao" here is probably not a reference to Daoism, but to "decentralised autonomous organisations".
Most often this is just a cyberpunky way of trying to avoid either tax or legal liability, but I'm not sure that either necessarily apply here. It would be interesting to know how the project is structured and funded.
Have you considered using Common Crawl [1], and if so, what was your assessment when compared to having your own spyders?
Long-term, a combination of theirs and your own could be optimal.
There are strengths and weaknesses with using their dumps: on one hand, benefits include them having crawled and having dealt with being throttled, etc. They offer monthly dumps for general content and daily dumps for news [2].
On the other hand, it's a huge pile of data to wade through, and their index format might not be your preferred method. The archive and index reside officially at AWS, so that may decide where to process it. (Not sure whether other providers maintain a copy as well or not.)
By "huge", specifically:
> October 2020 [...] contains 2.71 billion web pages or 280 TiB of uncompressed content.
From our analysis a few years ago, that was to be the approach for the now-defunct Snagz.net [3] (which never fully launched because co-founders were unable to join due to extenuating circumstances).
Impressed by the speed of the results. Some of my more obscure tests didn't give relevant results, but I guess this is still early on. Do you have any numbers about the size of the index, and where you're aiming to go?
My big question: What's behind the name? I find it a bit confusing and not very memorable at first sight, maybe an explanation would help.
From Wikipedia: Dao is a Chinese word signifying the "way", "path", "route", "road"... In most belief systems, the word is used symbolically in its sense of 'way' as the 'right' or 'proper' way of existence...
If I search for a person by given name and family name with quotation marks around the whole string then I would expect the hits that include the text to be at the top of the list. Google manages this for my name but Right Dao does not.
If I search for my name like this: "kevin whitefoot" the first 27 hits on Google are directly relevant and my name appears in the link or in the extracted text. Right Dao on the other hand returns a list where the most of the hits do not include my name as quoted just the two words separately which means that the hits are completely irrelevant as they refer to a completely different person.
Currently our scale is small and the cost is manageable. Please send your site to the email address in the about us page. (We don't have incremental indexing yet, so we have to replace the entire index with newer results once a while.)
And yet, you haven't told us how you are funded or how you plan to be... And you retain data. I honestly don't see any reason to think you are any better than your comqetitors.
How often is this indexed? I have a great example for today, December 3rd. Yesterday, Salesforce announced a new product, Hyperforce. It's a new way to deploy their product, basically a big deal from a big company. Searching Salesforce Hyperforce gets no results in rightdao, plenty in duckduckgo.
No, we use Kubernetes, grpc and protobuf as base, Prometheus and Grafana for monitoring. Our systems are mostly in C++, with some in Go such as crawler. No open source tool is used for indexing and search.
Today, Google accounted for 92% of all Internet searches. Google and Bing together occupy 96% of the search market. Most other search engines, including the popular privacy focused ones, simply get the search results from Google or Bing and reorder them, because developing a search engine is neither easy nor cost effective. Consequently, the few big tech companies control what people see.
Right Dao is different. We are a fully independent search engine, and we have the infrastructure and build the technology from the ground up. That enable us to show the search results free from search engine monopoly's manipulations.
We invite you to have a try, while we are constantly improving the quality and adding more features.
https://rightdao.com
FAQ: open source?
We think search engine's code is a bit sensitive in general. The engine depends on many subsystems, including storage, scheduling, indexing, etc. If the ranking code is seen by the spammers, they could push their spam websites. We are not sure if there's a viable way to open source a search engine.