From the article -- What concerns me is how obscurely the results have disappeared from three search engines, which also happen to command a fair share of the market after Google.
What the author really means is "they have disappeared from the only other English language index in the US." Yandex, an English language index hosted in Russia returns the wikivoyage queries as expected.
If I were to hazard a guess (and I am guessing here) the answer is here (also in the article): The first result links directly to Wikitravel. Wikitravel is the original project and a schism in the project resulted in the creation of Wikivoyage, ...
Based on this piece of information I would guess that Wikitravel sent Bing a DMCA take down notice telling them that Wikivoyage infringed on their copyrighted material. And as a result Wikivoyage was de-indexed from general results.
Wikitravel does not have any copyrights to anything, so I would hope not. It's all contributor-created content licensed under CC-by-SA which is free to copy under certain terms.
More importantly, Wikitravel is basically a dead man walking, and Wikivoyage is a living project.
> Wikitravel does not have any copyrights to anything, so I would hope not.
I guess I’d ask what authority you’re speaking from, because a quick perusal of their terms of use directly contradicts your claim. When you post content you sign over all ownership to them, and they do not allow scraping it.
> You automatically grant and assign to us, and you represent and warrant that you have the right to grant and assign to us, a perpetual, irrevocable, unlimited, fully paid, fully sub-licensable (through multiple tiers), worldwide license to copy, perform, display, distribute, prepare derivative works from (including, without limitation, incorporating into other works) and otherwise use any content that you post. You also expressly grant and assign to us all rights and causes of action to prohibit and enforce against any unauthorized copying, performance, display, distribution, use or exploitation of, or creation of derivative works from, any content that you post (including but not limited to any unauthorized downloading, extraction, harvesting, collection or aggregation of content that you post).
>Any copying, aggregation, display, distribution, performance or derivative use of our sites and services or any content posted on our sites and services whether done directly or through intermediaries (including but not limited to by means of spiders, robots, crawlers, scrapers, framing, iframes or RSS feeds) is prohibited.
For authority: I hold copyright which I've never assigned to a "few" pages.
Wikitravel is using my content under the CC-bySA license. The license they are using my content under does not permit them to forbid someone from copying the content.
You are in abstract correct; but the "-SA" in the license that the parent post mentioned stands for "share-alike"; the same thing as "copyleft". Wikitravel cannot forbid someone from copying other parts of the page without violating their license to the parent poster's content.
The schism mentioned in the original post was a result of IB taking over Wikitravel; the content that was forked over on Wikivoyage mostly would have preexisted IB’s ownership of Wikitravel. So the terms that mention IB would not be the same terms the content in question was submitted under.
So cjensen is right... They do not have any copyrights. They only have a license to use your copyrighted works themselves.
But you notice it doesn't say it's an exclusive license. Then they claim you've given them the right to "prohibit and enforce against any unauthorized copying". Which might be true, but I don't see how they have any right to define "unauthorized", since it's not their copyright, and their license isn't exclusive.
Having someone assign a license to you doesn't normally grant you standing to sue for copyright infringement and definitely doesn't allow you to file a DMCA takedown notice, which requires you to affirm that you are the copyright holder, not a licensee. Copyright trolls have been smacked down pretty hard in US courts.
https://wikitravel.org/shared/Copyleft seems to imply that you're allowed to share the content as long as you attribute it, which I'm almost certain wikivoyage is.
Wikitravel and Wikivoyage articles are usually functionally identical. There are sometimes information that are in one and not the other, or vice versa.
However, when I look at actual edit history, wikivoyage seems indeed more active. So you got a point there.
You needn't use your real name, of course, but for HN to be a community, users need some identity for other users to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...
As a former moderator at both sites who frequently explained this policy to contributors, no I didn't ask the maintainers, who once sued a moderator over the schism, any questions.
Yandex does a plenty their own filtering. For example they filter all the sites in this list https://eais.rkn.gov.ru/en/ which is used for political censorship in Russia.
Yes, that would be my guess. I talked to some people from Wikivoyage years ago, and one of the problems they talked about is that they were getting penalized for "duplicate content" a lot even though Wikivoyage was clearly the better alternative in every way (according to them anyway, I can't attest if that's true or not, or if it's still true since this was years ago).
It's not hard to imagine the "duplicate score ranking" got one-upped to "spam domain ranking".
"better alternative in every way" is overstating things a bit, but my experience when I've sporadically done side-by-side comparisons is that Wikivoyage tends to be more up to date and complete. However, there have definitely been a few articles where Wikitravel was considerably better. Neither is very good for business listings (out of date, spammy, badly sampled). WV tends to be better with practical information, like transportation options from the airport, and how to use public transit.
I think the general idea is that power users on travel community sites mostly choose to use WV at this point, but WT tends to rank higher in searches, so it may get more edits from other people.
Given that they have already been in court, I'm even more convinced that the DMCA is involved.
Here is my amended idea, Wikitravel added some sort of "gotcha" asset to their site, and Wikivoyage scraped their site and sucked it up, and when it appeared on Wikivoyage, Wikitravel sent the DMCA.
I could not find WikiVoyage's registered DMCA agent so it is entirely possible that Bing didn't know who to send notice of the takedown too.
Basically there seems to be a lot of bad blood between the two sites and as silly as it is, DMCA is an excellent "sniping" tool for taking cheap shots at someone you don't like.
Is it not the case that almost every non-Google, US-based search engine is basically Bing under the hood? It's a market I am not too familiar with, but seems I've heard that repeatedly over the years.
DuckDuckGo gets the bulk of its organic results from Bing[1], and Bing doesn’t seem to show pages from Wikivoyage[2], so contacting Bing to address that issue should fix it in DuckDuckGo, as well as any other search engines that use Bing’s index.
I've gotten into a habit of instinctively searching for queries in multiple major search engines as I've often found that many useful results are found by one but not another. Most of the time, notable differences seem to be in NLP or 'Oh, I know what you're looking for!' algorithms or are the result of DMCA or other requests, but I've noticed a few situations like this where results differ or seem special-cased, sometimes amusingly, for mysterious reasons (edit: another comment points out possible copyright issues with Wikivoyage).
Here are a couple cases where I archived results pages from Bing, Google, and Yandex with the Wayback Machine and archive.is.
A particularly puzzling case as to why the result might be filtered: one particular Wikipedia page. Searching by its title, using the queries:
victorian erotica wikipedia
or
"victorian erotica" site:en.wikipedia.org
Bing refuses to present the page, while Google and Yandex offer it with no qualms. Interestingly, when I discovered this, the article's talk page was the first result in Bing, but now even that no longer appears.
Another query, where the issue is likely that it borders on one of a diverse variety of sensitive and/or politically controversial topics where results sometimes seem deliberately tuned in one index or another (in my anecdata: mostly by Google), is the title of one of the hoax papers submitted in the grievance studies affair, which appears verbatim in many pages about it:
my struggle to dismantle my whiteness a critical race examination of whiteness from within whiteness
Bing and Yandex exhibit expected behavior. Google, however (which is usually the best at determining the subject of a query consisting of an out-of-context quote) seems to only focus on a small fraction of keywords and finds no results that are remotely related to the grievance studies affair itself.
I've seen discussions about the decreasing relevancy of search results in general where some have suggested that some sort of central repository be started for archiving interesting search results. I wonder if there have been any efforts toward such a project?
Note: when archiving search pages, I stripped all unnecessary parameters from the URLs. For example,
Oh, I didn't word that very well! What I meant was a repository of search results pages that are themselves interesting, especially as the results change over time...
Edit:
...and when they highlight the differences between search engines.
For example, Google's NLP allows it to just barely identify the video referenced by the query:
microsoft ceo repeating a word
And with this query, all three major engines return a jumble of results from amusingly diverse contexts:
Except a new internet directory probably wouldn't be very useful without some kind of search. If those old directories were big enough and didn't have a search worth using, the directory wasn't very useful.
I agree with Google doing more "controversial topic guiding" than others - though perhaps this is just their superior language model.
Interesting, Google seems to have fixed my go-to example. The query "<college newspaper name> sexual assault false report <year>" (for my college and sophomore year) used to return the expected article on Bing but fail to return it entirely on Google, unless the exact headline was used. Somebody seems to have fixed this bug though - shows up as first for both today.
I'm not sure if you can do it on behalf of a website, but Bing Webmaster Support has been pretty responsive in my experience with my own domains, like an actual person emailed me back after a couple days. And since Bing seems to feed a lot of DDG's index, if it gets back in Bing it will probably also go back to normal in DDG.
I just tried going to wikivoyage.org directly and did a search for Malta there. Guess what. It works. You already know what website you want, go there. Do your business there. Cut out the middle man hoovering up your data (even though DDG isn't supposed to do that. still wasted navigation)
Although it's true that people who know they want to use Wikivoyage can get around this, people who're searching for something like "travel to Malta" are the ones really affected by having the Wikivoyage results elided.
I googled "Malta travel guide" and the WikiTravel link is on page 3 of the results, the WikiVoyage one is on page 5.
I guess this is a very heavy market, and there are a lot of SEO optimized commercial pretty-looking "travel guide" sites (although WikiTravel apparently belongs to that category as well).
...yes, but again, you might be typing "travel to Malta". Wikivoyage being completely scrubbed from the index means it has no chance of helpfully showing up in those results.
but again, you are missing my point focused on making your point. I understand that this site is suddenly missing some traffic due to whatever reason nefarious or not. That's a given. Only the people using the search phrase "wikivoyage malta" would realize this. The people that search for "travel to Malta" would just get whatever results that are available for whatever reasons those are what is available. Sure, these peple might miss out on some valuable information, or they might not and nobody cares except tech nerds. The world did not stop spining on its axis because of this.
In principle, yes, but your point isn't relevant to the issue of Wikivoyage and other sites disappearing from search results. Someone may not even know of Wikivoyage and be delighted to discover it through a search.
For most queries, on most sites, searching sitename + query in google will give you better results than using the site's native search. Nobody is going to waste their time learning what the exceptions are.
I gave Brave search a try and it did return lots of results for Wikivoyage [1] but the first result was a page in German and yet I'm not in Germany and don't even speak German. English page was 4th down. Still, I prefer Brave's search results to those of DuckDuckGo. I just don't like how much stuff is missing from DDG and Bing (DDG's actual search provider). DDG is feels like a proxy for Bing so if I was to choose a different search engine, I'd just go straight to Bing and bypass DDG altogether. Anyway, give Brave Search a chance! [2]
What I realized some times ago, but can't explain (I didn't investigate though), it's that different versions of DDG have different indexes or at least return different results.
This is once again confirmed with the example in the article:
I switched from Google to DDG about a year ago but I just switched to Startpage about a week ago... lets see how it goes (I was getting bored of Bing's results).
But I wonder how long Wikivoyage has been blocked from DDG and/or Bing...
Startpage was my preferred search engine for the past two years, however in the past half year, I get many of my searches blocked when they contain "site:" param. The situation got worse in thev part few months when some searches with double-quotes get blocked too, which is a deal breaker for me.
For blocked search, I meant Startpage showed a page claiming unusual network activity, and presented a report form. There was no captcha, and there's no way to continue the search from there.
I wasn't using VPN when this happened, and I encountered this on my home WiFi, office WiFi, and mobile network, MacBook and Android.
What efforts has DDG actually taken to build a decentralized search index or anything to that degree? Do they make profit, and is their business model literally just acting as a proxy for Bing and what not?
I searched for it right now -- DDG for "Wikivoyage" doesn't turn up the site Wikivoyage on the first page of results. Amusingly, the linked article is on the first page.
Yeah, for me it's the Wikipedia article about Wikivoyage first. The infobox to the right is still there, but I assume that's separate from the index (and searching e.g. "Wikivoyage London" gets no useful results).
It doesn't show up for me on the first page of results in the main page, but the info box to the right has a direct link to it. Not sure what that means, but it's kind of weird and I've never seen anything like that before.
Of course, I'd never heard of WikiVoyage before this, so nice publicity either way, I guess.
What the author really means is "they have disappeared from the only other English language index in the US." Yandex, an English language index hosted in Russia returns the wikivoyage queries as expected.
If I were to hazard a guess (and I am guessing here) the answer is here (also in the article): The first result links directly to Wikitravel. Wikitravel is the original project and a schism in the project resulted in the creation of Wikivoyage, ...
Based on this piece of information I would guess that Wikitravel sent Bing a DMCA take down notice telling them that Wikivoyage infringed on their copyrighted material. And as a result Wikivoyage was de-indexed from general results.