
One of the world’s most visited websites that nobody is aware of (2017) - handpickednames
https://sijmen.ruwhof.net/weblog/1623-one-of-the-worlds-most-visited-websites-that-nobody-is-aware-of
======
shakna
> According to another source, a public address book at locatefamily.com,
> someone under the name Vladimir S. Nesterenko is living at [REDACTED] in
> Moscow. Vladimir lives according to Google Streetview it’s an apartment
> complex far away from the Moscow city center...

Really? Someone investigating with journalists, and may be considered a
journalists themselves (?), actually publishes the address of someone that
they are accusing of running a scam?

There can only be two reasons for doing something like that:

a) You're an idiot, and didn't consider the implications.

b) You're asking for the public to exact a justice that you think you can't.

~~~
scottlocklin
It's extremely likely that this person isn't responsible for the website, or
knows anything about it. Criminal or shady activities in Russia (and
everywhere else) are often laundered through cutouts.

You can find an example of this in "Magnitsky act: behind the scenes" where
Bill Browder allegedly laundered his corporate holdings for tax fraud through
various handicapped people, and eventually gangsters who were conveniently
murdered.

~~~
Fins
Using that piece of rather silly putin's propaganda as a source lowers your
credibility greatly.

~~~
scottlocklin
Ahem. The maker of that documentary is anti-Putin to the extent he supports
the theory that Putin blew up some Russian apartment buildings to start a war.

Browder, on the other hand, is actively surpressing the documentary; even if
none of the things said about him in the documentary are true (in which case
you'd think he would sue the guy who made it for libel; a guy who lives in the
UK), that's pretty shady.

~~~
gandhium
That filmmaker was anti-Putin and he still lives in Russia? Totally
believable.

~~~
rjplatte
That's not uncommon. Only very high-profile Putin opposers disappear.

~~~
Fins
Or we do not hear about low-profile ones. Being high profile (Navalny,
Khodorkovsky) might be even an insurance policy to keep one at least alive.

------
NKosmatos
Thanks for letting us know about this great "sharing" site :-)

Lot's of garbage in it, but it also has some interesting docs. I'm more
interested on the technical details of it. How it works internally for
scrapping, text extraction, indexing, random users creation and uploading of
stuff and so on.

I understand the privacy and copyright infringement concerns, but it's not
like they've hacked into those other sites and uploaded their files. They're
using existing and "open for everyone" pages. Also the estimates for revenue
are highly exaggerated IMHO. Sure they're making some €€€, but not in the
millions range :-)

On a final note, how is this vastly different from, let's say,
[https://www.scribd.com](https://www.scribd.com) ?

~~~
mtnGoat
This is quite easy to replicate. Just need about 5 jobs that run 24/7
harvesting data.

------
hahla
I read through the article and I’m baffled. I couldn’t watch the news video or
understand what was going on at the bottom of the post... but were news
outlets and the government involved for a site that’s skimming PDFs, hosting
them and then slapping on ads? This isn’t something new and if I recall
correctly SlideShare did the same thing to grow to where they are today. Is it
legal, I don’t know, but these types of sites are dime a dozen and nothing
new..

~~~
ericabiz
From the article: "Multiple tax files of Dutch citizens had been published via
www.docplayer.nl"

I don't speak or read Dutch, so I couldn't watch the video clip, but my guess
is these tax files contained important and sensitive information. They were
probably scraped from a government website.

~~~
zandjager
They probably did, but someone made the mistake to put them somewhere where
they were publicly available. Also the "russian" made tools available for
taking down the content, tools which according to the article were working in
an effective and timely manner.

------
EMRZ
Sorry if i don't understand this totally, but why is scraping for pdf and
ppt/pptx files and mirroring them illegal?

If you can reach that files just scraping it means they are somehow open to
public access.

No joke, i am genuinely asking.

~~~
consp
Scraping, mostly no. Using them to make profit: yes. Since you then violate
the copyright of the author as the original action was solely to make it
public (assuming no profit was intended).

edit: Rehosting is not allowed as far as I know in Dutch law if you are
attempting to make profit of it by not requesting it from the original owner.
Rehosting it and not taking advantage and linking to the original article is
allowed as far as I know. But maybe ask your lawyer for info if you really
want to know.

~~~
prepend
If a file is available through non-auth http it’s unclear what the copyright
is. I think it depends on the particular item whether copyright was violated.

Making a file public without restriction doesn’t mean another can’t make
profit (my ISP makes a profit by transmitting the file to me; gmail makes
profit when I email the file to me; google makes a profit when they cache the
file; archive makes a profit when they archive a file; etc etc).

I think if this were non-public docs then the case is clearer. But by
releasing a document publicly with unlimited access via URL the author
explicitly allows unlimited distribution (and due to the nature of tcp/ip
redistribution). If an author wants to restrict distribution then they should
restrict distribution using available protocols.

~~~
dragonwriter
> If a file is available through non-auth http it’s unclear what the copyright
> is.

Protocol has nothing to do with copyright. “Non-auth http” doesn't suddenly
make copyright murky.

> But by releasing a document publicly with unlimited access via URL the
> author explicitly allows unlimited distribution (and due to the nature of
> tcp/ip redistribution).

There is an implicit license to exactly the redistribution necessary to effect
access by URL, sure, but this is not “unlimited distribution”.

Republishing is, particularly, not what is licensed.

~~~
JdeBP
There is no implicit licence. This is an oft-circulated but fallacious
rationalization made by computer people that is not in line with the law.

The law does not recognize any such thing, and until the turn of the 21st
century this was a recognized hole. Technically, the World Wide Web (and
indeed other systems from FidoNet to SMTP) was a violation of many countries'
copyright laws. Legislators fixed the hole, but not by introducing the idea of
implicit licencing.

The EU introduced provisions in 2001, by way of a Directive, that the making
of temporary transient copies for the likes of HTTP and other network
transmission mechanisms to work _was explicitly not copyright violation_ in
the first place. This became Netherlands law in 2004, and U.K. law in 2003.
Neither the Directive nor the implementing Acts and Instruments talk about
licensing. It is simply not a violation of copyright by definition.

* [https://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32...](https://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:32001L0029)

* [https://www.ivir.nl/publicaties/download/RIDA2005_206.pdf](https://www.ivir.nl/publicaties/download/RIDA2005_206.pdf)

* [http://wetten.overheid.nl/jci1.3:c:BWBR0001886&hoofdstuk=I&p...](http://wetten.overheid.nl/jci1.3:c:BWBR0001886&hoofdstuk=I&paragraaf=5&artikel=13a&z=2018-10-11&g=2018-10-11)

* [https://www.legislation.gov.uk/uksi/2003/2498/regulation/8/m...](https://www.legislation.gov.uk/uksi/2003/2498/regulation/8/made)

* [https://www.legislation.gov.uk/uksi/2003/2498/note/made](https://www.legislation.gov.uk/uksi/2003/2498/note/made)

~~~
dahart
> There is no implicit license. This is an oft-circulated but fallacious
> rationalization

There is implicit copyright, I assume that’s what the parent comment was
referring to. In the US, it is automatically illegal to copy something and
redistribute it under copyright law [1]. The same is true in the EU [2]. While
copyrights are not licenses, the parent comment is correct in the sense that
one does not have a license to distribute copied content until one is granted
that license explicitly by the copyright holder.

[https://www.copyright.gov/help/faq/faq-
general.html#register](https://www.copyright.gov/help/faq/faq-
general.html#register)

[https://euipo.europa.eu/tunnel-
web/secure/webdav/guest/docum...](https://euipo.europa.eu/tunnel-
web/secure/webdav/guest/document_library/observatory/documents/div/FAQs%20on%20Copyright,%20Summary%20Report%20January%202017.pdf)

~~~
JdeBP
You and xe were both quite clearly talking about implicit licences.

------
55555
The ad revenue estimate could be off by as much as an order of magnitude. This
is a non-story. There are millions of spam websites on the internet.

~~~
desdiv
Unlike most spam websites, this one actually provides a useful and helpful
function. The next time I 404 on a PDF I'll be sure to search for it on this
site.

~~~
bcaa7f3a8bbc
Seriously, how long has he been online as an "ethical hacker"? For everyone
who has the experience of looking for some texts should know that there are
millions of this type of websites out there, the spam farms. Many of these
sites don't even have the files ( _a common tactic is putting the links to
your spamsite in a PDF, and name the PDF as some other documents people are
looking for but you don 't actually have, to disguise your spam as a file,
polluting the search result for everyone, bastard!_), and are just making the
visitors to click ads, or scamming people to enter personal information, to
pay for another related service or a premium account, and they can probably
make more money. And some other websites are even distributing malicious
software.

Meanwhile this website is actually an "honest" one that works for the visitors
as advertised! I think it would be great injustice by selectively just
enforcing the copyright laws by seizing this individual websites and its
income. After all, 30% of the websites on the entire Internet are like this. I
don't think the blog post really have a point.

~~~
JoeAltmaier
This one is a little different. There's the blatant, willful copyright
violation. And misrepresenting the doc authors as endorsing this website.

------
boffinism
Misleading title. As the article points out, the Alexa ranking puts it outside
the top 200k most visited sites.

~~~
furicane
You do realize that if it's in top 10 million, out of estimated ~640 million
websites, it's right at the top? It's not misleading at all, it's quite down
to the point, just like the article itself.

~~~
hahla
Alexa is generally a poor metric. In my experience directly comparing Google
Analytics to Alexa for hundreds of sites, anything above 50k rank is not
pushing meaningful traffic.

------
jrobn
Here is the real question everybody should ask themselves: In today’s world,
Why do I try to work honestly for a dollar?

~~~
emptybits
It's a very important question. IMO we should be most mindful of our work and
moneymaking since we spend most of our waking life doing these things.

When we are dishonest in our moneymaking, we are probably creating unnecessary
suffering for others. That's enough reason for some.

IMO it also feels easier to love and be loved (e.g. by friends, family, your
children, etc.) if you are honest in your dealings with others.

~~~
stackola
>IMO we should be most mindful of our work and moneymaking since we spend most
of our waking life doing these things.

Not if you whip up an aggregator site in a week that makes money on autopilot

------
arbuge
The upload buttons of those sites seem to be there mainly to create plausible
deniability. If any content creates trouble which cannot be made to easily go
away just by taking it down, the site owners can claim they did not upload it
themselves and that they don't have any records of who did. Or they could also
create fake upload records if they want to stretch things a bit further. In
practice, I think it's a very safe bet that 99.999999% of the documents on
those sites have been obtained through scraping other sites, and are not user-
submitted content.

------
Traubenfuchs
Who in their right mind puts their real whois information online? On my urls,
all that's real is the email address.

I applaud Vladimir Nesterenko for making money like this.

~~~
whatshisface
There's no telling whether or not that's his real name.

Edit: as a commenter below pointed out, just because a person with a name
lives at an address, they may not run a website that they're listed as
running. After all you probably live at an address, and if I was a fraudster I
could list you.

~~~
SOLAR_FIELDS
Except that if you read the article the reporters confirmed that a guy by that
name lived at that apartment complex by asking the neighbors. There’s no
guarantee that the person who registered is actually the person who lives
there but someone answers to that name at the address provided apparently.

~~~
prepend
Again, I wouldn’t be so sure. It’s a useful practice to use real people’s info
for precisely situations like this.

Step 1) register foo.com Step 2) make email address John.doe@foo.com Step 3)
register bar.com using John.doe@foo.com as contact info. Include contact info
for John Doe with a real address and random phone number

This meets Whois requirements and makes it a bit harder for any real lookups.
It’s easier than providing wholly fake info because journalists will Lee
dogging. When journalists contact John Doe who denies owning bar.com then they
will work it like an uncooperative subject.

~~~
Traubenfuchs
> Whois requirements

Are there any? My street name is literally "private" and my phone number
"000..." or something like that.

~~~
crystalPalace
"Because registration data connects individuals or organizations with domain
names, domain name registrants are required to provide accurate and reliable
contact details. If the domain name registrant knowingly provides inaccurate
information, fails to update information within seven days of any change, or
does not respond within 15 days to an inquiry about accuracy, the domain name
may be suspended or cancelled." \- ICANN
([https://whois.icann.org/en/primer#field-
section-3](https://whois.icann.org/en/primer#field-section-3))

------
dTal
Wow, the title doesn't do this justice. "Google makes millions from Russian
organized crime" might be better.

~~~
jerf
"Organized" usually implies a, well, _organization_. This seems like it may be
just one guy. I don't see anything about this site that would particularly
require an "organization".

~~~
zandjager
"Seems like it may be just one guy." One cannot make this statement weaker. We
don't know and we cannot know now whether it is just one guy.

~~~
stackola
It's very well possible though, so jumping to "organized crime" is also a
stretch

~~~
jessaustin
It's a well-known logical principle: always assume Russian organized crime!

------
freeflight
Why would a Russian individual host his, rather shady, operation in the smack
dab middle of Germany, and not somewhere in eastern Europe?

Why would an individual like that, who goes through the effort of obfuscating
Google analytics code, leave the domain registry information out in the open
like that?

Imho none of this really adds up, the whole setup seems to be quite complex
and sophisticated, I have a hard time believing that anybody who's as clever
as that, would screw it all up over the domain registry information.

~~~
Illniyar
That's easy to answer, people are not as smart and/or professional as you
think they should be. They can even do some smart things but be completely
ignorant in others.

------
holtalanm
the internet is full of these types of download sites.

meh.

dude really shouldn't have doxxed him though. we dont even know if the guy in
the whois is really the guy behind the site, since you can put anything in
there.

------
rukshn
Whoever doing the slideplayer website is doing a damn good SEO.

I see more slideplayer links on my Google search everyday than I see
SlideShare.

I always wondered why slideplayer which is a crappy ad galore website has more
content than SlideShare. Very interesting post.

------
zhte415
From the article

> an empire that makes a million dollar a year by illegally hosting 24.3
> million files copied from other sites

That's quite an interesting stat: that 24.3 carrots can be thrown on the web,
made hard to download, and a dollar can be earned.

------
intralizee
The site is nice. I can search "psychology" view 400 pdfs. Click on one, view
the pdf at the top of the browser and with the transcription below. All text
of the transcription is google translated when using chrome. It makes me think
about how it must be when obtaining financial freedom and just dumping money
on viewing whatever you want to read. Except I haven't found a good translator
for pdfs, so this transcription is nice on the site. Also most if not all of
the Pdfs are not complete and makes me think this site is beneficial to the
source publishers.

------
empath75
YouTube was full of pirated content when it started too. I don’t really see
what he’s doing wrong.

~~~
justin66
YouTube was full of pirated content when I looked at it this morning...

------
dangero
I wonder if this article ended up improving search ranking and revenues for
the creator. Another form of automated seo generation I see is youtube users
that scrape images and 3rd party news articles then convert them into a video
with a slide show and overlayed text to speech. Curious what their ad revenue
looks like. I assume a bot does all the work and creates thousands of videos
using that approach per day.

------
listic
> His platform is full of PDF files that are copied from other sites and re-
> hosted. This is illegal

Isn't 'illegal' a bit strong term for that?

~~~
aaronmdjones
No. An action can be (civilly) illegal without being a crime (i.e. criminally
illegal), but it's still illegal nonetheless.

~~~
aidenn0
Also, in the US at least, commercial copyright infringement can be criminally
illegal.

------
myro
But wait, earning this money and storing on AdSense account is tricky and
there's no such thing like buy bitcoin with adsense balance, right? To be able
to withdraw that amount the owner is expected to provide tax information and
so on..

~~~
zandjager
This is the real question. He could do it legitimately, as google doesn't
really care where your income comes from. Hence the implication of google in
this story as a party.

------
phkahler
IANAL but it seem a good idea to put a copyright notice on all documents. One
can hope a site like this doesn't scrape docs with copyright notices, but if
they do it anyway wouldn't you have a better case to sue for damages.

------
7j
The most annoying thing is traffic is comming from google.

Lately google search results are full of such low quality results, computer
generated text, etc.

Same applies also to YT - videos with 2x speed, inverted colors etc.

~~~
vidyesh
>Same applies also to YT - videos with 2x speed, inverted colors etc.

What is this? Can you provide some links to these kind of videos. Thank you.

------
Sujan
All the visitor data is based on faulty data - there is no connection between
the numbers his sources report and reality. If you have your own modestly
popular website, go check and see what those sites report for you. Those
numbers are totally made up.

That of course also means, that the revenue estimate is absolute speculation.

Even better: The original question "where did these files come from?" is not
even answered.

------
jabberthemutt
Clickbait garbage: _According to Alexa the site is ranked as the 209,334 most
visited site in the world, and the 3,945 most popular site in The
Netherlands._

~~~
JeanMarcS
I was asking myself if the intention is to state that it's one of the website
nobody knows most visited one (in the list of website nobody knows) or just
pure clickbait.

As the first possibility is impossible to nkow (if nobody knows about
them...), I agree with you and vote clickbait !

------
intea
Mirror (March '18):
[https://web.archive.org/web/20180328023025/https://sijmen.ru...](https://web.archive.org/web/20180328023025/https://sijmen.ruwhof.net/weblog/1623-one-
of-the-worlds-most-visited-websites-that-nobody-is-aware-of)

------
woogiewonka
Ready, set, go! Let's see how many scrapers HN readers will build now to try
to copy this "business" model.

~~~
mgamache
There's more to it then scraping. I would love a deep dive on how he/she was
able to get page rank for the sites. Usually these people employ black hat SEO
that works for while, but eventually fails when Google updates their ranking
algorithm. I always wondered if there are sites that could game the system
indefinably. Exactly the kind of info you don't announce on black hat SEO
forums.

~~~
jdhawk
Change your UserAgent to one of Google Bots, and start hitting the pages
hosting a PDF. Probably some hints there.

------
alanlamm
The article does not provide evidence that the users are fake (may well be
true, but presumption of innocence applies). If they are not fake, and reports
of copyright violation to the platform are duly considered, then it would
appear to comply with DMCA in much the same way as Youtube or Facebook do - am
I wrong?

~~~
dumbfounder
If it walks like a duck, talks like a duck, and acts like a duck, then it's
probably a duck. But you still need an ornithologist to put that duck in jail.
Or something like that.

------
wyqydsyq
> one of the world's most visited sites

> According to Alexa the site is ranked as the 209,334 most visited site in
> the world

Clickbait much? I think to be considered "one of the world's most visited" it
should at least be in the top 1,000

------
sheeshkebab
Is this some kind of political prep piece for passing legislation in Europe
against google, internet archive, and other search engines indexing public
content?

~~~
pjc50
No, it's a takedown of an ad farm site posing as a document hosting service.

~~~
zandjager
So what are two essential differences from google?

------
samstave
I love how we are kvetching about the legality of
rehosting/profiteering/copyright etc and not a single concern for the cancer
that ads truly are...

I understand the economic lubricant that ads provide to keep many things
connected and operating, but its such a one way flow that ads are a syphon
more that they are a grout.

------
cosmic_quanta
This is a fascinating investigation!

------
rcdwealth
The guy would do the same, but he cannot even dream of that income.

Report loaded with envy have no place on my list.

Let the Russian live.

