
How Did Yandex Discover a Private URL? - foxylad
I recently set up a VPS server to provide a private service for an Appengine app. To test exception handling, I set up a test URL &#x2F;boo.hoo&#x2F;test, and visited it a couple of times with Chrome browser on Ubuntu to make sure things worked the way I expected. This URL has never been used by the app.<p>I forgot to remove the test code, and this morning (three weeks later) received an error message, which shows my exception handling works... except I hadn&#x27;t visited the URL and I can&#x27;t understand how anyone else knew it existed.<p>The request log shows (long sequences of base64 replaced with &quot;...&quot;):<p><pre><code>    46.42.171.81 - - [31&#x2F;May&#x2F;2018:06:40:12 +1200] &quot;GET &#x2F;boo.hoo&#x2F;test HTTP&#x2F;1.1&quot; 401 3817 &quot;http:&#x2F;&#x2F;yandex.ru&#x2F;clck&#x2F;jsredir?from=yandex.ru%3Bsearch%3Bweb%3B%3B&amp;text=&amp;etext=1804.R2Y-...&amp;state=...&amp;sign=...&amp;keyno=0&amp;cst=...&amp;ref=...&amp;l10n=ru&amp;cts=1527705451111&amp;mc=4.949890056&quot; &quot;Mozilla&#x2F;5.0 (Windows NT 10.0; WOW64) AppleWebKit&#x2F;537.36 (KHTML, like Gecko) Chrome&#x2F;63.0.3239.132 YaBrowser&#x2F;18.2.1.174 Yowser&#x2F;2.5 Safari&#x2F;537.36&quot;
</code></pre>
So how did Yandex discover this URL?
======
gingerlime
We're just investigating a similar issue on our site. We have an internal URL
only available to admins (and would return a 403 otherwise, not linked to from
any public page), that suddenly got crawled by Yandex. We checked the IPs of
the indexing itself and they are legit.

We have only a handful of admins with access to this resource. It's possible
one of them had a malicious extension I suppose, but even if they did, how did
this leak to Yandex?

My colleague also found this discussion[0] recently, which reports a similar
problem.

[0][https://www.webmasterworld.com/webmaster/4829663.htm](https://www.webmasterworld.com/webmaster/4829663.htm)

~~~
foxylad
What IP address did your request come from? Someone made the point that my
request didn't come from Yandex itself, just appeared to be referred from
them.

~~~
gingerlime
77.88.47.81 (which has a reverse PTR of 77-88-47-81.spider.yandex.com.).

We also saw that this page was indexed by Yandex when we searched on their
website. This page wasn't/isn't indexed by Google, Bing, DuckDuckGo etc by the
way...

------
natch
Is /boo.hoo by itself (without the /test subpath) public? Adding /test to
anything seems a pretty obvious heuristic for a search indexing crawler to
try.

Also since Chrome is a Google product, you should expect they are logging what
you do. So maybe Yandex has breached them? Yikes.

Other possibilities are somewhere in the network between you and the server,
or somewhere in the Ubuntu networking stack(!). Maybe you thought of all
these, but I'm mentioning them since you didn't.

~~~
foxylad
No, boo.hoo on it's own returns a 404, and there is no other instance of
boo.hoo in the app's routes. I hate to sound so paranoid, but I'm guessing
this is evidence of Russia's inflitration of the internet.

------
QuinnyPig
Did you have any extensions enabled?

~~~
foxylad
No extensions - I thought of that. I'm reasonably security-aware, use Ubuntu
and Duck Duck Go for search.

