Hacker News new | past | comments | ask | show | jobs | submit login
A Recycled IP Address Caused Me to Pirate Books by Accident (nickjanetakis.com)
260 points by nickjj on May 8, 2018 | hide | past | favorite | 98 comments



This is known as "subdomain takeover" and is definitely a common problem. It's probably one of the most frequently reported bug types on hackerone.

I wonder -- has anyone written code to spin up EC2 instances and check for subdomains pointing to the IP? Not sure how you could do that efficiently (does rdns work after the IP has been recycled?), but a starting point might be gathering as many NXDOMAIN subdomains as possible, filtering the ones at cloud providers, and starting instances until you get a match.


Bing has been known to be a great search engine for "reverse IP searches." https://www.bing.com/search?q=ip%3A208.109.192.70

If you're fast enough, it could still be cached in bing when you search.


That's surprisingly good. And it turns out that if search Google for my server's IP, I'll find even more things I host, albeit through listings on a bunch of random semi-garbage sites.



Try passing 1.1.1.1 if you want a no-js version of the infinitely scrolling website.


That’s pretty cool. They must be buffering output themselves and sending raw http responses over the socket.


Interesting. I used this to discover that someone is pointing their (pretty decent) domain name towards my server. I won't, but I could theoretically make it display anything of my choosing.


Is it a private server? It might be a shared host so that many domains are pointing to the same IP.


Yes, private server with dedicated IP.


I suggested something similar while working at an AWS competitor - automated RBL searching. A huge number of our public IPs were used by spammers or phishers on the free tier. They would use a public IP until it was blocked, then release that and get another from the pool.

As a cloud host, it's embarrassing to have to explain to your customers that their shiny new IP is on an RBL because it was last used to steal passwords.


Did this company have an automated process for dealing with abuse tickets? Usually you’ll get at least one abuse ticket before an IP is added to RBL, right? It shouldn’t come as a surprise when your IP is added to the list after you’ve had X number of abuse complaints about the IP.

Seems like a better (or at least, complementary) solution would be more proactive monitoring of the abuse@ email and prompt terminations of clients who generate too many complaints.


We didn't have an automated process - we manually investigated and shut down accounts. On the supply side, we had poor fraud detection, also manual, almost entirely one person.

There was lots of room to automate things like this, but instead we had a robust internal Powershell library and numerous Slack bots. They embraced the "do things that don't scale" mantra a little too literally.


Then it's not a shiny new IP; it's a tarnished used IP that hasn't been refurbished.


Which is the only reality on ipv4


Sadly, the more common practice seems to be adding the entire IP blocks of known server/cloud providers to blacklists.


Definitely true in my experience, which is funny because it renders the whole point of the RBL entirely moot. It’s certainly not a sustainable solution.


It's comparable to using an RBL that excludes dynamic IP clients. The difference is one of culpability IMO, where the dynamic IP clients are likely to be innocents with compromised boxes but these cloud instance spammers are actively hostile (insofar as spammers are hostile)


I think I experienced this.

Received an email from campus IT that my phone was compromised.

On further digging I would load up a game every once in a while on the campus wireless. That game used a popular/legit Chinese CDN to host something on their news page which was flagged.

Easier to just use my wireless phone package instead of explaining my phone wasn't compromised.


I actually blogged about this in 2015, it's a problem for basically all cloud providers that allow recycled allocation of IPs: https://www.bishopfox.com/blog/2015/10/fishing-the-aws-ip-po...


Is it possible for IPv4 to not recycle IPs?

Is IPv6 big enough to not recycle IPs in the modern high-churn deployment world? (It's probably big enough to give every human (or payment card holding customer) a few million blocks of IPs that they can personally control, and recycle IPs within each accountable block that)


For IPv4, clearly no, at this point.

For IPv6, clearly yes, at least so far. It's pretty typical to get at least a /64 aka LAN with a VPS. That's 18 x 10^18 addresses. ISP customers typically get a /56 (256 LANs) or a /48 (65536 LANs).


Sites like DomainTools have vast amounts of DNS records indexed and allow reverse lookups by IP address (so not actual reverse DNS) which would make your idea pretty doable.


I believe https://securitytrails.com/ is a new player in this space


I wrote a script a few years ago to do the same thing with Digital Ocean floating IPs

https://gist.github.com/rxall/5305bcaa008e51685998


> Not sure how you could do that efficiently...

You could probably do that quite efficiently with Passive DNS data. There are a bunch of providers (e.g. FarSight, RiskIQ, many others) that collect and aggregate DNS request data and make it searchable over time.

RDNS is probably not going to be helpful. I'm not aware of any cloud provider setting a PTR record by default, and I think most won't allow you to do that at all.


GCP as well as Digital Ocean definitely allow for configuring PTR records. AWS does too, I think. At least with GCP, you have to demonstrate ownership of the domain.


I used to do this. I once ran across a prod HRC VPN subdomain and got myself banned from r/netsec.

https://www.reddit.com/r/netsec/comments/5wizf8/hillaryclint...


Should be simple. Just log 'host' header from http terminations, right?


> But before deleting it, I copied the IP address so I could open a support ticket on DigitalOcean. I figured they would like to know that someone is illegally distributing content on one of their servers. Now that they know the IP address, they can shut it down.

It’s also possible you exposed a web service that wasn’t meant to be public.


This. The article was interesting but the copyright knighthood put me off.

What if the person was simply serving those files for himself over the internet (I've done it countless times) and Google caught it because the author was careless with handling DNS entries? Now DO has an IP and an accusation, more power is given to the DMCA-strike-first-ask-later status quo, all for what? It's not child pornography we're talking about, it's books for Christ's sake. There's no harm, it does not affect your life, why go through the effort of bringing trouble to someone else because of your own lack of care with sensitive issues such as DNS entries?


> What if the person was simply serving those files for himself over the internet

There are a couple of reasons to believe that this is not the case.

First, there were thousands of them. Someone having thousands of books is not unreasonable, of course, but both the breadth and depth of this collection is such that it is extremely unlikely it is someone's personal library.

Second, the PDFs aren't the actual books. They are just short blurbs describing the book and containing download deep links into bookfreenow.com. A couple examples [1] [2]. Clicking to create an account so you can start downloading redirects through some ad companies (and possibly some shady affiliate marketing companies), eventually reaching some download site (I think) that tells you no free slots are available and asks you to make an account.

(The bookfreenow.com pages for each book all seem to be the same template with just the book info substituted. Even the comments on the each page are from the same people, at the same times, and say the exact same things except they have the correct book title on each page. They aren't even trying to make it look like the comments are legit).

[1] http://bookfreenow.com/downloads/the-lm3900-a-new-current-di...

[2] http://bookfreenow.com/downloads/essential-orthopaedics-by-j...


First, there were thousands of them. Someone having thousands of books is not unreasonable, of course, but both the breadth and depth of this collection is such that it is extremely unlikely it is someone's personal library.

Sounds like my personal library actually.

Second, the PDFs aren't the actual books. They are just short blurbs describing the book and containing download deep links into bookfreenow.com. A couple examples [1] [2]. Clicking to create an account so you can start downloading redirects through some ad companies (and possibly some shady affiliate marketing companies), eventually reaching some download site (I think) that tells you no free slots are available and asks you to make an account.

Well that’s a lot harder to explain in charitable terms! So is this even piracy, or just some kind of scam based on the promise of piracy?


It's a scam. I was searching some ebook for free and there are many links like that. It's funny that scammers might be more successful saving paid books than copyright warriors :)


It's funny that scammers might be more successful saving paid books than copyright warriors

You'll never actually get the book, because they don't have it. It's a scam to try to trap people who are trying to find free downloads of ebooks rather than paying for them.

(It's also kind of a funny definition of "save" you have there. With all due respect, if you want to save paid books, you should -- crazy as this may sound -- pay for them.)


Saving is wrong word, I guess. I mean that someone who's trying to find a pirated book will just stop trying after few unsuccessful attempts. For example copyright owners are forcing Google to hide search results with pirated content. But may be polluting search results with fake content is better strategy.


You can find the same thing for movie piracy in video form on YouTube. And old Napster/Kazaa had it for music files. The medium is the scammessage :-)


probably the second one. at some point in the signup process it will offer a free trial and require a credit card. not sure what happens next because I've never proceeded, but I'm pretty sure they don't actually have the books they claim to have


Shady business tactics don't make it illegal, and doesn't warrant an hosting provider block. There is worst on the Internet.


The only thing "shady" going on here is this discussion's unquestioned acceptance of propaganda language -- "piracy"/"pirate" -- as a descriptive term for alleged copyright infringement; copyright infringement which nobody has proven occurred in the first place. See https://www.gnu.org/philosophy/words-to-avoid.html#Piracy for more on how "piracy" is propaganda.


> What if the person was simply serving those files for himself over the internet (I've done it countless times) and Google caught it because the author was careless with handling DNS entries?

Author here, I reached out to DO because as a fellow content creator I felt morally obligated to report this.

I wouldn't like someone pirating my content and the people who created those 390,000+ PDFs put a lot of their time into making their content.

> There's no harm, it does not affect your life, why go through the effort of bringing trouble to someone else because of your own lack of care with sensitive issues such as DNS entries?

It does affect my life. I noticed now that there has been a couple of copyright infringement notices submit towards my domain (because of this PDF incident).

When you run an online business and your website is your entire brand, something like that is a big deal.

Also if you Googled for my name before I removed the A record, that SSL subdomain was coming up which was competing with my actual site's content. Not good!


> Also if you Googled for my name before I removed the A record, that SSL subdomain was coming up which was competing with my actual site's content. Not good!

Maybe do 301 redirections from ssl.nickjanetakis.com to your homepage, it can help with SEO.


I would agree with you in principle, but this is just another one of those spam sites that flood the search results with useless crap when you're looking for bookz; there's no copyright infringement here, just spam. They're easily distinguished from "real" PDFs of books because the text clearly doesn't make any sense.

Thus, in a "the enemy of my enemy is my friend" sort of way, I'm thankful for the OP for removing another fake ebook site from the Internet.


Then the webserver admin should work out how to stop google crawling his site and stop his site responding as any DNS name.


Default Nginx and Apache configs serve all IPs. If you don't share your server IP, it's reasonable to think your content won't be indexed by Google.


Google seems to be very good at finding new never seen before content. If you don't want it public then put some kind of authentication on it.


But Google also always respects robots.txt. If you don't want your content to appear just put a deny all for all domains. You can easily set that up in nginx by returning static content, you don't even need to create a file in the right directories.


Unless directory listing is turned on, I can’t see how Google would be able to index these PDFs given that no external links point to them.


And presumably it doesn't guess PATHs because Larry Page isn't in jail.


A personal library of 390,000 books? That's quite the collection. I wonder how much that person paid for those books?


It's not clear how OP reached that number, though. If they went by the number of results reported by Google, that's only an rough estimation.


You guessed it. I based it on the Google search results.

I did go back dozens upon dozens of pages (just skipping around) to spot check it and there were a ton of pages.

It's a lot less results now because I wrote this article 6 days before I published it. At this point the A record has been removed for almost a week.


There's something very /r/latestagecapitalism about this but I can't quite put my finger on it.


Just got some new ips and I've been on the other end of this. Some staging domain of a website still points to one of the ips I got and there's a health checker that keeps trying to ping /health on the domain.

Nowhere near the scale of this though, just some background noise I'll ignore


Same here. Soon after launching a cloud hosted virtual machine I noticed that it was getting a lot of unexpected HTTP traffic on port 80. They were GET requests with parameters that suggested to me that it was coming from some kind of Javascript browser tracker.

I rebooted the VM to get a new IP address and the traffic stopped. It's somebody else's problem now.


The problem for you is that you can't do much against it. The domain owner can easily change the record, you can't. The only way is to blackhole all requests coming in for domains other than the ones you want to host. Or is there any way to get the DNS records checked?


> Nowhere near the scale of this though, just some background noise I'll ignore

Same situation here. However being on the receiving end of a "formerly" Russian camgirl site makes this a little more than just noise. Any good ideas of what one could do with that?


Serve them a cheap gzip b0mb (example [0]) - it sould move their ass to clean up things.

[0] https://blog.haschek.at/post/f2fda


The users are often innocent, they just want to use the old page. Best is to just serve a 403 or refuse the connection.


That’s a nice way of getting your ass in a heap of trouble especially if you experience a drive-by by a legitimate service.

Essentially don’t do anything you wouldn’t do in a wrong number call or if someone is knocking on your door looking for the previous tenant.


This is a decent reminder to get rid of DNS entries to IP addresses you no longer control.

That said ...

> A few months ago I started to receive an absurd amount of notifications, but I ignored them.

Really? I find it pretty amazing that he chalked it up to “Google is probably on drugs”, without even investigating at all!


I can see myself doing the same thing, particularly if I didn't have much free time when it happened. I wouldn't expect Google Alerts to warn me of a security issue!


The problem is, when the Google Alert hit my inbox it didn't show the ssl.nickjanetakis.com subdomain in the alert snippet. It just showed "Nick Janetakis".

Still, I should have clicked through to see what was up, but then again, the links looked very suspicious. I don't make a habit of clicking a bunch of unknown links sent via email, especially not when running Windows.


Second this. I found Google to be the best indicator for any issues with a web page. They have pretty mature systems and crawl actively, anything flagging up there is likely to be an issue. Even if it had been spam you'd want to know who creates spam pages impersonating your domain.


Ha!

One of my old staging subdomains had an old Digital Ocean address left in it for a bit while we migrated some servers, and Google indexed some random ebook pirate site too, here[1] is a snap of the logs for anyone who is curious. Once I updated the DNS, Googlebot started to blow us up.

[1] https://node.zeneval.com/snaps/a79fe276b688da7b589ce539c9a4a...

I never would have even noticed, had it not been for Googlebot indexing the crap out of us, and causing 10s of thousands of sessions to be created in a short time which threw our Munin graphs off the charts.

The site we were staging ran fine, redis handled it without breaking a sweat, but we're not a public facing service, so I just straight up blocked Google Bot w/ an nginx rule.


> I know I made a stupid mistake by not removing the A record but this could happen to anyone. I would like to see more services only allow for DNS based authentication by adding TXT records.

There is plenty of reasons why one will prefer HTML verification over TXT DNS verification. It's usually faster, and more predictable. Plus DNS are far from being completely secure.


It's also pretty nice if you're a provider hosting a website for the client (e.g. Github Pages, Shopify, etc). Getting them to point an A record to us is hard enough, but at least it's only once. Then you can use HTML verification for setting up LE certificates, Analytics, etc.


Just out of curiosity, how do you know this just wasn't an intentional side effect of someone hosting a website on a DO box? Namely, was the box just responding to anything that would connect to it?

Google has probably already crawled that domain previously, and when it asked for that IP address, it found some other persons website.


The screenshot shows the blog author's name attached to all of the search results as a proper, spaced name. (As opposed to a domain substring) It looks like intended impersonation.


I'm not very security minded. I have old domains and subdomains that I used to use that have long since passed - that lived on server IP's that have also long since passed on from my ownership as well.

I just double checked all of my old stuff - and there's not a trace left out there. Apparently, I cleaned up all my old DNS entries as things moved on, even though none of those domains are my 'brand' (as the author states it is his). As a non-security minded person, I find it hard to believe a security-minded person, whose good at his trade, forgets to do this.


The only way someone is going to gain access to my server is if they manage to gain access to my workstation and steal my SSH key pair.

If it has happened before, you'd be foolish to think that it's impossible to happen again.

https://nvd.nist.gov/vuln/detail/CVE-2016-1247


What does this have to do with anything even remotely related to this article, other than it being a webserver? Symlink takeover is not a new vulnerability, and if someone has a user account on your server, you're already owned anyway. Escalating to root is trivial almost always.


The author spends the beginning of the article talking about how he takes security very seriously, that his webserver is practically uncompromisable, and that the odds of it being compromised are so remote because he has "the reflexes of a highly trained ninja" and doesn't run nginx as root.

I'm pointing out that his server isn't as uncompromisable as he's trying to lead the reader to believe.


If the author is truly a "ninja" they wouldn't be running their web application as the nginx www-data user in the first place, and then a web application exploit wouldn't inherently give anyone access to the nginx user either to exploit the log-rotation mechanism via symlink. One can read more about the CVE you linked here[1]. But basically the gist of it is this:

> As the /var/log/nginx directory is owned by www-data, it is possible for local attackers who have gained access to the system through a vulnerability in a web application running on Nginx (or the server itself) to replace the log files with a symlink to an arbitrary file.

This assumes the web application is also running as www-data, which wouldn't be that smart.

[1] https://legalhackers.com/advisories/Nginx-Exploit-Deb-Root-P...


From the article...

My site is static too, which means it’s only being hosted through nginx from a non-root user.


Yeah, so then you have to exploit nginx, not a web application. Good luck with that. If someone can get RCE through nginx alone, you're already toast.


What would be the reason to use someone else's domain that you don't have control over to point to an IP address?


They probably didn't even know that it was accessible via that domain. Their webserver responded to any request with the default site, and google decided to crawl ssl.nickjanetakis.com and found all the pdfs.


That's what I thought too but then I noticed that someone bothered to put "- Nick Janetakis" in the titles of those PDF pages (check the screenshot in the article).


Well spotted!

I don't think that's exactly what was going on though, although perhaps somebody else can chime in.

I don't think the "- Nick Janetakis" is actually in the title of the PDF, rather google has appended it to the actual title (the end of which has been replaced with an ellipsis).

I think google can get this from either the title of a html page or from a og:site_name entry of a html page (I'm not 100% on all this). It's possible that google took these from the "actual" ssl.nickjanetakis.com and still remembers the og:site_name and applies it to the pdf files?


I think that may be something Google appends to its search results for some links?

I googled my own domain (site:flurdy.com) and it appends "- flurdy" to some of my static pages. But not for all, especially not for subdomain apps. So I am not 100% sure.


So that they end up being the ones that are blamed for whatever illegal stuff the perp is up to.


What's the advantage of that compared to using a bare IP? You still need a server to host your illicit content, so you're still exposed that way. Also, DMCA requests are sent to service providers, not to whoever owns the domain. The whole arrangement is probably worse than a bare IP because your site can be "taken down" by someone else with no warning.


Which is what happened here but not before a few hundred thousand files were copied.

The whole point is that these domain/IP combinations are forgotten which means it could take a long time before the issue is discovered.


At a school I use to attend, their firewall filtered "inappropriate" content (which is a fun story on it's own...). Was a poor system, and in theory, would have a loophole around it...


I guess a spammer would be able to use the domain ranking for their spam until Google detects that content has changed. Probably easier than promoting a new spam domain


What puzzles me is that there are already sites that offer free subdomains. Wouldn't those be sufficient?


So, if I had a subdomain set up that way, and sci-hub or somebody came along and starting using it, would I have any legal obligation to do anything about it? Could my domain be seized?


I expect the domain registrar terms and conditions would get your domain shut down sooner or later.


Google is aggressive when it comes to crawling (which I think is OK) so it's very much possible that the one who hosted these PDF's had no idea that Google had crawled the site, or that it was under that domain.


It seems like they knew what they were doing, given that the search results have the author's name attached to them. (As a proper name, rather than cutting "nickjanetakis" out of "ssl.nickjanetakis.com")


What's the purpose for spreading PDFs like this? Are there ways to embed malware into PDFs so they attack the host machine of whoever downloads the files?


When this happened to me, I jumped to the same conclusion, that the PDFs must be a honeypot or something.

But yes, PDFs are exploitable, like any file format.

https://www.sans.org/security-resources/malwarefaq/pdf-overv...


There are people who feel good about sharing non-free materials with others. Perhaps this approach is for users where torrenting isn't an option.


Yep, like me


I don't disagree with the idea, but it is hard to think of a world where we only optionally pay for things. Do you think it's okay because the cost is prohibitive? Or because if we truly value it we will support it if it is free or not free? I don't necessarily know how to reason this issue out.


> The odds of that are remote because my workstation never leaves my office and I have the reflexes of a highly trained ninja.

Does HE ever leave his office?


if you're using cookies for sessions on your main domain this can be a very big flaw.


> I have Google Alerts set up so I get emailed when people link to my site. A few months ago I started to receive an absurd amount of notifications, but I ignored them. I chalked it up to “Google is probably on drugs”.

Between this quote and the bozo-level advice in the "Domain Validation Should Be More Strict" section ("I would like to see more services only allow for DNS based authentication by adding TXT records" is going to solve this problem? permanently decommissioning IPv6 addresses?), the one lesson I can take away from this article is to stay as far away as possible from any of this guy's security related courses.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: