Regarding point two, OP should connect to a VPN in Japan or somewhere he very isn't, use incognito mode, and see if the same content is served. I've seen hacked sites that are set up to serve normal content to where the attacker thinks the owner of the site lives, but serve phishing content or malware or whatever to everywhere else.
A 301 fits that bill because then the owners browser even when traveling will serve the good content
Can you get Google Safe Search to do that? I feel like my reports fall on deaf ears because SMS spammer's URLs would only serve 'bad' pages to $MyCountry (and nowadays do it behind a captcha, fuck you hcaptcha).
I have seen attacks where directly visiting the site doesn't show anything out of the ordinary, but visits coming from Google (referer) show different content. Have also seen ones where only User-Agent: Googlebot would see the modified version of the site.
(I doubt that is the case in OP's situation, but I have seen both of those methods of "hiding" multiple times now)
Yes, this is how most Wordpress malware works - they inject/publish ad or keyword spam content on the site if the user agent is googlebot. Regular users don't get the ads. It's partially why most people never realise their site has been hacked.
Or, try a mobile user-agent. I've seen loads of phishing pages that will only serve their malicious payloads to phones - this is especially common with the scams that are sent via SMS.
Yeah this is a good call-out. If the site is being used for drive-by or targeted malware there are other checks that may be happening alongside the redirect such as user agent, country of origin (like you mentioned), plugins installed, OS, or even time of day.
If they detect something that matches what they want, they may throw some intermediate 301's to pages that attempt to infect the user with something still ultimately redirecting to the "normal" page.
Just a note 301s are super sticky and browsers cache them even across incognito modes. Your best bet is to use a new browser after reconnecting to avoid false results.
On Chromium-based browsers, if you open the Developer Tools (F12 or Inspect in right click) and you go to the Network tab, you can click 'Disable Cache'.
In my experience, this solves the sticky 301 issue and you should have no issues with cached 301s anymore.
Works perfect for these kind of investigations or if you made a mistake during site development.
I'm not GP but a decade ago when I started out as a web developer I made the mistake of using 301s in production and at the time we never figured out how to get the browser to re-learn the responses for those pages without drastic measures.
I still never use 301s for that reason. Things may have changed, but I dare not try!
> I still never use 301s for that reason. Things may have changed, but I dare not try!
I use 301 for http:->https: redirects because (a) I doubt we're going back, (b) it prevents some cleartext leaks (like the Host header), and (c) it is slightly cheaper.
> we never figured out how to get the browser to re-learn the responses for those pages without drastic measures.
If you control the target URL it is easy, just redirect back. Seriously: The browser won't loop, it'll just fetch the content again and now not seeing a 301 will forget that nonsense ever happened. This is why 301 is usually a fine default for same-site redirects, or if the redirect target is encoded in the URL (such as in tracking URLs).
The big no-no is don't 301 to a URL you can't control unless you have the appropriate Cache-Control headers on the redirect.
Yeah that's a good point, but one way to think about a CDN is like a web browser that you control, so I say do it even with a CDN and remember you can always just flush the "browser" cache! (or in cloudfront's case: create an invalidation and wait a few seconds)
You can disable caching in Firefox's developer tools, this covers such cached redirects. Very useful combined with a persistent log of network activity to avoid clears after redirects.
There's a related site compromise where a hacked webserver behaves normally except, when the referrer is google.com, it adds a JavaScript redirect to the end of any page.
You go to example.com, everything looks normal. You click a link to example.com, you end up on a page selling herbal dick pills. Site owner yells at Google thinking it's their fault. Googlebot never gets served the redirect.
You should be able to do the same thing with 301 redirects.
A 301 fits that bill because then the owners browser even when traveling will serve the good content