Can someone point out the authors robots.txt where the offense is taking place? ...

denschub · 2024-12-30T18:42:20 1735584140

the robots.txt on the wiki is no longer what it was when the bot accessed it. primarily because I clean up my stuff afterwards, and the history is now completely inaccessible to non-authenticated users, so there's no need to maintain my custom robots.txt.

alphan0n · 2024-12-30T20:24:09 1735590249

https://web.archive.org/web/20240101000000*/https://wiki.dia...

denschub · 2024-12-30T20:31:25 1735590685

notice how there's a period of almost two months with no new index, just until a week before I posted this? I wonder what might have caused this!!1

(and it's not like they only check robots.txt once a month or so. https://stuff.overengineer.dev/stash/2024-12-30-dfwiki-opena...)

alphan0n · 2024-12-30T21:37:07 1735594627

:/ Common Crawl archives robots.txt and indicates that the file at wiki.diasporafoundation.org was unchanged in November and December from what it is now. Unchanged from September, in fact.

https://pastebin.com/VSHMTThJ

https://index.commoncrawl.org/

denschub · 2024-12-31T04:21:05 1735618865

just for you, I redeployed the old robots.txt (with an additional log-honeypot). I even manually submitted it to the web archive just now so you have something to look at: https://web.archive.org/web/20241231041718/https://wiki.dias...

they ingested it twice since I deployed it. they still crawl those URLs - and I'm sure they'll continue to do so - as others in that thread have confirmed exactly the same. I'll be traveling for the next couple of days, but I'll check the logs again when I'm back.

of course, I'll still see accessed from them, as most others in this thread do, too, even if they block them via robots.txt. but of course, that won't stop you from continuing to claim that "I lied". which, fine. you do you. luckily for me, there are enough responses from other people running medium-sized web stuffs with exactly the same observations, so I don't really care.

alphan0n · 2024-12-31T04:32:35 1735619555

What about the CommonCrawl archives? That clearly show the same robots.txt that allows all, from September through December?

You’re a phony.

denschub · 2024-12-31T04:50:33 1735620633

Here's something for the next time you want to "expose" a phony: before linking me to your investigative source, ask for exact date-stamps when I made changes to the robots.txt and what I did, as well as when I blocked IPs. I could have told you those exactly, because all those changes are tracked in a git repo. If you asked me first, I could have answered you with the precise dates, and you would have realized that your whole theory makes absolutely no sense. Of course, that entire approach is mood now, because I'm not an idiot and I know when commoncrawl crawls, so I could easily adjust my response to their crawling dates, and you would of course claim I did.

So I'll just wear my "certified-phony-by-orangesite-user" badge with pride.

Take care, anonymous internet user.

alphan0n · 2025-01-01T02:14:08 1735697648

>I'm not an idiot and I know when commoncrawl crawls

When will commoncrawl crawl your site again?

alphan0n · 2025-01-01T04:42:43 1735706563

Gentleman’s bet. If you can accurately predict the day of four of the next six months of commoncrawls crawl, I’ll donate $500 to the charity of your choice. Fail to, donate $100 to the charity of my choice.

alphan0n · 2025-01-01T07:04:42 1735715082

Or heck, $1000 to the charity of your choice if you can do 6 of 6, no expectation on your end. Just name the day from February to July, since you’re no idiot.

alphan0n · 2024-12-31T05:06:15 1735621575

◔_◔