Hacker Newsnew | past | comments | ask | show | jobs | submit | prologic's commentslogin

I always thought this was WASM "value add" -- The "virtual machine" of the browser (although we had this back in the day with Shockwave, Flash and Java applets too hmmm :D)


Yeah, I guess Sun really missed the opportunity to showcase a Java applet as the 'model/controller' for a web page 'view'. Instead we just got applets as a little window in a big window.


This kind of reminds me of the days when Java™ was popular. I agree, if we're pushing to build software to target another machine (WASI) have we really improved anything?


It's much easier to sandbox in theory. We need an easy on ramp to get existing software to run, but once things are more comfortable being written wasm first then we can really see the net improvements.


I'm one of those people. I think it comes down to the intent. There is an implicit good will of those that do this that the data isn't abused or the infrastructure behind it overwhelmed (self-hosting). "Big Tech" just make this worse, because their motivations aren't the same as ours (small web).


I've read about Anubis, cool project! Unfortunately, as pointed out in the comments, requires your site's visitors to have Javascript™ enabled. This is totally fine for sites that require Javascript™ anyway to enhance the user experience, but not so great for static sites and such that require no JS at all.

I built my own solution that effectively blocks these "Bad Bots" at the network level. I effectively block the entirety of several large "Big Tech / Big LLM" networks entirely at the ASN (BGP) by utilizing MaxMind's database and a custom WAF and Reverse Proxy I put together.


A significant portion of the bot traffic TFA is designed to handle originates from consumer/residential space. Sure, there are ASN games being played alongside reputation fraud, but it's very hard to combat. A cursory investigation of our logs showed these bots (which make ~1 request from a given residential IP) are likely in ranges that our real human users occupy as well.

Simply put you risk blocking legitimate traffic. This solution does as well but for most humans the actual risk is much lower.

As much as I'd love to not need JavaScript and to support users who run with it disabled, I've never once had a customer or end user complain about needing JavaScript enabled.

It is an incredible vocal minority who disapprove of requiring JavaScript, the majority of whom, upon encountering a site for which JavaScript is required, simply enable it. I'd speculate that, even then, only a handful ever release a defeated sigh.


This is true. I had some bad actors from the ComCast Network at one point. And unfortunately also valid human users of some of my "things". So I opted not to block the ComCast ASN at that point.


Exactly. We've all been down this rabbit hole, collectively, and that's why Anubis has taken off. It works shockingly well.


I was planning on building a Caddy module for Anubis actually. Is anyone else interested in this?


Yes, I would! I love Caddy's set and forget nature, and with this it wouldn't be different. Especially if it could be triggered conditionally, for example based on server load or a flood being detected.


see https://github.com/TecharoHQ/anubis/issues/16

There is going to be a pretty big refactor soon, but once that's done we plan on crushing this out.


I would be interested to hear of any other solutions that guarantee to either identity or block non-Human traffic. In the "small web" and self-hosting, we typically don't really want Crawlers, and other similar software hitting our services, because often the software is either buggy in the first place (Example: Runaway Claude Bot) or you don't want your sites indexed by them in the first place.


For anyone wondering, Oracle holds the trademark for "JavaScript": https://javascript.tm/


Which arguably they should let go of


How do you know it's an LLM and not a VPN? How do you use this MaxMind's database to isolate LLMs?


I don't distinguish actually. There are two things I do normally:

- Block Bad Bots. There's a simple text file called `bad_bots.txt` - Block Bad ASNs. There's a simple text file called `bad_asns.txt`

There's also another for blocking IP(s) and IP-ranges called `bad_ips.txt` but it's often more effective to block an much larger range of IPs (At the ASN level).

To give you an concrete idea, here's some examples:

$ cat etc/caddy/waf/bad_asns.txt # CHINANET-BACKBONE No.31,Jin-rong Street, CN # Why: DDoS 4134

# CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN # Why: DDoS 4837

# CHINAMOBILE-CN China Mobile Communications Group Co., Ltd., CN # Why: DDoS 9808

# FACEBOOK, US # Why: Bad Bots 32934

# Alibaba, CN # Why: Bad Bots 45102

# Why: Bad Bots 28573


Do you have a link to your own solution?


I have a pretty similar one. (Works off of the same concept) https://github.com/JasonLovesDoggo/caddy-defender if you're curious. Keep in mind this will not protect you against residential IP scraping.


Not yet unfortunately. But if you're interested, please reach out! I currently run it in a 3-region GeoDNS setup with my self-hosted infra.


The Twtxt/Yarn community is larger than you think. As the founder of Yarn.social[1] (which itself uses the Twtxt spec and extensions[2]) and operator of the "flagship" instance twtxt.net[3] I often interact with around ~70 folks (_not including news feeds_).

[1]: https://yarn.social [2]: https://twtxt.net [3]: https://twtxt.net


Respectfully, 70 folks is not larger than I thought.


Man I remember when I had 70 friends on Myspace, that was an incredible amount of interaction in the day


Yeah sure, but as I said the community is actually much larger. It is also very hard to measure because Twtxt/Yarn is what I call, "truly decentralised". However you are right, the search engine/crawler puts the active feeds at around ~1000 or so. So orders of magnitude smaller than any "big tech" social ecosystem, but that's kind of the appeal really.


Alright. I'll comment. -- I find it interesting to learn just how much the Go compiler can "optimize your code away". That's both good and bad.

The point on benchmarking the right thing is 100% spot on, same goes for testing too. The optimization problem however is a bit too contrived IMO. When would you possible write code (aside from very trivial things) where the compiler would optimize all your code away, thus making your benchmarking invalid? I want to see a real-world example of where someone has been caught out by this?


Yarn.social (https://yarn.social) and Salty.im (https://salty.im) are both projects of mine making $0/month (community driven and will stay that way). Yarn.social is the oldest now at around ~2.5yrs


Author of the Salty IM Spec here and currently working on a reference client and broker (`saltyd1) implementations called Salty Chat (`salty-chat` CLI and TUI) and a PWA (Progressive Web App)

Thanks for posting this! Happy to answer any questions! The project is in rapid development and has only been alive for ~2 weeks or so.

So far most things are working nicely, docs could be better (of course) and we finally have the Mobile/Desktop/Web (PWA) App in a working state (UX issues aside).


Anyone have any contacts in Apple or know how to file a bug report with Apple that won't just get lost in the ether?!


This is a question I had been asking for several years until I came across Twtxt (the spec/format originally created by Felix, @buckket, read about it at https://twtxt.readthedocs.org) -- When I came across it I saw a bit more potential so I created (what is effectively) a multi-user client also called Twtxt over at https://twtxt.net (launched in Aug 2020). Since then we've also created a Mobile App (Goryon) available in the App Store (iOS) and Play Store (Android), we also offer free (at this time) hosting of pods (individual instances) at https://twt.social/ -- Today we have a dozen pods/instances and some ~300 users. My own pod (twtxt.net) sees around ~4M hits/month :O


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: