Pretty cool to see my project hit the front page of HN, but definitely a bit of a /shrug moment on the subject itself. "Facebook gonna Facebook" I think is approximately how we feel about this.
I know here on HN we're used to hearing stories about scrappy startups trying to carve a piece of the pie big enough to exit on, but that is pretty much the exact opposite of what Dreamwidth is. Our motivations are very different, so this FB block is mostly a curiosity to us.
Dreamwidth is a small, neighborhood corner store kind of site. We're run by a couple of dedicated part-time staff (who have other jobs/responsibilities in life -- I personally work for Discord!) and a cadre of amazing volunteers who donate of their time and energy to make a nice little corner of the Internet that isn't driven by the cycle of VC and growth and user monetization.
We do not have any goals around growth, we don't advertise, and we ultimately don't care that much what the other platforms do. Our goal is to give people a stable home where they don't have to worry about their data being sold, their writing being monetized. Users choose to pay us for a few more advanced features (like full text search), and we support ourselves entirely off of that.
We are home to a large group of online roleplayers, Hugo Award winning fiction writers, Linux kernel developers, parents, security researchers, artists, activists, recipe bloggers, educators, and everything in between and around the edges who would rather work with a service owned and run by people who are motivated by something other than get-big-and-exit. Large communities of online roleplayers who get together and build whole worlds on Dreamwidth, who tell stories together. I'm constantly impressed by the creativity of our community.
Anyway, it's super cool to see Dreamwidth on the home page here. It's been my side project for over a decade now, and I'm quite proud of it. Even if modernizing a 20+ year old Perl project is a hellish undertaking at the best of times... but we keep going. :)
My wife and I tried to setup a simple business page for our local store we opened less than a year ago; they flag us as a fake/fraudulent account multiple times when we tried to created one; neither of us have personal/active FB accounts so I guess that's the reason (and this behaivor, yeah makes me double down on NEVER getting a FB account now). I even tried to emailed them 'proof' as they requested because my wife was worried it would really hurt us, nothing ever came of it. We finally decided it wasn't worth our effort, forgot about them and our store has thrived since. I'm happy to grow our business without having to deal with them. We've been using local and other ad platforms such as NextDoor.com, which I'd never heard of but one of our older customers brought to our attention. People talk about getting rid of Facebook, to me it starts with the actions you guys take and how my wife and I are going about it.
Don't support Facebook at all, they don't deserve it.
I had a quick look through Dreamwidth's "latest" page (https://www.dreamwidth.org/latest) earlier today, and a major portion of the posts on there were blatant spam for things like credit card scams, "Work from home and make $1000/day!", and so on.
You seem to be hosting a lot of spam, and those spam posts are also far more likely to be getting linked externally on sites like Facebook, since that's the reason they're being created.
Because Dreamwidth is effectively free website hosting along with a free new subdomain for each account, blocking individual subdomains is futile, and it's difficult for external sites to distinguish between spam and legitimate blogs.
I'm sure Facebook will unblock you fairly soon, but unless you get the spam on Dreamwidth under control, this will probably happen fairly often with different sites blocking it. It would be easy to end up with an impression of Dreamwidth being a spam-hosting site, and decide to block it (either manually or automatically).
Blogspot has always been in a similar situation and would get blocked from a lot of sites due to the sheer amount of spam it hosts.
We have a very manual anti-spam process right now that relies on humans to detect it and action it. We have a couple of very dedicated folks who end up looking every few hours, but it's not automated, and we don't have full timezone coverage.
It's definitely something I'd like to see us improve, but we've been focused on other projects (like switching from mid-90s HTML to a responsive design, which is a slow rewrite of the entire site). That said, if you have any advice on reasonably scalable ways of doing this in-house that don't involve sending our user content to a third party, I'd love to take any recommendations!
Feel free to email me, email@example.com, if you would rather do that. And if not, don't worry about it, I appreciate the comment anyway :)
The nice thing about this is it's pretty computationally light and straightforward to implement for any language. I have no clue as to your stack, but if you have python for your backend then sklearn is a good library that has a naive bayes classifier (plus a lot of other better options). Any post with a high probability of being spam, I'd automatically flag and by default just remove with the option for a user to ask for manual review. Main thing you'd need for this or any fancier approach is some dataset of spam/non spam posts. If you have an easy way of retrieving past posts that were labelled spam that should allow you to make a fine dataset. If you don't want to train on your own user posts (although only information kept is word counts here), you can look online for spam datasets and use one of those to train your classifier.
The nice part is that SpamBayes gives you two numbers, the spam "probability" and the ham "probability". When one of them is very close to 1 (like > .99) and the other is very close to 0 (like <.01), there is a good chance that the message is really spam or ham. And this classify almost all the messages. But from time to time you get a message where the numbers are not so clear, or both are big or both are small, and this means the classifier is confused and you really must take a look at the message.
Then google started doing that or something similar at scale and effectively eliminated spam in my mailbox ever since. (With the curious recent exception of some highly similar bitcoins spams)
Not sure what you mean here. The problem Deimorz was bringing up wasn't just about users writing something, and spammers linking to it. It was that this site was being used to host the spam payloads. By spammers, not by actual users.
And this is how a lot of the early spam fighting worked: by finding hosts that allowed sending spam and publishing their IPs on blocklists. All mail traffic from those IPs, even if legit, would then be rejected by a large proportion of mail servers that subscribed to these blocklists.
That's where the spamming is happening.
Calling these "spam payloads" is incorrect. The spam payloads are on Faceboot's servers. These are sites that are linked to by the spam, ostensibly for the purpose of funneling to whatever the spam is trying to market. Trying to police generic web pages, rather than the spam itself, seems like an exercise in futility given the basic philosophy of the Internet.
> And this is how a lot of the early spam fighting worked: by finding hosts that allowed sending spam and publishing their IPs on blocklists
The situation has a similar shape, but there is a distinction as Dreamwidth is not actively sending spam but rather responding to requests from viewers. Still, we can look at the outcome of what happened to the email ecosystem - increased centralization of providers - for a warning of what's to come.
A typical way to deal with this is to consider domain reputation somehow, if the content contains a link. E.g. trust links to old domains more than young ones. Or trust sites that with lots of back links more than ones with none.
So an old domain with user created content, a good reputation , but little moderation or abuse protection turns into a great place to host this data. Eventually links to the domain get flagged one too many times, and it gets blocked.
I agree that they are not sending spam in this scenario. But neither were the open smtp relays of old. They just passed it through, while allowing the spammers to leech off of the relay’s reputation.
(Just to be clear, I have no knowledge of what happened here in reality. So I don’t know that DW is hosting spam, nor that it was linked to from Facebook. This is just an example of why a domain blocklist might be a totally reasonable option.)
These scam sites are like that - do you really think you can make $30,000 a week working 30 minutes a day from your home computer if you just send these idiot $25?
There's already a call to control political information when it has harmful effects on society. Next up is "your website was blacklisted because you allowed a user to link to Plandemic". I agree Plandemic has no redeeming purpose, but censorship is not the answer.
I've got no problem with their operation, but YOU are going down a VERY dangerous and slippery slope by saying I can't block domains that clearly host trash because they might host something else.
On my network I can block child porn, malware sites, scam sites and even entertainment sites like youtube. If you are running a service that mixes the content together, then you may be blocked by folks (like me) who don't have time to chase down every (free) subdomain you allow scammers to create.
That is my right. Period. Full stop. That is not censorship.
Folks here get censorship confused. The govt does virtually nothing to stop these scam sites - so they are certainly not being censored. I'm fine if govt does nothing, as long as communities of people can block these places.
And yes, if you run a site on the internet and don't make it slightly difficult for scammers to use your site to host crap, then other folks in the neighborhood will move the heck away from you.
It seems like you're getting confused on what censorship is. https://en.wikipedia.org/wiki/Censorship . Censorship can be done by the government, and it also can be done by sufficiently powerful private entities.
Also, nowhere have I argued that anyone shouldn't block whatever they'd like on their personal infrastructure. Although if you do it to your kids, then you are indeed censoring.
And that is what we are talking about. An obligation to link others
Usually people follow the network -- find one person that's interesting, see who they talk to, and go from there.
Another approach is to see what "interests" are popular and click through to see who shares those/is active: https://www.dreamwidth.org/interests?view=popular
But, TBH, there's a lot of happenstance and serendipity (or are those the same).