Hacker News new | past | comments | ask | show | jobs | submit login

Hi all! Co-owner of Dreamwidth here.

Pretty cool to see my project hit the front page of HN, but definitely a bit of a /shrug moment on the subject itself. "Facebook gonna Facebook" I think is approximately how we feel about this.

I know here on HN we're used to hearing stories about scrappy startups trying to carve a piece of the pie big enough to exit on, but that is pretty much the exact opposite of what Dreamwidth is. Our motivations are very different, so this FB block is mostly a curiosity to us.

Dreamwidth is a small, neighborhood corner store kind of site. We're run by a couple of dedicated part-time staff (who have other jobs/responsibilities in life -- I personally work for Discord!) and a cadre of amazing volunteers who donate of their time and energy to make a nice little corner of the Internet that isn't driven by the cycle of VC and growth and user monetization.

We do not have any goals around growth, we don't advertise, and we ultimately don't care that much what the other platforms do. Our goal is to give people a stable home where they don't have to worry about their data being sold, their writing being monetized. Users choose to pay us for a few more advanced features (like full text search), and we support ourselves entirely off of that.

We are home to a large group of online roleplayers, Hugo Award winning fiction writers, Linux kernel developers, parents, security researchers, artists, activists, recipe bloggers, educators, and everything in between and around the edges who would rather work with a service owned and run by people who are motivated by something other than get-big-and-exit. Large communities of online roleplayers who get together and build whole worlds on Dreamwidth, who tell stories together. I'm constantly impressed by the creativity of our community.

Anyway, it's super cool to see Dreamwidth on the home page here. It's been my side project for over a decade now, and I'm quite proud of it. Even if modernizing a 20+ year old Perl project is a hellish undertaking at the best of times... but we keep going. :)

Great to hear all around. I like the focus on doing your own thing and ignoring the other platforms (and likely naysayers). We could all use a healthy dose of this in our side projects! Also - I would take it as a positive you are blocked on FB.

My wife and I tried to setup a simple business page for our local store we opened less than a year ago; they flag us as a fake/fraudulent account multiple times when we tried to created one; neither of us have personal/active FB accounts so I guess that's the reason (and this behaivor, yeah makes me double down on NEVER getting a FB account now). I even tried to emailed them 'proof' as they requested because my wife was worried it would really hurt us, nothing ever came of it. We finally decided it wasn't worth our effort, forgot about them and our store has thrived since. I'm happy to grow our business without having to deal with them. We've been using local and other ad platforms such as NextDoor.com, which I'd never heard of but one of our older customers brought to our attention. People talk about getting rid of Facebook, to me it starts with the actions you guys take and how my wife and I are going about it.

Don't support Facebook at all, they don't deserve it.

This might come off as a little rude, but it's sincere advice from someone who used to work in anti-spam (not at Facebook):

I had a quick look through Dreamwidth's "latest" page (https://www.dreamwidth.org/latest) earlier today, and a major portion of the posts on there were blatant spam for things like credit card scams, "Work from home and make $1000/day!", and so on.

You seem to be hosting a lot of spam, and those spam posts are also far more likely to be getting linked externally on sites like Facebook, since that's the reason they're being created.

Because Dreamwidth is effectively free website hosting along with a free new subdomain for each account, blocking individual subdomains is futile, and it's difficult for external sites to distinguish between spam and legitimate blogs.

I'm sure Facebook will unblock you fairly soon, but unless you get the spam on Dreamwidth under control, this will probably happen fairly often with different sites blocking it. It would be easy to end up with an impression of Dreamwidth being a spam-hosting site, and decide to block it (either manually or automatically).

Blogspot has always been in a similar situation and would get blocked from a lot of sites due to the sheer amount of spam it hosts.

You're definitely right -- this is an issue. I could very well believe that we tripped some FB spam measures.

We have a very manual anti-spam process right now that relies on humans to detect it and action it. We have a couple of very dedicated folks who end up looking every few hours, but it's not automated, and we don't have full timezone coverage.

It's definitely something I'd like to see us improve, but we've been focused on other projects (like switching from mid-90s HTML to a responsive design, which is a slow rewrite of the entire site). That said, if you have any advice on reasonably scalable ways of doing this in-house that don't involve sending our user content to a third party, I'd love to take any recommendations!

Feel free to email me, mark@dreamwidth.org, if you would rather do that. And if not, don't worry about it, I appreciate the comment anyway :)

The simplest spam filtering algorithm would be a naive bayes filter. It's essentially keep a count of words that appear in all posts, words that appear in spam posts, and words in non spam posts. Those counts + bayes rule will let you figure out the probability of spam given a word. It's called naive bayes because you assume each word in your post is independent of the others so probability the whole post is spam is just product of the probabilities.

The nice thing about this is it's pretty computationally light and straightforward to implement for any language. I have no clue as to your stack, but if you have python for your backend then sklearn is a good library that has a naive bayes classifier (plus a lot of other better options). Any post with a high probability of being spam, I'd automatically flag and by default just remove with the option for a user to ask for manual review. Main thing you'd need for this or any fancier approach is some dataset of spam/non spam posts. If you have an easy way of retrieving past posts that were labelled spam that should allow you to make a fine dataset. If you don't want to train on your own user posts (although only information kept is word counts here), you can look online for spam datasets and use one of those to train your classifier.

I used SpamBayes a few years ago http://www.spambayes.org/ (Is the project dead now?) (It has a PSF licence https://en.wikipedia.org/wiki/Python_Software_Foundation_Lic... https://en.wikipedia.org/wiki/Comparison_of_free_and_open-so...)

The nice part is that SpamBayes gives you two numbers, the spam "probability" and the ham "probability". When one of them is very close to 1 (like > .99) and the other is very close to 0 (like <.01), there is a good chance that the message is really spam or ham. And this classify almost all the messages. But from time to time you get a message where the numbers are not so clear, or both are big or both are small, and this means the classifier is confused and you really must take a look at the message.

Wow when this came out (I think this was the ‘original’) it felt quite ground breaking. Perhaps early 2000s it was?

Then google started doing that or something similar at scale and effectively eliminated spam in my mailbox ever since. (With the curious recent exception of some highly similar bitcoins spams)

Hmm, maybe it comes in waves? I just read over the last hour or so of posts on /latest/ and most of the posts there looked legitimate.

Yeah, same. I can't tell with the Russian posts, but the ones in English were a mixture of journal updates and fanfic.

There's about a dozen Russian posts now, only one of which looks like spam.

Controlling spam used to be about stopping unwanted messages sent to users. Now it has morphed into this idea that every site has the responsibility of content-policing their own users, lest what they publish be used to facilitate spam. Your advice may be pragmatic, but it shows how far we've slid down the slippery slope.

> Now it has morphed into this idea that every site has the responsibility of content-policing their own users, lest what they publish be linked from spam.

Not sure what you mean here. The problem Deimorz was bringing up wasn't just about users writing something, and spammers linking to it. It was that this site was being used to host the spam payloads. By spammers, not by actual users.

And this is how a lot of the early spam fighting worked: by finding hosts that allowed sending spam and publishing their IPs on blocklists. All mail traffic from those IPs, even if legit, would then be rejected by a large proportion of mail servers that subscribed to these blocklists.

Facebook users don't see those spam pages unless someone on Facebook sends a Facebook message to another Facebook user linking to them.

That's where the spamming is happening.

Compromised accounts trying to sell bogus Ray-Bans and tagging some friends seems to be a pretty common scam on Facebook. I see it in my feed a couple of times a year.

> this site was being used to host the spam payloads

Calling these "spam payloads" is incorrect. The spam payloads are on Faceboot's servers. These are sites that are linked to by the spam, ostensibly for the purpose of funneling to whatever the spam is trying to market. Trying to police generic web pages, rather than the spam itself, seems like an exercise in futility given the basic philosophy of the Internet.

> And this is how a lot of the early spam fighting worked: by finding hosts that allowed sending spam and publishing their IPs on blocklists

The situation has a similar shape, but there is a distinction as Dreamwidth is not actively sending spam but rather responding to requests from viewers. Still, we can look at the outcome of what happened to the email ecosystem - increased centralization of providers - for a warning of what's to come.

In this hypothetical, the message that is posted on Facebook would just be a link + something innocent that makes people click through. Why? Because the easiest form of spam filtering works by looking at the content. Spamming via a link rather than directly gives this kind of content filtering little to work on.

A typical way to deal with this is to consider domain reputation somehow, if the content contains a link. E.g. trust links to old domains more than young ones. Or trust sites that with lots of back links more than ones with none.

So an old domain with user created content, a good reputation , but little moderation or abuse protection turns into a great place to host this data. Eventually links to the domain get flagged one too many times, and it gets blocked.

I agree that they are not sending spam in this scenario. But neither were the open smtp relays of old. They just passed it through, while allowing the spammers to leech off of the relay’s reputation.

(Just to be clear, I have no knowledge of what happened here in reality. So I don’t know that DW is hosting spam, nor that it was linked to from Facebook. This is just an example of why a domain blocklist might be a totally reasonable option.)

Malware and childporn reduction efforts also often go after the hosts of that content. I'm not sure why calling the folks hosting this stuff what it is incorrect. Sure, childporn folks don't actually necessarily "send" child porn, they just respond to requests from viewers. But they host it.

These scam sites are like that - do you really think you can make $30,000 a week working 30 minutes a day from your home computer if you just send these idiot $25?

You're just listing the earlier stops on the slippery slope. Make hosts responsible for policing information when it's viscerally-revolting child porn. Then make hosts police content when it's directly harmful to people's computers. Then make hosts police content when it's an attempt to scam.

There's already a call to control political information when it has harmful effects on society. Next up is "your website was blacklisted because you allowed a user to link to Plandemic". I agree Plandemic has no redeeming purpose, but censorship is not the answer.

I'm explaining why sites that HOST but do not necessarily send content are blocked.

I've got no problem with their operation, but YOU are going down a VERY dangerous and slippery slope by saying I can't block domains that clearly host trash because they might host something else.

On my network I can block child porn, malware sites, scam sites and even entertainment sites like youtube. If you are running a service that mixes the content together, then you may be blocked by folks (like me) who don't have time to chase down every (free) subdomain you allow scammers to create.

That is my right. Period. Full stop. That is not censorship.

Folks here get censorship confused. The govt does virtually nothing to stop these scam sites - so they are certainly not being censored. I'm fine if govt does nothing, as long as communities of people can block these places.

And yes, if you run a site on the internet and don't make it slightly difficult for scammers to use your site to host crap, then other folks in the neighborhood will move the heck away from you.

> Folks here get censorship confused. The govt does virtually nothing to stop these scam sites - so they are certainly not being censored

It seems like you're getting confused on what censorship is. https://en.wikipedia.org/wiki/Censorship . Censorship can be done by the government, and it also can be done by sufficiently powerful private entities.

Also, nowhere have I argued that anyone shouldn't block whatever they'd like on their personal infrastructure. Although if you do it to your kids, then you are indeed censoring.

Sites have zero responsibility to monitor for spam. Other sites have no obligation to link to them.

And that is what we are talking about. An obligation to link others

Want to personally thank you as a user of DW for a few years that migrated over from tumblr after the NSFW ban - I can't thank you and the team enough for what y'all provide. It's a haven and a remnant of the "old web" that is honestly the one thing aside from my personal webring-esque site that I can /trust/ not to change with trends (whether it be payment trends or idiotic aesthetic changes that end up making more of a mess than not). Having a place for fandom analysis and journal posts and just to exist with some level of privacy is a rare treasure. Glad to see your thoughts on this FB ban.

Super cool to have people with a mission(other than optimizing KPI for profit). It wasn’t obvious for me how to discover quality content though, any words on how to use this site or how to think about it? Do we need to know people beforehand that publish things that we want to follow?

There are a couple of discoverability mechanisms, but it really depends on what you're looking for.

Usually people follow the network -- find one person that's interesting, see who they talk to, and go from there.

Another approach is to see what "interests" are popular and click through to see who shares those/is active: https://www.dreamwidth.org/interests?view=popular

But, TBH, there's a lot of happenstance and serendipity (or are those the same).

I really like the product you've made, and use it regularly. It might be helpful for your user base, and the project as a whole if one responded with a little more than a shrug. Facebook may be more willing to listen to you if you contact them then random users in getting this resolved.

Yeah, that's fair. We did follow up and I'm told we're unblocked now.

As a Dreamwidth user, I would like to use this occasion to express my thanks for this decision. I've seen too many blogging platforms being swallowed by large companies, shut down, "improved" into oblivion by marketers and made completely unusable in the chase of "growth". You're doing it right. Thank you.

It’s really great to see how far Dreamwidth has come since the start.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact