
Improving support for adult content on Mastodon - zach43
https://blog.joinmastodon.org/2019/05/improving-support-for-adult-content-on-mastodon/
======
lifthrasiir
There is a Ruby port [1] publicly available. From the encoding routine [2], it
seems to be an sRGB correction followed by a two-dimensional DCT. The first
two bytes encode parameters---the size of the output matrix n x m and the
quantization scale used for ACs; next 4 bytes encode the DC, then (nm-1)
quantized ACs (2 bytes per AC) follow. The comparison to Facebook's similar
solution [3] would be interesting (but I realize that Blurhash tries to _hide_
explicit images, so technically they have different goals).

[1]
[https://github.com/Gargron/blurhash/](https://github.com/Gargron/blurhash/)

[2]
[https://github.com/Gargron/blurhash/blob/master/ext/blurhash...](https://github.com/Gargron/blurhash/blob/master/ext/blurhash/encode.c)

[3] [https://code.fb.com/android/the-technology-behind-preview-
ph...](https://code.fb.com/android/the-technology-behind-preview-photos/)

~~~
WAHa_06x36
I created the algorithm, and there is a proper release of it with
documentation and implementations in a few languages coming... "soon". It's a
work project, so trying to find time to work on it is always tricky.

If you have any questions, feel free to ask.

The main goal of it is to be a placeholder while images load, but it also has
a nice secondary use for hiding sensitive images here. Other aims of the
algorithm is to be very simple and easy to implement, and to use very small
amounts of data to make it comfortable to include in databases and responses
(Facebook uses 200 bytes of data, BlurHash uses around 30, depending on
settings.)

~~~
jim-jim-jim
Off topic but I did a double take when I scrolled past your name. Are you the
same guy who did some kind of imageboard related stuff years ago? I know
that's vague, but my brain is fried and I can't remember why I remember you.

~~~
WAHa_06x36
Many, many years ago, yes. Well remembered.

~~~
claudiawerner
I'd just like to say thanks for creating Wakaba and Kareha and furthering
internet culture (and its study). I can't begin to recount the times I had
hosting a small board with a few friends and of course the good experience on
all the Wakaba boards I used, and my blog which for a while was running on an
admin-OP-only basis. Wakaba was arguably what got me into Perl.

------
BlackLotus89
Reminds me of
[https://news.ycombinator.com/item?id=15053064](https://news.ycombinator.com/item?id=15053064)

I normally hate this kind of blur and I'm not sure if this is worth such a big
release note. I normally see such filters applied (forced on users by platform
conventions), requiring a login to "unlock" and often you find yourself
wondering why it was blocked in the first place....

Could be good to hide spoilers thought

~~~
WAHa_06x36
This is used for sensitive images, but also as a placeholder while the full
image loads, as the entire blurred image can easily be contained in the JSON
data (it's around 30 characters long).

------
leshokunin
How does Mastodon handle governance for content that gets reported? It sounds
like a logistics nightmare. It's nice that the apps are getting a way to blur
things, but I would imagine that the bigger problem would be bad actors
running instances, and finding ways to keep them away.

~~~
proactivesvcs
Mastodon doesn't. It's the software. Instance (server) administrators and
moderators choose to handle reports.

------
WilliamEdward
I really like the idea of Mastodon, but i can't shake this feeling that their
name is kind of awful. I get that they're not trying to be exactly like
twitter but by comparison, twitter is a much bubblier and friendlier word.

~~~
olah_1
And the language of "toot" is extremely hokey and will never be taken
seriously. It implies farting for most Americans.

"Tweet" reminds people of birds. As in "a little bird told me". Which is
appropriate for a gossip style network.

~~~
jamesrcole
[Edit: people downvoting this, care to explain what you disagree with in it?]

 _Breaking news. In a series of explosive toots...

The foreign minister was forced to resign after a number of offensive toots...

OMG, your toots are so funny!...

Everyone's talking about what J K Rowling tooted today...

I can't believe you tooted that!...

...and so on_

If you think I'm joking then that shows how bad a name "toot" is. Because
these are all phrases that people would say about tweets, and which need to
sound non-ridiculous for any such system.

~~~
jclulow
That honestly all sounded ridiculous a decade ago with "tweet", too. The
success of the platform has normalised the nomenclature.

~~~
jamesrcole
I don't agree with that at all.

It's difficult to show evidence for the absence of something, though.

It'd be possible to show evidence of it, like that every time the word "tweet"
was brought up the were people saying it sounded ridiculous and embarrassing.

------
coldacid
Now if only certain instances would actually default to marking all their
content as adult content. This is great to have but useless when certain 18+
instances just bare all kinds of smut clear to the timeline. (I'm talking
mainly of the *blr instances, in case you haven't already hidden them for
yourselves.)

------
barbs
I feel like "BlurHash" is a misleading name, since it's not a hash of the
image data that gets created, it's just highly compressed to just some data
about the colours used. Unless I'm misunderstanding something?

~~~
viraptor
You could say that hash is a very aggressive, lossy compression.

Same image will result in same hash. Different images will often result in
different hashes. That makes it a valid hash.

~~~
piaste
> You could say that hash is a very aggressive, lossy compression.

You could but it would be about as misleading as a sentence could possibly be.

The purpose of compression is to preserve content as much as possible: similar
inputs should give similar outputs, and the output should provide as much
information about the input as it can.

Hashing deliberately does the exact opposite - slightly different inputs
should give wildly different outputs - as its primary purpose in the case of
crypto hashes, and in the case of index hashes as a performance optimization
(which is the primary purpose of index hashes).

~~~
viraptor
If you want specifically a cryptographic/indexing hash, where 1 bit of change
in input changes ~50% of bits in the output, then that's one possible
goal/restriction. But that's just one kind of hash function.

But you can have hash functions with the goal of preserving similarity. For
example soundex is a hash function with that constraint. From:
[https://pdfs.semanticscholar.org/06d6/8587c27058dd6ab3fb8238...](https://pdfs.semanticscholar.org/06d6/8587c27058dd6ab3fb8238f8556cf452ae67.pdf)

> For example value 1 = "Damieva" and value 2 = "Dameiva." These two values
> will produce the same Soundex hash value, creating a match.

There's also the whole class of LSH [https://en.wikipedia.org/wiki/Locality-
sensitive_hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing)

