
Dataset of classified screenshots from Tor hidden services - adulau
https://www.circl.lu/opendata/circl-ail-dataset-01/
======
mdip
I was reviewing the graph and was surprised at the rankings of things.
Frankly, there's a lot more "not illegal" things going on in Tor than I would
have expected. I expected there was a reasonable amount, but not representing
so much of the total[0].

It's a double-edged sword in a way -- more people using the project for doing
non-illegal things means perhaps the moniker "Dark" could be replaced with a
less hostile word (Private? ...no...)[1].

Bad, in that, it speaks a bit to users making inconvenient choices because
they perceive their privacy is threatened enough to warrant it.

I was glad to see that they scrubbed the content -- would have been one scary
tarball to download, otherwise, and very illegal for them to post considering
some of those categories. It makes me wonder how they avoided running afoul of
the law just collecting/viewing it for the purpose of scrubbing it, but IANAL.
It'd keep me from being involved on any project like that, though!

[0] I'm eye-balling the counts for a few categories in addition to the obvious
"other-not-illegal", so it's probably not 100%, and I'm assuming US law
(specifically WRT speech)

[1] I get "Dark" is meant as "invisible" and it's perhaps even a more accurate
description since it's not completely "invisible" unless you take additional
steps (and the software contains no known flaws), but sufficiently invisible
for most adversaries. However, it's spoken in the same breath as "drug/theft
markets" and is used more like "dark alley". Not sure there's a better term
and I'm probably bike-shedding, anyway.

~~~
superkuh
I couldn't agree more. I run a number of hidden services on Tor. That's
because for every website I create it's a simple thing to make it a tor hidden
service too. My ham radio and science hobby websites are in no way, "dark". I
link to them (and vice versa) from my clearweb sites and host from my home
connection/IP.

Plus on Tor you actually _own_ your domain rather than lease it on the whim of
some company easily pressured by political and social winds.

~~~
mpfundstein
How do you own it ?

~~~
Diederich
Your domain is a predictable derivative of your hidden node's key, which is
randomly generated. It's cryptographically impossible for the same domain to
be created more than once.

~~~
creatornator
Not impossible, just unlikely. Collisions can exist in theory in any hashing
algorithm (though it may be a 2^2048 large address space or something)

~~~
Dylan16807
Which is what "cryptographically impossible" means. Even with all the
computation on the planet for a million years you can't get a glimmer of a
chance.

------
rolltiide
Selling stuff on darknet markets must be lucrative just because 1 or 2
researchers has to always buy it.

~~~
VincentZ
Feel free to check which amount of money the bitcoins addresses you can find
online have received .. And you'll figure out :)

------
hclalpha
I'm surprised to see finance pop-up so high in the list... What's going on?

~~~
neffy
Crypto currency pump and dump schemes for the most part.

~~~
zettacircl
Without forgetting Mixers, Credit-Card sellers, Paypal-related schemes,
CryptoWallets, Escrows ...

------
RandomBacon
> 43 dark-web:motivation="religious"

That's interesting.

Unless it's some cult requiring the sacrafice of humans, I guess maybe it's
for a religous group in an oppressive country.

~~~
rz2k
Religion seems like a topic where there are a lot of sincere beliefs and
questions, but also a lot of social signalling and pressure within a community
to adhere to specific beliefs.

Suppose you are a member of a congregation who questions a specific doctrine
but you don't want to commit to opposing it or signalling that you are
unreliable? Suppose you are a member of the clergy questioning your faith, and
not satisfied with the discussions you have had with people of higher rank
within your religious order.

I think there is significant value in anonymous forums. For example the
arguments during the drafting of the US Constitution probably would have been
far less productive if they hadn't been preceded by anonymous and pseudo-
anonymous discussions in the form of the Federalist Papers where proposals
didn't carry the benefit of signalling allegiance to interest groups, or the
disadvantage of signalling the opposite.

A great example today is the difference between the content on Quora compared
to the content found on Hacker News. Just like a cover letter to a resume, a
post on Quora may be truthful or interesting, but it is also inextricably
linked to the poster's name and always suspect of being primarily interested
in the effect it has on the poster's standing in the real world. Here, there
are varying degrees of anonymity, and posts are more likely to be motivated by
a sincere interest in exploring a topic or advocating an opinion, rather than
what making such a statement says about the individual saying it.

~~~
LeifCarrotson
> _Suppose you are a member of a congregation who questions a specific
> doctrine but you don 't want to commit to opposing it or signalling that you
> are unreliable?_

For a real and current example, see the recent EFF vs. Watchtower case where a
Reddit user /u/darkspilver posted to an ex-Jehovah's Witness subreddit.
Watchtower subpoenaed Reddit for the user's IP address so they could
excommunicate the user. If they'd communicated over Tor, this would be less of
a problem.

------
sandworm101
Be very careful about such projects on Tor. There are plenty of images within
Tor, the possession of which, can land you in prison and destroy your life. I
would hesitate from clicking any link to such a project without first some
examination of how they dealt with that issue.

~~~
VincentZ
True. But this dataset is "safe". I could ensure you to download it and show
it in front of any public, without that much concern about shocking anyone
(maybe one or two pictures may not be friendly for everyone, but nothing as
bad as you could encounter on the "true Tor"). That's precisely one use of
this dataset : showing what is on Tor without have a 200 heart beat because
you don't know on which page you'll land next.

------
ebg13
It took me a minute to realize that "classified" in the headline means
labeled, not secret.

~~~
adulau
Sorry, I wanted to change it after I push the submit button... it was too
late.

~~~
maxheadroom
Maybe @dang can help, here? :)

~~~
gnulinux
I disagree that there is something that needs to be changed here. "Dataset of
classified X" is commonly used to refer to this exact thing. (i.e. dataset of
data points with human-made labels)

~~~
keymone
But the word has multiple meanings and both interpretations are equally
plausible. Personally I also thought the title was about secrets. Changing
from classified to labeled will remove ambiguity completely.

~~~
pbhjpbhj
It's the ordering here:

"... classified screenshots ..." would be usually be understood in en-gb to be
"screenshots given a security rating requiring restricted access".

"... screenshots classified ..." would be "screenshots given a classification
of some sort".

~~~
richardwhiuk
I disagree - the former would be used in both cases in my opinion. I think it
just depends which usage you use more.

~~~
pbhjpbhj
It is used in both situations when context makes it clear, there's no [or very
little, at least] context in a title. Ordering disambiguates the intended
meaning when context is insufficient.

