It's a double-edged sword in a way -- more people using the project for doing non-illegal things means perhaps the moniker "Dark" could be replaced with a less hostile word (Private? ...no...).
Bad, in that, it speaks a bit to users making inconvenient choices because they perceive their privacy is threatened enough to warrant it.
I was glad to see that they scrubbed the content -- would have been one scary tarball to download, otherwise, and very illegal for them to post considering some of those categories. It makes me wonder how they avoided running afoul of the law just collecting/viewing it for the purpose of scrubbing it, but IANAL. It'd keep me from being involved on any project like that, though!
 I'm eye-balling the counts for a few categories in addition to the obvious "other-not-illegal", so it's probably not 100%, and I'm assuming US law (specifically WRT speech)
 I get "Dark" is meant as "invisible" and it's perhaps even a more accurate description since it's not completely "invisible" unless you take additional steps (and the software contains no known flaws), but sufficiently invisible for most adversaries. However, it's spoken in the same breath as "drug/theft markets" and is used more like "dark alley". Not sure there's a better term and I'm probably bike-shedding, anyway.
Concerning the "other-not-illegal", if the website was none of the other labels AND not illegal, it was usually tagged as "other-not-illegal". For instance, personal websites are tagged as "legitimate" and a Tor Wiki is tagged as "other-not-illegal" and "wiki". General information about Tor, websites allowing to do some calculation online (tools to hash things, calculator, ..) or online games (without money involved) are labelled as "other-not-illegal".
Anything related to finance - even if it can be legal - is not labelled as "other-not-illegal", for example.
CIRCL is the Computer Emergency Response Team of Luxembourg. They ̶a̶r̶e̶ work very closely with the law.
It wasn't obvious to me from the article how or if this dataset is statistically significant in some way in regards to total tor traffic?
I believe the tor project publishes anonymized data sets of tor usage, and if I recall correctly the web site handling the most traffic over tor is Facebook.
Plus on Tor you actually own your domain rather than lease it on the whim of some company easily pressured by political and social winds.
As long as you keep the private keys private you get traffic to that public key and "own" the domain.
Pictures can have multiple labels. And so, having the ratio of "Forum + Drugs + Finance" vs "Market-place + Weapons" dispense more information than just the global frequency of "Finance"-related pages :)
Usually? Always. There are a plenty of legitimate services that'll fence such items for you, but they'll just sell them very near retail on amazon/ebay/whatever.
Unless it's some cult requiring the sacrafice of humans, I guess maybe it's for a religous group in an oppressive country.
Suppose you are a member of a congregation who questions a specific doctrine but you don't want to commit to opposing it or signalling that you are unreliable? Suppose you are a member of the clergy questioning your faith, and not satisfied with the discussions you have had with people of higher rank within your religious order.
I think there is significant value in anonymous forums. For example the arguments during the drafting of the US Constitution probably would have been far less productive if they hadn't been preceded by anonymous and pseudo-anonymous discussions in the form of the Federalist Papers where proposals didn't carry the benefit of signalling allegiance to interest groups, or the disadvantage of signalling the opposite.
A great example today is the difference between the content on Quora compared to the content found on Hacker News. Just like a cover letter to a resume, a post on Quora may be truthful or interesting, but it is also inextricably linked to the poster's name and always suspect of being primarily interested in the effect it has on the poster's standing in the real world. Here, there are varying degrees of anonymity, and posts are more likely to be motivated by a sincere interest in exploring a topic or advocating an opinion, rather than what making such a statement says about the individual saying it.
For a real and current example, see the recent EFF vs. Watchtower case where a Reddit user /u/darkspilver posted to an ex-Jehovah's Witness subreddit. Watchtower subpoenaed Reddit for the user's IP address so they could excommunicate the user. If they'd communicated over Tor, this would be less of a problem.
Several of the images tagged with only 'religious' are literally just hosting mirrors of the King James Bible from htmlbible.com.
"... classified screenshots ..." would be usually be understood in en-gb to be "screenshots given a security rating requiring restricted access".
"... screenshots classified ..." would be "screenshots given a classification of some sort".