Careful with this. A lot of communities don't like scraping of their public content. There was a guy who got booted from archive.org I think for trying to archive an instance that had a lot of under-18 folks' content.
I'd encourage you to build a federated app instead. :)
I think you are making the assumption that I am scraping content based on the fact that I am developing a free software content scraper that anyone is invited to use.
Yep, just like I assume countries that enrich plutonium to weapons grade levels and stick it on the pointy end of an ICBM are threatening others with nuclear weapons, even if they're never launched.
If you think developing web spider software is akin to developing nuclear weapons, I think you might want to go have a talk with some larger, well-known companies who have not only half-developed not-yet-working software (like my activitypub spider, which doesn't even have a storage backend at the moment), but who have fully developed advanced web spiders that have actually downloaded and archived exabytes of data from the web, to be saved privately for all time. Frequently they even let anyone who wants search the full text of it, usually without authentication!
If you don't want second parties to have copies of your data, configure your webserver not to send it to them when they request it. You can't force someone to do something with an HTTP request.
Your first statement looks like it should be logical, but when read for soundness, the consequent ("[then] I think you might...") makes absolutely no sense following the antecedent ("if you think..."). I only mentioned nuclear weapons to try to really emphasize to you that a technology's existence is enough to cause fear in people and communities, which does have real world consequences. But I don't think you care about that.
Anyway, I work at one of those companies. You know what they have? Ways to let users opt out (ex: ROBOTS.txt), ways to ensure they're not DOSing people when scraping (which uses material resources: compute time, spindles, electricity, etc), ways to track the copyright of the source material (which belongs to the author, usually), and ways to respond to second-party requests (legal and non-legal notices) who want to know how much of their data has been scraped or exercise their rights over their material. These technological features are because this is what human societies have found to be a decent balance between scrapers' rights and internet users' rights. Your solution lacks this due consideration and gives internet users a giant middle finger.
In your last paragraph it is pretty clear you are doing this because of some ill-conceived "ethical" notion that "because HTTP responded with this payload, it is now mine with an 'ethical license' to do anything". There are other ways to point out security flaws in ActivityPub that are way more constructive and less asshole-ish, but it seems you're pretty keen to erase a lot of moral and legal nuance to prove "because I have a technological capability means I have the moral ought and the legal right". Sorry, but no: the world is a lot more complex than this.
Just because I have the technological capability to transmit the message "you're being a dick" from the comfort of my home doesn't automatically mean it would be ethical for me to, so of course I am not going to tell you "you're being a dick", and normally I wouldn't type this sentence at all but in this special case I am because it shouldn't be a problem with your ethical system since I'm not actually saying it despite having the technological capability, so it should have no impact on you (and if it did, it should give you pause to reconsider that maybe you need to do more self-reflection on discovering your actual reasons for doing this ill-advised project).
Because you have not cared to clarify your ethical view in the last 3 responses to me, nor in your ethics statement of your project.
Your system is designed to download and save information in an unaccountable manner on behalf of anyone, "unaccountable" literally is a doorway to "for any further purposes", so it's a very safe assumption.
The lack of clarity also comes from ignoring the bulk of my previous message. Ball is still in your court. I am inviting you to make this exact clarification (plus far more), when all you seem interested in doing is dodging, delaying. The worst action you could possibly take is accusing me for assuming in order to fill in the very deliberate blanks you are leaving behind.
> Your system is designed to download and save information in an unaccountable manner on behalf of anyone
I think perhaps you have confused some source code that I have released with a service that performs a function on behalf of a user. I operate no such service.
All I have done is produced a tool that allows a user who downloads and builds and runs that tool to download data from a website, much like a browser or any other HTTP client. There is no "on behalf of"—it's just a tool for a first party to use.
I am happy to let you keep showing off your circular reasoning to the world, and will happily repeat myself pointing out all my counterpoints you did not engage with and ignored.
For example:
- I claimed a technology's existence is enough to cause real world consequences. You ignored this point.
- I mentioned you are not including safeties to building a tool to protect its user[0] (the "first party" user of your tool) and its targets ("second party" people the tool-users are subjecting to your tool). That makes it legally/morally unappealing to use as a tool(puts self in danger), and morally unappealing to be subjected to. Why build a tool this way to be completely legally/morally unappealing, unless you want to cater to users specifically that do not have such legal/ethical concerns? You ignored this point.
- I have invited you to clarify your ethical view. You are circling back to a previous non-argument.
- You simply refuse to verbalize your implicit moral stance -- that your role as a "toolmaker" absolves you of all the moral consequences of its use[1]. If this is incorrect, I welcome clarification from you.
[1] This moral position has long been well-criticized and is not a sufficiently nuanced moral stance in this day and age. For an old example, consider Tom Lehrer's criticism of von Braun: "'Once the rockets are up, who cares where they come down? That's not my department', says Wernher von Braun." [2].
I'm going to be blunt: your ethics statement sucks. It reeks of "I don't care what you intended, I'm going to use your data in ways you didn't want because nothing is physically stopping me". At the very least, that's a terrible attitude to describe as an "ethics statement". If you were to call it "justification" instead, at least it would be internally consistent.
I see that your code makes no mention of robots.txt, so you've designed it in such a way that explicitly ignores each instances' published intention. You can't reasonably make any claims about "consent" while pretending that "User-agent: *; Disallow: /" isn't there.
From a first glance it does not scrap the web UI, and uses public APIs only. So mentioning that it ignores robots.txt is not a solid argument. These APIs are there specifically for automated use.
I agree that this "ethics statement" is of no use, though. The author should have ignored these people who get upset because of their posts being copied.
Every time people get upset because of reasonable behavior of others and unreasonably attempt to control that behavior, it is an opportunity for teaching.
According to whom? Every dictator thinks it's reasonable behaviour for them to crush the opposition, while those who look on, or those who suffer, will usually believe that to be unreasonable behaviour.
Someone I learned from a colleague once is this:
"No one ever thinks they are the bad guy."
So your concept or reasonable behaviour may not match mine, and you may exhibit behaviour that will upset me. That's not an opportunity for me to learn, that's an opportunity for me to seek some sort of recourse against you.
Publishing software is protected expression in the place I am writing it, so I will absolutely be surprised if others attempt that: it would be illegal under the laws in this place.
I think perhaps you are confusing morals with ethics. Morals are a subjective matter, unique to each person, and are derived from their own individual values.
Ethics were designed as a more objective framework that can be consistently applied in a society so that groups may be able to reach consensus about decisions that affect others. I have yet to see any argument that developing and publishing software that allows people to download public information from the web is unethical, especially considering the fact that you cannot download any information from a webserver that that server does not willingly provide to you. You send a request, and perhaps you receive a response—or not. It is wholly within the determination of the server what, if anything, it sends to you. My software speaks plain ol HTTP, no hax or subterfuge or fuckery of any kind.
Indeed, such HTTP client software development and distribution is widespread in our society: you're probably using some software like it right now to read these words. Other tools that perform this function are shipped with every single install of most OSes. It's some of the most common software on the planet. When you browse Mastodon or Pleroma instances, software on your computer is doing the same thing that my software would do, if you ran it.
Despite the fact that some people are irrationally upset over people's choice of HTTP clients with no justification offered, the burden of proof remains on you or anyone else who has a problem with my software to explain why, from an ethical perspective, I shouldn't be writing it or publishing it. No one has offered such an explanation to me, nor can I, in what I think to be a thorough consideration of all the possible consequences or systems of ethics which might apply, discover one myself. I do not believe that such exists, considering the circumstances of how common even advanced HTTP client software is. If you have one, please speak up.
Remember: whether something is moral or not is a personal opinion; it cannot be right or wrong. Whether or something is ethical or not, however, is more or less an objective analysis within a given ethical framework. It is not an opinion.
I mean, it is legal. Stopping him wouldn't work well.
Sleazy? Arguably. Allowed? I mean, it's not like it's pulling any magic tricks. It's operating within what the protocol (and the law) permits. People could use a better protocol that doesn't have these problems, but hey, who cares, right?
Actually, the behaviour of the Mastodon author promising a "safe space" in his paid, targeted advertising campaigns without any real plan for data security is the unethical behaviour. If you don't like that people can scrape the fediverse, fix the damn security.
What behavior is that? Writing a tool that people can use to download information published on websites? Does the creator of `wget` or `curl` put people's lives at risk too? Chrome?
I think perhaps you may have misattributed the responsibility for the side effects of publishing data globally.
https://git.eeqj.de/sneak/feta