I think the most precise metaphor is: a kid walked through the front door of a public library, borrowed a couple freely available books, then the government realized those books mistakenly included sensitive information.
In order to address that error, 15 police officers raided the kid's house.
That would be an accurate analogy if these documents were linked to from a publicly-accessible portion of the site. They were not. This is more like someone walking into an unlocked back room and grabbing books that hadn't been shelved.
These analogies are not helping. Here's what actually happened: the accused allegedly sent requests to a web server asking "may I please look at the document with id X?" for various values of X. Each time the web server had the option to say "no, you may not", or even "no, that document doesn't exist." Instead, it responded each time by sending the requested document.
That's all that happened: someone used HTTP in the way it's intended to be used, and inferred quite reasonably that the people who set up that web server knew what they were doing and meant to set it up that way. It turns out those people didn't know what they were doing, and they got embarrassed about it.
The computer is not a person and what it does only matters insofar as you may infer that the owner of the property programmed it to do what the owner intended.
As you admit, the property owners did not intend those documents to be accessible. So the only relevant question is: would a reasonable person infer that documents which could only be accessed by editing a URL (by "tricking the HTTP server," if you insist on anthropomorphizing a dumb machine) was intended or not intended to be accessed?
I think most people would assume that documents that can only be accessed by editing an ID were not meant to be accessed. And that really is the end of the analysis.
I don't think you understand the web. I'm not anthropomorphizing anything. He literally sent a request for each document he wanted to look at and the server sent a response.
You keep referring to this hypothetical "reasonable person" who doesn't understand the very basic facts about technology, but the opinion you attribute to the "reasonable person" is just one you invented that happens to match your own.
> I think most people would assume that documents that can only be accessed by editing an ID were not meant to be accessed.
How would anyone know if the documents could only be accessed by editing the URL? Others in this thread have pointed out that some of those documents were indexed by Google, so actually, editing the URL is not the only way to get to them.
Computers always do what you _tell_ them to do, not what you want them to do.
The onus for keeping computerized material private is on the owner, and the owner screwed up royally by wrongly allowing sensitive material to be placed unprotected on a _public_ web site. Whether or not it was indexed is irrelevant - it was on a publicly accessible site, permissions set to publicly accessible, and the entire site was meant to be publicly accessible. One can close the analysis until the cows come home, it will not change this fact.
Accessing that material is as illegal as finding a diamond ring (or personal files) while dumpster diving. Dumpster diving may be seen as tasteless or low-class, but as far as I know, it’s not illegal.
Do we prosecute reporters for ferreting out publicly available, yet embarrassing, information?
Dumpster diving is legal (in most places but not all) because the owners has, by putting something in the trash, expressed their intent to not own the item in question anymore.
A website isn't a trash can though.
If I accidentally leave a diamond ring (or personal files) in public somewhere and you take them that is absolutely theft.
A web server is a thing people use to make files publicly accessible - it has no other purpose. It has stronger expectations against privacy than a trash can.
As such, your analogies to situations (locked houses, unattended jewelry) with the opposite expectation just disprove your point. Assuming a file is private even though it's publicly accessible on a web server is as nonsensical as assuming an object is free for the taking even though it's an unattended diamond ring.
I mean, it's in the name, web server. It serves things to people when people asks:
- Hey, can I GET this drink?
- 200 OK, here it is pal.
- 204 Uh, the bottle appears to be empty
- 206 I have only half the ingredients for the mix
- 300 Stirred or shaken?
- 301 That drink is now called this, but here it is!
- 400 I can't understand what you say buddy, are you drunk?
- 403 I'm sorry, but I must refuse to serve you that drink
- 404 Oops, I can't seem to find the bottle
- 411 How much do you want?
- 413 That's too much drink!
- 418 I'm actually a teapot
- 503 Too busy right now!
You can't, because github hasn't made a mistake and accidentally made all private repos public.
If github screwed up one day and all private repos were temporarily made public it would be illegal for you to run a script that tried to scrape them all down to your personal hard drive.
But...you would have left the diamond ring by accident.
Files don't "accidentally" become publicly accessible via HTTP. i.e. you don't return to your computer one day to find everything is public.
Someone specifically took the steps to make this data public. The fact they didn't realize what they were doing isn't the fault of people that then view the data.
But as the person knows they are configuring a web server, I would say this is more carelessness / incompetence rather than an "accident" in the same way as losing a Diamond Ring would be.
These analogies involving valuable physical items are way off base. If I'm walking along the street and see a diamond ring lying there, I can only assume that it belongs to someone else and they've misplaced it (because it's very valuable and, crucially, there is no way for the owner to make use of its value if they've lost possession of it). I may not have any way to locate the owner, but I still recognize that it belongs to someone else and that for me to take it and keep it would deprive them of their property (probably, I should take it to the police).
If you insist on analogies involving lost rings, this situation is more like taking a picture of a ring someone lost in the street than it is like taking the ring.
No, that's a terrible analogy, because a picture of a diamond ring is worth much less than the ring. A copy of valuable information is generally worth just as much as the original. And while no-one would care if someone else had a picture of their diamond ring, they would care if someone else had a copy of their private information.
> Sure, but your carelessness doesn't absolve me of my crime.
If his carelessness meant communicating that you could take the ring without stealing it (say placing it in the donation basket instead of his wallet), that would absolve you of your crime.
> The onus for keeping computerized material private is on the owner
I don’t think that’s a sensible rule and at the end of the day, it’s not the one that’s going to prevail. The Internet will be sanitized and made safe for all the people who forget their passwords and write them in their monitors. The Internet is for ordinary people now, not curious teenager hackers. And ordinary people will make the rules to suit themselves.
> The Internet will be sanitized and made safe for all the people who forget their passwords and write them in their monitors.
And how, exactly, is this "sanitization" going to occur? Are you saying that having 15 police officers raid a home and confiscate multiple computers (all but one of which had nothing to do with the incident in question), arresting a completely uninvolved person on his way to school, and taking no action at all against the stupid contractor who set up the website, is an acceptable form of "sanitization"?
> The Internet is for ordinary people now, not curious teenager hackers.
That's not what the police action described in the article is saying. It's saying the Internet is for government and corporations, and God help the ordinary people who get in their way. (Btw, I include "curious teenager hackers" in "ordinary people". Perhaps the fact that you don't is part of the problem.)
This can't work, for the simple reason that the Internet has global reach. Unlike other kinds of personal property, a Web server is accessible to the entire world. There are lots of people out there who have no reason to respect US or Canadian law, and there always will be. Prosecuting this young man, or Weev, may make some "ordinary" people feel better, but doesn't begin to deter any actual criminals.
The only solution is site owners taking responsibility for securing their sites, in accordance with the sensitivity of the information on them. The sooner "ordinary" people realize that, the better.
Ordinary young people today are probably even less computer literate than ordinary people my age (mid 30s). They grew up being spoon fed the Internet through the FB and Snapchat apps on iPhones.
Your entire argument is based on this idiosyncratic theory of widespread ignorance. This theory is simply wrong, as a matter of fact. Even if it were true, no historical case ever turned on the unprovable notion that most people are too dumb to understand the truth.
Since you didn't respond when I raised it elsewhere in-thread, I would highlight again the fundamental imbalance between the rules you would impose on Facebook etc. and those you would impose on users. Firms that spend billions of dollars developing their systems only have to be as smart as the most ignorant person we can imagine. Their users, in contrast, must be geniuses to keep up with their many changes to TOS, interfaces, and functionality, while simultaneously those genius users aren't allowed to notice that numbers follow each other in sequence. This is nonsense on its face, but then again authoritarian maneuvers are their own justification, aren't they?
If we're at the topic of wishful thinking, I wished ordinary people would understand basic things about the internet. The purpose of humanity as a whole shouldn't be to dumb things down for "ordinary people". It should be to better teach and educate new generations, so we won't be able to assume ordinary people are dumb.
Web scraping is a normal occurence on the internet not just limited to curious teenage hackers. When I grew up we didn't lock our doors because we knew and trusted our neighborhood. It's like you're asking 4chan to use polite vocabulary.
> I think most people would assume that documents that can only be accessed by editing an ID were not meant to be accessed. And that really is the end of the analysis.
You do realize HN provides an API that allows you to request any item by using an ID? [1]
Stories, comments, jobs, Ask HNs and even polls are just items.
They're identified by their ids, which are unique integers, and
live under /v0/item/<id>.
If you really know better than everyone else who has replied to you on this story, why don't you point out the exact law that states accessing resources over HTTP is forbidden if not initiated from another resource originating from the target server? Otherwise, I'll assume your "analysis" is simply a subjective view on how you would like the web to work. A pretty limited and unrealistic view that wouldn't work in the real world.
1. I don't think you can access that story by starting from the front page, because scrolling for more stories only gets you to page 25. Does that mean the intention is the story is private?
2. You can now access it by using the DOM element generated for my comment. Does that mean it's public?
Not to mention there's more to things than HTTP. could be plenty of other sources. Maybe I like using netcat just for kicks. Maybe I like hand-typing HTTP.
While odd, `printf "GET / HTTP/1.0\r\n\r\n" | nc 104.20.44.44 80` gets you the HN home page as good as anything.
Unfortunately, the main counter argument in this thread, from what I gathered, is "ordinary people wouldn't do that". Including someone claiming that if your mom wouldn't do it (in this case), it's not legal: https://news.ycombinator.com/item?id=16854087
Sure, and the fact that HN has a note to that effect on its website is evidence that all of these items are intended to be publicly accessible. It's also obvious in any case that stories, comments, jobs, ask HNs and polls are intended to be public. In the case we're talking about here, it was far less obvious that the relevant documents were intended to be publicly available.
Of course they do. They consider whether or not to give me access. If they respond with 200, they are effectively telling me that the information is public and the request is approved. There's no law moral or legal that stops me from asking for information.
I could ask a law agent for classified information, but he's not going to prosecute me for asking questions. He could be suspicious and ask "how do you know a document with that number exists?". And I can reply "oh, I'm just asking for random numbers".
You can describe what a webserver does in anthropomorphic terms if you like, but it's not the webserver's "intentions" that are relevant. It's the intentions of the people who control the website and the intentions of the person who accesses it.
>There's no law moral or legal that stops me from asking for information.
I wouldn't be so confident of that if you haven't read up on the relevant laws. Many countries have prohibitions against unauthorized access that apply in circumstances where the access is not "unauthorized" in a technical sense relating to the details of the HTTP protocol. The law doesn't necessarily say what you would want it to say or what you would expect it to say. See e.g. the following example from the US. (I'm aware that the incident we're discussing occurred in Canada.)
> You can describe what a webserver does in anthropomorphic terms if you like, but it's not the webserver's "intentions" that are relevant. It's the intentions of the people who control the website and the intentions of the person who accesses it.
And how do you prove intent? This is a technical problem with technical protocols involved. Intent should be provided via the protocol. If the protocol says resources are public, unless otherwise stated, you can't rely on a human to answer, post factum, what resource is private.
I believe that’s something they teach you in law school. Lawyers have been working on that problem for a while! IANAL, but I don't think you are going to be able to find a concise answer to that question that goes beyond the immediately obvious.
>Intent should be provided via the protocol.
Sure, if you say so. That’s not how the law works, though.
Editing a url is not "tricking the web server", the web server is designed to respond to urls with the information they point to. Tricking the server would be doing something like sending malformed packets designed to cause the server to leak memory and display the contents of "hidden data" in an exposed field, ie causing it to behave in a way for which it was not intended.
You're being hugely disingenuous. The owner of these files set up their website, which includes deciding which files are and are not publicly accessible, and it is reasonable to expect that the files they made publicly accessible are the files they intended to be publicly accessible.
One can certainly make the counterargument that a lack of public links suggests the owner wanted them to be private, but you are pretending that there's no evidence whatsoever that the files were meant to be public, and that's plainly not true.
> I think most people would assume that documents that can only be accessed by editing an ID were not meant to be accessed.
I think most people don't have an intuitive understanding of this at all, which means you can get them to give any answer you want by crafting your description of the problem appropriately. That doesn't make such a procedure reasonable.
> would a reasonable person infer that documents which could only be accessed by editing a URL
Except there's no way to know whether that's the only way to access those documents. That's what access control is for. They could be linked from elsewhere for all you know, and it's perfectly reasonable to assume that if you can access the document by punching in a URL, then it is so accessible.
Just curious, not trying to trip you up: In your perspective, would it be trespassing to make a new website which has links to the original site with the edited URLs, without actually accessing those edited URLs? If such a website with links containing edited URLs already existed, would it be trespassing to follow those links?
Just curious about which one, or both, of those are trespassing in your perspective.
That's not true at all, for instance a number of the records that are part of the download were indexed by Google.[1]
So it's more like going in to your library, using the card stack, learning about a book, going to the shelf it is on, and then looking at all the books on the same shelf.
Somebody noticed that you were looking at all the books and called the cops on you. The cops break in and arrest you for looking at books. They tell you that the bookshelf is off-limits and has personal information.
Sure, the library creates it's own card stack and google is an external service; however if you design websites for a living you expect google to perform that functionality.
I mean, I designed a service where we wanted to make it easy to share private information, so we didn't use authorization. However I realized that if I wanted the data to be private I should use a suitably long non-consecutive random ID for the resource. If anyone is guilty of criminal misconduct, it's the person who designed this asinine system or the executive who allowed it to be used on the internet.
Hell, I'd go so far as to say that the fact that the exact same system is still being used across the US is a sign that the company who runs the system is criminally negligent.
I think that's a bit harsh. The documents at that URL were understood to be freely available to the public.
As I physical analogy, I'd think about it more as one of those restaurant straw dispensers. He got tired of pressing the button each time for a new straw, and instead opened the lid and grabbed a bunch out.
> It did, however, damage the privacy of various Canadian citizens.
Did it? I understand that the stupid contractor who put this data on the website did (potentially--but note that nobody is saying that anyone has actually suffered harm because of that data being accessible). But did the teenager who got this bomb dropped on him damage anyone's privacy? As I understand it, he downloaded the data, put it on his hard drive, and left it there; it never went anywhere else.
Can you please send me a copy of your last 3 tax returns? My email address is in my HN profile.
I don't know you have don't particularly care about your financial situation, so I'm not gonna read them or share them with anyone else. I'll just keep them on my hard drive.
> Can you please send me a copy of your last 3 tax returns?
Options:
A) Sure, here you go. Oh wait! I didn't mean to send you those. You tricked me and stole my information. I'm going to send 15 police officers round to arrest you and then you're going to prison for years.
A) is not comparable to the current situation because you are the one initiating the action. I can't stop you from sending me an email so it can't be a crime on my part if you do so.
No, you are initiating the action by requesting the file from me. You did request the file didn't you? Even though you should have known it wasn't public information?
That's not a correct counter argument, the information he got was understood to be public and there was no reason to expect or think there was any private information on there.
If the site had said "This site provides tax returns" then there would be reason to expect the files would contain private information.
The site in question gave no indication there would be private information in those files.
Also, technical nitpick, there are some countries where tax information is public so probably not the best thing to go with.
That may be so, but did he intend to damage their privacy? Probably not.
He can't be faulted for accidentally downloading some private information that was improperly mixed in with a bunch of public information that he was trying to download. He had no indication that the information he was retrieving was not supposed to be public.
In order to address that error, 15 police officers raided the kid's house.