Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> And I don't think I disagree with the court on schema vs. file layouts either.

I disagree that the law should prohibit disclosing "file layouts" but it's pretty clear that the law does block that, and I fundamentally agree with you that schemas are directly analogous to file layouts and thus restricted.



A SQL schema literally does not indicate the locations of data inside of a file. In fact, the whole reason schemas exist is to decouple the relationships between table rows and the pages and indexes that store that data. We had relational databases before SQL, and there are non-SQL relational (and non-relational) databases today, but you program them, at the query level, with code that is aware of what tables live where.

A schema is the opposite of a file layout. A schema is to a file layout what a Google search is to an IP address.


Let me put this differently.

If you tell me that you have a closet for your jackets and another closet for your shirts, you're telling me how clothes are laid out in your wardrobe. Specifically, you're telling me that you're laying those out separately, and able to deal with them independently, with little interference between the two. It's not the entirety of the layout information, but it sure is some of it.

If you tell me that you have a column for your first names and another column for your last names, you're telling me how names are laid out in your database('s files). Specifically, you're telling me that you're laying those out separately, and able to deal with them independently, with little interference between the two. It's not the entirety of the layout information, but it sure is some of it.

Sure -- in theory, you could be actually throwing everything together into a dumpster, then paying enough people to search it all in parallel when you want to retrieve that red jacket. If you're actually doing that, maybe you could legitimately claim that you haven't divulged anything about your closet's layout by telling me that shirts and jackets are separate. But chances are pretty darn good you're not actually doing that (and I would know this for a fact if I already somehow knew you were actually using closets built by Joe down the street), and thus actually are exposing layout information by telling me that you're storing them separately. One security implication of which is that, the moment that I get a glimpse of your closet and notice that it contains a shirt, I know it's not the one with the jackets, and I can skip it when trying to steal that expensive red jacket.


It's either a file layout or it is not a file layout. If you write an affidavit saying it's "sort of like a file layout", the conclusion will be that it is not one. Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts), and then they pulled up a definition of "schema" from Mirriam-Webster, and the definition of "schema" was so abstract it could have matched anything.

If anybody on the Illinois Supreme Court had known what a schema actually was, we'd have won the case. Further, if the definition of "file layout" had been more material to the Chancery case, it would have been in the trial record that it wasn't one.


> Now, the Illinois Supreme Court found that it was a file layout (wrongly). But they didn't use any of this kind of message board logic to do it; they pulled up a definition for "file layout" from a technical dictionary (which, ironically, pretty clearly established, even more than this thread does, that schemas aren't file layouts)

"Wrongly" was exactly what I just spent an hour writing a long comment disputing, with a detailed explanation. Specifically, with a real-world analogy between “a description of the arrangement of the data in a file” and “a description of the arrangement of the clothes in your closet.”


If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk. Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.

Now if you wanted to argue that a schema serves the same purpose as a file layout, ie that it's how a programmer interfaces with the data, and that it impacts workload performance, that would be fair enough. And given that laws are all about intent perhaps that would be relevant. (Or perhaps not. I didn't read about the case yet.)

But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.


> If I understand correctly, you're saying that you expect items in a column to tend to cluster near one another on disk.

That's one thing I'm saying would be sufficient to consider this file layout, yes. I'm not saying it's necessary. Databases can obviously be row-oriented too. Knowing that they don't cluster would also be layout information. As could any number of other things.

> Notably though that doesn't give you any sort of relative or absolute offset. Neither does it have anything to say about, for example, blocks of different types which might be interleaved. Or compression. Or indexes. Or copy on write related garbage collection. Or journaling. Or any number of other things.

It doesn't have to include offsets or any of those other things. File layout information could be as simple as "data should be aligned to a page boundary for performance" or "this field must reserve space for up to 16 characters" or even "data from different records should not be stored in an overlapping manner, to allow fast erasure"... I could go on. And notice the wardrobe layout example doesn't have offsets either, but the decision to separate jackets from shirts is absolutely one about layout nonetheless.

> But I think it's fairly reasonable to say that in typical usage an SQL schema is decidedly not a file layout in a literal sense.

It is not complete file layout information. But it certainly can be part of the file layout information.

Imagine you had a table with columns name1 VARCHAR(64) and name2 VARCHAR(64) in that order. Now imagine you modified a couple of bytes on the disk, such that you swap the 1 and the 2. You can imagine a database where that would be sufficient to confuse it into thinking the two columns had swapped contents, right? Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?


Note that "some information related to the file layout" or "some information that has an impact on the file layout" is not "the file layout" in a literal sense. Thus it seems to me to follow that the answer to the question "is this a file layout" should be no.

Symbolically it isn't [ schema -> file layout ] it's [ schema, engine version -> file layout ]. Even if you had that additional information, neither item by itself nor even the pair together would be correctly considered a file layout. If I have a function f( foo, bar ) -> baz neither a foo nor a bar is a baz. I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.

For that matter, even the [ schema -> file layout ] case isn't technically a file layout any more than a json blob is an xml blob. Being trivially translatable doesn't change the definition.

Compare that with the question (also commonly asked by courts) "is thing equivalent in intent (or use, or ...) to other thing" in which case the answer might feasibly be yes.

> Could you really claim the schema didn't contain any file layout information in that scenario, when it certainly affected which bytes are interpreted as belonging to which columns?

In that example you have made an educated guess about the file layout and then taken advantage of that (guessed) information. "You can imagine a database" tells you everything you need to know here, namely that this is entirely dependent on the implementation. So yes, I would claim that the schema did not on its own contain any file layout information though in conjunction with knowledge of the implementation it could be used to derive such.


> I can fairly trivially fix a sandwich out of bread, peanut butter, and jam; in no way does that imply that the three ingredients sitting next to each other on the counter are a sandwich.

What is "sandwich" in this analogy? Nobody is claiming the schema is a "database", or a "table". I was saying it's one component of the file layout.

Using your own analogy: if you know you put the jam near the peanut butter, you know part of the ingredient layout. You can't say "it's not ingredient layout if you haven't told me where the bread is."


The point about the sandwich was that the inputs to a function are not correctly referred to as its output. Those are distinct things.

If you wanted to further extend the analogy to apply to schemas then I guess the recipe would be the database engine and the final product that you eat would be the file layout. Knowing that the final dish will include jam does not mean that you have the final dish in your possession. The jam sitting on the counter is not the final dish.

Importantly, you don't even know how I'm going to use the jam. I could put it only on one half, or I could arrange it in stripes, or I could even use more than two pieces of bread! I might not even make a sandwich! I could even throw it all in a blender and make a (disgusting) smoothie.


I dont think "file layout" has to mean the exact location of every byte. An abstract file layout is still a file layout.


How can you literally interpret the two words "file layout" without it pertaining to the layout of a file?


We can successfully interpret the two words “guinea pig” without it pertaining to either pigs or things coming from Guinea, so I’m sure this is also possible.


I'm not sure whether 'file' necessarily has to refer to the 'Unix' view of a 'sequence of bytes'? Or just 'some organisational unit of information'? Ie like the stuff you would put into a filing cabinet?

The 'sequence of bytes' view is just one specific level of abstraction. It's not what's actually on disk because of things like compression, encryption and fragmentation.

Database schemas are a different level of abstraction.


DBs can be files on disk though? Besides they're a bit like easy hand rolling powder mix for filesystems. Filesystem entries has properties like filenames and inode numbers and file contents. Databases has columns like emails and membership IDs and their favorite cookies. I don't think "file layout" is an absurd framing.


It is in literally no sense a layout; the whole point of a schema is that it doesn't tie you down to a layout. SQL schemas make sense even in the absence of files!


You suggest that we interpret "file formats" as exactly this -- no more, no less. This approach is also called "textualism". The other option is to interpret "file formats" in the context of the law that includes these words. Or: what exactly did the lawmakers have in mind when they said that (a) government needs to provide information; (b) except for several cases, of which one is (c) "file formats". What kind of information did they think it was ok for the government not to provide?

I agree with the Court's argument that "the information about how the actual information is stored and connected one piece to another" is what the lawmakers meant in this case.

- If the actual information is stored in the files, the government does not need to disclose how these files are organized ("file formats").

- If the actual information is stored in the database, the government does not need to disclose how the database is organized (database schema).

- If the actual information is stored in the block memory -- with structs and pointers -- the government does not need to disclose the structs and the pointers.

The "textualist" opponent would of course argue, as OP did, that the second and the third example aren't excepted by clause (c) because "when there is no file, there could be no file format". This however is missing the point (in my opinion), as it doesn't see the forest for the trees.


> A SQL schema literally does not indicate the locations of data inside of a file.

That's only true if you apply eg the Unix definition of what a file on a file system is (like a sequence of bytes or whatever).

For all we know, the law might take a broader view. Something like: a 'file' is anything that in the olden days you would have stuck into a filing cabinet.

The 'Unix' definition isn't even particularly natural: it's one specific level of abstraction. On disk, the bytes aren't necessarily laid out one after another. Especially with fragmentation, compression and encryption going on.

An SQL schema tells you how data is laid out in a different layer of abstraction than the Unix view of bytes. But that view isn't the only one that the law can mean by 'file'.


>> And I don't think I disagree with the court on schema vs. file layouts either.

> I disagree that the law should prohibit disclosing "file layouts"

Note, the court wasn't ruling what the law should say, only what the law says. At least that's my understanding of it. I certainly wasn't opining on what the law should say.


Understood. I mention that distinction only because I find many people (not you) who say that "X law doesn't apply because if it did, it would be bad" vs directing your ire at the actual laws, which are poorly written and the legislators who are negligent in fixing those laws.

Courts should decide based on the law, not based on what is "good".


It seems like an unnecessarily ambiguous term.

Without additional context, I would interpret the term “file layout” to mean the file and directory structure of an application.

Such an application could potentially store data as plain files, the names of those files may contain personal or sensitive information.


> Without additional context, I would interpret the term “file layout” to mean the file and directory structure of an application.

I would interpret it to mean a description of what the file contains and where. This is information you need if you have a mysterious file and you want to parse it. It's also information you need if you have some data and you want to create a readable file that expresses it. But for the concept to apply to a database schema, (a) the database would have to be a file, and (b) the schema would have to specify where the information in the database is stored. That's difficult to do, since the schema has no knowledge of how much information there is in the database or how it might be written down.


> It seems like an unnecessarily ambiguous term.

Agree, and, I don't even understand why it's in there in the first place (it should just not be) but that's a job for the legislature to resolve, not the courts.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: