Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is a database schema not a file layout?


The article describes why. 2 different db engines (or even instances) can use different file layouts for the same schema.

In many was sql is all about divorcing the schema from the files.


But on the other hand, in all database systems the schema is used to determine how the files are laid out. Although I suppose the same thing could be argued for any data that is stored in a file, excepting that a schema is metadata that determines the organisation of data so it's a bit of a special case.


In a Microsoft Word document, the section headings also tell Word how to lay out the Word document file.


Do you mean that section headings aren't a file layout? That's their entire purpose.

Edit: If you're talking about the byte representation only, I don't think section headings indicate the placement of the body's bytes.


Does your interpretation not mean that(coupled with the court ruling that file formats can't be foia'd) any document with sections cannot be requested via FOIA?


If this format is reused across many files, they might want to give the contents of those docs in a different format from the original.


You have found an argument that proves too much.


Yea coupled with the courts arguments the interpretation of sections in a document as a "file format" means no files with sections can be released via FOIA requests


Arguably, all requests for files could be returned with all of the letters in the document but scrambled in a random order soas to obfuscate the file layout.


There's a solid chance that the schema gives away what DBMS is being used. But even if it didn't, I'd still call it a file layout in this context.


The DBMS is almost definitely going to be mentioned in RFP or specification documentation. As it was in this lawsuit.


The gov't releasing the hardware and software licencing used in CANVAS already gives that away.


So?


So if you have the schema and the DBMS, you probably know how data is arranged in the files ("files" in the filesystem sense).


Is your argument that government agencies should also withhold the names of filing cabinet manufacturers? :)


Just that it's a file layout. Or even if you strictly define a file layout as say an ext4, NTFS, or FAT file tree, that revealing the schema is revealing the file layout.

I don't know why they don't want to reveal file layouts, but for whatever reason, they decided it was "per se" exempt regardless of the security implications.


It's obviously not a file format. The same SQL schema can generate N different files, with N different layouts, for N different databases. By the logic you're using ("schema" + "database vendor" = "file format"), a Word document outline is also a file format.


The parent asks "how is it not a file layout" not "can you guess the file layout?" given it.

I am a human, you know I have a kidney, but I am not a kidney.


If you send a copy of the code, is that sending the code? If it is, what about sending a copy of the code with a Caesar Shift?


Another way to think about it is that if a SQL schema is a file, so is an Excel spreadsheet template.


It's interesting that the opening analogy in the post uses an Excel spreadsheet as a great way to explain a database. It's such an easy next step to say the way an xls/ods file is saved is a file format but the column layout in the tabs/tables are the schemas. The court (and the city) playing these games is so scary since it is so biased toward all modern government data being covered by FOIA exemptions.


An Excel spreadsheet template is an arrangement of rows/columns/cells which is encoded in a XML document which is encoded in a ZIP file archive.


I don't follow your point.


Yes, it's a file format.

(Kinda a file format inside a file format inside a file format.)


"Excel" is a file format, but my point is that if a schema is a file format, so are the contents of an Excel spreadsheet.


File or file layout? Cause both of these are probably stored as files, .sql and .xltx respectively.


It literally does not describe a file, and does not literally describe the data layout of anything on disk (though with enough knowledge, you may be able to infer facts about probable layouts).


> does not literally describe the data layout of anything on disk

Huh? Depends on the DMBS, but each InnoDB table is a file.

And the schema determines the file structure.


Schema is an abstraction over the file structure. Different RDBMSes will use different file layouts for a given schema. The same RDBMS may even have different engines that use different file layouts, or may change file layout between major versions.

"Determines" is too weak: it must be "is". If "schema is file layout" is true, then sure, a schema is a file layout. But if it is merely "schema determines file layout", then no, a schema is not a file layout.


Abstractions are notoriously leaky in DBMSes. First off, they don't even use the same SQL spec. Give me a schema that uses anything Postgres-specific, and I can tell you what the bytes on disk look like for a given row or index.

I think it's a moot point anyway because the language is broader than just files in the filesystem sense, which is basically what the court said too.


> but each InnoDB table is a file.

A table isn't a schema, it is a component of a schema, and most databases don't use InnoDB.


> it is a component of a schema

So if you have the schema, you have the tables.


The schema describes the database layout. The file layout (if you were going to call it that) in a modern RDBMS would describe how the RDBMS implemented a particular database layout as described by the schema.


Because it doesn't describe how data is laid out on disk.


Neither does a file layout. FS will decide that... even then, not physically.


We're talking about "file layout" at the application level, not the filesystem level.

But your comment illustrates just how difficult it is to nail these things down, based on inherently imprecise language.


So you mean the filetree and file contents, as seen by userspace program?

It's meant to be imprecise, because they didn't want some "gotcha." If they say we won't reveal the disk layout, technically you can't tell that from the filetree. If they won't reveal the filetree, but this is SQLite, it's always a single file. If it's file tree + contents, well the CPU byte endianness might matter for some DBMSes, even though you could just try both.


We can't FOIA details about how xls file laid out internally, despite that xls file being FOIA'ble itself. That's the file-format we're talking about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: