
Linus Torvalds: “I'm happily hacking on a new save format using ‘libgit2’” - hebz0rl
https://plus.google.com/+LinusTorvalds/posts/X2XVf9Q7MfV
======
mrcharles
On the game I'm currently working on, it's built very heavily around Lua. So
for the save system, we simply fill a large Lua table, and then write that to
disk, as Lua code. The 'save' file then simply becomes a Lua file that can be
read directly into Lua.

This is absolutely amazing for debugging purposes. Also you never have to
worry about corrupt save files or anything of it's ilk. Development is easier,
diagnosing problems is easier, and using a programmatic data structure on the
backend means that you can pretty much keep things clean and forward
compatible with ease.

(Oh, also being able to debug by altering the save file in any way you want is
a godsend).

~~~
Touche
That's how people are going to cheat at your game.

~~~
outworlder
The save format does not matter at all. It wouldn't matter even if it were an
obscure, made-up format. All it would do is slow down 'cheaters' by half an
hour.

The only argument against human-editable text files is parsing speed, not
security.

~~~
dinkumthinkum
Data size has a bit to do with it. Not trying to be pedantic, just adding
that.

------
bhaak
What's with all the XML hate? Of course, doing everything in XML is a stupid
idea (e.g. XSLT and Ant) and thanks heaven that hype is over.

But if I want something that is able to express data structures customized by
myself, usually with hierarchical data that can be verified for validity and
syntax (XML Schemas or old-school DTD), what other options are there?

Doing hierarchical data in SQL is a bitch and if you want to transfer it, well
good luck with a SQL dump. JSON and other lightweight markup languages fail
the verification requirement.

~~~
Sharlin
The issue is probably that 99.999% of all XML use cases don't use (or need)
the verification aspect. For all of those, XML is overkill. Besides, surely it
would be possible to design a verification layer on top of JSON, for instance
- the fact that one does not currently exist does not mean that XML (and abuse
of XML!) should not be criticized.

~~~
bananas
One of the core aspects of XML that is really important is that no typing is
inferred by the structure of the file unlike JSON. JSON is by nature tied to
the JavaScript type system which is sparse and inaccurate. For example, if you
look at the following:

    
    
       { "name": "bob", "salary": 1e999 }
    

Ah crap! Deserializer blew (in most cases silently converting the number to
null)

    
    
       <person>
          <name>bob</name>
          <salary>1e999</salary>
       </person>
    
    

No problem. The consumer can throw that at their big decimal deserialiser.

And the following is _not_ acceptable as it breaks the semantics of JSON and
requires a secondary deserialisation step as strings ain't numbers...

    
    
       { "name": "bob", "salary": "1e999" }
    

JSON is a popular format but it's awful.

~~~
halflings
JSON's semantics is that you represent numbers by their decimal
representation.

In this particular case, you're giving a different representation, so of
course you an pass it as a string.

~~~
andor
His point was that this number is too large to store it in a Javascript Number
variable (which is a IEEE 754 double).

~~~
shawnz
OK, so the provided number format is not sufficient for the kind of numbers he
is trying to deal with. So instead you would represent it as a string and
handle the encoding/decoding of that number yourself. How is that different
from the XML way where there is no provided number format to begin with, and
everything is a string?

------
bananas
I think this title is wrong.

Firstly some clarification - this appears to just be about the persistence
format for his dive log. It was XML, now it's git based with plain text.

As someone who had to manage a system which worked with plain text files
structured in a filesystem for a number of years in the 1990s, this is done to
death already.

You now end up with the following problems: locking, synchronising filesystem
state with the program, inode usage, file handles to manage galore and
concurrency. All sorts.

Basically this is a "look I've discovered maildir and stuffed it in a git
repo".

Not saying there is a better solution but this isn't a magic bullet. It's just
a different set of pain.

~~~
xsace
Maybe you want to wait till he release something. Cause you know, if he took
months to get the big picture in mind, I doubt you grasp what he envision just
by reading his comment.

~~~
bsder
Yes, because his design of git was so well-formed.

Git is so well-designed that _expert users_ manage to trash their repositories
and propagate the damage.

Maybe that's not a problem of libgit. But tools are both the infrastructure
_and_ the UI.

~~~
taeric
Not sure what you are referring to. What are some common ways "expert users"
manage to "trash their repositories?"

~~~
bsder
Let's start here: [http://randyfay.com/content/avoiding-git-disasters-gory-
stor...](http://randyfay.com/content/avoiding-git-disasters-gory-story)

So, the solution to the fact that the merging UI is a pile of garbage is _HAVE
A SINGLE PERSON ALWAYS DO THE MERGE_. Excuse me? The whole point of a
distributed revision control system is so I don't have to have a single choke
point. That's the definition of _distributed_.

Then there was the KDE disaster:
[http://jefferai.org/2013/03/29/distillation/](http://jefferai.org/2013/03/29/distillation/)

Yeah, the root fault wasn't Git. However, at no point did Git flag that
something was going horribly wrong as the repository got corrupted and
deleted. Other distributed SCM systems I have used tend to squawk very loudly
if something comes off disk wrong.

Maybe the underlying git data structures are fine, but, man, the UI is a pile
of crap.

And, I won't even get into rebase, because that seems to be a religious
argument.

~~~
smharris65
The issues in the randyfay.com post are due to a misunderstanding when using
git as a "centralized" repo like SVN. Git, by design, does not enforce a
central repo even if you designate one logically. These issues can be
completely avoided if you merge the right way:

[http://tech.novapost.fr/merging-the-right-way-
en.html](http://tech.novapost.fr/merging-the-right-way-en.html)

~~~
pjc50
Well, that confirms that the "obvious" workflow of "git pull" is dangerous. At
least it explains all the spurious merges. Why on earth did it ship with this
broken design? Why doesn't git pull do the right thing by default?

~~~
judk
Yup, and Windows is broken because ctrl-c copies text instead of killing a
process.

Why doesn't Windows do the right thing by default?

Oh, its because a different system behaves differently.

DVCS is fundamentally more complex than VCS.

------
WalterBright
Back in the bad old DOS days, instead of creating a file format for
saving/loading the configuration of the text editor, I simply wrote out the
image in memory of the executable to the executable file. (The configuration
was written to static global variables.)

Running the new executable then loaded the new configuration. This worked like
a champ, up until the Age of Antivirus Software, which always had much grief
over writing to executable files.

It's a trick I learned from the original Fortran version of ADVENT.

~~~
strictfp
Thank you for that anecdote, it made my day. Simply awesome.

~~~
WalterBright
I learned a heckuva lot from reading the ADVENT Fortran source code. I was
floored when I figured out how it was saving its configuration - such a
brilliant idea. And in DOS it could be implemented in about 5 lines of simple
C code. (Code size was critical in the old 64Kb days.)

The other huge thing I learned from ADVENT was polymorphism. The comment in
the source code "the troll is a modified dwarf" was an epiphany for me.

------
jmnicolas
From the comments (Tristan Colgate) :

"XML is what you do to a sysadmin if waterboarding him would get you fired.﻿"

Made my day :-)

~~~
Ygg2
That's just mean. Waterboarding isn't that bad...

~~~
jmnicolas
But it gets you fired ... on the other end, nobody has ever been fired for
using XML.

------
lifeisstillgood
What I like is the "I dont start prototyping till I have a good mental
picture"

I am currently stuck on a project I want to start becasue I cannot get it to
fit right in my (future) head. And I am glad I am not an idiot for not being
able to knock out my next great project in between lattes.

(Ok, in direct comparison terms I am an idiot, but at least its not
compounded)

~~~
specialist

      "A change in perspective is worth 80 IQ points."
      
      -- Alan Kay
    

My biggest hurdle solving new problems is divining a unifying, simplifying
metaphor. Once you have the right notion, that Eureka! moment, everything
falls into place, like magic.

Like how Kepler was able to fully explain Bache's astronomical data once he
realized the planets orbits the sun.

Personal example: I used to write print production software. Placing pages
onto much larger sheets of paper that get folded and bound into a book. A task
called image positioning aka imposition. It took me years to figure out how to
model the problem. Key insight was simulating the work backwards, from binding
back to the press. Then when I showed the new solution to my coworkers, the
response was "Well, duh."

------
tzury
I just realized that Linus' posts are the only reason I ever go to Google
Plus.

~~~
cbsmith
The question nobody is asking, but actually should is: I wonder what other
good G+ content you are missing?

G+ is largely misunderstood. It is a lousy tool for interaction with people
connected to you purely socially. It's a very good way to find and interact
with people connected to you by interest.

~~~
icefox
The really sad thing is that I have tried several times to _search_ for
content that I know exists on G+, but I can't find it, even when I knew the
author. After the third time failing at this my usage of G+ dropped
significantly. Of all of the things that you would think would work search
would be at the top... :|

~~~
tmzt
Right, if I could subscribe to a Circle with all of the kernel devs in it I
would.

G+ is actually a great place to read long form messages and comments, but
doesn't really have content discovery down.

------
oneeyedpigeon
I don't quite get Linus' problem with XML for document markup (for anything
else - config files, build scripts - sure, XML is horrible). Does anyone know
any more details about what his specific gripe is? For me, asciidoc (which
looks very similar, conceptually, to markdown) suffers from one huge problem:
it's incomplete. Substituting symbols for words results in a more limited
vocabulary, if that vocabulary is to remain at all memorable.

Sure, XML _can_ be nasty, but thats very much a function of the care taken to
a) format the file sensibly b) use appropriate structure (i.e. be as specific
as necessary, and no more).

~~~
babarock
For those who missed it, here's what Linus wrote in the comments:

"+Aaron Traas no, XML isn't even good for document markup.

Use 'asciidoc' for document markup. Really. It's actually readable by humans,
and easier to parse and way more flexible than XML.

XML is crap. Really. There are no excuses. XML is nasty to parse for humans,
and it's a disaster to parse even for computers. There's just no reason for
that horrible crap to exist.

As to JSON, it's certainly a better format than XML both for humans and
computers, but it ends up sharing a lot of the same issues in the end: putting
everything in one file is just not a good idea. There's a reason people end up
using simple databases for a lot of things.

INI files are fine for simple config stuff. I still think that "git config" is
a good implementation."

~~~
bhaak
Linus' adversion to XML explains also why parsing git's output is so abysmal
inconsistent.

Subversion has a really good XML output for its log command which is a joy to
use (and that's something to say if you work with XML) whereas with git you
always have ugly format options that are most of the time underdocumented.

~~~
davvid
I disagree. It's actually quite simple, and _fast_.

Git's output was designed in the Unix spirit; you can parse it very quickly
without needing a parser toolchain.

It's also extensively documented: git help log, etc

------
josephlord
[https://github.com/torvalds/subsurface](https://github.com/torvalds/subsurface)

I didn't really know what he was talking about but I think this is it.

The title does need changing though as it is definitely file formats under
discussion not file systems.

------
vfclists
What is it with HN commenters and their demented ability to send topics
completely of track? I would have thought someone might have examined the code
or what Linus is trying to implement and comment about it.

But here we have threads about Lua, why people hate XML and love JSON and all
kinds if irrelevant issues which have been well hashed elsewhere ad nauseam.
Why not restrict to an analysis of whatever it is Linus developing?

HN is getting truly annoying and sucky, if it isn't so already.

------
fuzzix
> "I actually want to have a good mental picture of what I'm doing before I
> start prototyping. And while I had a high-level notion of what I wanted, I
> didn't have enough of a idea of the details to really start coding."

This I like. The race away from the waterfall straw man has also stripped us
of the advantages of BDUF.

While rigid phase-driven project management helps nobody, I think there's
still room for speccing as much as we can upfront within iterative processes.

Or you could run to the IDE and start ramming design pattern boilerplate down
its throat the second you're out of the first meeting ;)

~~~
hvidgaard
You should be speccing what you want to achieve: the goals, the why, the
impact, the external limitations, measures of success and so forth. This also
allows you to describe and plan testing up front. The "how" is best handled in
an iterative manner.

A lot of people use AGILE to avoid planning at all, which is a particular
destructive anti-pattern, and the exact opposite of what you need.

~~~
pessimizer
>The "how" is best handled in an iterative manner.

I think that the first "how" should be planned as much as anything else. I
understand how you refactor from v0.0.1 to v5.34.2 iteratively, but I think
that getting from vNothing to v0.0.1 is qualitatively different.

If I don't have a complete idea of how my minimally functional thing will work
that is small enough that I can completely hold it in my head, and instead
just architect by agglutination and test writing, 1) my results are going to
be hacky garbage, 2) my first 50 iterations are going to be devoted to
replacing it all haphazardly to fix bugs, and 3) the code and interface will
become increasingly more complex, harder to work with, and strewn with special
cases.

When v0.0.1 is well planned, v2.5.2 may not look anything like the plan
anymore, but in my experience it becomes shorter, cleaner, and more correct
rather than a giant ball of band-aids propped up with tests.

~~~
hvidgaard
Personally, once "the goals, the why, the impact, the external limitations"
have been defined, we start to do mockups, a programmers "differential", to
zero in on a solution that satisfies the demand.

With that on the board/paper/code, we can start to test our assumptions and
iterate on the solution. I do not know if that classifies as planning, but it
works very well.

------
splitbrain
he talks about a save file format, not a file system. or do we have different
concepts of "file system"?

~~~
sp332
I agree it's confusing, I think the submitter just meant "system for files" or
something.

~~~
Pxtl
That would be excusable if we were talking abuot somebody who writes higher-
level programs that would be excusable, but not for a kernel developer.

~~~
saraid216
The submitter isn't Linus.

------
k2enemy
I don't really understand what he's talking about here (my ignorance, not his
fault.) Is it something like
[https://camlistore.org/](https://camlistore.org/) that is a content
addressable (the git part) datastore?

~~~
saljam
Yep, I thought it sounded like Camlistore, but as a library.

------
pcj
>>So I've been thinking about this for basically months, but the way I work, I
actually want to have a good mental picture of what I'm doing before I start
prototyping. And while I had a high-level notion of what I wanted, I didn't
have enough of a idea of the details to really start coding.

This might be a tangential discussion. Earlier, I used to have a similar
approach. Can't code until I have the complete picture. But, it's tough to do
in a commercial world and you have deliverables. So, nowadays, I start with
what I know and scramble my way until I get a better picture. There are times
when that approach works. But, there have been days where I was like - "wish I
had spent some more time thinking about this".

I am curious how folks on HN handle this "coding block".

~~~
tonyarkles
I've got a few strategies that might help, depending on the circumstances.

A notebook: I'll write down some notes and just kind of free write whatever
thoughts come to mind. If there's something that I think is important to come
back to, I'll draw an empty box in the left margin (to be filled with a check
mark later)

Readme: start writing the Readme for the project, even if you're not entirely
sure of the details. Include code examples. If you don't like how the API is
coming together, change it. It's way less work to modify the API now than it
will be later.

Write a test: I don't always unit test, but when I do I test first :). This
works well on projects that already have a decent test suite. It's kind of an
executable version of the Readme.

Branch and Hack: branches are cheap. Make one and start playing. Don't like
how it's turning out? Make a new branch and try again!

Ctrl-Z: maybe the answer won't come to you right away. Let it sit and run in
the background for a while and come back to it. If I'm worried about
forgetting details, I'll write it down in a notebook first.

------
aashishkoirala
This is what Linus does. He has strong opinions and he throws them around. You
can't let that get to you. Both XML and JSON are just fine if used properly.

~~~
theandrewbailey
This is the first profanity-free Linus rant that I've read in a long time.

~~~
vacri
Almost all of Torvalds' "profanity rants" that get passed around are the
result of frustration at an existing conversation, and you can find profanity-
free comments by him simply by checking out a slightly earlier one.

------
beagle3
And the actual description is here:
[http://lists.hohndel.org/pipermail/subsurface/2014-March/010...](http://lists.hohndel.org/pipermail/subsurface/2014-March/010592.html)

------
tedchs
Why reinvent on-disk data formats when you can just make a file of protocol
buffers?
[https://code.google.com/p/protobuf/](https://code.google.com/p/protobuf/)

~~~
sparkie
Why reinvent binary serialization when you could use ASN.1, or any of the
thousand binary serialization formats that pre-date protobufs?

~~~
lern_too_spel
For that specific example, you can find a good discussion here:
[https://groups.google.com/forum/m/#!topic/protobuf/eNAZlnPKV...](https://groups.google.com/forum/m/#!topic/protobuf/eNAZlnPKVW4)

------
Gonzih
Current title that I see "Linus Torvalds on implementation of human-readable
file system" is off. It's about file formats, not file systems.

------
senthilnayagam
why do you need to view filesystem and make it readable for humans, you would
interact it via commands "ls" or some gui

git as the basis of filesystem is interesting, hope we don't need to manually
make branches and commits to use it

~~~
oneeyedpigeon
Did you read the article? It's not really about the filesystem. 1 part your
fault for seemingly not reading the article you're commenting about, 1 part
the submitter's fault for choosing such a misleading title.

------
joelhaasnoot
Worked on a project a few years ago where we needed distributed sync
capability. Using git (or bazaar or mercurial) was one of the options - store
everything in it versus a database. Interesting to see the same thought
"coming back".

~~~
fit2rule
I've also used libgit as a means to a similar end - providing versioned data
across a local filesystem. Its an idea whose time has come ..

------
hardwaresofton
Why not sqlite or sexpressions? Linus states that databases can't hold
previous state but that's not really true...

I'm not sure why git is the best tool for the job in this case, even after
reading the post & some of the contents.

~~~
tmzt
They can, if you recreate the primary feature of Git on top of them.

------
signa11
erik-naggum's most excellent xml rant:
[http://www.schnada.de/grapt/eriknaggum-
xmlrant.html](http://www.schnada.de/grapt/eriknaggum-xmlrant.html)

------
sam_bwut
At work we have a git backed document store that just saves as json -
versioning makes keeping track of audit points nice and easy.

------
twic
Title is entirely misleading. Tech support! TECH SUPPORT!!

~~~
anon4
Have you tried turning it off and on again?

------
meapix
xml haters!!! using other formats how can I define DTDs?

~~~
1ris
[https://news.ycombinator.com/item?id=7333354](https://news.ycombinator.com/item?id=7333354)

