Hacker News new | comments | show | ask | jobs | submit login

The Linux kernel has over 15 million lines of code, people normally don't hold it against it. Judging a piece of software by its LOC count is a fallacy.

A project with rigorous error handling and testing will have more LOCs than a corresponding project without.

Some problems are just hard, and you'll want as much code as is necessary to make it secure and performant. Some parts of the code you will never run, but inactive code seldom hurts you.

MySQL has its issues, but none of them would be fixed just by having less code.




It is material as a measure of complexity when it becomes necessary for a developer to understand that complexity. Database tuning is a black art to many because databases are very complicated.

It turns out that having less code does in fact fix some issues. Consider Prevayler, for example, an object persistence layer that provides full ACID guarantees in something like 3 KLOC. It's also radically faster. It has a number of limitations (e.g., data must fit in RAM, no query language) but if you're ok with that, it's great: a Prevayler system is radically easier to reason about and optimize than something 300x as complicated.

Also, your 15 MLOC for Linux is a bit of a red herring. Look at this (somewhat old but presumably representative) breakdown here:

http://www.h-online.com/newsticker/news/item/Kernel-Log-More...

70% of the code is in drivers and arch. The kernel itself is circa 1% of the total, which at that point was 75KLOC. I think that they've been so disciplined in keeping it small is part of what has made Linux such a success.


Hi,

The way I look at it is that a database, much like Linux, is a platform. I very seldom look at the source code for either for day to day programming.

As popular platforms they have in common that they are very well tested through everyday use, and are likely to operate as documented for ordinary configurations.

When you can rely on the correct operation of the system, the code of the underlying implementation is irrelevant. What you care about is how well the system supports your requirements, and what performance you can get by tweaking the available knobs.

Contrast this with your average 1000 line script. It has simplicity on its side, but when something breaks, that script is a suspect, and the source code of your DB probably isn't.

> Consider Prevayler, for example [...]

I'm not really sure what you're saying here. That an in-process in-memory object persistence framework without indexing can be faster than a heavy-duty relational database? That's not just "less code", that's "less features". Or "different features", at any rate; they're not the same species. I'm just going to assume what you mean to say is, "Not everyone needs a relational database".

> Also, your 15 MLOC for Linux is a bit of a red herring. [...]

All the more reason that the LOC count by itself is a meaningless metric.


Sure, databases are a platform all their own. Like any platform, as long as you are operating well within the expected envelope, they work as advertised. When you get near the edge, though, you really need to understand how they work. As we are seeing with the rise of all sorts of RDBMS alternatives, a lot of people are getting near a lot of different edges.

The ability to understand how something works is a function of complexity. LOC is correlated with complexity, so it's a good rough metric. If you have a better one, please offer it. But otherwise I'll stand by my original point, which is that the guy bitching about 1000 lines of consistency code is ignoring the much larger amount of code used in alternative approaches.

What I'm saying with Prevayler's example is that if you don't need all the features of a database, then the extra complexity is a drag on what you're trying to get done. Less features means less code means less work to master.

> All the more reason that the LOC count by itself is a meaningless metric.

Yes, you throwing in a bullshit number is definitely proof that all numbers are bullshit. Bravo.


> The Linux kernel has over 15 million lines of code, people normally don't hold it against it.

They would if all the Linux kernel did was play Tetris for example. The point was that here is 1000 lines of what someone thinks is awkward code to deal with eventual consistency vs 1M lines of code to deal with consistency in another case. If consistency for a particular application can be dealt with in 1000 lines, you should usually go for that instead of for the millions of lines solution.

Think of it the other way. They have availability and partition tolerance from Riak and they can handle eventual consistency with 1000 more lines. Now imagine you have MySQL and you have to make run in a multi-master distributed mode, how many lines of code would you need to handle 2 of the CAPs and then haven an application specific way to handle an incomplete (or untimely third part)? I bet it would be more than 1000 lines...


> Some problems are just hard, and you'll want as much code as is necessary to make it secure and performant.

Wtf? Since when did bloat make code "secure and perfomant" ? And it hurts you if you ever want to touch or look at that code again.


It is true that the most important thing is a good design, which will hopefully get you good performance with minimal and maintainable code.

However, in my experience, there are almost always additional optimizations that can be done after you have implemented your basic design. Things like "this part could make smarter choices with a more complicated heuristic", "we could use a faster datastructure here, though it requires a lot of bookkeeping", or "We could cut a lot of computation here with an ugly hack that cuts through the abstraction".

Of course, more code makes it harder to change the structure of the program, so it's the classic trade-off of maintainability versus optimization.

A good example of this, besides databases, is CPUs. Modern CPUs use loads of silicon on complex optimization tricks; out-of-order execution, register renaming, prefetchers, cache snooping. And all that "bloat" is actually making it faster. You can't make a super-fast CPU by removing all the cruft to get a minimal design. (Or rather, you can make it faster for certain cases, but it would be slower at doing almost anything useful.)


>Wtf? Since when did bloat make code "secure and perfomant" ?

WTF? Since when one reads the phrase "Some problems are just hard, and you'll want as much code as is necessary to make it secure and performant." and deduces (who knows by what logic) that the guy means _bloat_ and not _necessary_ code (error checking, code for handling corner cases, etc)?

Not to mention that bloat is a silly term used by non-programmers to mean "this program is large" or "I don't use that feature, so it must weight down the program needlessly".

That is, people who don't understand that features they don't make use of (e.g the full text search capability of MySQL) are not even loaded from disk by the OS in the first place, or that most of the size of a large program like Office is not comliled code but assets (graphical etc).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: