Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's the largest amount of bad code you have ever seen work?
428 points by nobody271 on Nov 13, 2018 | hide | past | favorite | 578 comments
I think I've broken my own record with this one ~2500 lines of incoherent JavaScript/C#. Works though.

Oracle Database 12.2.

It is close to 25 million lines of C code.

What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.

Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.

Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.

The only reason why this product is still surviving and still works is due to literally millions of tests!

Here is how the life of an Oracle Database developer is:

- Start working on a new bug.

- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.

- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.

- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.

- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.

- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.

- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.

- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.

- Finally one fine day you would succeed with 0 tests failing.

- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.

- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.

- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.

The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).

The fact that this product even works is nothing short of a miracle!

I don't work for Oracle anymore. Will never work for Oracle again!

Sounds like ASML, except that Oracle has automated tests.

(ASML makes machines that make chips. They got something like 90% of the market. Intel, Samsung, TSMC etc are their customers)

ASML has 1 machine available for testing, maybe 2. These are machines that are about to be shipped, but not totally done being assembled yet, but done enough to run software tests on. This is where changes to their 20 million lines of C code can be tested on. Maybe tonight, you get 15 minutes for your team's work. Then again tomorrow, if you're lucky. Oh but not before the build is done, which takes 8 hours.

Otherwise pretty much the same story as Oracle.

Ah no wait. At ASML, when you want to fix a bug, you first describe the bugfix in a Word document. This goes to various risk assessment managers. They assess whether fixing the bug might generate a regression elsewhere. There's no tests, remember, so they do educated guesses whether the bugfix is too risky or not. If they think not, then you get a go to manually apply the fix in 6+ product families. Without automated tests.

(this is a market leader through sheer technological competence, not through good salespeople like oracle. nobody in the world can make machines that can do what ASML's machines can do. they're also among the hottest tech companies on the dutch stock market. and their software engineering situation is a 1980's horror story times 10. it's quite depressing, really)

Makes you think of how exactly code_quality correlates with commercial success.

That sounds like a market ripe for disruption - imagine a competitor that follows software engineering best practices.

That is absolutely insane. I can't even begin to imagine the complexity of that codebase. I thought my Rails test suite was slow because it takes 4 minutes. If I wrote it in C or C++ it would probably be 10 seconds.

I can't imagine a C/C++ application where the test suite takes 20-30 hours on a test farm with 100-200 servers. And if you can break 100-1000 tests with a single change, it doesn't like things are very modular and isolated.

And 30 hours between test runs! I would definitely not take that job. That sounds like hell.

It's a good exercise to imagine how the job would be sold. Things like this would definitely not come up in the interview process, instead they would sell you on "you get to work on a cutting-edge db kernel that is running most of the Fortune 100s" or sth like that, which is true (!), but doesn't describe the day to day.

The best way to guess this is to extrapolate from the interview questions. If they ask you a lot of low-level debugging/macro/etc questions..

> The best way to guess this is to extrapolate from the interview questions.

Wouldn't you just ask the developers interviewing you outright, "can you walk me through an example of your day? How long does it take you to push out code? What's testing like? Do you run tests locally, or use something like Jenkins?" etc.

Most new hires are probably not being interviewed by devs, but either by 3rd-party recruiters or internal recruiters with HR. When I was working in recruiting, the last thing either we or the client wanted was for the new hire to talk to either the person who they were replacing or any of the potential coworkers. Heck, one internal recruiter I had to interface with at a company I choose not to disclose said to me, "can we ask if they read Hacker News? There's some bad vibes about us there."

Which is when I got back on HN regularly :-)

(PS I did tell the internal person that there was no way that reading HN was related either to a BFOQ or other job requirement; and thus while it's not illegal, it'd be highly suspicious.)

> When I was working in recruiting, the last thing either we or the client wanted was for the new hire to talk to either the person who they were replacing or any of the potential coworkers.

What the fuck? Am I a spoiled tech-bro, or does that sound completely insane to anyone else? I would 100% not take a job if I didn't get a chance to talk to my coworkers and future manager during the interview process.

Perhaps you are spoiled (as am I in that regard) but i would absolutely never take a job unless I knew who I was going to be working with and had a chance to ask them honest questions.

Seems like a trap set up for fresh out of college hires. I don’t know any senior developers who would even consider a job under those circumstances.

On the contrary, the interview was an ordinary one. The screening round consisted of very basic fizzbuzz type coding ability checks: Reversing a linked list, finding duplicates in a list, etc.

Further rounds of interviews covered data structure problems (trees, hashtables, etc.), design problems, scalability problems, etc. It was just like any other interview for software engineering role.

"Well, your interviews went quite well. Now the final question: what would you do if you start losing your mind?"

"I'd like you to write a graph algorithm that traverses the abyss, the cosmic horror that consumes one's mind, that traverses twilight to the rim of morning, that sees the depths of man's fundamental inability to comprehend.

Oh ya, the markers in here are pretty run down, let me pray to the old ones for some more"

I have written the algorithm you requested - but I wish I hadn’t run it. I hit ctrl-c when I realized what it was doing but it was too late... The damage is done — we are left with only the consequences and fallout.

Forgotten dreams like snowflakes melt on hot dusty ground, soon to turn into hard dry mud beneath a bitter polluted sky.

Pretty sure I was asked that question in an Amazon interview.

Were you even given substantial time to ask the interviewers questions? In most interviews I’ve done, even later round interviews whether it’s a finance company, start-up, FAANG, and companies of all sorts in between, I was given at most 5 minutes to ask questions after some dumb shit whiteboard algo trivia.

I was given 5 minutes to ask questions after each round of interview. That part was ordinary too. That's what most of the other companies do (FAANG or otherwise).

The real risk is for people who are too young to know what to ask

I'd hope they wouldn't even consider somebody for this sort of job who's too young to know what to ask.

That's kind of naive, of course you want young people who will work hard and maybe not know what they are getting in to. I was offered a job at oracle back in the day, I would have felt a lot of despair if this is what it was.

I am not sure what position you were interviewing for and to what level of interview you made it.

When I was interviewing for an SRE position with Google in Dublin, I had about 10min to ask questions in each of the 5 interviews that were conducted on-site.

In between the interviews, a sixth SRE would take me to lunch for about an hour. Anything discussed with him wouldn't be evaluated as part of the interview.

So there was plenty of time for questions, I would say.

Hell for the proactive go-getters, but paradise for people who enjoy any excuse for a bit of justifiable down time!

Q: Are you busy? A: Yes, in the middle of running tests...

That would have been fun but in reality there was no downtime. Developers like me were expected to work on two to three bugs/features at a time and context switch between them.

If I submit my test jobs today to the farm, the results would come one or two days later, so I work on another bug tomorrow, and submit that. Day after tomorrow, I return to the first bug, and so on.

How would you know that merging code from the first bugfix wouldn't break the (just tested) code from the second bugfix?? Would you assume that the first bugfix will be merged first and branch off of that?

Without knowing Oracle's approach, this sort of problem is no different from any other software, even tho it reaches a larger scale.

Branch from master, and rerun tests before the final merge, like you should in any other software? (Many processes fail that criterion, see https://bors.tech/ for something that gets this right).

Ideally you work on a different enough bug that there's limited interaction, and ideally that's figured out before you fix it, but those criteria are indeed harder to satisfy in a bigger software.

But if the time needed to test and deploy a change is so ludicrous, it seems like you'd rarely get a big-enough window to rerun your tests before the upstream changes again. Either people are merging in unstable code, or the release lifecycle is a slow byzantine nightmare too (probably the case here).

Usually you don't test a single change before merging, but branch from master, merge any number of changes and then run the tests. So the master branch would move forward every 20-30 hours in this case, unless the tests of a merge batch fail, in which case master would kinda stall for a bit.

I understand. It was partially a tongue in cheek remark :)

Tests in C/C++ run shockingly fast. I ported an application from Ruby to C++ and the tests ran in well under a second when it was taking 10+ seconds in Ruby. Granted because of C++'s type system there were fewer tests, but it was fast enough that I kept thinking something was wrong.

It's because Ruby is only of the slowest languages out there, and C/C++ is usually #1-#2 on many benchmarks.

Are you including the time to build/link the tests? This is especially true if you have a bunch of dependencies. Last time I worked on C++ tests most of my time was spent on getting the tests to link quickly. Managed to get it from 25 minutes to 1 minute. But I'd rather have spent that time actually writing more test cases, even if they took 10s to run.

Started a new job a few months ago and we’re writing Go - a bunch of the test suites I’ve built run in microseconds. Statically typed compiled languages ftw.

You've violated the terms of service of Oracle Database by insinuating the codebase quality is in any way not superior to any and all competitors. No benchmarks or comparisons may be performed on the Oracle Database Product under threat of grave bodily harm at the discretion of our very depraved CEO.

I doubt the competition (e.g. IBM or Microsoft) has any better code quality. Even PostgreSQL is 1.3M lines of code, so let's get something deliberately written for simplicity. SQLite is just 130k SLoC, so another order of magnitude simpler.

And yet, even SQLite has an awful amount of test cases.


I'm sure some of the difference (25M vs. 1.3M) can be attributed to code for Oracle features missing in PostgreSQL. But a significant part of it is due to careful development process mercilessly eliminating duplicate and unnecessary code as part of the regular PostgreSQL development cycle.

It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.

> It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.

The single hardest thing about programming, I'd say.

In many (most?) ways the best edits of code are the ones where you can get rid of lines.

PostgreSQL has a lot of code but most parts of the code base have pretty high quality code. The main exceptions are some contrib modules, but those are mostly isolated from the main code base.

It's because software LOC scales linerarly with the amount of man-months spent: a testament to the unique ability of our species to create beautiful, abstract designs that will stand the test of time.

This is an interesting comment, because I can't decide if you are sarcastic or making a deep insightful comment. Because I don't think the statement is true. LOC can go on forever, but it usually happens in things that aren't beautiful and abstract.

I was being sarcastic.

thanks for reply. you said it so earnestly that i couldn't tell!

Worked on sql server for 10+ years. MS SQL Server is way better than that. The sybase sql server code we started with and then rewrote was as bad as oracle.

I guess that is just because SQL as a standard is not coherent nor something beatifully designed. SQL is mashup of vendor specific features all bashed togehter into one standard.

There's also a lot of essential complexity there. SQL provides, in essence, a generic interface for entering and analyzing data. Imagine the number of ways to structure and analyze data. Now square that number to get the number of two tests for how two basic features of the language interact with each other. And that's not even near full test coverage.

Your point about essential complexity is absolutely correct, but your faux mathematical analysis is totally not a legit way to analyze the complexity of something or determine test coverage. I feel like as programmers we should be comfortable making sensible statements without making up shady pseudo-math to sound convincing.

It's abundantly clear that I'm not making a precise computation here. My argument is that tests don't scale linearly with the number of features because interactions between features need to be tested as well.

Never having to use Oracle Database is a good result.

I can hear the clamoring of lawyers eager to fundraise for Larry's next flying sailboat.

A sentiment among members of a former team was that automated tests meant you didn't need to write understandable code - let the tests do the thinking for you.

This, and stuff like your story, are why I don't trust people who promote test-driven development as the best way to write clean APIs.

TDD need to die. This is a curse.

There should be integration tests along with some property based tests and fuzzy tests. Usually catches a lot of things.Invest in monitoring and alerts too.

TDD is like relying on debugger to solve your problem. Is debugger a good tool? yes,it is a great tool. But using it as an excuse to avoid understanding what happens under the hood is plain wrong.

The problem lies in industry where software engineering is not given any value but whoteboarding and solving puzzles is.

Software engineering is a craft honed over years of making mistakes and learning from them. You want code asap, kick experience engineers get codemonkeys in and get a MVP.

Quality is not clever algorithm, but clear conscise logic. Code should follow the logic, not the other way around.

Clear > clever.

And yet tests seem to have made this massive garbage heap actually work and enable a lump of spaghetti to continue to operate as a viable product. It doesn't mean you should write bad code, but it seems like if it can make even the most awful of code viable, then that's a pretty good system. The fact that modern medicine allows the most beat up and desperate to continue to live on isn't an indictment against medicine, it's a testament to it. Don't write bad code, sure. We can all agree to that. Don't prioritize testing? Why? To intentionally sabotage yourself so that you're forced to rewrite it from scratch or go out of business?

Depends on the definition of viable.

I’m sympathetic but this is too strong: what needs to die is dogma. TDD as a way of thinking about the API you’re writing is good but anything will become a problem if you see it as a holy cause rather than a tool which is good only to the extent that it delivers results.

I very much agree.

I remember when i realized that TDD shouldn't have such weight in our development as it had gotten (when it was high on the hype curve).

It was when we starting using a messaging infrastructure that made everything much more reliable and robust, and trough which we could start trusting the infrastructure much more (not 100% though, of course).

It made me realize that the reason why we did this excessively large amount of tests (1800+) was because the fragile nature of a request/response-based system and we therefore "had to make sure everything worked".

What I'm trying to get at here is thar TDD assumed the role of a large safety net to a problem we should have addressed in a different manner. After introducing the messaging, we could replay messages that had failed. After this huge turning point tests were only used for what they should have only been used for - ensuring predictable change in core functionality.

(our code also became easier to understand and more modular, but that's for another time...)

What you allude to there is pretty bad TDD. It was never intended as a replacement for good design, rather as an aid to be clear about design and requirements without writing tons of specs up-front.

And I agree, that there are lots of anti-patterns that have grown in tandem with TDD, like excessive mocking with dependency injection frameworks or testing renamed identity functions over and over just to get more coverage. However, I'd argue that is equally the fault of object-oriented programming though.

Where I disagree is this: TDD and unit tests are still a very useful tool. Their big advantage is that you can isolate issues more quickly and precisely, IF you use them correctly.

For instance, if I have some kind of algorithm in a backend service operating on a data structure that has a bug, I do not want to spend time on the UI layer, network communication or database interactions to figure out, what is going on. Testing at the right scope you get exactly that.

The problem with TDD is that the methodology wants to cover every change, no matter how internal, with some sort of external test.

Some changes are simply not testable, period.

No, you cannot always write a test which initially fails, and then passes when the change is made, and when this is the case. You should understand why that is, and not try.

In some cases when you can, yet still should not. If a whole module is rewritten such that the new version satisfies all of the public contracts with the rest of the code, then only those contracts need to be retested; we don't need new tests targeting internals.

It's because the old version wasn't targeted by such tests in the first place that it can be rewritten without upheaval.

I think TDD is the best way to develop (yet). Obviously tests are code, and if you write crappy highly-coupled tests you will end up with only much more messy code. This is a clear example of bad testing. The greatest advantage of TDD is in design, everything should be modular and easy to unit test, so you could:

- reproduce bug and verify your bugfix in matter of ms with proper unit test

- understand what code does

- change and refactor code whenever you want

You can tell from what is written that they are not following TDD. Redesign that codebase in an easy and clean to test design would require an exponential effort and time compared to have it done step by step, but it would be worth it

A unit test is the least useful kind of test. It requires your design to be "easy to unit test" instead of simple, and if you change something and have to rewrite the test you might miss some logic in both pieces.

Plus the tests never break on their own because they're modular, and each time you run a test that was obviously going to pass, you've wasted your time.

As long as you have code coverage, better to have lots of asserts and real-world integration tests.

Integration tests are much slower usually, and you are testing tons of things at the same time. Something breaks (like in that example) and you have no idea of what and why went wrong.

If you unit test properly you are unit testing the business logic, that you have to properly divide and write in a modular fashion. If you want to test a more complex scenario, just add initial conditions or behaviors. If you can't do that or don't know how to do that, then you don't know what your code is doing or your code is bad designed. And that may be the case we read above.

Tests rarely break because they help you not breaking the code and functionalities, and they are so fast and efficient on making you realizing that that you don't feel the pain of it.

I can't imagine any example where "easy to unit test" != simple

in my work with python, easy to unit tests usually makes things a bit harder. You want functional methods , not mega classes with 100s of class variables , and each class method operates on some portion of those class variables. It makes it impossible to truly isolate functionality and test it. While coding though, it is very easy to make a class and throw god knows what into the class variable space and access those variables whenever... However if we have staticmethods , not reliant on any class , just the arguments provided, and it doesnt modify any class state, the test are great. We can change/refactor our models with confidence knowing the results are all the same.

In my opinion, the only thing that is valuable about unit tests is more appropriately captured in form of function, class and module contracts (as in "design by contract"). Unfortunately very few languages are adopting DbC.

Functional tests now, that's another matter. But a lot of TDD dogmatism is centered on unit tests specifically. And that results in a lot of code being written that doesn't actually contribute to the product, and that is there solely that you can chop up the product into tiny bits and unit test them separately. Then on the test side you have tons of mocks etc. I've seen several codebases where test code far exceeded the actual product code in complexity - and that's not a healthy state of affairs.

In more recent times I've seen some growth in interest around contract testing. Unit tests are immensely more useful when paired with contract tests, but unfortunately without them they tend to be more of a hassle. At its essence integrations are a form of a contract, but those suffer their own problems. In rspec you have 'instance_double' which is a form of a contract test as well, but not really sufficient for proper testing IMO. The current state from what I've seen is a little lackluster, but I wouldn't be surprised to see a growth in contract testing libraries for a variety of languages popping up.

I had some tests on my codebase, but eventually only documentation and integration tests remained.

So let's look at a simplified example.


My tests are in the test folder. They are actually superfluous since integration tests test for the same thing.

I cannot break up the program in a way that would unit test a smaller piece of it in more detail. They only tests I can add would be to test the command line driver

For a single person and their one-person code base, you can certainly get away without unit tests.

This is especially if your "integration tests" are testing the same component, and not actually integrating with numerous other components being developed by different teams - or, if the system is so small it can run on a single workstation.

Working in teams on larger systems, the situation is different. Part of the point of unit tests is the "shift left" which allows problems to be discovered early, ideally before code leaves a developer's machine. It reduces the time until bugs are discovered significantly, and reduces the impact of one dev's bugs on other devs on the team.

TDD is yet another in a long line of "methodologies" that don't work. Tests are not a bad thing of course. The problem comes when you turn testing into an ideology and try to use it as a magic fix for all your problems. Same goes for "agile," etc.

Programming is a craft. Good programmers write good code. Bad programmers write bad code. No methodology will make bad programmers write good code, but bureaucratic bullshit can and will prevent good programmers from working at their best. The only way to improve the output of a bad programmer is to mentor them and let them gain experience.

The reality of working in teams at most companies is that there are going to be mediocre programmers, and even bad programmers, on the team. Many of the practices you refer to as bureaucratic bullshit are actually designed to protect programmers from the mistakes of other programmers.

Of course, this does require that the process itself has been set up with care, thought, and understanding of what's being achieved.

I'm probably not the best to speak on the topic as I don't use TDD (nor have I), but I think the idea is good if maybe a bit unorthodox: leveraging tests to define examples of inputs/outputs and setting "guards" to make sure the result of your code is as you expected via the tests.

I'm not keen on the "cult" of it, but if expectations of what the output should look like are available from the onset, it would appear to be of some benefit, at least.

What about TDD requires not understanding the code?

I'm confused by your comment. Your premise is that TDD should die, and your support is comparing it to a "great tool". Should TDD really die, or should people just stop treating things as a silver bullet? I personally love TDD, it helps me reason about my interfaces and reduces some of the more cumbersome parts of development. I don't expect everyone to use TDD and I don't use it all the time. Similarly I'd never tell someone debuggers should die and they should never use a debugger if thats something that would help them do their job.

The thing is, when I spend a lot of time thinking about how to make my program type-safe all of my unit tests become either useless or no-ops

Integration tests easily survive refactoring, on the other hand

Unit tests are a side effect of TDD, they don't have to be the goal. I'd find value out of TDD even if I deleted all of my tests after. It sounds like your problems are around unit tests, and that is neither something required to TDD nor is it something limited to just TDD.

The problem with integration tests is they are slow and grow exponentially. If they aren't growing exponentially then there's probably large chunks of untested code. Unit tests suffer their own problems, like you said they can be useless because of a reliance on mocking, they can also be brittle and break everywhere with small changes.

Ultimately any good suite of tests needs some of both. Unit tests to avoid exponential branching of your integration tests, and integration tests to catch errors related to how your units of code interact. I've experienced plenty of bad test suites, many of them are because of poorly written unit tests, but its often the poorly written integration tests that cause problems as well. As with most things, its all about a healthy balance.

No, like in some programs when I figure out how to do it correctly the unit tests are either complete tautologies or integration tests.

Then there are the "write once, never fail ever" tests. Okay, so the test made sense when I wrote the code. I will never touch that part ever again because it works perfectly. Why do I keep running them every time?

If the unit tests are tautologies then they aren't testing the right things, and if they are integration tests then they aren't actually unit tests.

I personally run my unit tests every time to confirm my assumptions that the unit of code under test hasn't changed. I also assume all code I write will inevitably be changed in the future because business requirements change and there's always room for improvement. Actually can't think of a single piece of code I've written (apart from code I've thrown out) that didn't eventually need to be rewritten. The benefit of running unit tests is less than the benefit of running integration tests, but the cost of running them is also significantly less. Current project I'm working on has 10x as many unit tests as integration tests and they run 100x faster.

My workflow is usually run my unit tests for the code I'm working on constantly, and when I think things are working run the entire test suite to verify everything works well together. Thats my workflow whether or not I'm doing TDD.

The code that determines truths about the data never had to be rewritten.

Like, are the two points neighbors? I mean, I'm not going to write a version of this function for a spherical board in the future. Nobody plays on a spherical board.

It's also a really boring unit test. Yes, (1,1) and (1,2) are neighbors. Do I really need to test this function until the end of time?

Thats exactly the type of code that should be unit tested. The unit tests are trivially easy to write, and a very naive solution is easy to code up. The tests should take up a negligible overhead in your overall test suite runtime. Then when it comes time to optimize the code because its becoming a bottleneck you can be confident that the more obscure solution is correct.

TDD should only drive the public interface of your "module", if your testing your internals your doing it wrong. It will hinder refactoring rather than help.

TDD doesn't think for you, it merely validates your own existing understanding/mental model and forces you to come up with it upfront. This is hardly a thing to be mistrustful about, unless you work with idiots.

You are right about that, but having code that passes a given test suite doesn't say anything about its secondary qualities, such as whether it can be understood. In theory, a failing test could improve your understanding of the situation, allowing you to refactor your initial pass at a solution, but I would bet that on this particular code base, the comprehension-raising doesn't go far enough, in most cases, for this to be feasible.

That seems orthogonal to testing though. Implementation code can be hard to understand with or without a test suite, at least with test, as you point out, you may be able to understand the behaviour at some higher abstraction.

ok but from reading a lot of the comments on HN it sounds like many posters here think that they do work with idiots.

Those idiots probably also think the same, though.

If everyone’s an idiot tbinking they’re surrounded by idiots, then TDD has no hope to ever succeed !


> TDD doesn't think for you

I totally agree, but I met several programmers who think the opposite.

Rich Hickey called it guard rail driven programming. You'll never drive where you want to go if you just get on the highway and bump off the guard rails.

Except that's a really bad analogy. It's more like you set up guard rails, and every time your vehicle hits a guard rail you change the algorithm it uses for navigation until it can do a whole run without hitting a guard rail.

I've experienced myself how the code quality of proper TDD code can be amazing. However it needs someone to still actually care about what they're doing. So it doesn't help with idiots.

It is not be a good analogy for TDD as properly practiced, but it seems to be very fitting for the situation described at the top of this thread, and that is far from being a unique case.

I don't think it's a generous analogy, but it's poking fun at being test DRIVEN, rather than driver driven. I think he'd agree with you that it's the thinking and navigating and "actually caring about what they're doing" that matters. Tests are a tool to aid that. Tests don't sit in the driver's seat.

Yeah. To me "test driven" really means that I write code under the constraints that it has to make writing my tests sensible and easy. This turns out to improve design in a large number of cases. There are lots of other constraints you can use that tend to improve design as well (method size, parameter list size, number of object attributes, etc are other well known ones). But "test driven" is a nice catch phrase.

> Except that's a really bad analogy. It's more like

The response to every analogy ever made on the internet. Can we stop using them yet?

Spot on: "Analogies: Analogies are good tools for explaining a concept to someone for the first time. But because analogies are imperfect they are the worst way to persuade. All discussions that involve analogies devolve into arguments about the quality of the analogy, not the underlying situation." - Scott Adams, creator of Dilbert (I know he's quite controversial since the election, but he's on point here) in https://blog.dilbert.com/2016/12/21/how-to-be-unpersuasive/

Scott Adams has been quite controversial long before the elections, ever since he got busted as a sock puppet "plannedchaos," posing as his own biggest fan, praising himself as a certified genius, and calling people who disagreed with him idiots, etc. Not to mention his mid to late '90s blog posts about women.

But at least he wasn't using analogies, huh?


>Dilbert creator Scott Adams came to our attention last month for the first time since the mid to late '90s when a blog post surfaced where he said, among other things, that women are "treated differently by society for exactly the same reason that children and the mentally handicapped are treated differently. It's just easier this way for everyone."

>Now, he's managed to provoke yet another internet maelstorm of derision by popping up on message boards to harangue his critics and defend himself. That's not news in and of itself, but what really makes it special is how he's doing it: by leaving comments on Metafilter and Reddit under the pseudonym PlannedChaos where he speaks about himself in the third person and attacks his critics while pretending that he is not Scott Adams, but rather just a big, big fan of the cartoonist.


>Dilbert's creator Scott Adams Compares Women Asking for Equal Pay to Children Demanding Candy

Hmm, that sounds an awful lot like another analogy to me, actually... Oops!

So maybe Scott Adams isn't the most un-hypocritical guy to quote about the problems of analogies.

Nice imagery. I like it.

The other commenters made me think of the kids game Operation. https://en.wikipedia.org/wiki/Operation_(game)

How about shock collar programming? Or electric fence programming? Or block stacking (Jenga) programming.

Good times.

Yup, in static vs dynamic conversations, I invariably see someone dismiss the value of compiler enforcement by claiming that you should be writing unit tests to cover these cases anyway. Every time I say a silent prayer that I never end up working with the person I'm talking to haha.

I don't see how this is an argument against TDD. Apparently a whole slew of things went wrong in this project but that doesn't imply that testing is the cause of them.

TDD only works in conjunction with thorough peer reviews. Case in point: at my place of work, code and tests written by an intern can go through literally dozens of iterations before the check-in gets authorized, and even the senior engineers are not exempt from peer reviews (recent interns are especially eager to volunteer).

The problem with TDD is that a dedicated developer can always make a test pass- the how is another matter.

when(mockObject.getVar()).thenReturn(1); assertEquals(1, mockObject.getVar());

test passes boss!

I am a current Oracle employee and blame a lot of the mistakes on the overseas development team in india. They are (not all but enough to matter) terrible programmers, but hey when you can throw 10 Indian programmers at a problem for the cost of one American... You can blame your blated mismanaged code base on their management over there. This is likely do to the attrition and generally less talented and less autonomous engineering style.

There is a clear difference between code developed AND maintained in the US vs. code that was developed in India, or code developed in USA and given to Indian developers to manage support. Nothing against Indians, but Ive been around the block and there seems to be a lesser quality of code from that part of the world and companies justifyvit in cost savings.

Actually, you can blame this to Oracle top management (especially in a company structured as Oracle is): they called the shots, from day 1.

I have not found this to be true at all. I have seen both US and Indian developers adding good code as well as ugly code to the Oracle Database product.

The actual damage was done much before I had joined Oracle. It appears that somewhere in the early 2000s, the Oracle codebase went from manageable to sphagetti monster. The changelog showed more changes from US developers than Indian developers at that time. Once the damage was done, all developers whether from the US or India now need to follow this painful process to fix bugs and add features.

Now pick almost any other category-leading software product and you will find a similar situation.

The category-leading product is probably from one of the earliest companies in the field, if not the first. They have the oldest and cruftiest code - and the manpower to somehow keep it working. It is definitely not the fastest and definitely not the most stable. But they do have the resources to make sure it supports all the third party integrations and features important for the big customers.

I have encountered exactly this same situation on several different fields and categories.

At at time when I was a complete open source fanatic in the early 2000s it suddenly made me realize how Microsoft actually had much better quality software than most big proprietary software vendors.

Sweet-merciful Jesus. You just made me experience Vietnam-style flashbacks. I worked at Oracle for the 12.1 and 12.2 releases (not working there anymore). You just described my day to day tenure at Oracle. Thank god that's done.

You described the early part of my career in software to a T.

I worked for a mini-computer company in the 1980's that ported Oracle (I'm thinking the version stamp was 22.1 when I was there from 1986-1990). It was one GIANT mess of standard "C" with makefiles that were in some ways larger and more complex than some of the actual kernel code it was building!

Took 24 hours to build the product... lol

> The only reason why this product is still surviving and still works is due to literally millions of tests!

Lesson learned. Always write tests. Your business will depend on it.

One one hand, sure. They're still able to ship a working product despite having an abysmal code base. That's an excellent end result that must not be underestimated. Perhaps the problem that code base solves is really that difficult and there's no other way.

But on the other hand, over-reliance on tests is one of the reasons they ended up in this situation in the first place. It's like the car safety engineer's joke - How do you make cars safer? Install a knife in the middle of the steering wheel pointed at the driver.

When we're too sure that some safety feature will save us, we forget to be careful.

As I read this, I am vacationing in Hawaii for the first time. I can look out my window right now and see the island of Lanai. And that's what I'm doing as I'm read your post right now.

Reading a few sentences, looking out at Lanai. Reading a few more sentences, and looking back at Lanai...

As someone who codes an ERP app built on 12.2 this comment resonated with me in ways you can't begin to imagine.

The notion of tests that take 20 hours blows my mind.

The notions of tests written in C that take 20 hours I can't even fathom.

I'm going to guess a lot of these are integration tests, not unit tests (simply going off execution time).

At that point, for DB testing, I doubt it matters what language test are written in, it's going to be mostly about setting up and tearing down the environment over and over.

Really surprising considering that Oracle is the standard for serious enterprise databases. Not really surprising when you consider Oracle's other bug ridden offerings (probably not as thoroughly tested). Makes me fear for Oracle 18c.

Not surprising at all. There code might be not performance, maintainable or good looking by developer standards but as OP said they have a gazillion of test cases that make sure oracle db runs and doesn’t produce weird outcomes.

Totally unsurprising if you've ever worked with Oracle. The layers upon layers of legacy cruft are plainly visible even in simple things like the GUI installers.

I remember an oracle forms product based product I helped develop to install on end users pc's required several oracle products installing - which meant 14 or 15 Floppy disks to be used in the right order.

The fact that the first version shipped in 1979 has to contribute to this as well.

The field of software engineering has matured a lot since then.

I mean, PostgreSQL can trace its roots back to 1982's INGRES ... and UNIX started in 1969.

There are quite a few very old projects that don't have the same level of cruft as Oracle; it epitomises a Sales Division driven culture.

How many of those switches (that now need to be supported and tested) are because some functionality was promised to a large contract, and so it just had to be done? I would wager a good number.

Good lord, just reading it can cause panic attack.

It literally gave me a sinking feeling. I’d quit that job on day 0.

But you would have to wait for day 1+ to realize

Sounds like a way an reinforcement learning algorithm could write code.

At least there are tests!

Tests that run for 30 hours is an indication that nobody bothered writing unittests. If you need to run all tests after changing X, it means X is NOT tested. Instead you need to rely on integrations tests catching Xs behavior.

I beg to differ. Having to run the full test suite to catch significant errors is an indication that the software design isn't (very) modular, but it has nothing to do with unit tests. Unit tests do not replace service/integration/end to end tests, they only complement them - see the "test pyramid".

I think it's important to point this out, because one of the biggest mistakes I'm seeing developers do these days is relying too much on unit tests (especially on "behavior" tests using mocks) and not trying to catch problems at a higher level using higher level tests. Then the code gets deployed and - surprise surprise - all kinds of unforeseen errors come out.

Unit tests are useful, but they are, by definition, very limited in scope.

(terminology nazi mode)

"... is an indication that the software design isn't (very) decoupled ".

You can be modular without being properly decoupled from the other modules.

Hmmm... you have a point, but then, shouldn't it be "decoupled into modules"?

But then are your modules really modular?

In a C/C++ world, a module is usually defined as a file/.dll/.so on disk. So highly-coupled modules are still modules.

Always test "outside-in", i.e. integration tests first, then if you can afford it, unit tests.

Integration tests test those things you're going to get paid for... features & use-cases.

Having a huge library of unit tests freezes your design and hampers your ability to modify in the future.

While I see some value in the red-green unit testing approach, I've found the drawbacks to often eclipse the advantages, especially under real-world time constraints.

In my day to day programming, when I neglect writing tests, the regret is always about those that are on the side of integration testing. I'm okay with not testing individual functions or individual modules even. But the stuff that 'integrates' various modules/concerns is almost always worth writing tests for.

Love the idea of this.

In my experience it's far easier to introduce testing by focusing on unit testing complicated, stateless business logic. The setup is less complex, the feedback cycle is quick, and the value is apparent ("oh gosh, now I understand all these edge cases and can change this complicated code with more confidence"). I think it also leads to better code at the class/module/function level.

In my experience once a test (of any kind) saves a developer from a regression, they become far more amenable to writing more tests.

That said I think starting with integration tests might be a good area of growth for me.

In general, I test those things that relate to the application, not those about the implementation.

i.e. Test business logic edge-cases, don't test a linked-list implementation... that's just locking your design in.

Writing functional tests is easy when you have a clear spec. If you do, tests are basically the expression of that spec in code. Conversely, if they're hard to write, that means that your spec is underspecified or otherwise deficient (and then, as a developer, ideally, you go bug the product manager to fix that).

Right, like I said "complicated business logic". I agree completely, I have no desire to test a library or framework (unless it's proven to need it).

For those who want to know more about this approach: see Ian Cooper - TDD, Where Did It All Go Wrong https://www.youtube.com/watch?v=EZ05e7EMOLM

I post this all the time, it's like free upvotes :)

Integration tests are pretty important in huge codebases with complex interactions. Unit tests are of course useful to shorten the dev cycle, but you need to design your software to be amenable to unit testing. Bolting them onto a legacy codebase can be really hard.

In large systems you can unit test your code within an inch of its life and it can still fail in integration tests.

Exactly. I have a product that spans multiple devices and multiple architectures: micro controllers, SDKs and drivers running on PCs, third-party devices with firmware on them and FPGA code. They all evolve at their own pace, and there’s a big explosion of possible combinations in the field.

We ended up emulating the hardware, run all the software on the emulated hardware, and deploy integration tests to a thousand nodes on AWS for a few minutes it takes to test each combination. Tests finish quickly and it has been a while since we shipped something with a bug in it.

But there’s a catch: we have to unit test the test infrastructure against real hardware - I believe it’d be called test validation. Thus all the individual emulators and cosimulation setups have to have equivalent physical test benches, automated so that no human interaction is needed to compare emulated output to real output. In more than a few cases, we need cycle-accurate behavior.

The test harness unit (validation ) test has to, for example, spin up a logic analyzer and a few MSO oscilloscopes, reinitialize the test bench – e.g. load the 3rd party firmware we test against, then get it all synchronized and run the validation scenarios. Oh, and the firmware of the instrumentation is also a part of the setup: we found bugs in T&M equipment firmware/software that would break our tests. We regression test that stuff, too.

All in all, a full test suite, run sequentially, takes about 40,000 hours, and that’s when being very careful about orthogonalizing the tests so that there’s a good balance between integration aspects and unit aspects.

I totally dig why Oracle has to do something like this, but on the other hand, we have a code base that’s not very brittle, but the integration aspects make it mostly impossible to reason about what could possibly break - so we either test, or get the support people overwhelmed with angry calls. Had we had brittle code on top of it, we’d have been doomed.

If you're only writing tests at the unit level, you might as well not bother. And it's always good to run all tests after any change, it's entirely too easy for the developer to have an incomplete understanding of the implications of a change. Or for another developer to misuse other functionality, deliberately or otherwise.

It could also be that the flags are so tangled together that a change to one part of the system can break many other parts that are completely unrelated. Sure you can run a unit test for X, but what about Y? Retesting everything is all you can do when everything is so tangled you can’t predict what a change could effect.

> Tests that run for 30 hours is an indication that nobody bothered writing unittests.

Yes, they were not unit tests. There was no culture of unit tests in the Oracle Database development team. A few people called it "unit tests" but they either said it loosely or they were mistaken.

Unit test would not have been effective because every area of the code was deeply entangled with everything else. They did have the concept of layered code (like a virtual operting system layer at the bottom, a memory management layer on top of that, a querying engine on top of that, and so on) but over the years, people violated layers and wrote code that called an upper layer from lower layer leading to a big spaghetti mess. A change in one module could cause a very unrelated module to fail in mysterious ways.

Every test was almost always an integration test. Every test case restarted the database, connected to the database, created tables in it, inserted test data into it, ran queries, and compared the results to ensure that the observed results match the expected results. They tried to exercise every function and every branch condition in this manner with different test cases. The code coverage was remarkable though. Some areas of the code had more than 95% test coverage while some other areas had 80% or so coverage. But the work was not fun.

I'm amazed that it had that many tests that took that long, but ONLY had 80-95% coverage. I understand the product is huge, but that's crazy.

I do. It's about state coverage: every Boolean flag doubles the possible state of that bit of code: now you need to run everything twice to retain the coverage.

FWIW, I know people who work on SQL processing (big data Hive/Spark, not RDMBS), and a recurrent issue is that an optimisation which benefits most people turns out to be pathologically bad for some queries for some users. Usually those with multiple tables with 8192 columns and some join which takes 4h at the best of times, now takes 6h and so the overnight reports aren't ready in time. And locks held in the process are now blocking some other app which really matters to the businesses existence. These are trouble because they still "work" in the pure 'same outputs as before', it's just the side effects can be be so disruptive.

Writing tests for error handling can be a pain. You write your product code to be as robust as possible but it isn't always clear how to trigger the error conditions you can detect. This is especially true with integration tests.

How about XS Max =]]

Code like this makes me think of the famous line from the film version of Hellraiser: "I have such sights to show you..."

Contrast PostgreSQL or... uhh... virtually any other database. Oracle's mess is clearly a result of bad management, not a reflection of the intrinsic difficulty of the problem domain.

Nonsense. The problem domain you dismiss is hideously complicated. Oracle DB and PostgreSQL are entirely different classes of products. No airline runs its reservation system on PostreSQL. That's not a coincidence.

It's not a coincidence, no, because Oracle can provide support guarantees in a way a Postgres contractor can not.

This is also a factor for independent developers (who build airline reservation systems) who need to choose a RDBMS for their product - they'll choose oracle, because ... Oracle can provide support guarantees in a way a Postgres contractor can not.

Which makes Oracle not a different class of product than Postgres, but a different class of support for the product. (which could be considered part of the product, so ... maybe you're right)

No, they use Amadeus. Amadeus is a wonderful mainframe program that perfectly and with 100% accuracy faithfully models how you'd book a train ticket in France in the fifties.

What more could we want?

> The problem domain you dismiss is hideously complicated.


Size of software reflects the number of people working on it (and for how long), not essential complexity.

Have you heard the term "marketing"?

This is interesting, could you elaborate?

The KDB+ database has been around for 20 years.

The executable is ~ 500kb.

Enterprise software is gross.

> The executable is ~ 500kb.

So... is this good or bad? A Hello World in Go is on the order of 2 MB. That doesn't say anything about code bloat, it just says that Go prefers static over dynamic linking.

That 2 MB might be messier than you suggest. It seems like people close to the project spent time trying to clean it up:


Perhaps the most valuable part of this whole thing are the tests. Perhaps with that test bank, one could start from scratch to write a new database.

I think this validates my view that testing is important, but keeping the codebase clean, readable and modular is more important!

Why not both?

Yes, both. But maybe one is more important?

I'd argue the tests is more important for example Oracle is still the leading commercial DB, If your product works people will buy it.

Isn't that mostly due to momentum? But if Oracle can't keep up with the features customers need, they'll lose that momentum.

Enterprise software sales cycles are slow, but once they start turning, it's hard to turn them back.

Really interesting post. I have often worked in such a mess of a code even though in orders of magnitude smaller code bases with only a few developers. I would never imagine a project like oracle is like this. Since there seem to be a number of oracle employees around, I would be very interested to know if there have been any propositions to start cleaning up this shit. The man hours wasted from this workflow is so huge that I expect that even a small percent of oracle's developers could be assigned to rewrite it and they would catch up the rest of the product in a reasonable time frame so that in a few years it could be rewritten and stop wasting developers' time.

Fingers crossed that there are no merge conflicts & conflicting tests.

Everyone's on the same 2 month like schedules, so I guess that won't be much of a problem.

So I think the interesting question is: when you run into a large, bloated, unwieldy POS codebase, how do you fix it? You obviously need some buy-in from management, but you also need a plan that doesn't start with "stop what we are doing and get everyone to spend all of their time rewriting what we have" or "hire a ton of new devs."

I have seen smaller versions of what the OP describes. My plan was that every new piece of code that was checked in had to meet certain guidelines for clarity--like can the dev sitting next to you understand it with little to no introduction--and particularly knotty pieces of existing code deserved a bug report just to rewrite and untangle the knots.

In the end, whatever your plan, I think what you need is a cultural change, and cultures are notoriously difficult to change. Any cultural change is going to have to start high up in the organization with the realization that the codebase is an unsustainable POS.

I had a very similar experience in another enterprise storage company with a code base of ~6M loc of C/C++ and gazillion test cases. Originally, it used to take roughly about an hour to just build the system where it did a bunch of compile time checks, linting etc. Then if everything goes well, it goes to code review, then to a set of hourly and nightly integration checks before it gets merged to the main branch. It would take another cool 3-4 months of QA regression cycle before it gets to the final build.

This sounds a lot like Walmart's codebase for their POS registers. Except, the kicker is that there are zero unit tests, zero test frameworks, etc. You just have to run it through the shitty IBM debugger and hope that you don't step on anyone else's work. Up until 2016, they didn't even have a place to store documentation. Each register has ~1000 flags, some of which can be bit switched further into testing hell.

The structure must be a high couple and low coherence. It should have been designed with high coherence. More components/modules should be designed with the ability to get further divided into smaller modules/components with the growth of the requirements. Bug fixing in smaller components is much easier than solving in the overall project.

Not sure if Oracles is already following this or not. But this is necessary for scalable projects.

How can it have millions of tests with 25 million lines of code? How many lines of code is there including the code in the tests?

You can have automated test generation. I'd imagine with a database system, you'd have a big list of possible ingredients to a query and then go thorugh a series of nested for loops to concatenate them together and make sure each permutation works separately. That can easily make for thousands of tests with only a few lines of code.

Along with what eigenspace said, check out SQLite's Testing page: https://www.sqlite.org/testing.html (the project has 711 times as much test code and scripts, as code). You can go really far... and still miss things on occasion.

The 25 million lines of code is only the source code of Oracle Database written in C.

The test cases are written in a domain specific language named OraTst which is developed and maintained only within Oracle. OraTst is not available outside Oracle. The OraTst DSL looks like a mixture of special syntax to restart database, compare results of queries, change data configuration, etc. and embedded SQL queries to populate database and retrieve results.

I don't know how many more millions of lines of code the tests add. Assuming every test takes about 25 lines of code on an average (every test used to consume about half of my screen to a full screen), we can estimate that the tests themselves consume close to another additional 25 million lines of code to 50 million lines of code.

now that's a candidate for a Rust rewrite

I wonder what version control system do they use for that monstrosity (how long it takes to checkout/commit changes)?

Thank you. I had a really rough day caused by the project I inherited. Doesn't seem so bad in comparison now.

Wow. What is the average salary for an Oracle developer doing this sort of work?

Sounds like working on batch apps on mainframes in the 70's, one compile/run cycle a day, next morning get a 1 foot high printed stack of core dump.

This is why Java is better.

What happens if any of the tests are wrong? (got bugs themselves)

not much difference, it’s a feature now!

it reminds my early days in oracle, most of the time was spent on debugging what's wrong with the testcase.

I love my job

I am maintaining one application in construction industry space. That application was created 25 years ago by construction worker that never wrote single line of code before, but because he caused a lot of problems on construction site they give him Programming 101 book and let him build it.

15 years later the app was close to half milion lines long of huge bowl of spaghetti code. Only comments in whole codebase were timestamps. I don't know why he dated his code, but I find it fascinating: he never deleted basically anything so you can find different timeframes of when he discovered various concepts. There is use-exception-instead-of-if period, there is time when he discovered design patterns, there is time before he learnt SQL so all the database queries was done by iterating over whole tables and such. I am sure I will find commented Hello world somewhere in the code someday.

I am working on this codebase for 10 years. Code quality improved and major issues get fixed, but there is not enough budget to actually rewrote whole system, so after all it is more or less huge spaghetti monster and I get used to it.

I have to salute this construction worker for building a solution that is apparently so valuable for the business that they can’t simply replace or rewrite it. This probably means that it solves a real problem for them, and adding 30.000 lines of code per year without any formal training or much tooling is no small feat either. I understand the criticisms and laughs here from the “real” software developers, but damn it’s just impressive what people can create on their own given enough time and motivation.

It is impressive indeed. The coding started just in time when Windows 95 were released. There was no Stackoverflow and they don't even really have internet back then. The programmer (as far as I know) didn't even speak English so he has access to book or two in German language and code snipets in help section of Delphi. At the same time creating applications with UI just started, so there was very little experience available, espcially in rural Austria.

Company did tried to migrate to other software few times, but the software is just too specific for given industry and legislation of small country that the companies who tried to create similar software usually went bankrupt soon.

Rural Austria? Please provide a company name ... or at least first and last character :)

Somewhere out there, there's a software developer who was assigned the task of building the team's office using 30,000 bricks, making all kind of spaghetti patches to prevent it from falling over, and the construction workers are laughing about it on a construction worker forum.

And this is the wall where he discovered you can use cement to bind the bricks. And here he even mixes that cement with sand and water. This is a safer place to stand.


A spaghetti monster which solves a real business problem can be improved, chunked into pieces, gradually rewritten, whatever improves maintenance. If need be, there will be funds and time for doing so.

By contract, an impeccably architectured, layered, no-design-patterns-omitted, product which solves no business problem .. oh, the horror.

One of my first jobs in the industry was really similar. I ended up sitting down with a friend and rewriting it in C#. We didn’t have permission, but no one knew until our codebase was already in working shape (a month or so). We got away with it because the original codebase was so bad that it hadn’t shipped in 5 years. Months of 0 productivity were normal. My friend and I went on to rewrite the entire suite of products over the course of a year or two. We then started our own business. The rewrites were the most successful products that company had ever had in the modern era. Rewriting is not always a bad idea, and it can be less expensive in the medium run. Few seem to realize this, thanks to Joel Spolsky’s blog post on the matter being seen as dogma.

We had a bunch of code at work that everyone sort of begrudgingly used that I wrote almost 20 years ago, not knowing too much about what I was doing. I have recently rewritten it – about 100kLOC of messy C++ turned into 25kLOC of bliss. The test coverage we had ensured that we didn’t have to worry about anything breaking. I hate myself a bit less :)

I feel like you could make a huge chart on the wall showing the epochs and what the previous developer didn't know at that time. Like "why would it be done this way? Ah 1997, let's see on the chart... Ah right, Greg doesn't know SQL here"

Haha that would be amazing!

what kind of problems do you cause at a construction site that do not get you fired but reassigned to Programmer with programming 101 book?

Probably fall from roof few times or was not really handy with hammer or something.

The company is quite fascinating, they started around 1945 and during the years they've became small conglomerate. There are three or even four generations workign together and once they like you as a person they will find something for you to do.

Is the original programmer still at the company in any capacity?

No, he left before I was hired as a consultant. I had never get chance to talk to him in any form. I am the single person who ever touched the source code for past 10 years. I don't know why he left or where he went.

From my experience this means they had a strong union.

Perhaps he played too much Minecraft and tried building a Turing machine from the materials on the construction site?

C can't be that different from redstone right?

Like reading the journal you found in the abandoned house you just moved in and it belonged to the previous kid that used to live there. Sounds like a movie.

Oh yeah it might be commedy where two people argue about spaces vs tabs and there come our function of 20k lines in single block of code that never heard about using any of those... nor about moving repeated code to own function.

This sounds quite awesome, terrible, and he sounds like he was jumping into a very deep end. Quite sad that he seemingly didn't have someone to tutor/mentor him moving to this role. Given that it worked, and there was a learning curve as you described, over a decade, he seems to have had some big amount of determination to get things to work.

I recall a civil engineering suite of programs that had been converted from Basic into Fortran IV.

The basic was so old it only had two (yes two) character variables - the Fortran code made liberal uses of Arithmetic IF staements !!!!

An example of one is IF (S12 - 1.0) 13, 13, 12

>> there is not enough budget to actually rewrote whole system,

As pointed out elsewhere, this is definitely solving a real problem, the longevity of the app is the proof.

Instead of rewriting, can you replace it with newer idioms? A MVP/PoC for a newer way of solving a problem (AR may be here) that the software solves with some tangible gains, the latter is more important, can lead to approval of a mini budget for that MVP and who knows what that can lead to.

Well, maybe. Biggest problem is using database from 80s, which is used in a way that sometimes it acts like database, sometimes it is used by copying files (one file per table) around random directories with custom locking mechanism.

App consist of maybe 20 different codebases that generate around 30 executables, kind of randomly, fetching source files from different codebases as programmer find a fit + random "fixes" of system modules/component make it all very very hard to do much groundbreaking work.

That's fascinating and gave me a genuine laugh.

I think you won! I had similar experience with my first professional job, a cad cam that is solving building structures for antisysmic regulations. Every developed had his own lib, duplicated code with different bugs on each lib, no comments, every screen was build from copy paste code of other screen and no tests at all. Undercovered bugs was out there for many years without any way of knowing and special sleep functions producing intentionally slow code.

"lava flows" anti-pattern

What is he up to now?

I don't know. The company is based in rural Austria and so it took company quite some time to find another engineer that will took on this project (me). I have never met him or have any other contact with him.

Well, if you're working on a codebase for 10 years, then "no budget" is not really an excuse, sorry. As a responsible engineer, you should have either convinced management to spend some of your time on refactoring main parts, or cleaned it up yourself bit-by-bit every time you touch something. 10 man-years should be enough for a program that was created in 10 years by a single rookie dev.

Sure if this was my job, then I would do it but I am just contractor with set amount of hours devoted to the project. First few years was spend fighting fires as the company needed this very specific software to function, currently it is just about keeping eyes on having it run and occasional fix some report or update data pipelines.

I would say that rewrite would cost about 2 millions of euros. Which is really big price tag for company that use this system as a backoffice tool.

Some company have back offices that they've spent considerably more on. Airline companies for instance may have a tool that lets the person at the gate check who that person is, what their deal is, etc. And then GDPR happened and the bill to ensure that every rule is followed to the letter and suddenly 2M€ isn't that bad after all...

Years ago as an intern at Microsoft, I had code go into the Excel, PowerPoint, Word, Outlook, and shared Office code.

Excel is an incomprehensible maze of #defines and macros, PowerPoint is a Golden Temple of overly-object-oriented insanity, and Word is just so old and brittle you'd expect it to turn to dust by committing. There are "don't touch this!"-like messages left near the main loop _since roughly 1990_.

I had a bug in some IME code causing a popup to draw behind a window occasionally, and a Windows guru had to come in to figure it out by remote-debugging the windows draw code with no symbols.

I learned there that people can make enormous and powerful castles from, well, shit.

"Don't touch this" around the main loop can mean being able to make promises about responsiveness, reliability, etc.

Frequently there are critical code sections where it is much easier to tell people "don't touch it" rather than training people how to work on it safely.

When that is the case, would it not also be a really good place to explain why not, or provide a link to the place where such an explanation is provided?

It is important to point out that most computer systems are running non deterministic operating systems.

For example, code running in JVMs on top of non deterministic operating systems sometimes behaves in really odd ways. Sometimes a main loop is stable for reasons nobody understands.

In reality there often isn't time.

Getting it done > Getting it done properly as far a management is concerned.

Ever had this discussion with a coworker?

Coworker: "I hadn't enough time to do it right"

You: "Given enough time, how would you do it differently?"

Coworker: "............" (crickets)

IMHO it's not related to deadlines only ; the "not enough time" argument is often a comfortable fallacy, keeping us from facing the limits of our current skills.

I found it to be especially true with testing. I've lost the count of how many times I heard "we didn't had time to write (more) tests". But testing is hard. And when given enough time, these developers don't magically start doing it "right" overnight.

Knowing when something isn't right is easier to knowing how to do it right. So I would be wary as to saying it is a lack of skill by your co-workers.


I had to write a bespoke popup window launcher for a large gambling company in the UK. The games were mainly the awful slots games that you see in motorway service stations. These are basically one arm bandits on steroids.

There is a lot of logic that was in JavaScript that should have been in C# and I had to design it correctly to work with a third party Proprietary CMS system and I had to manage session tokens on 3 to 4 third party systems. Not easy.

It took me about 2 weeks of just reading the code and absorbing it, drawing lots of diagrams of how data flowed through the system and then porting that logic over to C# in a way that would work with the CMS system in a logical and OOP fashion and handling auth tokens effectively.

I've quickly realized that there is never enough time.

If anyone feels that there is enough time then (depending on the position dev/manager/client/etc) he starts slacking, moving focus, moving deadlines, moving staff, demanding more features/support/documentation or new requirements analysis, start being pissed more about smaller bugs.

>I found it to be especially true with testing. I've lost the count of how many times I heard "we didn't had time to write (more) tests". But testing is hard. And when given enough time, these developers don't magically start doing it "right" overnight.

Bingo. It's not necessarily only skills though. It can be myriad reasons and "no time" is just the easiest excuse they can think of. In big companies I've often seen the company process prescribing such bad tools that a good TDD testing strategy is impossible to do with those tools, but they won't move away from them, because somebody in purchasing already bought 10,000 licenses for this bad tool (which is often just a bad GUI, which doesn't really help, except for selling the thing).

The worst tool was a bad GUI where you couldn't even define functions you want to use, that had slow (>1h), non-deterministic, test execution, for a unit test.

To even write unit tests effectively you need to write your code in a certain differently.

In C# this normally means using IOC + DI.

Also almost nobody I know does proper TDD. I know it is very convincing when one of the TDD evangelists shows you how to write something like a method to work out the nth number in a fibonacci sequence using nothing but just writing tests.

In reality most 95% of developers that even write tests write the logic and write the test afterwards to check the logic does what it should.

>To even write unit tests effectively you need to write your code in a certain differently.

>In C# this normally means using IOC + DI.

I've become quite partial to functional programming in the last few years. Side effect free functions with functions as interfaces for DI lend themselves perfectly to TDD and data parallel async without worrying too much.

C# is now slowly taking over most of the good features from F#, but I think the culture won't transform so easily.

and let’s not forget the ever popular “bug-driven testing”.

Unlike your coworker, I _always_ have a plan. Often a dozen of them. With various pros and cons for each.

But also unlike your coworker, I probably _figured out how to do it right_ in the time given. It's pretty rare that I don't have time to do it right; it does happen (especially with extreme instances of scope creep and requirements drift), but it's rare.

Which I guess is your point? The time excuse is just an excuse, and a good developer writes good code.

The one that gets me is when you've designed something as simple as possible and then to make it "simpler" people insist on making it less general and paradoxically more complex in a small way.

Related to that is the "obvious performance fix" that doesn't perform faster that keeps burning up time for years long after it was proven to not be faster because freshers never found out about it and the oldsters forgot.

This is a complete fallacy IMO. The time is 10 fold down the line when people are attempting to reverse engineer in order to maintain it.

Presumably, down the road, you'll either be a defunct company or doing well enough to afford 10 times the manpower to fix things.

Facebook was a spaghetti code mess in the beginning. I'm sure it caused them some growing pains, but moving too slowly early on would have likely been more costly.

> Presumably, down the road, you'll either be a defunct company or doing well enough to afford 10 times the manpower to fix things.

Only in startup land which is still a small fraction of our industry.

Most places will never have 10 times the manpower to fix things and are hurting themselves by not doing them properly in the first place.

> Facebook was a spaghetti code mess in the beginning. I'm sure it caused them some growing pains, but moving too slowly early on would have likely been more costly.

Survivor-ship bias, for every facebook how many potentially viable companies never got off the ground because users couldn't tolerate using their steaming pile?

Notably, MySpace is often cited as a company that failed because their codebase was terrible, which prevented them from adding features as quickly as Facebook could (despite having many more engineers at the time.)

I don't believe that for one second considering the hacks that Facebook has had to do around the limitations of PHP.

I'm completely opposed to the view that bad craftsmanship is acceptable because of time constraints. You are paying for it very dearly, very soon. It is of the utmost importance to write the best code you can from the beginning, and I don't believe it slows you down very much, if at all.

If you've ever seen a software product where something that should take a weekend takes months to get out, it's often not because the problem is more complicated than you'd think, but because of a mangled, complex codebase which prevents anyone from getting real work done.

Edit: Removed a bunch of redundancy.

Both you and this comment's parent are correct. Which doesn't say anything about the nature of software engineering or project management, but more the importance of taking context into account when considering advice.

In this case, the evidence is that excel/word etc are doing fairly well...


The long as short of it as a contractor I have to get it done. It will be probably me making the changes later and I make sure I put these things called comments in.

Also developers pretending code quality is an either or proposition is a false dichotomy. You can write 80% of it in a correct manner and the other 20% could be just hacks to get it done in time. You can't write the perfect system.

So I am sorry you are the one being fallacious.

Try telling that to a pointy-haired boss.

To be fair, if you are implementing a feature or fix in Word and think you need to edit the main loop - 99% chance you are wrong and the fix would be better placed elsewhere. And 99% chance that an edit will cause regressions or changes in behaviour elsewhere.

Yup ! Code is the how.... Comments are the Why

1990 you say.... maybe the programmer was just an MC Hammer fan.

Waited a bit for this question to pop up, but it didn't, to my surprise. So:

> ...remote-debugging the windows draw code with no symbols

Why, specifically, were no symbols available? I can't come up with an explanation. Surely old symbols are kept. Do checked builds take longer to iterate on (ie build), or something?

That's a great question! I didn't mean to imply that we absolutely couldn't have used symbols -- the answer is just because he was able to figure it out without them and it was less effort to try without first.

Office and Windows are different teams and units, so one dev on one team typically wouldn't have access to all of the symbol info for the codebase of the other. Setting that up takes some hoop-jumping, so he tried without and ended up figuring things out just fine over a few hours.

What I wanted to demonstrate was that in that moment all he had available was shit, and he still managed to push the castle higher.

Wow, I see.

I must admit, I do very much wonder what kind of environment Microsoft would be if teams were less segregated. I found http://blog.zorinaq.com/i-contribute-to-the-windows-kernel-w... in the comments, which seems to hint at the same sort of theme somewhat - particularly the bit about contributing to teams other than your own. There's a strong notion of isolation.

This is just thinking out loud, a response is not required. Everywhere has pros and cons. I'm (even with all this moping) actually less hesitant about MS as a whole than the rest of FAANG (except for N, which I also don't see a problem with) - not because of the whole "new MS" thing, or GH, but because everyone else seems to have fewer scruples than I consider to be a viable baseline. So there's that. :)

It's just kind of sad to see these kinds of inefficiencies, and it would be cool to eliminate them. Of course, it'd unleash organizational chaos for a while, but of course it would be totally worth it.

Maybe in the year 2050 some future intern/employee will be adding to the Office code and wonder about the "Don't touch this" code relics of the past that someone from long ago left for future generations.

I bet in 2050 people will puzzle over the microservice-cloud-javascript-Go-caching legacy systems that were developed in 2018 and be scared...

I mean, in 2018, I'm scared witless of all that. Anything cached in background to someone else's computer by something as unstable and slow as ECMAScript is just... a very bad idea. Keep programs local, and don't use Web coding / scripting for anything but the Web itself through a free-standing browser.

Can you guess what happened here:

https://news.ycombinator.com/item?id=15745250 ?

It appeared that a bug in an office component was fixed with manually binary editing. Is that probable?

and this tradition proudly continues with Windows 10!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact