It is close to 25 million lines of C code.
What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.
Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.
Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.
The only reason why this product is still surviving and still works is due to literally millions of tests!
Here is how the life of an Oracle Database developer is:
- Start working on a new bug.
- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.
- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.
- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.
- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.
- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.
- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.
- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.
- Finally one fine day you would succeed with 0 tests failing.
- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.
- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.
- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.
The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).
The fact that this product even works is nothing short of a miracle!
I don't work for Oracle anymore. Will never work for Oracle again!
I can't imagine a C/C++ application where the test suite takes 20-30 hours on a test farm with 100-200 servers. And if you can break 100-1000 tests with a single change, it doesn't like things are very modular and isolated.
And 30 hours between test runs! I would definitely not take that job. That sounds like hell.
The best way to guess this is to extrapolate from the interview questions. If they ask you a lot of low-level debugging/macro/etc questions..
Further rounds of interviews covered data structure problems (trees, hashtables, etc.), design problems, scalability problems, etc. It was just like any other interview for software engineering role.
Oh ya, the markers in here are pretty run down, let me pray to the old ones for some more"
Forgotten dreams like snowflakes melt on hot dusty ground, soon to turn into hard dry mud beneath a bitter polluted sky.
When I was interviewing for an SRE position with Google in Dublin, I had about 10min to ask questions in each of the 5 interviews that were conducted on-site.
In between the interviews, a sixth SRE would take me to lunch for about an hour. Anything discussed with him wouldn't be evaluated as part of the interview.
So there was plenty of time for questions, I would say.
Wouldn't you just ask the developers interviewing you outright, "can you walk me through an example of your day? How long does it take you to push out code? What's testing like? Do you run tests locally, or use something like Jenkins?" etc.
Which is when I got back on HN regularly :-)
(PS I did tell the internal person that there was no way that reading HN was related either to a BFOQ or other job requirement; and thus while it's not illegal, it'd be highly suspicious.)
What the fuck? Am I a spoiled tech-bro, or does that sound completely insane to anyone else? I would 100% not take a job if I didn't get a chance to talk to my coworkers and future manager during the interview process.
Seems like a trap set up for fresh out of college hires. I don’t know any senior developers who would even consider a job under those circumstances.
Q: Are you busy?
A: Yes, in the middle of running tests...
If I submit my test jobs today to the farm, the results would come one or two days later, so I work on another bug tomorrow, and submit that. Day after tomorrow, I return to the first bug, and so on.
Branch from master, and rerun tests before the final merge, like you should in any other software?
(Many processes fail that criterion, see https://bors.tech/ for something that gets this right).
Ideally you work on a different enough bug that there's limited interaction, and ideally that's figured out before you fix it, but those criteria are indeed harder to satisfy in a bigger software.
(ASML makes machines that make chips. They got something like 90% of the market. Intel, Samsung, TSMC etc are their customers)
ASML has 1 machine available for testing, maybe 2. These are machines that are about to be shipped, but not totally done being assembled yet, but done enough to run software tests on. This is where changes to their 20 million lines of C code can be tested on. Maybe tonight, you get 15 minutes for your team's work. Then again tomorrow, if you're lucky. Oh but not before the build is done, which takes 8 hours.
Otherwise pretty much the same story as Oracle.
Ah no wait. At ASML, when you want to fix a bug, you first describe the bugfix in a Word document. This goes to various risk assessment managers. They assess whether fixing the bug might generate a regression elsewhere. There's no tests, remember, so they do educated guesses whether the bugfix is too risky or not. If they think not, then you get a go to manually apply the fix in 6+ product families. Without automated tests.
(this is a market leader through sheer technological competence, not through good salespeople like oracle. nobody in the world can make machines that can do what ASML's machines can do. they're also among the hottest tech companies on the dutch stock market. and their software engineering situation is a 1980's horror story times 10. it's quite depressing, really)
And yet, even SQLite has an awful amount of test cases.
It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.
The single hardest thing about programming, I'd say.
This, and stuff like your story, are why I don't trust people who promote test-driven development as the best way to write clean APIs.
There should be integration tests along with some property based tests and fuzzy tests. Usually catches a lot of things.Invest in monitoring and alerts too.
TDD is like relying on debugger to solve your problem. Is debugger a good tool? yes,it is a great tool. But using it as an excuse to avoid understanding what happens under the hood is plain wrong.
The problem lies in industry where software engineering is not given any value but whoteboarding and solving puzzles is.
Software engineering is a craft honed over years of making mistakes and learning from them. You want code asap, kick experience engineers get codemonkeys in and get a MVP.
Quality is not clever algorithm, but clear conscise logic. Code should follow the logic, not the other way around.
Clear > clever.
I remember when i realized that TDD shouldn't have such weight in our development as it had gotten (when it was high on the hype curve).
It was when we starting using a messaging infrastructure that made everything much more reliable and robust, and trough which we could start trusting the infrastructure much more (not 100% though, of course).
It made me realize that the reason why we did this excessively large amount of tests (1800+) was because the fragile nature of a request/response-based system and we therefore "had to make sure everything worked".
What I'm trying to get at here is thar TDD assumed the role of a large safety net to a problem we should have addressed in a different manner. After introducing the messaging, we could replay messages that had failed. After this huge turning point tests were only used for what they should have only been used for - ensuring predictable change in core functionality.
(our code also became easier to understand and more modular, but that's for another time...)
And I agree, that there are lots of anti-patterns that have grown in tandem with TDD, like excessive mocking with dependency injection frameworks or testing renamed identity functions over and over just to get more coverage. However, I'd argue that is equally the fault of object-oriented programming though.
Where I disagree is this: TDD and unit tests are still a very useful tool. Their big advantage is that you can isolate issues more quickly and precisely, IF you use them correctly.
For instance, if I have some kind of algorithm in a backend service operating on a data structure that has a bug, I do not want to spend time on the UI layer, network communication or database interactions to figure out, what is going on. Testing at the right scope you get exactly that.
Some changes are simply not testable, period.
No, you cannot always write a test which initially fails, and then passes when the change is made, and when this is the case. You should understand why that is, and not try.
In some cases when you can, yet still should not. If a whole module is rewritten such that the new version satisfies all of the public contracts with the rest of the code, then only those contracts need to be retested; we don't need new tests targeting internals.
It's because the old version wasn't targeted by such tests
in the first place that it can be rewritten without upheaval.
- reproduce bug and verify your bugfix in matter of ms with proper unit test
- understand what code does
- change and refactor code whenever you want
You can tell from what is written that they are not following TDD.
Redesign that codebase in an easy and clean to test design would require an exponential effort and time compared to have it done step by step, but it would be worth it
Plus the tests never break on their own because they're modular, and each time you run a test that was obviously going to pass, you've wasted your time.
As long as you have code coverage, better to have lots of asserts and real-world integration tests.
If you unit test properly you are unit testing the business logic, that you have to properly divide and write in a modular fashion. If you want to test a more complex scenario, just add initial conditions or behaviors. If you can't do that or don't know how to do that, then you don't know what your code is doing or your code is bad designed. And that may be the case we read above.
Tests rarely break because they help you not breaking the code and functionalities, and they are so fast and efficient on making you realizing that that you don't feel the pain of it.
I can't imagine any example where "easy to unit test" != simple
Functional tests now, that's another matter. But a lot of TDD dogmatism is centered on unit tests specifically. And that results in a lot of code being written that doesn't actually contribute to the product, and that is there solely that you can chop up the product into tiny bits and unit test them separately. Then on the test side you have tons of mocks etc. I've seen several codebases where test code far exceeded the actual product code in complexity - and that's not a healthy state of affairs.
So let's look at a simplified example.
My tests are in the test folder. They are actually superfluous since integration tests test for the same thing.
I cannot break up the program in a way that would unit test a smaller piece of it in more detail. They only tests I can add would be to test the command line driver
This is especially if your "integration tests" are testing the same component, and not actually integrating with numerous other components being developed by different teams - or, if the system is so small it can run on a single workstation.
Working in teams on larger systems, the situation is different. Part of the point of unit tests is the "shift left" which allows problems to be discovered early, ideally before code leaves a developer's machine. It reduces the time until bugs are discovered significantly, and reduces the impact of one dev's bugs on other devs on the team.
Programming is a craft. Good programmers write good code. Bad programmers write bad code. No methodology will make bad programmers write good code, but bureaucratic bullshit can and will prevent good programmers from working at their best. The only way to improve the output of a bad programmer is to mentor them and let them gain experience.
Of course, this does require that the process itself has been set up with care, thought, and understanding of what's being achieved.
I'm not keen on the "cult" of it, but if expectations of what the output should look like are available from the onset, it would appear to be of some benefit, at least.
Integration tests easily survive refactoring, on the other hand
The problem with integration tests is they are slow and grow exponentially. If they aren't growing exponentially then there's probably large chunks of untested code. Unit tests suffer their own problems, like you said they can be useless because of a reliance on mocking, they can also be brittle and break everywhere with small changes.
Ultimately any good suite of tests needs some of both. Unit tests to avoid exponential branching of your integration tests, and integration tests to catch errors related to how your units of code interact. I've experienced plenty of bad test suites, many of them are because of poorly written unit tests, but its often the poorly written integration tests that cause problems as well. As with most things, its all about a healthy balance.
Then there are the "write once, never fail ever" tests. Okay, so the test made sense when I wrote the code. I will never touch that part ever again because it works perfectly. Why do I keep running them every time?
I personally run my unit tests every time to confirm my assumptions that the unit of code under test hasn't changed. I also assume all code I write will inevitably be changed in the future because business requirements change and there's always room for improvement. Actually can't think of a single piece of code I've written (apart from code I've thrown out) that didn't eventually need to be rewritten. The benefit of running unit tests is less than the benefit of running integration tests, but the cost of running them is also significantly less. Current project I'm working on has 10x as many unit tests as integration tests and they run 100x faster.
My workflow is usually run my unit tests for the code I'm working on constantly, and when I think things are working run the entire test suite to verify everything works well together. Thats my workflow whether or not I'm doing TDD.
Like, are the two points neighbors? I mean, I'm not going to write a version of this function for a spherical board in the future. Nobody plays on a spherical board.
It's also a really boring unit test. Yes, (1,1) and (1,2) are neighbors. Do I really need to test this function until the end of time?
I totally agree, but I met several programmers who think the opposite.
I've experienced myself how the code quality of proper TDD code can be amazing. However it needs someone to still actually care about what they're doing. So it doesn't help with idiots.
The response to every analogy ever made on the internet. Can we stop using them yet?
But at least he wasn't using analogies, huh?
>Dilbert creator Scott Adams came to our attention last month for the first time since the mid to late '90s when a blog post surfaced where he said, among other things, that women are "treated differently by society for exactly the same reason that children and the mentally handicapped are treated differently. It's just easier this way for everyone."
>Now, he's managed to provoke yet another internet maelstorm of derision by popping up on message boards to harangue his critics and defend himself. That's not news in and of itself, but what really makes it special is how he's doing it: by leaving comments on Metafilter and Reddit under the pseudonym PlannedChaos where he speaks about himself in the third person and attacks his critics while pretending that he is not Scott Adams, but rather just a big, big fan of the cartoonist.
>Dilbert's creator Scott Adams Compares Women Asking for Equal Pay to Children Demanding Candy
Hmm, that sounds an awful lot like another analogy to me, actually... Oops!
So maybe Scott Adams isn't the most un-hypocritical guy to quote about the problems of analogies.
The other commenters made me think of the kids game Operation. https://en.wikipedia.org/wiki/Operation_(game)
How about shock collar programming? Or electric fence programming? Or block stacking (Jenga) programming.
test passes boss!
There is a clear difference between code developed AND maintained in the US vs. code that was developed in India, or code developed in USA and given to Indian developers to manage support. Nothing against Indians, but Ive been around the block and there seems to be a lesser quality of code from that part of the world and companies justifyvit in cost savings.
The actual damage was done much before I had joined Oracle. It appears that somewhere in the early 2000s, the Oracle codebase went from manageable to sphagetti monster. The changelog showed more changes from US developers than Indian developers at that time. Once the damage was done, all developers whether from the US or India now need to follow this painful process to fix bugs and add features.
The category-leading product is probably from one of the earliest companies in the field, if not the first. They have the oldest and cruftiest code - and the manpower to somehow keep it working. It is definitely not the fastest and definitely not the most stable. But they do have the resources to make sure it supports all the third party integrations and features important for the big customers.
I have encountered exactly this same situation on several different fields and categories.
At at time when I was a complete open source fanatic in the early 2000s it suddenly made me realize how Microsoft actually had much better quality software than most big proprietary software vendors.
I worked for a mini-computer company in the 1980's that ported Oracle (I'm thinking the version stamp was 22.1 when I was there from 1986-1990). It was one GIANT mess of standard "C" with makefiles that were in some ways larger and more complex than some of the actual kernel code it was building!
Took 24 hours to build the product... lol
Lesson learned. Always write tests. Your business will depend on it.
But on the other hand, over-reliance on tests is one of the reasons they ended up in this situation in the first place. It's like the car safety engineer's joke - How do you make cars safer? Install a knife in the middle of the steering wheel pointed at the driver.
When we're too sure that some safety feature will save us, we forget to be careful.
The notions of tests written in C that take 20 hours I can't even fathom.
At that point, for DB testing, I doubt it matters what language test are written in, it's going to be mostly about setting up and tearing down the environment over and over.
The field of software engineering has matured a lot since then.
There are quite a few very old projects that don't have the same level of cruft as Oracle; it epitomises a Sales Division driven culture.
How many of those switches (that now need to be supported and tested) are because some functionality was promised to a large contract, and so it just had to be done? I would wager a good number.
Reading a few sentences, looking out at Lanai. Reading a few more sentences, and looking back at Lanai...
Yes, they were not unit tests. There was no culture of unit tests in the Oracle Database development team. A few people called it "unit tests" but they either said it loosely or they were mistaken.
Unit test would not have been effective because every area of the code was deeply entangled with everything else. They did have the concept of layered code (like a virtual operting system layer at the bottom, a memory management layer on top of that, a querying engine on top of that, and so on) but over the years, people violated layers and wrote code that called an upper layer from lower layer leading to a big spaghetti mess. A change in one module could cause a very unrelated module to fail in mysterious ways.
Every test was almost always an integration test. Every test case restarted the database, connected to the database, created tables in it, inserted test data into it, ran queries, and compared the results to ensure that the observed results match the expected results. They tried to exercise every function and every branch condition in this manner with different test cases. The code coverage was remarkable though. Some areas of the code had more than 95% test coverage while some other areas had 80% or so coverage. But the work was not fun.
FWIW, I know people who work on SQL processing (big data Hive/Spark, not RDMBS), and a recurrent issue is that an optimisation which benefits most people turns out to be pathologically bad for some queries for some users. Usually those with multiple tables with 8192 columns and some join which takes 4h at the best of times, now takes 6h and so the overnight reports aren't ready in time. And locks held in the process are now blocking some other app which really matters to the businesses existence.
These are trouble because they still "work" in the pure 'same outputs as before', it's just the side effects can be be so disruptive.
I think it's important to point this out, because one of the biggest mistakes I'm seeing developers do these days is relying too much on unit tests (especially on "behavior" tests using mocks) and not trying to catch problems at a higher level using higher level tests. Then the code gets deployed and - surprise surprise - all kinds of unforeseen errors come out.
Unit tests are useful, but they are, by definition, very limited in scope.
"... is an indication that the software design isn't (very) decoupled ".
You can be modular without being properly decoupled from the other modules.
Integration tests test those things you're going to get paid for... features & use-cases.
Having a huge library of unit tests freezes your design and hampers your ability to modify in the future.
In my day to day programming, when I neglect writing tests, the regret is always about those that are on the side of integration testing. I'm okay with not testing individual functions or individual modules even. But the stuff that 'integrates' various modules/concerns is almost always worth writing tests for.
In my experience it's far easier to introduce testing by focusing on unit testing complicated, stateless business logic. The setup is less complex, the feedback cycle is quick, and the value is apparent ("oh gosh, now I understand all these edge cases and can change this complicated code with more confidence"). I think it also leads to better code at the class/module/function level.
In my experience once a test (of any kind) saves a developer from a regression, they become far more amenable to writing more tests.
That said I think starting with integration tests might be a good area of growth for me.
i.e. Test business logic edge-cases, don't test a linked-list implementation... that's just locking your design in.
I post this all the time, it's like free upvotes :)
We ended up emulating the hardware, run all the software on the emulated hardware, and deploy integration tests to a thousand nodes on AWS for a few minutes it takes to test each combination. Tests finish quickly and it has been a while since we shipped something with a bug in it.
But there’s a catch: we have to unit test the test infrastructure against real hardware - I believe it’d be called test validation. Thus all the individual emulators and cosimulation setups have to have equivalent physical test benches, automated so that no human interaction is needed to compare emulated output to real output. In more than a few cases, we need cycle-accurate behavior.
The test harness unit (validation ) test has to, for example, spin up a logic analyzer and a few MSO oscilloscopes, reinitialize the test bench – e.g. load the 3rd party firmware we test against, then get it all synchronized and run the validation scenarios.
Oh, and the firmware of the instrumentation is also a part of the setup: we found bugs in T&M equipment firmware/software that would break our tests. We regression test that stuff, too.
All in all, a full test suite, run sequentially, takes about 40,000 hours, and that’s when being very careful about orthogonalizing the tests so that there’s a good balance between integration aspects and unit aspects.
I totally dig why Oracle has to do something like this, but on the other hand, we have a code base that’s not very brittle, but the integration aspects make it mostly impossible to reason about what could possibly break - so we either test, or get the support people overwhelmed with angry calls. Had we had brittle code on top of it, we’d have been doomed.
Enterprise software sales cycles are slow, but once they start turning, it's hard to turn them back.
Contrast PostgreSQL or... uhh... virtually any other database. Oracle's mess is clearly a result of bad management, not a reflection of the intrinsic difficulty of the problem domain.
The executable is ~ 500kb.
Enterprise software is gross.
So... is this good or bad? A Hello World in Go is on the order of 2 MB. That doesn't say anything about code bloat, it just says that Go prefers static over dynamic linking.
This is also a factor for independent developers (who build airline reservation systems) who need to choose a RDBMS for their product - they'll choose oracle, because ... Oracle can provide support guarantees in a way a Postgres contractor can not.
Which makes Oracle not a different class of product than Postgres, but a different class of support for the product. (which could be considered part of the product, so ... maybe you're right)
Size of software reflects the number of people working on it (and for how long), not essential complexity.
What more could we want?
I have seen smaller versions of what the OP describes. My plan was that every new piece of code that was checked in had to meet certain guidelines for clarity--like can the dev sitting next to you understand it with little to no introduction--and particularly knotty pieces of existing code deserved a bug report just to rewrite and untangle the knots.
In the end, whatever your plan, I think what you need is a cultural change, and cultures are notoriously difficult to change. Any cultural change is going to have to start high up in the organization with the realization that the codebase is an unsustainable POS.
Not sure if Oracles is already following this or not. But this is necessary for scalable projects.
The test cases are written in a domain specific language named OraTst which is developed and maintained only within Oracle. OraTst is not available outside Oracle. The OraTst DSL looks like a mixture of special syntax to restart database, compare results of queries, change data configuration, etc. and embedded SQL queries to populate database and retrieve results.
I don't know how many more millions of lines of code the tests add. Assuming every test takes about 25 lines of code on an average (every test used to consume about half of my screen to a full screen), we can estimate that the tests themselves consume close to another additional 25 million lines of code to 50 million lines of code.