What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.
Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.
Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.
The only reason why this product is still surviving and still works is due to literally millions of tests!
Here is how the life of an Oracle Database developer is:
- Start working on a new bug.
- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.
- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.
- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.
- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.
- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.
- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.
- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.
- Finally one fine day you would succeed with 0 tests failing.
- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.
- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.
- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.
The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).
The fact that this product even works is nothing short of a miracle!
I don't work for Oracle anymore. Will never work for Oracle again!
Sounds like ASML, except that Oracle has automated tests.
(ASML makes machines that make chips. They got something like 90% of the market. Intel, Samsung, TSMC etc are their customers)
ASML has 1 machine available for testing, maybe 2. These are machines that are about to be shipped, but not totally done being assembled yet, but done enough to run software tests on. This is where changes to their 20 million lines of C code can be tested on. Maybe tonight, you get 15 minutes for your team's work. Then again tomorrow, if you're lucky. Oh but not before the build is done, which takes 8 hours.
Otherwise pretty much the same story as Oracle.
Ah no wait. At ASML, when you want to fix a bug, you first describe the bugfix in a Word document. This goes to various risk assessment managers. They assess whether fixing the bug might generate a regression elsewhere. There's no tests, remember, so they do educated guesses whether the bugfix is too risky or not. If they think not, then you get a go to manually apply the fix in 6+ product families. Without automated tests.
(this is a market leader through sheer technological competence, not through good salespeople like oracle. nobody in the world can make machines that can do what ASML's machines can do. they're also among the hottest tech companies on the dutch stock market. and their software engineering situation is a 1980's horror story times 10. it's quite depressing, really)
That is absolutely insane. I can't even begin to imagine the complexity of that codebase. I thought my Rails test suite was slow because it takes 4 minutes. If I wrote it in C or C++ it would probably be 10 seconds.
I can't imagine a C/C++ application where the test suite takes 20-30 hours on a test farm with 100-200 servers. And if you can break 100-1000 tests with a single change, it doesn't like things are very modular and isolated.
And 30 hours between test runs! I would definitely not take that job. That sounds like hell.
It's a good exercise to imagine how the job would be sold. Things like this would definitely not come up in the interview process, instead they would sell you on "you get to work on a cutting-edge db kernel that is running most of the Fortune 100s" or sth like that, which is true (!), but doesn't describe the day to day.
The best way to guess this is to extrapolate from the interview questions. If they ask you a lot of low-level debugging/macro/etc questions..
> The best way to guess this is to extrapolate from the interview questions.
Wouldn't you just ask the developers interviewing you outright, "can you walk me through an example of your day? How long does it take you to push out code? What's testing like? Do you run tests locally, or use something like Jenkins?" etc.
Most new hires are probably not being interviewed by devs, but either by 3rd-party recruiters or internal recruiters with HR. When I was working in recruiting, the last thing either we or the client wanted was for the new hire to talk to either the person who they were replacing or any of the potential coworkers. Heck, one internal recruiter I had to interface with at a company I choose not to disclose said to me, "can we ask if they read Hacker News? There's some bad vibes about us there."
Which is when I got back on HN regularly :-)
(PS I did tell the internal person that there was no way that reading HN was related either to a BFOQ or other job requirement; and thus while it's not illegal, it'd be highly suspicious.)
> When I was working in recruiting, the last thing either we or the client wanted was for the new hire to talk to either the person who they were replacing or any of the potential coworkers.
What the fuck? Am I a spoiled tech-bro, or does that sound completely insane to anyone else? I would 100% not take a job if I didn't get a chance to talk to my coworkers and future manager during the interview process.
Perhaps you are spoiled (as am I in that regard) but i would absolutely never take a job unless I knew who I was going to be working with and had a chance to ask them honest questions.
Seems like a trap set up for fresh out of college hires. I don’t know any senior developers who would even consider a job under those circumstances.
On the contrary, the interview was an ordinary one. The screening round consisted of very basic fizzbuzz type coding ability checks: Reversing a linked list, finding duplicates in a list, etc.
Further rounds of interviews covered data structure problems (trees, hashtables, etc.), design problems, scalability problems, etc. It was just like any other interview for software engineering role.
"I'd like you to write a graph algorithm that traverses the abyss, the cosmic horror that consumes one's mind, that traverses twilight to the rim of morning, that sees the depths of man's fundamental inability to comprehend.
Oh ya, the markers in here are pretty run down, let me pray to the old ones for some more"
I have written the algorithm you requested - but I wish I hadn’t run it. I hit ctrl-c when I realized what it was doing but it was too late... The damage is done — we are left with only the consequences and fallout.
Forgotten dreams like snowflakes melt on hot dusty ground, soon to turn into hard dry mud beneath a bitter polluted sky.
Were you even given substantial time to ask the interviewers questions? In most interviews I’ve done, even later round interviews whether it’s a finance company, start-up, FAANG, and companies of all sorts in between, I was given at most 5 minutes to ask questions after some dumb shit whiteboard algo trivia.
I was given 5 minutes to ask questions after each round of interview. That part was ordinary too. That's what most of the other companies do (FAANG or otherwise).
That's kind of naive, of course you want young people who will work hard and maybe not know what they are getting in to. I was offered a job at oracle back in the day, I would have felt a lot of despair if this is what it was.
I am not sure what position you were interviewing for and to what level of interview you made it.
When I was interviewing for an SRE position with Google in Dublin, I had about 10min to ask questions in each of the 5 interviews that were conducted on-site.
In between the interviews, a sixth SRE would take me to lunch for about an hour. Anything discussed with him wouldn't be evaluated as part of the interview.
So there was plenty of time for questions, I would say.
That would have been fun but in reality there was no downtime. Developers like me were expected to work on two to three bugs/features at a time and context switch between them.
If I submit my test jobs today to the farm, the results would come one or two days later, so I work on another bug tomorrow, and submit that. Day after tomorrow, I return to the first bug, and so on.
How would you know that merging code from the first bugfix wouldn't break the (just tested) code from the second bugfix?? Would you assume that the first bugfix will be merged first and branch off of that?
Without knowing Oracle's approach, this sort of problem is no different from any other software, even tho it reaches a larger scale.
Branch from master, and rerun tests before the final merge, like you should in any other software?
(Many processes fail that criterion, see https://bors.tech/ for something that gets this right).
Ideally you work on a different enough bug that there's limited interaction, and ideally that's figured out before you fix it, but those criteria are indeed harder to satisfy in a bigger software.
But if the time needed to test and deploy a change is so ludicrous, it seems like you'd rarely get a big-enough window to rerun your tests before the upstream changes again. Either people are merging in unstable code, or the release lifecycle is a slow byzantine nightmare too (probably the case here).
Usually you don't test a single change before merging, but branch from master, merge any number of changes and then run the tests. So the master branch would move forward every 20-30 hours in this case, unless the tests of a merge batch fail, in which case master would kinda stall for a bit.
Tests in C/C++ run shockingly fast. I ported an application from Ruby to C++ and the tests ran in well under a second when it was taking 10+ seconds in Ruby. Granted because of C++'s type system there were fewer tests, but it was fast enough that I kept thinking something was wrong.
Are you including the time to build/link the tests? This is especially true if you have a bunch of dependencies. Last time I worked on C++ tests most of my time was spent on getting the tests to link quickly. Managed to get it from 25 minutes to 1 minute. But I'd rather have spent that time actually writing more test cases, even if they took 10s to run.
Started a new job a few months ago and we’re writing Go - a bunch of the test suites I’ve built run in microseconds. Statically typed compiled languages ftw.
You've violated the terms of service of Oracle Database by insinuating the codebase quality is in any way not superior to any and all competitors. No benchmarks or comparisons may be performed on the Oracle Database Product under threat of grave bodily harm at the discretion of our very depraved CEO.
I doubt the competition (e.g. IBM or Microsoft) has any better code quality. Even PostgreSQL is 1.3M lines of code, so let's get something deliberately written for simplicity. SQLite is just 130k SLoC, so another order of magnitude simpler.
And yet, even SQLite has an awful amount of test cases.
I'm sure some of the difference (25M vs. 1.3M) can be attributed to code for Oracle features missing in PostgreSQL. But a significant part of it is due to careful development process mercilessly eliminating duplicate and unnecessary code as part of the regular PostgreSQL development cycle.
It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.
> It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.
The single hardest thing about programming, I'd say.
PostgreSQL has a lot of code but most parts of the code base have pretty high quality code. The main exceptions are some contrib modules, but those are mostly isolated from the main code base.
It's because software LOC scales linerarly with the amount of man-months spent: a testament to the unique ability of our species to create beautiful, abstract designs that will stand the test of time.
This is an interesting comment, because I can't decide if you are sarcastic or making a deep insightful comment. Because I don't think the statement is true. LOC can go on forever, but it usually happens in things that aren't beautiful and abstract.
Worked on sql server for 10+ years. MS SQL Server is way better than that. The sybase sql server code we started with and then rewrote was as bad as oracle.
I guess that is just because SQL as a standard is not coherent nor something beatifully designed. SQL is mashup of vendor specific features all bashed togehter into one standard.
There's also a lot of essential complexity there. SQL provides, in essence, a generic interface for entering and analyzing data. Imagine the number of ways to structure and analyze data. Now square that number to get the number of two tests for how two basic features of the language interact with each other. And that's not even near full test coverage.
Your point about essential complexity is absolutely correct, but your faux mathematical analysis is totally not a legit way to analyze the complexity of something or determine test coverage. I feel like as programmers we should be comfortable making sensible statements without making up shady pseudo-math to sound convincing.
It's abundantly clear that I'm not making a precise computation here. My argument is that tests don't scale linearly with the number of features because interactions between features need to be tested as well.
I am a current Oracle employee and blame a lot of the mistakes on the overseas development team in india. They are (not all but enough to matter) terrible programmers, but hey when you can throw 10 Indian programmers at a problem for the cost of one American... You can blame your blated mismanaged code base on their management over there. This is likely do to the attrition and generally less talented and less autonomous engineering style.
There is a clear difference between code developed AND maintained in the US vs. code that was developed in India, or code developed in USA and given to Indian developers to manage support. Nothing against Indians, but Ive been around the block and there seems to be a lesser quality of code from that part of the world and companies justifyvit in cost savings.
I have not found this to be true at all. I have seen both US and Indian developers adding good code as well as ugly code to the Oracle Database product.
The actual damage was done much before I had joined Oracle. It appears that somewhere in the early 2000s, the Oracle codebase went from manageable to sphagetti monster. The changelog showed more changes from US developers than Indian developers at that time. Once the damage was done, all developers whether from the US or India now need to follow this painful process to fix bugs and add features.
A sentiment among members of a former team was that automated tests meant you didn't need to write understandable code - let the tests do the thinking for you.
This, and stuff like your story, are why I don't trust people who promote test-driven development as the best way to write clean APIs.
There should be integration tests along with some property based tests and fuzzy tests. Usually catches a lot of things.Invest in monitoring and alerts too.
TDD is like relying on debugger to solve your problem. Is debugger a good tool? yes,it is a great tool. But using it as an excuse to avoid understanding what happens under the hood is plain wrong.
The problem lies in industry where software engineering is not given any value but whoteboarding and solving puzzles is.
Software engineering is a craft honed over years of making mistakes and learning from them. You want code asap, kick experience engineers get codemonkeys in and get a MVP.
Quality is not clever algorithm, but clear conscise logic. Code should follow the logic, not the other way around.
And yet tests seem to have made this massive garbage heap actually work and enable a lump of spaghetti to continue to operate as a viable product. It doesn't mean you should write bad code, but it seems like if it can make even the most awful of code viable, then that's a pretty good system. The fact that modern medicine allows the most beat up and desperate to continue to live on isn't an indictment against medicine, it's a testament to it. Don't write bad code, sure. We can all agree to that. Don't prioritize testing? Why? To intentionally sabotage yourself so that you're forced to rewrite it from scratch or go out of business?
I’m sympathetic but this is too strong: what needs to die is dogma. TDD as a way of thinking about the API you’re writing is good but anything will become a problem if you see it as a holy cause rather than a tool which is good only to the extent that it delivers results.
I remember when i realized that TDD shouldn't have such weight in our development as it had gotten (when it was high on the hype curve).
It was when we starting using a messaging infrastructure that made everything much more reliable and robust, and trough which we could start trusting the infrastructure much more (not 100% though, of course).
It made me realize that the reason why we did this excessively large amount of tests (1800+) was because the fragile nature of a request/response-based system and we therefore "had to make sure everything worked".
What I'm trying to get at here is thar TDD assumed the role of a large safety net to a problem we should have addressed in a different manner. After introducing the messaging, we could replay messages that had failed. After this huge turning point tests were only used for what they should have only been used for - ensuring predictable change in core functionality.
(our code also became easier to understand and more modular, but that's for another time...)
What you allude to there is pretty bad TDD. It was never intended as a replacement for good design, rather as an aid to be clear about design and requirements without writing tons of specs up-front.
And I agree, that there are lots of anti-patterns that have grown in tandem with TDD, like excessive mocking with dependency injection frameworks or testing renamed identity functions over and over just to get more coverage. However, I'd argue that is equally the fault of object-oriented programming though.
Where I disagree is this: TDD and unit tests are still a very useful tool. Their big advantage is that you can isolate issues more quickly and precisely, IF you use them correctly.
For instance, if I have some kind of algorithm in a backend service operating on a data structure that has a bug, I do not want to spend time on the UI layer, network communication or database interactions to figure out, what is going on. Testing at the right scope you get exactly that.
The problem with TDD is that the methodology wants to cover every change, no matter how internal, with some sort of external test.
Some changes are simply not testable, period.
No, you cannot always write a test which initially fails, and then passes when the change is made, and when this is the case. You should understand why that is, and not try.
In some cases when you can, yet still should not. If a whole module is rewritten such that the new version satisfies all of the public contracts with the rest of the code, then only those contracts need to be retested; we don't need new tests targeting internals.
It's because the old version wasn't targeted by such tests
in the first place that it can be rewritten without upheaval.
I think TDD is the best way to develop (yet). Obviously tests are code, and if you write crappy highly-coupled tests you will end up with only much more messy code.
This is a clear example of bad testing. The greatest advantage of TDD is in design, everything should be modular and easy to unit test, so you could:
- reproduce bug and verify your bugfix in matter of ms with proper unit test
- understand what code does
- change and refactor code whenever you want
You can tell from what is written that they are not following TDD.
Redesign that codebase in an easy and clean to test design would require an exponential effort and time compared to have it done step by step, but it would be worth it
A unit test is the least useful kind of test. It requires your design to be "easy to unit test" instead of simple, and if you change something and have to rewrite the test you might miss some logic in both pieces.
Plus the tests never break on their own because they're modular, and each time you run a test that was obviously going to pass, you've wasted your time.
As long as you have code coverage, better to have lots of asserts and real-world integration tests.
Integration tests are much slower usually, and you are testing tons of things at the same time. Something breaks (like in that example) and you have no idea of what and why went wrong.
If you unit test properly you are unit testing the business logic, that you have to properly divide and write in a modular fashion. If you want to test a more complex scenario, just add initial conditions or behaviors. If you can't do that or don't know how to do that, then you don't know what your code is doing or your code is bad designed. And that may be the case we read above.
Tests rarely break because they help you not breaking the code and functionalities, and they are so fast and efficient on making you realizing that that you don't feel the pain of it.
I can't imagine any example where "easy to unit test" != simple
in my work with python, easy to unit tests usually makes things a bit harder. You want functional methods , not mega classes with 100s of class variables , and each class method operates on some portion of those class variables. It makes it impossible to truly isolate functionality and test it. While coding though, it is very easy to make a class and throw god knows what into the class variable space and access those variables whenever... However if we have staticmethods , not reliant on any class , just the arguments provided, and it doesnt modify any class state, the test are great. We can change/refactor our models with confidence knowing the results are all the same.
In my opinion, the only thing that is valuable about unit tests is more appropriately captured in form of function, class and module contracts (as in "design by contract"). Unfortunately very few languages are adopting DbC.
Functional tests now, that's another matter. But a lot of TDD dogmatism is centered on unit tests specifically. And that results in a lot of code being written that doesn't actually contribute to the product, and that is there solely that you can chop up the product into tiny bits and unit test them separately. Then on the test side you have tons of mocks etc. I've seen several codebases where test code far exceeded the actual product code in complexity - and that's not a healthy state of affairs.
In more recent times I've seen some growth in interest around contract testing. Unit tests are immensely more useful when paired with contract tests, but unfortunately without them they tend to be more of a hassle. At its essence integrations are a form of a contract, but those suffer their own problems. In rspec you have 'instance_double' which is a form of a contract test as well, but not really sufficient for proper testing IMO. The current state from what I've seen is a little lackluster, but I wouldn't be surprised to see a growth in contract testing libraries for a variety of languages popping up.
My tests are in the test folder. They are actually superfluous since integration tests test for the same thing.
I cannot break up the program in a way that would unit test a smaller piece of it in more detail. They only tests I can add would be to test the command line driver
For a single person and their one-person code base, you can certainly get away without unit tests.
This is especially if your "integration tests" are testing the same component, and not actually integrating with numerous other components being developed by different teams - or, if the system is so small it can run on a single workstation.
Working in teams on larger systems, the situation is different. Part of the point of unit tests is the "shift left" which allows problems to be discovered early, ideally before code leaves a developer's machine. It reduces the time until bugs are discovered significantly, and reduces the impact of one dev's bugs on other devs on the team.
TDD is yet another in a long line of "methodologies" that don't work. Tests are not a bad thing of course. The problem comes when you turn testing into an ideology and try to use it as a magic fix for all your problems. Same goes for "agile," etc.
Programming is a craft. Good programmers write good code. Bad programmers write bad code. No methodology will make bad programmers write good code, but bureaucratic bullshit can and will prevent good programmers from working at their best. The only way to improve the output of a bad programmer is to mentor them and let them gain experience.
The reality of working in teams at most companies is that there are going to be mediocre programmers, and even bad programmers, on the team. Many of the practices you refer to as bureaucratic bullshit are actually designed to protect programmers from the mistakes of other programmers.
Of course, this does require that the process itself has been set up with care, thought, and understanding of what's being achieved.
I'm probably not the best to speak on the topic as I don't use TDD (nor have I), but I think the idea is good if maybe a bit unorthodox: leveraging tests to define examples of inputs/outputs and setting "guards" to make sure the result of your code is as you expected via the tests.
I'm not keen on the "cult" of it, but if expectations of what the output should look like are available from the onset, it would appear to be of some benefit, at least.
I'm confused by your comment. Your premise is that TDD should die, and your support is comparing it to a "great tool". Should TDD really die, or should people just stop treating things as a silver bullet? I personally love TDD, it helps me reason about my interfaces and reduces some of the more cumbersome parts of development. I don't expect everyone to use TDD and I don't use it all the time. Similarly I'd never tell someone debuggers should die and they should never use a debugger if thats something that would help them do their job.
Unit tests are a side effect of TDD, they don't have to be the goal. I'd find value out of TDD even if I deleted all of my tests after. It sounds like your problems are around unit tests, and that is neither something required to TDD nor is it something limited to just TDD.
The problem with integration tests is they are slow and grow exponentially. If they aren't growing exponentially then there's probably large chunks of untested code. Unit tests suffer their own problems, like you said they can be useless because of a reliance on mocking, they can also be brittle and break everywhere with small changes.
Ultimately any good suite of tests needs some of both. Unit tests to avoid exponential branching of your integration tests, and integration tests to catch errors related to how your units of code interact. I've experienced plenty of bad test suites, many of them are because of poorly written unit tests, but its often the poorly written integration tests that cause problems as well. As with most things, its all about a healthy balance.
No, like in some programs when I figure out how to do it correctly the unit tests are either complete tautologies or integration tests.
Then there are the "write once, never fail ever" tests. Okay, so the test made sense when I wrote the code. I will never touch that part ever again because it works perfectly. Why do I keep running them every time?
If the unit tests are tautologies then they aren't testing the right things, and if they are integration tests then they aren't actually unit tests.
I personally run my unit tests every time to confirm my assumptions that the unit of code under test hasn't changed. I also assume all code I write will inevitably be changed in the future because business requirements change and there's always room for improvement. Actually can't think of a single piece of code I've written (apart from code I've thrown out) that didn't eventually need to be rewritten. The benefit of running unit tests is less than the benefit of running integration tests, but the cost of running them is also significantly less. Current project I'm working on has 10x as many unit tests as integration tests and they run 100x faster.
My workflow is usually run my unit tests for the code I'm working on constantly, and when I think things are working run the entire test suite to verify everything works well together. Thats my workflow whether or not I'm doing TDD.
The code that determines truths about the data never had to be rewritten.
Like, are the two points neighbors? I mean, I'm not going to write a version of this function for a spherical board in the future. Nobody plays on a spherical board.
It's also a really boring unit test. Yes, (1,1) and (1,2) are neighbors. Do I really need to test this function until the end of time?
Thats exactly the type of code that should be unit tested. The unit tests are trivially easy to write, and a very naive solution is easy to code up. The tests should take up a negligible overhead in your overall test suite runtime. Then when it comes time to optimize the code because its becoming a bottleneck you can be confident that the more obscure solution is correct.
TDD should only drive the public interface of your "module", if your testing your internals your doing it wrong. It will hinder refactoring rather than help.
TDD doesn't think for you, it merely validates your own existing understanding/mental model and forces you to come up with it upfront. This is hardly a thing to be mistrustful about, unless you work with idiots.
You are right about that, but having code that passes a given test suite doesn't say anything about its secondary qualities, such as whether it can be understood. In theory, a failing test could improve your understanding of the situation, allowing you to refactor your initial pass at a solution, but I would bet that on this particular code base, the comprehension-raising doesn't go far enough, in most cases, for this to be feasible.
That seems orthogonal to testing though. Implementation code can be hard to understand with or without a test suite, at least with test, as you point out, you may be able to understand the behaviour at some higher abstraction.
Rich Hickey called it guard rail driven programming. You'll never drive where you want to go if you just get on the highway and bump off the guard rails.
Except that's a really bad analogy. It's more like you set up guard rails, and every time your vehicle hits a guard rail you change the algorithm it uses for navigation until it can do a whole run without hitting a guard rail.
I've experienced myself how the code quality of proper TDD code can be amazing. However it needs someone to still actually care about what they're doing. So it doesn't help with idiots.
It is not be a good analogy for TDD as properly practiced, but it seems to be very fitting for the situation described at the top of this thread, and that is far from being a unique case.
I don't think it's a generous analogy, but it's poking fun at being test DRIVEN, rather than driver driven. I think he'd agree with you that it's the thinking and navigating and "actually caring about what they're doing" that matters. Tests are a tool to aid that. Tests don't sit in the driver's seat.
Yeah. To me "test driven" really means that I write code under the constraints that it has to make writing my tests sensible and easy. This turns out to improve design in a large number of cases. There are lots of other constraints you can use that tend to improve design as well (method size, parameter list size, number of object attributes, etc are other well known ones). But "test driven" is a nice catch phrase.
Spot on: "Analogies: Analogies are good tools for explaining a concept to someone for the first time. But because analogies are imperfect they are the worst way to persuade. All discussions that involve analogies devolve into arguments about the quality of the analogy, not the underlying situation." - Scott Adams, creator of Dilbert (I know he's quite controversial since the election, but he's on point here) in https://blog.dilbert.com/2016/12/21/how-to-be-unpersuasive/
Scott Adams has been quite controversial long before the elections, ever since he got busted as a sock puppet "plannedchaos," posing as his own biggest fan, praising himself as a certified genius, and calling people who disagreed with him idiots, etc. Not to mention his mid to late '90s blog posts about women.
>Dilbert creator Scott Adams came to our attention last month for the first time since the mid to late '90s when a blog post surfaced where he said, among other things, that women are "treated differently by society for exactly the same reason that children and the mentally handicapped are treated differently. It's just easier this way for everyone."
>Now, he's managed to provoke yet another internet maelstorm of derision by popping up on message boards to harangue his critics and defend himself. That's not news in and of itself, but what really makes it special is how he's doing it: by leaving comments on Metafilter and Reddit under the pseudonym PlannedChaos where he speaks about himself in the third person and attacks his critics while pretending that he is not Scott Adams, but rather just a big, big fan of the cartoonist.
Yup, in static vs dynamic conversations, I invariably see someone dismiss the value of compiler enforcement by claiming that you should be writing unit tests to cover these cases anyway. Every time I say a silent prayer that I never end up working with the person I'm talking to haha.
I don't see how this is an argument against TDD. Apparently a whole slew of things went wrong in this project but that doesn't imply that testing is the cause of them.
TDD only works in conjunction with thorough peer reviews. Case in point: at my place of work, code and tests written by an intern can go through literally dozens of iterations before the check-in gets authorized, and even the senior engineers are not exempt from peer reviews (recent interns are especially eager to volunteer).
Now pick almost any other category-leading software product and you will find a similar situation.
The category-leading product is probably from one of the earliest companies in the field, if not the first. They have the oldest and cruftiest code - and the manpower to somehow keep it working. It is definitely not the fastest and definitely not the most stable. But they do have the resources to make sure it supports all the third party integrations and features important for the big customers.
I have encountered exactly this same situation on several different fields and categories.
At at time when I was a complete open source fanatic in the early 2000s it suddenly made me realize how Microsoft actually had much better quality software than most big proprietary software vendors.
Sweet-merciful Jesus. You just made me experience Vietnam-style flashbacks. I worked at Oracle for the 12.1 and 12.2 releases (not working there anymore). You just described my day to day tenure at Oracle. Thank god that's done.
You described the early part of my career in software to a T.
I worked for a mini-computer company in the 1980's that ported Oracle (I'm thinking the version stamp was 22.1 when I was there from 1986-1990). It was one GIANT mess of standard "C" with makefiles that were in some ways larger and more complex than some of the actual kernel code it was building!
One one hand, sure. They're still able to ship a working product despite having an abysmal code base. That's an excellent end result that must not be underestimated. Perhaps the problem that code base solves is really that difficult and there's no other way.
But on the other hand, over-reliance on tests is one of the reasons they ended up in this situation in the first place. It's like the car safety engineer's joke - How do you make cars safer? Install a knife in the middle of the steering wheel pointed at the driver.
When we're too sure that some safety feature will save us, we forget to be careful.
As I read this, I am vacationing in Hawaii for the first time. I can look out my window right now and see the island of Lanai. And that's what I'm doing as I'm read your post right now.
Reading a few sentences, looking out at Lanai. Reading a few more sentences, and looking back at Lanai...
I'm going to guess a lot of these are integration tests, not unit tests (simply going off execution time).
At that point, for DB testing, I doubt it matters what language test are written in, it's going to be mostly about setting up and tearing down the environment over and over.
Really surprising considering that Oracle is the standard for serious enterprise databases.
Not really surprising when you consider Oracle's other bug ridden offerings (probably not as thoroughly tested).
Makes me fear for Oracle 18c.
Not surprising at all. There code might be not performance, maintainable or good looking by developer standards but as OP said they have a gazillion of test cases that make sure oracle db runs and doesn’t produce weird outcomes.
Totally unsurprising if you've ever worked with Oracle. The layers upon layers of legacy cruft are plainly visible even in simple things like the GUI installers.
I remember an oracle forms product based product I helped develop to install on end users pc's required several oracle products installing - which meant 14 or 15 Floppy disks to be used in the right order.
I mean, PostgreSQL can trace its roots back to 1982's INGRES ... and UNIX started in 1969.
There are quite a few very old projects that don't have the same level of cruft as Oracle; it epitomises a Sales Division driven culture.
How many of those switches (that now need to be supported and tested) are because some functionality was promised to a large contract, and so it just had to be done? I would wager a good number.
Tests that run for 30 hours is an indication that nobody bothered writing unittests. If you need to run all tests after changing X, it means X is NOT tested. Instead you need to rely on integrations tests catching Xs behavior.
I beg to differ. Having to run the full test suite to catch significant errors is an indication that the software design isn't (very) modular, but it has nothing to do with unit tests. Unit tests do not replace service/integration/end to end tests, they only complement them - see the "test pyramid".
I think it's important to point this out, because one of the biggest mistakes I'm seeing developers do these days is relying too much on unit tests (especially on "behavior" tests using mocks) and not trying to catch problems at a higher level using higher level tests. Then the code gets deployed and - surprise surprise - all kinds of unforeseen errors come out.
Unit tests are useful, but they are, by definition, very limited in scope.
While I see some value in the red-green unit testing approach, I've found the drawbacks to often eclipse the advantages, especially under real-world time constraints.
In my day to day programming, when I neglect writing tests, the regret is always about those that are on the side of integration testing. I'm okay with not testing individual functions or individual modules even. But the stuff that 'integrates' various modules/concerns is almost always worth writing tests for.
In my experience it's far easier to introduce testing by focusing on unit testing complicated, stateless business logic. The setup is less complex, the feedback cycle is quick, and the value is apparent ("oh gosh, now I understand all these edge cases and can change this complicated code with more confidence"). I think it also leads to better code at the class/module/function level.
In my experience once a test (of any kind) saves a developer from a regression, they become far more amenable to writing more tests.
That said I think starting with integration tests might be a good area of growth for me.
Writing functional tests is easy when you have a clear spec. If you do, tests are basically the expression of that spec in code. Conversely, if they're hard to write, that means that your spec is underspecified or otherwise deficient (and then, as a developer, ideally, you go bug the product manager to fix that).
Integration tests are pretty important in huge codebases with complex interactions. Unit tests are of course useful to shorten the dev cycle, but you need to design your software to be amenable to unit testing. Bolting them onto a legacy codebase can be really hard.
Exactly. I have a product that spans multiple devices and multiple architectures: micro controllers, SDKs and drivers running on PCs, third-party devices with firmware on them and FPGA code. They all evolve at their own pace, and there’s a big explosion of possible combinations in the field.
We ended up emulating the hardware, run all the software on the emulated hardware, and deploy integration tests to a thousand nodes on AWS for a few minutes it takes to test each combination. Tests finish quickly and it has been a while since we shipped something with a bug in it.
But there’s a catch: we have to unit test the test infrastructure against real hardware - I believe it’d be called test validation. Thus all the individual emulators and cosimulation setups have to have equivalent physical test benches, automated so that no human interaction is needed to compare emulated output to real output. In more than a few cases, we need cycle-accurate behavior.
The test harness unit (validation ) test has to, for example, spin up a logic analyzer and a few MSO oscilloscopes, reinitialize the test bench – e.g. load the 3rd party firmware we test against, then get it all synchronized and run the validation scenarios.
Oh, and the firmware of the instrumentation is also a part of the setup: we found bugs in T&M equipment firmware/software that would break our tests. We regression test that stuff, too.
All in all, a full test suite, run sequentially, takes about 40,000 hours, and that’s when being very careful about orthogonalizing the tests so that there’s a good balance between integration aspects and unit aspects.
I totally dig why Oracle has to do something like this, but on the other hand, we have a code base that’s not very brittle, but the integration aspects make it mostly impossible to reason about what could possibly break - so we either test, or get the support people overwhelmed with angry calls. Had we had brittle code on top of it, we’d have been doomed.
If you're only writing tests at the unit level, you might as well not bother. And it's always good to run all tests after any change, it's entirely too easy for the developer to have an incomplete understanding of the implications of a change. Or for another developer to misuse other functionality, deliberately or otherwise.
It could also be that the flags are so tangled together that a change to one part of the system can break many other parts that are completely unrelated. Sure you can run a unit test for X, but what about Y? Retesting everything is all you can do when everything is so tangled you can’t predict what a change could effect.
> Tests that run for 30 hours is an indication that nobody bothered writing unittests.
Yes, they were not unit tests. There was no culture of unit tests in the Oracle Database development team. A few people called it "unit tests" but they either said it loosely or they were mistaken.
Unit test would not have been effective because every area of the code was deeply entangled with everything else. They did have the concept of layered code (like a virtual operting system layer at the bottom, a memory management layer on top of that, a querying engine on top of that, and so on) but over the years, people violated layers and wrote code that called an upper layer from lower layer leading to a big spaghetti mess. A change in one module could cause a very unrelated module to fail in mysterious ways.
Every test was almost always an integration test. Every test case restarted the database, connected to the database, created tables in it, inserted test data into it, ran queries, and compared the results to ensure that the observed results match the expected results. They tried to exercise every function and every branch condition in this manner with different test cases. The code coverage was remarkable though. Some areas of the code had more than 95% test coverage while some other areas had 80% or so coverage. But the work was not fun.
I do. It's about state coverage: every Boolean flag doubles the possible state of that bit of code: now you need to run everything twice to retain the coverage.
FWIW, I know people who work on SQL processing (big data Hive/Spark, not RDMBS), and a recurrent issue is that an optimisation which benefits most people turns out to be pathologically bad for some queries for some users. Usually those with multiple tables with 8192 columns and some join which takes 4h at the best of times, now takes 6h and so the overnight reports aren't ready in time. And locks held in the process are now blocking some other app which really matters to the businesses existence.
These are trouble because they still "work" in the pure 'same outputs as before', it's just the side effects can be be so disruptive.
Writing tests for error handling can be a pain. You write your product code to be as robust as possible but it isn't always clear how to trigger the error conditions you can detect. This is especially true with integration tests.
Code like this makes me think of the famous line from the film version of Hellraiser: "I have such sights to show you..."
Contrast PostgreSQL or... uhh... virtually any other database. Oracle's mess is clearly a result of bad management, not a reflection of the intrinsic difficulty of the problem domain.
Nonsense. The problem domain you dismiss is hideously complicated. Oracle DB and PostgreSQL are entirely different classes of products. No airline runs its reservation system on PostreSQL. That's not a coincidence.
It's not a coincidence, no, because Oracle can provide support guarantees in a way a Postgres contractor can not.
This is also a factor for independent developers (who build airline reservation systems) who need to choose a RDBMS for their product - they'll choose oracle, because ... Oracle can provide support guarantees in a way a Postgres contractor can not.
Which makes Oracle not a different class of product than Postgres, but a different class of support for the product. (which could be considered part of the product, so ... maybe you're right)
No, they use Amadeus.
Amadeus is a wonderful mainframe program that perfectly and with 100% accuracy faithfully models how you'd book a train ticket in France in the fifties.
So... is this good or bad? A Hello World in Go is on the order of 2 MB. That doesn't say anything about code bloat, it just says that Go prefers static over dynamic linking.
Really interesting post. I have often worked in such a mess of a code even though in orders of magnitude smaller code bases with only a few developers. I would never imagine a project like oracle is like this. Since there seem to be a number of oracle employees around, I would be very interested to know if there have been any propositions to start cleaning up this shit. The man hours wasted from this workflow is so huge that I expect that even a small percent of oracle's developers could be assigned to rewrite it and they would catch up the rest of the product in a reasonable time frame so that in a few years it could be rewritten and stop wasting developers' time.
So I think the interesting question is: when you run into a large, bloated, unwieldy POS codebase, how do you fix it? You obviously need some buy-in from management, but you also need a plan that doesn't start with "stop what we are doing and get everyone to spend all of their time rewriting what we have" or "hire a ton of new devs."
I have seen smaller versions of what the OP describes. My plan was that every new piece of code that was checked in had to meet certain guidelines for clarity--like can the dev sitting next to you understand it with little to no introduction--and particularly knotty pieces of existing code deserved a bug report just to rewrite and untangle the knots.
In the end, whatever your plan, I think what you need is a cultural change, and cultures are notoriously difficult to change. Any cultural change is going to have to start high up in the organization with the realization that the codebase is an unsustainable POS.
I had a very similar experience in another enterprise storage company with a code base of ~6M loc of C/C++ and gazillion test cases. Originally, it used to take roughly about an hour to just build the system where it did a bunch of compile time checks, linting etc. Then if everything goes well, it goes to code review, then to a set of hourly and nightly integration checks before it gets merged to the main branch. It would take another cool 3-4 months of QA regression cycle before it gets to the final build.
This sounds a lot like Walmart's codebase for their POS registers. Except, the kicker is that there are zero unit tests, zero test frameworks, etc. You just have to run it through the shitty IBM debugger and hope that you don't step on anyone else's work. Up until 2016, they didn't even have a place to store documentation. Each register has ~1000 flags, some of which can be bit switched further into testing hell.
The structure must be a high couple and low coherence. It should have been designed with high coherence. More components/modules should be designed with the ability to get further divided into smaller modules/components with the growth of the requirements. Bug fixing in smaller components is much easier than solving in the overall project.
Not sure if Oracles is already following this or not. But this is necessary for scalable projects.
You can have automated test generation. I'd imagine with a database system, you'd have a big list of possible ingredients to a query and then go thorugh a series of nested for loops to concatenate them together and make sure each permutation works separately. That can easily make for thousands of tests with only a few lines of code.
Along with what eigenspace said, check out SQLite's Testing page: https://www.sqlite.org/testing.html (the project has 711 times as much test code and scripts, as code). You can go really far... and still miss things on occasion.
The 25 million lines of code is only the source code of Oracle Database written in C.
The test cases are written in a domain specific language named OraTst which is developed and maintained only within Oracle. OraTst is not available outside Oracle. The OraTst DSL looks like a mixture of special syntax to restart database, compare results of queries, change data configuration, etc. and embedded SQL queries to populate database and retrieve results.
I don't know how many more millions of lines of code the tests add. Assuming every test takes about 25 lines of code on an average (every test used to consume about half of my screen to a full screen), we can estimate that the tests themselves consume close to another additional 25 million lines of code to 50 million lines of code.
It is close to 25 million lines of C code.
What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.
Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.
Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.
The only reason why this product is still surviving and still works is due to literally millions of tests!
Here is how the life of an Oracle Database developer is:
- Start working on a new bug.
- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.
- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.
- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.
- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.
- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.
- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.
- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.
- Finally one fine day you would succeed with 0 tests failing.
- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.
- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.
- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.
The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).
The fact that this product even works is nothing short of a miracle!
I don't work for Oracle anymore. Will never work for Oracle again!