Hacker News new | past | comments | ask | show | jobs | submit | page 3 login
Ask HN: What's the largest amount of bad code you have ever seen work?
428 points by nobody271 on Nov 13, 2018 | hide | past | favorite | 578 comments
I think I've broken my own record with this one ~2500 lines of incoherent JavaScript/C#. Works though.

A million+ lines of Java code for a product we (in a previous company) had; it started out nice and design pattern etc but after 5 years with ~200 different coders adding/changing things and no time to refactor, it became bad enough to trigger a phase out of the product and replacement by another product. I actually moved roles to figure out how to not make the same mistakes: those lessons work for them until today.

Two projects with 300000 lines of Python code each at a small company handling physical access control systems with smartcards and stuff. Once I looked the most critical part; the logic which decides if a door should be opened or not and it was implemented with the assumption that in Python, the bitwise operators work the same way as in C (they doesn't at all).

An Angular app with ~50k LOC.

Barely any tests and several hand-rolled components, some of which were merged only because the author managed to implore somebody to give them an R+ after a few sprints of the branch just sitting there.

A total of 21 people were involved in creating this system and for some insane reason we still had daily standups with almost full attendance.

What's the problem with "hand-rolled components"? Isn't that pretty much the essence of building non-trivial frontend stuff? Or does this have some special meaning in the Angular world?

Not in the case of components which could easily be replaced with a well-tested and proven solution.

Notable among the ones in this project was the date picker - there's a decent number of those available and yet somebody made the decision to hand-roll it.

The result was a mess that for some reason had a 300 LOC service as a part of it. Needless to say minimal tests.

It was a waste of man-days in a project that was already over budget.

There are a lot of date pickers, however if you have a specific UI and UX to implement, it might not be worth trying wrangle one of them into doing what you need it to do.

There's so many integration points (business logic, i18n/l10n, styling, keyboard navigation, accessibility, etc) that I think it's often easier to write your own. Hopefully now that most browsers support <input type="date" />, we can just use that most of the time.

My codebase I'm working on right now. Went from a proof of concept demo, to small mini game, to a bigger mini game, to full game, to SAS product. With no time(budget) to rewrite it up until delivery to a customer. Its pretty frustrating, as its my fault, but budget and tight deadlines forced my hand.

So since we are on this subject, does anyone have advice for wrangling spaghetti QT apps into submission? We have two of them here and of course their creators aren't here anymore and we find it very difficult for developers that weren't involved in their development to grok all the shared state.

Server code written in C++ that called into a lower level C library. There was a 4000 line case statement that handled message replay. I am not sure why it wasn't broken out into functions, but it was "sensitive" code and refactoring it was considered a no go.

Drupal 8

I did one project in Drupal. What a heaping pile of shit unfit for any actual developer and not someone's wife who moonlights as a small-business homepage developer. Never again.

I never worked in Drupal 7, but I can't image it was much better.

Drupal 6 was actually a pretty consistent hooks-based system that exploited the fact that PHP has a fast and feature-rich implementation of the function lookup table.

Drupal 7 started the migration to object-oriented code and was halfway through the messy rewrite when it was released. Drupal 8 finished where Drupal 7 left. That's where the majority of the developers left as well (pun intended).

My first PHP app was an invoicing application. It is ~3500 lines of php in a single file. Most of which is a few heredocs with javascript to be in lined on the client side. I haven't touched the code in 10 years. I still use it for my freelance invoicing.

As a budding programmer (hobbyist), what can I do not to fall into the mistakes others talk about in the comments? Is there any particular set of mental habits that could lessen the chance of spaghetti occurring, or is it just a matter of when rather than if?

In practice, it's more a matter of mentality and willingness, than any specific knowledge regarding code maintanence. All you need is a couple of articles or a single book about writing clean, maintainable code - after that, the majority of the benefit comes from simply trying to maintain the long term thinking and not getting into a "let's just get this frigging done and go for a beer" mentality. Yes, there's always more you can learn to better structure your code, but simply being aware and _wanting_ to write readable code consistently takes you above 75% of the code out there.

All these comments are good. Also ALWAYS watch out for 'Quick Win': When you do something quick and dirty, mark it up for 'refactoring' (i.e. Tech Debt). When your list of quick and dirty builds up, go and clean it. Do not accumulate tech debt, try to keep it low otherwise you will pay full price later (examples up here).

Another great tip that saved me from tons of refactoring: "The wrong abstraction is worse code duplication". Meaning sometimes duplication of code is better than trying to create the wrong architecture (as long as you mark it in your quick and dirty list).

You don't just pay full price on technical debt, you pay it back with interest!

The most important thing is to be on the lookout for spaghetti code (or other big chunks of hard-to-maintain code), and do something against it.

There's lots of materials on how to refactor code (improving its structure without changing functionality), including blogs, books and video lectures.

The best way to learn how not to write spaghetti code is working on spaghetti somebody else wrote. Every big project has some corner with sub-par code quality, so you could volunteer to clean something up, e.g. in Libreoffice or Firefox.

Here's some rule I'm using to keep my code maintainable (to me):

- Line should not be too long (120 max)

- Function should fit into 1 page in the view port of screen, even when your console and debugger take 1/3 bottom part of the screen.

- Variable name should be pronounceable and longer than 3 letters (to prevent name like `i`, `x`, `s`)

- File should not be too long (1000 lines max).

- Return/Exit/Throw as early as possible.

- Comment:

    - If an `if` take more than 2 condition, it's worth commenting.

    - If a funciton is longer than 10 lines, it's worth commenting.

    - All file's worth commenting.

    - If you ask yourself "should I add comment", add comment.

I think the best starting point is

Robert Martin, Clean Code.

Most people that have read the book (and it is a classic so many have) will swear by it. I most certainly do.

Windows Vista. It is practically a virus, and ran ~5x slower than XP on the same hardware.

It wasn't even consistent - a colleague had weird errors with networking on Vista and I didn't even though we had identical laptops.

Microsoft support was basically "yes it does that sometimes".

Largest amount? 13k lines in a single Java file, none of which was boilerplate. Who knows how many lines were in the whole project... a million seems about right.

C functions that span over 10k lines with 46 parameters.

Not as big as some of the examples here, but still kind of interesting. I worked for a city government, and the entire backend code for our work-bids site was a few hundred files of PHP written with some sort of Dreamweaver framework that was deprecated years ago. Old enough that the entire site had to be run on PHP 5.7. I ended up rewriting the entire thing with Laravel (the framework of choice for the other webdev that was in the IT dept). It really didn't take that long; he just had too much on his plate to worry about it before I showed up.

has to be itunes. What a bloated pile of crap that is.

any entertaining examples from your time working on it?

Once upon a time, there was a search product and one of the data sources that it could search was a Solr/Lucene database. This should be no problem, since search is what Solr does. It should be as simple as passing the user's query through to Solr and then reading the response. The problem was, it was important to know exactly which parts of any matched records were relevant to the search.

The Guy Before Me™ decided that the best way to implement this would be to split the user's search into individual words, perform a separate search query through Solr's HTTP API for each individual word, and then do a bunch of very clever and complex post-processing on the result sets to combine them into a single set of results.

This led to endless headaches due to horrible performance. Imagine if you wanted to implement web search this way. How would you synthesize the results for the search "boston plumbers" given the search results for "boston" and the search results for "plumbers?" You would need tens of thousands of results for each search term to find even one match that applies to both terms. Now scale this to getting hundreds of results to present to the user. Now scale this to n search terms.

I was tasked with making this take less than 8,000ms for a simple query. I spent a while getting to understand how this code worked and building out performance tests so that we could determine how it would behave under load (we didn't have any users yet). The results were pretty grim. I presented two possible options for moving forward:

1. Move this crazy result-set-intersection logic closer to the data. I could build a custom Solr plugin to do this stuff inside the Solr server so that we didn't need to copy gigantic result sets across the network from Solr to the application server for every query.

2. Delete ALL of this nonsense because literally exactly what this whole mess of code was meant to accomplish is already implemented in Solr. They call it highlighting. It's one of the marquee features of the program. I can't stress enough that this is precisely, perfectly, unequivocally, the exact thing that all of this complexity was meant to accomplish.

My manager thought it would be a shame to throw away all of that very expensive code and lose the flexibility of an in-house solution. So we went with option one. I spent the next month writing a Solr plugin that reproduced the original logic. It was still slow as mud so I sharded the data across multiple Lucene servers and distributed the algorithm across them with a map/reduce sort of scheme.

In the end, it all worked great. It was fully ten times slower than the solution already built into Solr, but it worked.

The startup later ran out of runway trying to build a big-data-sized in-memory distributed database from scratch to speed up search. The founder (also the lead developer while I was there) insisted that everyone use raw C-style arrays and a custom in-house hash table implementation because he thought STL was too slow. Basically, "not invented here" was in the DNA of that company. I'm surprised we even used commodity hardware and didn't design some kind of in-house search coprocessor that would do everything in silicon.

... it all "worked great". Wow, what a mess.

Obligatory XKCD: https://xkcd.com/303/

There's an xkcd comic for everything.

There just needs to be an xkcd about the "xkcd comic for everything" meme and the holographic simulated universe will come to completion.


Maybe, if we pitch the idea to Randall, he may prove it. :)

Ha ha Turing Complete xkcd...

The number is cool.

406. Summer Glau

Nice spotting

Everybody needs a 303.


Huge ecommerce company. Code written from the 90s in .NET was ported into PHP using an auto-library... nothing else to say

ah my godness that brings back bad memories... :|

I was at a project were every new developer said "i have never seen such bad code". Srsly! 3 new People said it and me as well.

- Bad testability - no tests - hidden 3 bad bugs surfaced in just one year - ...

I had very little trust in that code. And it was not much fun. On Upside: Cleaning it up was a great feeling.

Thank you for the question it gave us so much knowledge in the replies.

Wish you all the best

A million lines of mostly 90s style C++ codebase - works like a charm :)

One of the startups I worked at had 3 frameworks in PHP - all grown, all messed up in all the wrong ways - and that was just the backend. We had a homegrown framework written in JQuery for the frontend as well. never Again.

And probably making a ton of money so biz didn't want to replace it.

Every line of code I've written over the past 30 years.

100k lines of semi generated, semi handrolled java/c

I made scones with the wrong flour once. (¿)

Amavisd. About 11,000 lines of crufty perl.

The most insidious cases are those where bad processes are developed around bad software, leading to a vicious cycle of co-dependency.

I once worked for a reasonably large business that ran all of their invoicing and stock management through an MS Access project. The whole thing was wacky. half the business logic was implemented in VBA. The other half was stored procedures, but there was no obvious pattern (predictably, anything that involved a task the DB was optimised for was written in VBA). "Deployment" consisted of saving to a network drive and waiting until everyone opened it again in the morning. Data integrity seemed optional and inconsistent. The symbol naming convention could be described as cryptographic. The only documentation was a comment over each function stating the original developer's name and a timestamp. It had a proud splash screen stating "Developed by Dave Davidson" that was shown for 5 seconds on startup - this was of course completely simulated and there was no reason to have a splash screen. I could never quite fathom the magnitude of this guys delusions of grandeur or why they would want to put their name to it.

The worst part about it all was that for the most part, it worked. The parts that didn't work were well known to the people using it and worked around. So processes were developed around it, and over time these became so deeply ingrained in the teams using it that they couldn't imagine working any other way. Most of this consisted of taking telephone orders, printing off the invoices that were generating in Access, and then rekeying that information into an accounting system (for efficiency purposes of course, these printed copies were passed to another team member with notes scribbled on the original document. Mistakes were commonplace and accepted as a CoB).

Part of my role was to implement a web based ordering system. We did a reasonably good job but of course had plenty of our own WTFs. The biggest pain was integrating this with a particular team. They could not imagine a process that did not involved printing off orders and rekeying. After a while we realised that the reason we got so much push back was that once fully integrated, our system would make half of the team members redundant.

With support from management we went ahead, and over time the wrongs were righted. When I left there was still a deep level of distrust in the new system. Mistakes that were daily occurrences in the old world were "proof that the new system won't work". Orders were still printed "just to make sure I don't lose it". New bugs were treated as if the sky were falling. I would spend more time managing expectations than writing code. But we got there in the end.

Buggy software can be fixed. Buggy humans are an entirely different kettle of fish.


SAS Viya. Enough said.

git ls-files | xargs wc -l

<puts face in palms, starts weeping gently>

cloc .

The Internet.

While not related to LOC, warranted nonetheless: https://xkcd.com/2030/

The entire Athenahealth stack.

Perl, right?

The whole damn thing.

if it works, it's not bad code

I don't think I want to work with you, ever :)

My own CMS. Here's the function signature of the heart of it, which depending on circumstance may call itself recursively.

    function a_nodes_list_trees(
    	$node_id = 0,
    	$tree_id = 0,
    	$nleft = 0,
    	$nright = 0,
    	$nlevel = 0,
    	$current_subtree_depth = 0,
    	$flags = 0,
    	$order = 'tree',
    	$inline = false, /* needed when loading stuff via ajax*/
    	$skipped_types = [],
    	$items_per_page = 10,
    	$min_tag_count = 2,
    	$min_author_count = 2,
    	$is_site_root = false,
    	$skip_pagination = false,
    	$ignore_404 = false,
    	$skip_display_options_and_batch_ops = false,
    	$display_bottom_pagination = true)
As you can tell from the default values, it started out having 6 arguments. And it has things like this in it:

    	$mysql_the_rest = 'FROM
    			'.DBTP.'node n
    				LEFT JOIN
    					'.DBTP.'node n2
    							n.tree_id = n2.id
    							n2.perm_view & ' . A_PERMS . '
    			n.perm_view & ' . A_PERMS . '
    			n.site_id = ' . $site_id . '
    	$total_count = $A->db->fetch_one_column('c', ' SELECT COUNT(*) c '.$mysql_the_rest);
.. you know? The whole CMS is 20k lines of PHP, with HTML, PHP, and MySQL all happily living together in the same files (it's not that I don't have templates, I just have plenty HTML in the PHP, too)

Yet, it works like a charm, PHP updates made it faster even, and I can use it for everything I needed so far, and use its output in a variety of ways. I still want to rewrite it, but it seems a lot of work to just shave off a few ms and have nicer code, with the same result for the visitor, and also having to write something that migrates the content. I suspect with enough content, it will slow down, and then I'll think about the next iteration. But it's still a mixture of pride, plain being happy to have it, and groaning whenever I fix a bug or add a feature.

An easy refactor for your function is to use a "Builder" for your params, i.e.:

options = NodeBuilder .author({ min: 2 }) .flag({ .. }) .pagitation({ .. })

function a_nodes_list_trees(options) { ... }

Because in the end, if your function is fast and working correctly, it's fine even though it's a little bit messy. The problem is more all the code calling this function and having to pass dozens of params in the right order, and then it's a pain to start adding/changing parameters. Also, when calling this function, you probably need a bunch of temporary variables to "build" all the params; all those temporary variables could instead live inside that builder object.

I think I'll just use arrays, one for the caller, and one globally that has the default values. Fortunately that function isn't called in that many places.

Making and then really using the CMS helped me with knowing what I would want in my next CMS, and just keeping the whole in mind from the start would probably make it a lot better by itself. My thinking is that the longer I put that off, the more languages improved and the more I hopefully learned in the meantime ^^

You're benefitting from an accurate mental model of this code (since you wrote it and work on it) and as soon as someone else has to work on this thing they are going to curse you.

So it might be job security but it's also vulnerable to the hit-by-a-bus problem

When I started out, I wanted to opensource it, it had an installer and everything.. but then I realized it's become kind of a little monster and wisely canceled that plan, I'm not that irresponsible or mean :)

My god man that is beautiful. I'm going to put that on a poster.

I feel your accounts bio perfectly matches the spirit of your posts content here haha!

The quotes in my bio are by people who thought things through, while the CMS exploded in scope while I made it, so I'm not quite sure what you mean. Can you elaborate?

The verbosity and wall of text

Maybe, but at least it's more or less clear what I mean right away :P A million words conveying one unit of meaning are still more efficient than ten words conveying no meaning.

I just thought about that when I saw a youtube comment saying "stolen" in response to a funny joke. I wondered, do they mean they intend to "steal it" because it's a good joke, or did they mean the person who posted the joke stole it from somewhere? Nobody will ever know, at least I for one won't sign in just to ask.

I notice that since mobile devices, a lot of "communication" on the web these days is kinda like the "small talk" from Kevin from The Office US.


If I hadn't asked what you meant, I and anyone else who read your comment would have had their own interpretation of it. I see a lot of comments like that on HN, where you would have to ask "what do you mean?" because it's totally unclear. A variation is stating something that is factually true but doesn't really refute anything, but the commenter clearly seems to mean something by stating that triviality, but don't say what it is. It's like dog whistles, but not for others to hear, but only the posters themselves know what they mean. Count me out.

"Maybe, but at least it's more or less clear what I mean right away :P A million words conveying one unit of meaning are still more efficient than ten words conveying no meaning."

No need for 'atleast'. I'm not criticizing the length, its fun-ny because its another form of life and way of expression that others use, brings me joy when i see patterns in life, expressed in many forms, one being someone who is verbose writing a hilarious god function then inadvertently backs up that 'digital persona' posting with a god function-esque bio lol!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact