Hacker News new | past | comments | ask | show | jobs | submit login

I recently came across this artice:


Although it describes the issue pertaining to statistics + machine learning, this is also exactly what end ups up happening with a large codebase without clear requirements or test cases, and people just making incremental, piecemeal changes over time. You end up with an application that has been trained (overfitted) with historical data and usecases, but breaks easily for slightly new variations that are different from anything that has ever been handled by the system before in some trivial way that better designed, cleaner, more abstract system would be able to deal with.

Given how much poor coding practices resemble machine learning (albeit in slow motion), it's hard to hold too much hope about what happens when you automate the process.

I really like this extension of the concept of overfitting to codebases in general.

I especially noticed this in libraries/packages that were "community owned" in a company--instead of one team owning the package and being the authority on deciding the long term roadmap and communicating with other teams about feature requests, deprecations, documentation, bug fixes, etc, the community at large, where "community" was very broadly defined as a team that for whatever reason had an interest in using/maintaining/adding onto the package, would collectively own the package.

Naturally, the result was exactly the scenario you described. Each team hacked on their own bit of functionality for their specific purpose, while doing their best to not affect or break the increasingly precarious tightrope of backwards compatibility. There was no long term architectural vision, so there was a definite need for refactoring--and yet no team had the incentive to invest the amount of time needed to do that.The documentation was woefully incomplete as well, and few people understood how the entire thing worked since each team would only interact with their small fraction of the code.

Two principles I live by (much to the annoyance of my bosses)

1. Don't fear the refactor. 2. If you don't want to rebuild your entire application from scratch, don't worry, a competitor will do it for you.

There's nothing wrong with creating something in increments. It's the fear of revisiting something that destroy's a code base.

Your bosses might be right.

Technical debt, much like regular debt, can also be used as leverage to quickly gain a competitive advantage. While your competitors are busy refactoring/rebuilding perfect applications without hardly creating any more customer value, the scrappy startup that writes piles of spaghetti code might be building exactly what customers want.

Code quality != business value.

While this is clearly true (and is exactly what was being described when "technical debt" was coined), the unfortunate reality is that we often take on huge amounts of technical debt in order to fund the equivalent of pizza parties. Having eaten all the pizzas, we then have to pay back the debt and frequently the company can't afford it.

This is one of the reasons why you must not fear the refactor. Sometimes you need to get that code out the door because the business requires it. Then you need to pay back the debt -- by refactoring that mess every time you touch it in the future.

There is no such thing as "technical inflation" to magically wipe away our debt. It's important to have good lines of communication so that the business doesn't get used to squeezing development in order to eat pizza (because, why not? It's free!)

Piles of spaghetti code will give your customers what they want today, but rob them of the features they want tomorrow and make every future feature orders of magnitude more expensive to develop than they should be. That's the interest you pay on Tech Debt.

Much like regular debt, if you don't repay it, you go out of business and end up penniless.

> Code quality != business value

I don't think that's a given: in some circumstances code quality absolutely is business value. It might be better to say code quality can be, but isn't always, business value. As ever, context is the deciding factor.

Well, I would say technical debt is similar to the classic kind of debt: It may give you short-term advantage (liquidity), but on the long term, there's interest on it. If not paid off, it grows exponentially.

So yeah, technical debt can be used as a tool, but it doesn't come for free.

I really don't think so, apart from exceptional case (if you're selling your code to another dev maybe), code quality is never value to the user. That's not to say that good code quality is useless of course, but the usefulness of code quality is not in the business value.

Technical debt is already has similar business concept in "expensive" money - Funds that you raise from VCs while in distress on bad terms because there's no other way to do what needs to be done fast enough. Programmers paid with expensive money trying to argue that they need more time to write high quality code because 'future' will seldom win that argument.

I think this is actually a better analogy than debt (notice that equity and debt are on the same side of the ledger); just as future valuation of the company is uncertain, so may also be the business value of the quick hack. I.e. in the same way that a certain VC investment may or may not be wise, the quick hack has the same uncertainty attached.

Add to this that the business people may have a bad grasp of the true cost of the hack, and the developers little insight into the business value of it, you get the current situation.

And when you clearly see that the whole product was a dead end, you can default on your technical debt. Saving you untold man hours.

> Technical debt, much like regular debt, can also be used as leverage to quickly gain a competitive advantage.

Unlike regular debt, technical debt is extremely hard to quantify.

You can't balance a business strategy if you can't estimate how much you're going to pay.

What tends to happen in reality, after the code gets 7-8 years or more long and it's always been piecemeal and spaghetti code then each change is exponentially more difficult to make.

There is the story by Robert C Martin about the company that made a really good C debugger back in the day. Then C++ came out and the company promised to make a version for it. Well months came and went and eventually they went out of business. Because the first version of the debugger they wrote was awful code it made changes really hard to mark and so they couldn't adapt to the changing market.

Amen. Good enough working code gets your foot in the door. You pay later, but at least there is a later.

Business is mostly a math's problem and most programmers don't really understand why they go to work.

This view has a danger that some understand this as "you never have to pay your debt back". But if project lives long enough to be successful you end up painting yourself into the corner where you cannot change a single thing without breaking something.

> 1. Don't fear the refactor.

Like most things in life, there is a balance. I have argued against large refactors many times. Often wanting to do a refactor is just a thinly disguised excuse to use some new technology (I'm as guilty of this as anyone else). Anytime a refactor comes up my goal is to figure out why:

1) What will the refactor fix?

2) What will the refactor potentially break? Are there tests around critical functionality?

3) Does the group proposing the refactor really understand the ins and outs of the application? When new people come into a system they often want to change it to fit their mental model of the problem, and miss subtleties of why the system is a certain way.

That being said, I evaluate small refactors anytime I have to touch a piece of code.

I am more inclined to your sentiment. Now there is no excuse for badly formatted code and being a lazy slob, and I never use the word refactor in the sense it is used here.

I often _redesign_ old code to meet new requirements and to support new features, but I would not call it refactoring.

I always strive to leave the code better than when I found it. But I would not name it refactoring.

"It's the fear of revisiting something that destroy's a code base." So true.

Yep, when fear creeps in around modifying a part of an application it is time to have a very serious conversation about fixing that. It is one of the few cases where I find the refactor vs. creating customer value argument is more clear cut -- letting that fear linger is likely to spread to other parts of the code & turns into a human problem pretty fast.

Fixing might be a presentation, tests, documentation, refactoring, rewrite, deprecation, whatever. Just don't let it languish and the fear grow.

I wholeheartedly agree! Companies that delay tackling technical debt still ship features, they get slower and more error prone development as time passes. As they still keep shipping they can fail to see how much faster they'd be shipping 6 months down the line if they tackle debt which adds weeks to each feature being developed.

I've expanded on these thoughts before on my blog about technical debt inflation if anyone is interested https://scalabilitysolved.com/technical-debt-inflation/

Indeed. I used to be scared of database changes in case something went wrong. Now I realise the worst thing to do is to hack code on top of a poor database design to make up for it. That usually ends up far worse.

And one can extend that to businesses as well. How many established companies have been laid low by someone with a new process built in a more modern foundation.

It would be interesting if someone actually had data on this?

I suspect this is something "software engineering" researchers might study.

Whatsapp is a prime example in the tech vs tech space - ride-shares vs taxi services, automated freight loading, fedex vs ups in terms of automating their package sites. An old factory with 1000 workers not being able to compete on a cost basis with a new automated one is the story of the last 50 years I feel.

Agree with the other commenters, very interesting insight.

It maybe doesn't fit the metaphor quite as well, but as an operations person, I've frequently run into the "underfitting" problem. For example, we run Chef to manage our physical and virtual infrastructure. There are a ton of community-authored Chef cookbooks available. Which at first blush, sounds great. But often, they have grown over time to become these awful hydras that try to be all things to all people. PR after PR has added support for the specific use case of every organization that wants to run the cookbook in their own special way. The "Getting Started" section of the README eventually becomes a dumping ground of 900 attributes you need to set correctly, and yet somehow it still doesn't quite perform how you'd like.

In many cases, we've tried to use community cookbooks and even merge our own customizations back upstream. Only to eventually give up and write our own version that's 50 lines of Chef DSL/Ruby instead of 5,000 but does exactly what we need, the way we need, and no more. It's very possible to make a system too generic and configurable, to the point where it loses all meaning.

Found the exact same thing regarding the community cookbooks. We do use some though, it depends on the complexity and how well they work. I've either written some from scratch, taking pointers from the community ones or forked them to make them simpler and better suit our needs. Pull requests have been made where it makes sense.

Glad to hear we're not the only ones who found the community ones not perfect for every need.

> There are a ton of community-authored Chef cookbooks available. Which at first blush, sounds great.

Welcome to software development! Not as easy as it looks is it :)

EDIT you may find these articles helpful (or at the very least food for thought):

- https://blog.codinghorror.com/dependency-avoidance/

- https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-...

The problem with the analogy is that for a learning algorithm, there are clear definitions of the model complexity as it relates directly to the outcome being optimized. YAGNI applied to a model is a penalty term for parameters or various methods of regularization.

But when the “goal” of the system is just “arbitrary short term desires of management” you can easily point out the problems, but there is no agreement on what constraints you can use to trade-off against it.

Especially for extensibility, where you can get carried away easily with making a system extensible for future changes, many of which turn out to be wasted effort because you did not end up needing that flexibility anyway, and everything changed after Q2 earnings were announced, etc.

In those cases, it can actually be more effective engineering to “overfit” to just what the management wants right now, and just accept that you have to pay the pain of hacking extensibility in on a case by case basis. This definitely reduces wasted effort from a YAGNI point of view.

The closest thing I could think of to the same idea of “regularizing” software complexity would be Netflix’s ChaosMonkey [0], which is basically like Dropout [1] but for deployed service networks instead of neural networks.

Extending this idea to actual software would be quite cool. Something like the QuickCheck library for Haskell, but which somehow randomly samples extensibility needs and penalizes some notion of how hard the code would be to extend to that case. Not even sure how it would work...

[0]: < https://github.com/Netflix/chaosmonkey >

[1]: < https://en.m.wikipedia.org/wiki/Dropout_(neural_networks) >

Overfitting is a quantifiable problem. If you're not doing robust data segregation and CV you're not even engaging in elementary ML practices.

Only if the training data you got is representative of all future use cases. Good luck with that.

You can segment the validation to be data after a certain date, and train on data before that date. You get an accurate sense of how well the model will perform in the real world, as long as you make sure the data never borrows from the future.

That only ensures your model is accurate assuming real world parameters remain the same, which again, is prone to overfitting.

To use a real world example, financial models on mortgage backed securities were the root cause of the financial crisis, because they were based on decades of mortgages that were fundamentally different than the ones they were actually trying to model. Even if someone was constructing a model by training on data from say, 1957-1996, and validating using 1997-2006, they would have failed to accurately predict the collapse because the underlying factors that caused the recession (the housing bubble, prevalence of adjustable rate mortgages, lack of verification in applications) were essentially unseen in the decades of data prior to that.

Validation protects against overfitting only to a certain degree, and only to the extent that the underlying data generating phenomena don't ever change, which, in the real world, is generally a terrible assumption.

I'd probably put fraud ahead of models as the root cause. The entire purpose of those securities was to obscure the weakness of their fundamentals.

That's not hard and fast, though. While no model is perfect, robust models can "handle" outliers. Worst case, you know when it happens and train with more a priori.

Worse case? More like best case.

It's not about outliers. Let's say you're at a startup and you fit some model to your first 30 customers. It works great for your next 10 customers, but fails dramatically for your first enterprise client. Why? Because the enterprise client was fundamentally different from your previous 40 customers. If you fit your model on a population in which the relationship looks one way, then try to apply your model to a population with a different relationship, it will fail.

Machine learning and statistics are both application of the same principles of probability and information theory. They work (for the most part) by modeling the world capturing the relationships between random variables. A random variable can be any natural process that we can't express in precise terms, so we express it in probabilistic terms.

This is the same principle underlying the premise that "past results do not guarantee future success." The relationships between random variables in the world that affect success in anything -- stock market performance, legal outcomes, etc. -- might not be the same tomorrow as they are today.

And that's not even a matter of overfitting. That's just your ever-present real-world threat of having all your modeling work invalidated by forces outside your control. Overfitting happens when you, the data scientist, fit your model to random noise in the training data. An overfitted model will have bad generalization performance on held-out samples, even from the same population. It's not always easy or possible to detect overfitting, especially with small training sets.

What's the problem with that, though? Startups are usually advised to service one market, not several. If your first 40 customers were prosumers but then you have a prospective enterprise client, the logical response is say no to the enterprise client and go after another 60 (or 60,000) prosumers.

Or at least understand that you're entering a new market and budget appropriately for development. Usually, if you're switching from between prosumer -> enterprise, you are very, very lucky if the sum total of changes you need to make is training a new machine learning model. To start out with, you usually need to get used to sales cycles that take 6-18 months, hiring a dedicated sales guy to manage the relationship, and handling custom development requests.

There's no problem with it, but some very intelligent people don't seem to realize that you can't just "use machine learning" and predict whatever you want. It's gotten better over the last few years, now that it's less new and magical than it used to be, but I still see it happen now and then.

Hopefully your analysts (which in this case includes your lawyers, accountants and statisticians) will tell you that the new client is different to the others and your models may not hold up and may need revision.

Hopefully you also listen to them.

Close. Extrapolation is possible using structural theories rather than only reduced form models.

Only if your structural theory is not-wrong enough.

Even if you KNOW that your model is not-wrong in the right direction and within acceptable orders of magnitude, how do you fit the parameters for that structural model? You need some kind of data, even if you're just using anecdata to pick magic constants.

All models are wrong, some are useful.

Fortunately models like these are often testable across many contexts, amenable to metastudies, available for calibration, etc.

That's my whole point. You just asserted that you can extrapolate outside a training set with a structural model. I am asserting that those "many contexts" and "metastudies" amount to a bigger, more representative training set.

What do you mean by CV? I'm not familiar with those terms. Thank you.

As sibling points out, cross validation, which is the front-line approach to avoiding overfitting for supervised classification problems.

It means cross validation. It essentially means is a way of simulating how well your model will do when it encounters real world data.

When building a model, you divide your data into two parts, the training set and the testing set. The training set is usually larger (~80% of your original data set, although this can vary), and is used to fit your model. Then, you use the remaining data you set aside for the testing set by using your model to generate predictions for that data, and comparing it to the actual values for that data.

You can then compare the accuracy of the model for the training and testing sets to get an idea if your model generalizes well to the real world. If, for example, you find that your model has an accuracy of 95% on the training data, but 60% on your testing data, that means your model is overly tuned into features of the data used to build the model that may not actually be helpful for prediction in the real world.

Never seen the acronym (not really in the space) but I assume cross validation.

Camouflaged Vacuity

I assumed Code Versioning so that if you have robust data segmentation you have less uncertainty about the impact of change. However, I'm a tourist here and hope OP comes back to share.

Cross-validation: testing model fit on non-training data

I assumed Computer Vision.

Fantastic insight, really top-notch.

Just some random thoughts in no particular order - curious what you make of them:

- On the subject of incremental piecemeal changes over time with no requirements: don't you all find that in your workflows (when you're doing something for yourself), it is hard to step back and "architect" something? It is easier to just let it evolve.

- Likewise it takes real work and thought to organize something as simple as a spice rack. (I just keep opened packages of spices in the cupboard.) The knowledge that company is coming is one of the few pushes. But it kind of feels like it's being done for show.

- It's hard to add architecture when you know there's no team that is coding against it as an API. It's just you. It feels like that extra power is, kind of wasteful.

- The other thing is that it may be the case that you know there is some deeper level of architecture. In the case of my spices, for example, most of the opened spice packets I mentioned are actually mixes. (Such as grilled chicken spice mix.)

- If I had to architect my own spice rack, I should start by learning which spices I'm actually using more of. And since what I'm doing works, I don't actually care. Plus, it would be a step down: the first time I mixed my own spices, I would probably end up with a worse dish than pouring some out of a premixed packet.

- The first time you architect a "proper" framework rather than let your machine learning algorithm "overfit", the result is probably demonstrably worse.

- That is a lot of pressure on not architecturing, and just continuing to (over)-fit.

This is where good logging helps.

The lifehack is to throw all your spices in a box and only pull hthem out when you need them and then leave them on the rack. Then throw away any spice you haven't used in n months and add it to a blacklist. The ones you use frequently should be prominently displayed and texted with extra care and possibly set up for autorenewal from the grocery.

Only introduce new spices when there's a recipe, and buy just the amount you need.

So too with code. Log your code paths, prune little used features, optimize the hell out of the most frequently used ones, introduce features sparingly and with purpose...

I like this spice metaphor, thabks for it.

well-constructed != over-architected

Epicycles within epicycles eventually get replaced with a clean redesign (https://wikipedia.org/wiki/Paradigm_shift)

The tricky bit is mostly that you need a new theory of the data to have a better abstraction. That's the tricky bit.

Models generated by DL lack even a paradigm or theory or abstraction.

This is a brilliant insight.

One of the problems I’ve seen in research into technical debt is the lack of a good definition. This insight could form the basis of one.

"Given how much poor coding practices resemble machine learning (albeit in slow motion), it's hard to hold too much hope about what happens when you automate the process."

Your whole argument seems to be based on your personal experiences. Perhaps it is also thus vulnerable to some sort of overfitting :)

Hopefully code reviewers with institutional knowledge can advise on where to apply pruning and prevent code overfitting.

Pruning is also the common ML practice to prevent statistical overfitting.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact