Hacker News new | past | comments | ask | show | jobs | submit login
Evolutionary couplings between files reveal poor software design choices (ergoso.me)
242 points by armish on Dec 15, 2014 | hide | past | favorite | 82 comments

A "correctly layered" app with UI view separate from UI logic separate from server-side logic etc. will show up as coupling, if commits are feature oriented.

There's certainly a hint as to where to look for bad coupling, but expected "coupling", like tests, need to be discounted.

> A "correctly layered" app with UI view separate from UI logic separate from server-side logic etc. will show up as coupling, if commits are feature oriented.

It would show some coupling commits, but e.g. bug fixes should/would be segregated to the relevant files, not spread across the system.

Basically bugfixes would be localized to relevant files, but features would spread across files.

The grandparent's making an important point in that you can't design a system such that all possible changes you might want to make are localized to one area of the code. Engineering is about trade-offs: if you rigorously separate view from logic from database, you make it harder to add features that must touch all three. Conversely, if you make each feature its own file and add in hooks to the view/logic/database layer so they call out to plugins, you make it easy to add new features but very difficult to understand what each layer as a whole is doing.

The best you can do is choose the ideal architecture for your particular project, in the particular point in time that you're working on it. That's why basically every software system needs to be rewritten as it grows up: the ratio of complete rewrites to new features to bugfixes to maintenance refactorings changes as the system matures and the requirements become more precisely known. It's also why we have a software industry; if there was one ideal way to design a system for all domains and all points in time, someone would go design it and be done with it, and none of us would have jobs.

> if you rigorously separate view from logic from database, you make it harder to add features that must touch all three

I've found the exact opposite of this to be true.

I agree! Perhaps the (parent) meant something different, I wonder?

For context, I'm talking about the initial phase of a product's lifecycle, where you are changing the product definition roughly every couple days, the total codebase fits in one person's head, and you spend much more time writing code than reading it.

Systems like PHP + "SELECT * FROM database_table" or the MEAN stack, where you use the same data format for both backend storage and UI and intermingle logic with templates, are significantly faster for getting something workable on the screen that users can try out. I've done a complete MVP in 4 days with PHP; a roughly equivalent app in Django (which has some minimal model/view/database separation) took about 2-3 weeks. The fastest I could launch a feature in Google Search that touched both UI and indexing was roughly 6 months; as you'd expect, that has a very rigorous separation of front and back-ends.

Now, the PHP solution will quickly become unmaintainable - with the aforementioned 4 day project, I no longer wanted to touch the code after about 3 weeks. But this doesn't matter - it's 10 years later and the software is still serving users, and I long since moved on to bigger and better things. And that's my general point: what's "good" code is context-sensitive. Everybody wants to work on nicely-factored code where you can understand everything, but there is no business to pay you unless you first make something that people want, and oftentimes that takes hundreds of iterations (including many fresh starts where you throw everything away and rebuild from scratch) that are invisible to anyone collecting a paycheck.

Tests should have some coupling, but good tests need to change less frequently than the target code. Bad tests need to change every time. It seems like there's still value to explore there.

> I was thinking about writing up a small application paper for this project, but I am really terrible at reading papers from the Computer Science field, let alone writing them.

Thank goodness for that. This blog post was so much easier to read then a formal paper.

Why are formal papers so tedious to read? Imagine how much time we'd waste as a group if he written this as a paper. Many of us would give up before finding the actual information, and those of us that /did/ identify the actual information would have invested a lot more time than it took to read this excellent blog post.

I have found that the best papers are hard to read because they’re informationally dense, so you have to slow down to really process every sentence and unpack the author’s thinking in your head—but once you do, you get a lot of knowledge from just a few pages. So it ends up being worthwhile.

Average papers are hard to read because they’re trying to emulate the style of the good papers, but without having enough actual content. The length and register of a blog post are definitely a good fit for the average essay.

Crappy papers are hard to read because they’re crappy, and recasting them into a different format would reveal that they contain no information at all. :)

True, but it is also true that the best papers would be even better if they were written in less terse and cryptic style so that the same information could be obtained more easily.

I’m not sure about that. For example, the first time I read them, each of these statements in the declarative specification of Hindley–Milner type inference took me a long time to unpack.

    x : σ ∈ Γ
    --------- [Var]
    Γ ⊢ x : σ

    Γ ⊢ e₀ : τ → τ′  Γ ⊢ e₁ : τ
    --------------------------- [App]
    Γ ⊢ e₀ e₁ : τ′

    Γ, x : τ ⊢ e : τ′
    ------------------ [Abs]
    Γ ⊢ λx. e : τ → τ′

    Γ ⊢ e₀ : σ  Γ, x : σ ⊢ e₁ : τ
    ----------------------------- [Let]
    Γ ⊢ let x = e₀ in e₁ : τ

    Γ ⊢ e : σ′  σ′ ⊑ σ
    ------------------ [Inst]
    Γ ⊢ e : σ

    Γ ⊢ e : σ  α ∉ free(Γ)
    ---------------------- [Gen]
    Γ ⊢ e : ∀α. σ
Each one takes a sentence or two of relatively dense English text to explain:

[Var]: If the context indicates that it has a particular type, a variable is inferred to have that type.

[App]: If a function is inferred to have some type, and a value is inferred to have the same type as the parameter of that function, then the application of that function to that argument value is inferred to have the type of the result of the function.

[Abs]: If the body of a function is inferred to have some type, given an arbitrary type for its parameter, then the type of such a function is a function type from that parameter type to the type inferred for the body.

[Let]: If an expression is inferred to have some polymorphic type, and another expression would be inferred to have some monomorphic type given that a particular name were bound to that polymorphic type, then a let-expression binding that name to the former expression within the latter expression would have the same type as inferred for the latter.

[Inst]: If an expression has a polymorphic type, and that type is an instance of some more polymorphic type, then the expression can be said to have the more polymorphic type.

[Gen]: The type of an expression can be generalised over its free variables into a polymorphic type.

But having learned this notation, I can now read and write specifications of type systems with ease, and do so much more quickly and compactly than I could in a more approachable notation. To me, that’s a win.

Of course, I’m the sort of person who finds Java programs hard to read because they seem to take so long to say anything. So opinions are going to vary on this!

Ah, serendipity! As it happens I'm working on a variant of HM type inference right now, so my first step was to look up the explanation on Wikipedia and have my eyes glaze over. (As the saying goes, 'Which part of [above equations] do you not understand?' Answer: all of it! And it's not even the first time I've looked at it, though I don't remember anything much from the previous time.)

So I rummaged around a bit more and found http://akgupta.ca/blog/2013/05/14/so-you-still-dont-understa... which explains the notation in much more readable English text. It's a pure win.

Yes, I find typical Java style too verbose for optimal readability; but I find typical Perl style too terse for optimal readability. The sweet spot is somewhere in the middle.

There is an editorial problem with that (which nowadays, with electronic publishing should not be an issue): space is (used to be) a scant resource. Hence density of content was more important than clarity of exposition.

Nowadays I guess everything boils down to a custom of which we (yes, I am part of the problem) have not been weaned. It looks more scientific to write things densely and more or less cryptically.

Also, we scientists are a bit afraid of publicly (I mean, to the general public) explaining our way of understanding things, because we know it is somewhat blurry, informal, possibly even comical to a lot of people. And we tend to be introverts.

I guess things like arxiv.org etc. are going to create a new way of explaining scientific discoveries much more interesting and enlightening.

> Why are formal papers so tedious to read? Imagine how much time we'd waste as a group if he written this as a paper.

This is a huge problem in the academic field. Papers are written in academese, not because it's a particularly good way of distributing information, but because it's expected. I absolutely agree that less formal language is both good for the discipline, the technical non-academic audience, and for the lay audience.

Steven Pinker recently wrote a book called "The Sense of Style" which talks about academese and writing well about complex subjects (something which he, indeed, has a lot of experience in!). I haven't read it yet, but I've heard it's quite good.

There was another recent HN post that showed an analysis of IntelliJ's architecture using a source code analyzer [1]. Does anyone have more information on these types of tools? There seems to be a genre of tools that are used to inspect the architecture of a program, and I have no idea where to start learning about them.

[1] http://t.co/Ja6uOLRGkQ

Thanks for the pointer; didn't know about that plug-in. As a long term Intellij Idea user, implementing this as a plug-in to Intellij is one of the things I would like to do in the short term. Would be really great to make the tool let you know about possible coupling as you work on the project.

Let me know if you can get your hands on a list of related plug-ins, I am really curios to see them in action.

Huh. Could you also use this to flag something for a potential bug, if a historically coupled pair is not coupled in some commit?

I actually think this might be the best use case of this proto-tool.

You can do some quick and dirty analysis to find classes that change together in the same day. Often you discover that there are some faulty abstractions.


I think that this sort of repository analysis is going to be standard practice within the next couple of years.

That looks really interesting, and I'd love to run it on my own projects. I'm curious though, why do you think you need to make it a web service? And why tie it to github? I would like to run it on a local git repository and output the results into a local file. That seems like a good small program.

It's not really tied to github. If you look at the source[1] it's just a `git clone` wrapped in a shell script.

[1]: https://github.com/armish/evsrc/blob/master/scripts/evSource...

I'd also like to run this on our software, but being a web service tied to GitHub kills it for me.

right now, it is not really tied to GitHub; but only expects you to give a git repository URL. Coupling it GitHub will allow more information on the system, for example knowledge on commits that resolve issues on the tracker. That is a long-term goal of this project.

I think it is a mistake to think of coupling caused by TDD to be a false positive. What this outlines really is that TDD will force you to edit two files instead of one for many changes. This is a clear indication of how TDD will slow you down.

0. It has nothing to do with TDD

1. It's a tooling artefact, testing systems certainly don't have to mandate split code and tests. Rust's test framework allows tests in the same file as the tested code, the testing guide recommends that unit test live alongside the code they test[0] and the standard library follows this practice[1]. I'm reasonably sure you can also do so in e.g. py.test[2]

2. I'm not convinced editing two files slows you down, most editors and window managers will let you put both files side-by-side and trivially jump between them. Are java developers slowed down by having to jump between files?

[0] http://doc.rust-lang.org/guide-testing.html#the-test-module

[1] https://github.com/rust-lang/rust/blob/4deb27e/src/libcollec...

[2] by marking all python files as "test modules"

0. It has nothing to do with TDD

OP is referring to this line:

It also turns out, in both softwares, a majority of the couplings are attributable to Test Driven Design, where a source code is coupled to its test. So these are apparently false-positives I should take care of in the next version of the pipeline.

> This is a clear indication of how TDD will slow you down.

Not shown: the part where the critical production bug you introduced was caught by your test suite, thus saving you countless hours of agony, angry customers, and lost revenue.

Also not shown: the part where the feature that was exhaustively tested was removed a couple months after launch because requirements changed or the market shifted.

It works both ways, and probably one of the biggest skills to being an effective developer is understanding whether a piece of code you write is likely to be thrown away shortly or whether it's going to live forever and cause umpteen headaches for the maintainer. (This itself is surprisingly counterintuitive: I have seen high-priority code backed directly by an executive thrown away a week after being written because of shifting perspectives within the organization, and I've also seen a one-character typo in a "throwaway" migration script result in restoring a million+ users from tape backup.)

"Not doing TDD" absolutely does not mean "not doing testing". You can definitely still test your software, even if you your development process is not test-driven like that. I can't quite understand why this is an issue.

But then you find yourself back to mempko's inane issue: you'll have to edit your test files alongside your code files, even if you edit the tests after the code.

Instead of writing tests in a separate file, why not express the same logic in the form of types in the lines directly above the code which implements that logic? This has the added benefit of making your code self-documenting and giving rise to powerful tools such as type-guided implementation inference and search.

Because you probably don't have a type system which comes anywhere near close to what you need to express (I doubt that's even possible).

By all means do leverage your type system as much as you can, avoid writing tests for what you know your type system handles, and (if your type system is expressive enough) use property-based testing to further leverage your type system into essentially fuzzing your functions.

But you'll still need to write tests.

How do you write a type which fully describes, for example, "given a user, some content, and various bits of metadata, this function transforms them into a blog post with all the data in the right places"?

You don't write just one type, you write many. The problem you described is pretty straightforward. There are actually lots of examples of how to do this with existing Haskell libraries. What specifically do you want to know about?

OK, let's put it this way. I have an entirely arbitrary calculation, taking input A and outputting B. I want to ensure that this calculation is implemented according to specification.

How on earth do you write a type, or set of types, for that that actually bear some resemblance to the spec and don't simply reflect the details of the implementation?

To simplify it to the point of near-nonsense, how would you write a type which says `append "foo" "bar"` will always result in "foobar", and never "barfoo" or "fboaor"? Or that a theoretical celsiusToFahrenheit always works correctly and implements the correct calculation? If you can't do that, how can you do it for more complex data transforms?

This is where dependent types come in. They allow you to write an implementation of append that is correct by construction. Your algorithm is in essence a formal proof of the proposition that `append foo bar = foobar`.

A simpler example is that of lists and the head operation. In most languages, if you try to take the head of an empty list you get a runtime exception. In a language with dependent types you are able to express the length of the list in its type and thus it becomes a type error (caught at compile time) to take the head of an empty list.

> Your algorithm is in essence a formal proof of the proposition that `append foo bar = foobar`.

Isn't that literally just implementing the program, though? The point of tests is that they're simple enough that you can't really fuck up, and they describe the specification, not the implementation.

If you have to write a formal proof, what's making sure the proof is actually proving what you intend? And what's the actual difference between this proof and the implementation?

If you have to write a formal proof, what's making sure the proof is actually proving what you intend? And what's the actual difference between this proof and the implementation?

The formal proof is the implementation. It is the code you run in production. The proposition is your types. Instead of writing tests, you write types. It's the exact same process you would use with "red-green-refactor" TDD except it's the compiler checking your implementation instead of the test suite. The advantage of doing it with types is that the compiler can actually infer significant parts of the implementation for you! Types also happen to be a lot more water-tight than tests due to the way you specify a type for a top-level function and everything inside of the body can generally be inferred.

If you're interested, here is a series of lectures demoing dependently-typed programming in Agda by Conor McBride:


I will certainly watch those - but for the moment, I see no obvious way to immediately see that a proof is proving what you intended it to prove, whereas `assert (append "foo" "bar") == "foobar"` immediately shows what you expect and can be read and checked by somebody with the slightest programming knowledge.

The point of tests is that they're supposed to be simple enough that it should be near-impossible for them to contain bugs - they're incomplete, sure, but what they're testing should always be correct - whereas it seems that it would be easy for a proof to prove something subtly different from the spec without anybody being able to tell. If we could write complex programs to spec without error, we wouldn't have tests in the first place.

You don't have to see that the proof is proving what you intend; the type-checker does that for you. This is why some people sometimes refer to type checkers as small theorem provers.

As for your assert example, how do you know append will work as you expect for all possible strings (including null characters or weird unicode edge cases)? With a proof you will know.

> With a proof you will know.

But I could just as easily accidentally write a proof which proves something else, couldn't I? This is complex code expressing complex ideas - a type system would certainly help, but it can't tell me that I'm proving the wrong thing.

But I could just as easily accidentally write a proof which proves something else, couldn't I?

Then your proof would be rejected by the compiler. Remember, the types specify your proposition: i.e. what you are intending to prove. The actual proof itself is the function you implement for that type.

As for whether you're proving the right thing or the wrong thing, a type system is no less helpful than a test suite. The advantage of a type system is that it checks whether your types are consistent within the entire program rather than in merely the specific test you're running.

> Then your proof would be rejected by the compiler.

Only for some types of mistake, surely. If that was always the case, we'd have a compiler that could read minds.

> As for whether you're proving the right thing or the wrong thing, a type system is no less helpful than a test suite.

Really? I can write down in my test suite, "assert (add 1 1) == 2". Anyone can come along and look at that and make sure it matches the spec. We can add additional tests for various bounds, and possibly use a quickcheck-like tool in addition, and for 99.99% of use cases be happy and confident that we're at least writing the right thing.

What's the type for that and does it actually have any resemblance to "the add function adds two numbers together", or do I have to read a couple of papers and have a background in university-level math to convince myself that the proof actually does what it says?

What type system do you have in mind? Haskell?

Haskell is a start but I was thinking of a system with dependent types such as Coq, Agda or Idris.

So does non-TDD. If you're fixing a bug, your tests should test for conditions that trigger that bug.

If you're modifying behavior, you will need to update the tests that break because of that, whether you write tests before or after.

> This is a clear indication of how TDD will slow you down.

This is a classic problem of externalities. You can only quantifiably judge what you are quantifiably measuring.

In this statement, you are only measuring the file couplings. I hope it's obvious that there are many other factors at play.

For example, let's look at a "typical" development iteration for a feature.


With TDD:

1. you write the automated test

2. run the test

3. if the test passes goto 6

4. edit the software

5. goto 2

6. finish


With manual testing:

1. edit the software

2. manually test the software

3. if the software does not do what you need it to do, goto 1

4. finish

TDD gives you a quick feedback loop, since the verification step is automated, at the cost of up front time spent on writing the test.

There's also automated regression testing, which is useful to prevent regressions when you change the software system.


When the software becomes complex:

* regression tests offer a quick feedback loop that scales almost linearly

* manual tests have a slower feedback loop and are often given to the QA staff, which involves communication overhead, scheduling, meetings, etc.

Sure, it will slow you down compared to the idea of writing only the source file... assuming the source file is as well-structured and bug-free as it would have been by writing the test. In which case, why would anyone ever write any tests anywhere ever?

You'd probably see something similar when looking at a C/C++ project. I would expect significant coupling between header and source files.

The solution there was for future languages to combine the two. Maybe we'll see a future language combine unit tests and source into the same file.

Heh, maybe such a language would refuse to compile if public functions did not have an associated test.

No need to wait. Rust has basic unit tests inside the source files doc.rust-lang.org/0.12.0/guide-testing.html

> refuse to compile if public functions did not have an associated test

Sounds like my worst nightmare.

Editing more files is not strictly a bad thing. Files are an organizational tool. This means they have cost (overhead), and they have payoff (structure). There is always cost, there is always payoff. The trick is to find the "best" point in the curve, and you cannot do that if you focus only on the payoff or only on the cost, as you have done here.

This is an amazing approach to making some hard-to-understand aspects of software more visible. This sort of improved information is how software process and tools evolve over time. I look forward to seeing where this project goes.

Am I the only one who disagrees with the premise that two files that tend to change a lot together indicate poor software design choices?

If you change an API, you will have to change consumers of the API. Does that mean that your code is bad?

This exists even in a low level examples: if you change a c++ class, you will need to also change the corresponding header file.

Or perhaps, am I misunderstanding the concept?

I think what's clearer is that bad coding is one reason why we often have to change too many files at the same time. We see software where implementation details surface in so many places that you have to change five files to change anything.

Now, it's affirming the consequent to say "you change two files, therefore you must have repeated yourself/coupled two things too much", but if you think those are common problems, then it will still be sound to say "you repeatedly changed these files in sync, you should look there to see if you've coupled them too tightly."

The OP does not really define "bad" software. I'm also not convinced of the concept of "good" vs "bad" software.

When reviewing code, we tend to judge how the final output looks based on the reviews' aesthetics. There is almost no emphasis on the process of building the software nor is there much emphasis on how long it takes & how reliable the software is.

While aesthetics & clarity are important the notion of "good" or "bad" software depends on the context of the judgement. Is it good/bad for the programmer? Is it good/bad due to the costs of development? Is it good/bad based on it's flexibility toward changing requirements? Is it good/bad based on the flaws in deployed system? Is it good/bad based on the feature velocity?

Why is software productivity so difficult to measure? Software is complex & software is created in complex situations. It is tough to get an "apples to apples" comparison when comparing complex contexts. It's like comparing two people. Is one person better than another? Usually it depends on the context...

Ideally, with good encapsulation and individual files keeping to single responsibilities, you would more often get away with just changing implementation and not changing API. Although you could take the size of the change into account to penalize API changes in other files less than implementation changes in both files and ignore header files if you didn't want to clump consumer changes into the same code smell. If the API has to change, you may have exposed too much implementation to the consumers.

agreed; those kind of linkages are not necessarily due to bad design but are, on the contrary, intrinsic to the design of the language you code in. Ideally I should consider these kind of things and exclude such pairs (X.cpp <-> X.h) from the final results, but this is still pretty much work in progress.

Good catch, though. Thanks.

The author noted there were "legitimate" linkages such as files and test files for those files, header files would be another such case. Eliminating those (which should be moderate to trivial) will leave files that really have no business knowing about each other's internals

I played around with this a bit in June. This is what I managed to come up with in the time I was interested in it:


cool! Thanks for sharing the link to your project. I am not really a Ruby expert, so wasn't able to figure out how you calculate the couplings. I will be more than happy to compare the results across these two tools.

perhaps off topic, but evfolds algorithm looks very close to estimating a Markov network. Can anyone comment on how their model differs from a Markov network, and how these differences arise?

For reference, given some variables, a Markov network is a parsimonious way to express arbitrary covariance matrices in terms of individual interactions between groups of variables (in this case, pairs of variables). Their approach looks very similar to estimating the Maximum Likelihood or MAP graph.

Are there undocumented flags that you're running this with? I wind up with a series of graphs too densely packed to make sense of when I run it on angular.

The PDFs that come out of igraph layout are not that pretty. The screenshots I used on the blog post are generated with Cytoscape 3.1.1, which gives better layout and better styling options. For Angular.js, I went with "partialCorrelations_0.3.sif".

Note to myself: put all source code into one file ;)

Hold on. I have an interface file "interface.d.ts" and a whole bunch of other files reference it. Whenever I make changes to any files that depend-on/reference that file I of course also make changes to that file. This means that every file in my project is coupled to that file. How is that indicative of good or bad design?

Tight coupling is generally considered bad practice, it leads to more accidental variance and complexity. In general adding either polymorphism, or additional methods to a class are considered safer. I'm not saying in your case it was the wrong choice, or that cleaning up design is bad.

Generally if you have to change a whole bunch of related files when you change one, it's an issue with the design.

Not sure you can know that without knowing the problem domain.

There's a tension there with one-and-one-place-only otherwise known as Don't-Repeat-Yourself.

This is similar in aspects to http://google-engtools.blogspot.com/2011/12/bug-prediction-a..., which might be a fun set of ideas to link into future iterations.

I think Rails should rename has_many to couples_many

Logical coupling crops up in lots of unexpected places as well:

<%= partial :foo %>

Partials are functions but with no clear argument signature, so they may be used sloppily with no obvious way of determining what (interface, state expectations) they are coupled to.

There's some really interesting thinking here, but interpreting this naively would suggest that the perfect software project has only 1 file.

That's where the Cohesion metric comes into play for OO (and maybe other programming paradigms?). Aka: is each class a single cohesive unit, or do you effectively have two (or more) classes melded together with disjoint data and methods.

Something I think is particularly interesting about this approach is that it is language agnostic. It's probably even independent of "programming". It could also be useful for general documents: if I edit wiki page A and always edit B too, should they be the same page instead?

that is partially true: single file apps will never have couplings in this manner. One extension to this project would be to find couplings between particular regions of the file and then even single file apps will start falling down. I think it is not true that if you cannot find any coupling using this tool, your software is well-designed; it would just mean this tool is not smart enough to capture those bad designs.

Couplings between different regions of the files, however, are relatively harder to find and requires some more thinking in terms of implementation.

That's pretty cool. There are a lot of tools, both public and private at companies kept as trade secrets to do static analysis of code to pin point potential errors, memory leaks, etc.. This is the first analysis I've seen that actually looks at more than one revision of a project in source control, though. Every other tool basically just analyzes one revision at a time.

Very interesting. Would like to see based on function/class/module instead of file though.

That's brilliant. Reminds how did DNA sequencing back in the day. Genetic linkage analysis.

The first two times I read this headline I thought it was about fruit fly evolution.

I know, right? I was really happy when I learned that this branch of the CS is also referred to as evolutionary, because I was inspired by the evFold approach, which is related to evolution in multiple organisms. This is a bit confusing for people coming from the biological science domain, but also nice that we share some terminology between two fields ;)

Use this on the linux source code

I'm running it now on a repository with 15 years of history, and ~15K commits. Let's see how it goes. :)

One thing I noticed is that the scripts are written to run once, and always do everything. It would be better to have a Makefile and dependencies, so that the you can run it multiple times, and only the changes are updated.

I'll see if I can push some fixes to github.

let me know if you find points that can be improved; I would be more than glad to pull your changes in to the main repository.

Thanks for trying this out.

Thanks, just thinking about coupling made me consider some new design improvements in my code.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact