I love the term 'ugly hack'. It's a very short way of saying 'a solution for a problem that assumes too much of the current context in which this component is called, and is guaranteed to break when the context changes'. It also implies 'this is intended to be temporary because I don't feel like doing the proper solution at this moment' which is accompanied with the intent 'we should fix this later' although most people who write it also have the wisdom that that almost never happens.
When I write it it usually means I know it sucks, but I'm reasonably happy with it - but if someone catches me out on it it's ok because I already criticised it myself
It's actually too general. What I mean with component here can be anything from a line of code to a class hierarchy.
In a sense, using the word 'component' here was an ugly hack. I should have gone for the proper solution and properly define it. You see now that it breaks down when the interpretation changes.
I guess I understand. I objected because I felt you were assuming too much of some context in your definition of “ugly hack” :) I can't imagine using the term “ugly hack” to describe some unfortunate high-level architectural decision.
Sometimes, instead of "I don't feel like doing the proper solution," it's "I haven't yet figured out the proper solution," or "A cleaner solution means adding 250 lines of code to address problems we may not have."
Some results normalized by overall language popularity. Specifically, the entry in row R and column C is 1000 * (hits for C in language R) / (hits for "the" in language R) or "---" if either the numerator or denominator was small enough not to make the top-10 list on github.
All the scraping was done by hand and the numbers rounded to a limited number of places in the process, so there may very well be mistakes.
[EDIT: oops, initially I failed to paste in the actual data.]
Tentative conclusions: Python is ugly-hack-iest and (almost exactly tied with C) ugliest; HTML is most beautiful with Python a close second, XML is lolliest, C++ is WTFiest, and C is buggiest.
Tentative meta-conclusion: these numbers have no value beyond idle amusement. But they idly amused me, so that's OK.
(The weirdest result of the lot, to me, is XML coming top for "lol". If you do the search and click on "XML" on the left you'll see why it is. Lots of instances of what I think are the same file, full of "&lol;" entities. LOL, that's pretty ugly. WTF? An ugly hack, I guess.)
I read your table and immediately jumped to a different conclusion: That python programmers are more sensitive to ugly hacks and more likely to call them out. Not saying I'm right, but I don't know that the data can distinguish the hypotheses.
My intuition tells me that if your alternative hypothesis were true, then php programmers have higher standards because they ajudge more code 'ugly and C++ programmers are universally perfectionists...or at least when they are not confused which it appears they usually are.
Just for fun with hypothesizing, does Python's near 50/50 split between 'ugly and 'beautiful suggest a large degree of random use? Or more interestingly, does it suggest there is a tendency to classify middle cases as extreme cases, and is this a result of the community having 'a Pythonic Way?'
I immediately felt the same and left a comment. But what would distinguish it is if "Go" had a high proportion of such comments, where "Perl" would have a low one.
Could .py have the greatest number of "ugly hacks" because the community's standards for explicit, "beautiful" designs is higher? This would be shown if languages like Go have a much higher prevalence for "ugly hack" than a language like Perl. (Where even core language features, ahem you can complete the thought.)
If you actually look at the details, these terms aren't used in the same context as we might imagine. If there was a way to filter out useless projects, this would get interesting.
As you said, big grains of salt... It would be probably more helpful if they were normalized by the percentage any particular language is used overall on Github.
I'd be curious to see if there's a relationship between 'ugly hack' in code commits and the languages being used (though it'd probably say more about the programmer than the language). The bar chart on the left of the page hints that it's possible, but would have to be normalised.
My hypothesis is there'd be no real difference, but it would be fun to explore nonetheless.
I would think the left bar would remain accurate. C lets you shoot yourself and people do it. PHP is terrible for a lot of well documented reasons. Javascript is being used for things it originally never envisioned.
The hyped dynamic languages sort of surprise me, with their modern APIs. I'm guessing most of that is poor framework programing.
Finally, Java's numbers? lol. Its a brave new world. If this was 8 years ago in ejb v1.2 Java world have taken the crown.
Look for dirty hacks; still... C way ahead the rest.
Interesting that "ugly" places PHP over Javascript, but "dirty" is the other way around.
I wonder what does it mean. The comments and the "ugly" or "dirty" qualifiers are just part of the perception of the person writing the comment, and they're contributing to spread that perception to others.
What qualifies as "ugly", "dirty" and "clever", or even as "hack", may be quite different depending on the language and the community behind it.
There is a bias because there is more C code. The number of repositories doesn't equal to the quantity of code and if you search for general things that are used in both languages frequently then there is always more C results, for example:
Fair enough. If there is a bias towards C I would guess it was more due to the situations in which C gets used, than C itself. Talking to a friend who has to try and get the same C/C++ codebase to compile for 6 different platforms (consoles) has been an eye opening experience. That kind of environment is like a slightly warm petri dish for ugly hacks.
I was not surprised to see tex (even though it is not a popular language on GitHub). It seems that one have to resort to (ugly) hacks when dealing with it.
It usually means that something out of your control is acting strange and you don't know or can't properly fix it, but you have found a weird workaround that should probably not even work, or an very contrived way to achieve the result.
As one obviously only writes extremely elegant code, it is important to notify the programmer coming after that you were forced to write it this way.
All of this is expressed in just two words :)