(0) If you need to know or care about its implementation details when you use it, it's not an abstraction.
(1) If unrelated ad hoc cases are hardcoded together into a single procedure, it's not an abstraction.
(2) If it needs fifteen tunable parameters, some combinations of which don't even make sense, it's not an abstraction.
At some point in time, programmers discovered that selection (if, switch, pattern mathcing, whatever) could be replaced with dynamically dispatched first-class procedure calls. Since then, they haven't stopped making their code more “abstract” by making their control flow impossible to follow. Alas, this isn't real abstraction, because it doesn't achieve the intended purpose of reducing the number of things you have to keep in your head simultaneously.
So, rather than blaming “wrong abstractions”, I'd blame “not real abstractions”.
Sometimes, too great.
(0) The usual symbol for complex number addition is +.
(1) The operation isn't associated to any individual complex number (i.e., it's not an object method), but is rather an intrinsic part of the algebraic structure of the complex numbers.
(2) Complex numbers don't have a physical object identity in memory.
C++ templates allow for almost anything. I have seen incredibly strange things done to the language with templates. Since Templates can be used to parse, and execute, entirely different programming languages, it is hard to imagine a level of abuse to which C++ templates are not amenable.
That doesn't mean anything. With enough determination, anything can be turned into “something very unrecognizable”. Heck, sometimes the results are relatively pleasant even: http://libcello.org/ . But, if a language doesn't allow you to define basic abstractions like complex numbers (not to mention slightly more elaborate ones, like queues that are indistinguishable from one another if they contain the same elements in the same order), then that obviously reflects very poorly on its abstraction facilities.
> C++ templates allow for almost anything. I have seen incredibly strange things done to the language with templates.
What you can't do in C++, however, is make your abstractions not leak. Every template library is one template specialization away from being broken. This is unlike languages that enforce their abstractions by more robust means, like parametricity and macro hygiene.
This needs to be given a catchy phrasing and spread all across developer communities!
WTD == Worse Than Duplication?
WAAW == Wrong Abstractions Are Worse?
Duplication is still bad, and correct abstractions are still good.
For instance, one might say: "The way x is implemented works, but it's not quite the right abstraction, here's why". That is way more informative than saying it's simply the wrong abstraction or considering it good enough and ignoring its deficiencies.
I keep having to simplify the code and it’s exhausting.
At the same time, let's not take this dogma too far in the other direction. Repeating oneself is still undesirable, but DRY-related refactoring should focus on behavior and intent, not on how the code looks. And abstractions are still good (they're how we get anything done at all!), but leaky abstractions should be avoided, indirection-via-abstraction should be minimized, and we should be vigilant against extreme and overeager overapplication of abstraction (insert your blub joke here).
* A and B both depend on the same logic X.
* You make a shared helper function X
* Requirements for A change, necessitating a change to x.
* You check if B requires the same change. If yes, change X. If no, fork X into X_A and A_B
Now you're staying DRY, and DRY has saved the day by forcing you to think about whether A's change should impact B.
If you had prematurely forked X, then X_A and X_B would drift without any sanity check on whether they should.
It also needs to take into account factors like available budget of time and money, capabilities of the team working on the code, etc.
The number of bugs are proportional to the lines of code, this is undeniable from empirical data. Ergo, fewer lines of code will tend to yield fewer bugs. So if your code is literally the same, there's no reason not to extract it into a function.
That said, the moment you have to start adding parameters in order to successfully factor out common code, ie. to select the correct code path taken depending on caller context, that's when you should seriously question whether the code should actually be shared between these two callers. More than likely in this case, only the common code path between two callers should be shared.
If the two pieces of code are likely to change in different ways, for different reasons, that is a strong reason not to extract it into a function even if they happen to be character-for-character identical for the moment.
Less code means fewer bugs, but that doesn't mean I should be working on the gzipped representation.
The future is much more malleable than your immediate needs.
But even if your future turns out to be true, the code you need to refactor is then already extracted to a function, so you can easily duplicate that function, make your localized changes, and change the callers to the new function. So this is still the best route.
> Less code means fewer bugs, but that doesn't mean I should be working on the gzipped representation.
Don't be absurd. gzipping doesn't preserve your program in human-readable form. Extracting code into a reusable function makes your program more human readable, not less.
But sometimes it can help readability, too. "DRY" as a principle was originally formulated in terms of repetition of pieces of knowledge rather than code, and I think in those terms it's far more useful. If this code represents "how we frob the widget" and that code represents "how we tweak the sprocket" and there's no reason for those to agree, they should probably be separate functions. Pulling them out into a "tweaking_sprockets_or_frobbing_widgets" function is making things less readable, because it's conflating things that shouldn't be conflated. If there is not some underlying piece of knowledge - some statement about the domain or some coherent abstraction that simplifies reasoning or some necessary feature of the implementation - combining superficially similar things is just "Huffman coding".
When done properly, yes. When done to the point where a five line function is created with ten inputs (yes, this is real), no. But DRY tells us that the five lines of duplication is unconditionally worse.
Hell, I've even seen things like logging/write tuples (i.e. log the error, write the to a socket) encapsulated, even though the only non-parameter code ends up being the two function calls.
Anything, taken to extremes is bad. The problem with DRY is it encourages that extremism.
I agree that's often how DRY is understood, and that it can be a problem.
It is not how DRY was originally formulated, which was "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." This differs from blind squashing of syntactic repetition in two important ways. First, as under discussion here, if things happen to be the same but mean different things, combining them is not "DRY-er". Second, there can be repetition of knowledge without repetition of code. For instance, if we are telling our HTML "there is a button here", and our JS "there is a button here", and our CSS "there is a button here", we're repeating the same piece of knowledge three times even though the syntax looks nothing alike.
I make no claim as to whether the flawed, more common understanding or the original intent is what "DRY really means", but I think the latter is more useful.
DRY as a guiding principle sometimes has a secondary beneficial effect that was not discussed. Two pieces of code that happen to be the same but "mean different things" should not automatically be deduplicated by dumb extraction. However, the fact that those two things share code may, when viewed through the lens of "prioritize-DRY-ness", hint that the two share a common underlying goal, which can be abstracted out into functionality that can be used by both.
Put another way: if the code to control a nuclear reactor circuit and the code to turn on a landing light on a plane happen to be the exact same, they shouldn't be blindly deduplicated into some library function, but the fact that they're the same may indicate a need for a more accessible, easily-usable-without-mistakes way of turning that kind of circuits on and off.
I'm not convinced by your example. There are plenty of mathematical calculations taking numerous parameters that I think should be in a distinct function.
Even for non-mathematical calculations, 5 lines of code that are used repeatedly as some sort of standard pattern in your program should also get factored out. Like your logging example, ie. you consistently log everything in your program the same way, then sure, refactor that into a shared function. Then if you suddenly find you need to log more or less information, you can update it in one place.
Of course, I understand your meaning that sometimes factoring out doesn't make sense, but if you find repetition more than twice as per DRY, refactoring seems appropriate.
In web frameworks, there is usually a little bit of boiler plate for each view.
You could refactor this completely away, but not without an almost total loss of flexibility and a good amount of readability too. Often the views will look very similar, then start diverging as the project grows.
With you on refactoring common patterns out, and yes, some people don't do this enough. But really, the important thing there is that those patterns are truly common to a large degree and should stay in sync - so it's worth it to introduce a maintain a new concept to keep them that way.
I think I've been pretty clear about the costs and when this is worth it, particularly in my first post in this subthread, which I'll quote here:
> That said, the moment you have to start adding parameters in order to successfully factor out common code, ie. to select the correct code path taken depending on caller context, that's when you should seriously question whether the code should actually be shared between these two callers. More than likely in this case, only the common code path between two callers should be shared.
Or if you want a more concise soundbite: refactor if your indirection is actually a clear and coherent abstraction.
I work with people I would describe as... junior at best (lots of boot campers) and I see this all the time. Functions that just return an anonymous function for no reason, half JSON blobs returned from functions that are called with one string instead of just repeating the blob in the code, etc.
That’s a popular claim. I wonder how many failed projects could have it as their epitaph.
Have you ever worked on a project where the requirements changed so fundamentally from one day to the next that you truly, honestly had no idea where you were going next?
I haven’t. I’m not aware that I’ve ever met anyone else who has, either.
The claim that requirements always, or even usually, change so dramatically within such short timescales that it isn’t worth laying any groundwork a little way ahead simply doesn’t stand up to scrutiny, in my experience. Any project that was so unclear about its direction from one day to the next would have far bigger problems than how the code was designed.
Otherwise, there is always a risk that by being too literal, by ignoring all of your expectations about future development regardless of your confidence in them, you climb the mountain by climbing to one small peak, then down again and up the next slightly higher peak, and so on. This could be incredibly wasteful.
Of course requirements often change on real world projects. Of course I’m not advocating coding against some vaguely defined and mostly hypothetical future requirement five years in advance. But often you will have some sense of which requirements are going to be stable enough over the next day or week or month to base assumptions on them, and insisting on ignoring that information for dogmatic reasons just seems like a drain on your whole development process.
Way less than over-ambitious projects that died because of things that they didn't need, immortalized in lots of classic Comp-Sci literature. From Fred Brooks' books to Dreaming in Code:.
There's a reason it's a popular claim. In fact, popular means it's just repeated by many -- but this claim one can read repeated by the most experienced and revered programmers (or an analogous one, e.g. the KISS principle, "Do the simplest thing that could possible work", etc.), from the Bell Labs guys to the most celebrated programmers today.
>Have you ever worked on a project where the requirements changed so fundamentally from one day to the next that you truly, honestly had no idea where you were going next? I haven’t. I’m not aware that I’ve ever met anyone else who has, either.
Welcome to my life :-)
Not being snarky -- rapidly changing requirements is the number one complain in my kind of work.
Firstly, you only actually create a function either when it is being reused, or because it's functionality is a logically separable responsibility and so you factor it out for understandability.
Either way, the function should also have a meaningful name describing its purpose so you don't have to jump around to understand what's actually happening.
In practice I'm not really sure what you mean?
A good rule of thumb on large scale software projects is complexity begets complexity.
It then takes a lot of effort to return to simple code.
> Extracting code into a reusable function makes your program more human readable, not less.
Is that a given?
I've come across plenty of small functions that I couldn't understand without checking the calling functions for context.
Meaning, your future needs are ever changing and often unclear. Your present needs are immediate and usually obvious. Meet your present needs first and foremost without sacrificing flexibility to meet future needs. Factoring code into functions accomplishes this.
> I've come across plenty of small functions that I couldn't understand without checking the calling functions for context.
Sure, happens to me too when I don't assign meaningful names, or the functions don't actually encompass a small, meaningful set of responsibilities, or the functions use deep side-effects that require reasoning within larger contexts.
The problem with such programs isn't factoring into functions though. If anything, this step reveals latent structural problems.
That's a false analogy.
gzip is non-syntax-preserving
Do you have a link to this undeniable data? I haven't seen many empirical code quality studies that are not littered with possible confounders. It's very difficult to do these studies.
I do believe that bug density is roughly proportional to the size of the codebase if you average over large corpera of code. But the bug density of different types of code within those corpera varies a great deal in my experience.
So I think the important question is what sort of code is duplicated and why it is duplicated. Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.
https://vimeo.com/9270320 ~37:00 he covers:
El Emam et al (2001): The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics
For a summary, sure . There have been loads of studies on various metrics, but none have actually been any better than simple lines of code, despite the fact that it has such a large variance as a metric.
> Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.
While enforcing invariants would certainly be good, I'm not convinced that's the only reason DRY reduces bugs. Common functions get manual reviews every time the code that calls them also get reviewed and/or refactored, whether due to new features or bugfixes.
DRY increases exposure of more commonly used paths through your program.
True, but that doesn't mean SLOC is a very useful metric. Say you were to rewrite a large Java codebase in a language that eliminates all getters and setters. You will have greatly reduced the number of SLOC, but it is very unlikely that you will have reduced the number of bugs very much.
In other words, going for the low hanging fruit of programming language design wouldn't necessarily help much. Bugs are not evenly spread out over the entire codebase.
I agree strongly with your point about potential confounders (and was about to make it myself) but now you are making your own assertion that I'm uncertain about. Why are you confident that switching to a language that greatly reduces the number of lines of code would not reduce the number of bugs?
While I don't know of any hard evidence, it passes my internal plausibility test that if some of the "2 screen" functions become "1 screen" function, bugs might be less likely. There might be some counter-force that would confound this, but I wouldn't eliminate it out of hand. So what makes you say "very unlikely" rather than "not necessarily true"?
While I agree that the bugs are not likely to be in the trivial code, I don't think it's a given that presence of the trivial code has no impact on the number of bugs elsewhere. Consider a "fatigue" based model, where the human brain is distracted by the monotony of the bug-free getters and setters and thus unable to pay sufficient attention to the logic bugs elsewhere in the program. And again, I'm not making that claim that eliminating boilerplate reduces bugs, only objecting to the assumption that it does not.
I think if our process is "1) write the software in Java, 2) remove those lines", it's clear that we've probably changed the average bug density of the project. I agree that there is much reason for concern in generalizing that result to what would have happened if we'd written in that other language to begin with.
>While I don't know of any hard evidence, it passes my internal plausibility test that if some of the "2 screen" functions become "1 screen" function, bugs might be less likely.
Yes, mine too, but only for randomly chosen pieces of code. What I don't believe is that the linear correlation between SLOC and bugs that studies have found in large codebases allows us to pick and choose the lines of code that are easy to eliminate and expect the number of bugs to drop proportionately.
I do not think that this analysis can be applied to decisions about whether to duplicate code or not.
First, if your code is the same, you're not going to have two different bugs in the two copies of it.
Second, trying to change the number of lines of code in your project without making deeper changes is the sort of thing that very directly confounds the analysis. It's like going from "Smaller companies have happier employees" to "So we should fire half our employees because empirical data shows the rest will instantly become happier."
No, but you might see two different buggy behaviours due to contextual differences in how the code is used.
> Second, trying to change the number of lines of code in your project without making deeper changes is the sort of thing that very directly confounds the analysis.
Except refactoring into smaller reusable functions is precisely a deep structural change.
If there are actual changes involved in refactoring, I have no a priori expectation of whether that reduces or increases bug count, and I can see good arguments that it's likely to increase them, since you're making the code more complex in order to satisfy the demands of multiple consumers and therefore exposing each consumer's unique complexity to the other as bug surface. (Case in point: Heartbleed resulted entirely from a little-used extension to DTLS, a variant of TLS over UDP, which 99+% of OpenSSL users never cared about.)
Firstly, any refactoring involves a code review of what you're factoring out. This has a non-zero probability of revealing bugs, so I already disagree with your claim that it wouldn't change the bug count.
Secondly, if you're having difficulty refactoring, that's a strong hint at deeper structural problems, so it yields information on what kinds of structural changes are needed.
> If there are actual changes involved in refactoring, I have no a priori expectation of whether that reduces or increases bug count, and I can see good arguments that it's likely to increase them, since you're making the code more complex in order to satisfy the demands of multiple consumers.
Are you making it more complex? Because that doesn't seem like a sound refactoring in my mind. Special cases require special code, you don't place special cases in a general function, unless the function itself only handles the special case. I already covered this in my original post where I discuss when DRY isn't appropriate.
> (Case in point: Heartbleed resulted entirely from a little-used extension to DTLS, a variant of TLS over UDP, which 99+% of OpenSSL users never cared about.)
I don't see how this is a point in your favour. It's a point that little used and little inspected paths are more likely to be vulnerable. But a reused function gets more use and more review than inline code. In other words, Heartbleed would probably still not have been found if it weren't part of a common function, and instead were littered in various places throughout a code base.
The first part is obvious (more code of course brings the potential for more bugs).
The deduction (hence more code = more bugs) is only valid if the studies controlled for the similarity of the code, cyclomatic complexity and other such factors.
Just because more code has impact in the number bugs, doesn't mean more code by itself means more bugs. Correlation != causation.
Program A having more code than program B could mean e.g.:
(1) A is inherently more complex (and really needs more lines, the same way an IDE needs more lines than an simple text editor), and thus will naturally be more prone to bugs.
(2) A and B has similar functionality as programs, but B is written by people who meander and write bloated code and needless abstractions (turning a 100 line program into a 1000 line hell of "design patterns" and factorySingletonProxy "flexibility"). Which will again bring in more bugs.
But that's not necessarily the case if A is bigger than B due to simple repetition that doesn't introduce complexity. Which is exactly what we're discussing in this thread.
For a trivial example, 10.000 lines of "print 'hello world'" repeated won't have more bugs than a 1000 line complex C program.
So the only possible causations for the correlation we're discussing are:
1. more program code causes more bugs
2. more bugs causes more program code
3. some unknown third factor(s) simultaneously causes both more bugs and more program code
I think 1 and 3 are most often the case, where 3 could be something like developer inexperience, although some studies have shown that even experienced developers still introduce bugs at comparable rates to novices (just lower constant factors). I think 2 sometimes happens to address immediate needs, ie. hotfix for specific bug X may introduce more bugs, but I doubt it's the rule.
Regardless, my original claim still seems pretty undeniable, ie. more program code tends to yield more bugs.
> For a trivial example, 10.000 lines of "print 'hello world'" repeated won't have more bugs than a 1000 line complex C program.
But 10,000 lines of "print 'hello wrld'" would have more bugs than a 1,000 line complex C program. Probably on the order of 9,000 more bugs in fact.
The numbers we're talking about here are averages across all programs of comparable length, not to be applied literally to any specific program, because it turns out that those specific program qualities don't really matter, ie. LOC is still a more accurate predictor of bug count than cyclomatic complexity and other metrics.
Thus I can say that a 1,000 line program probably has about X bugs, and I probably won't be off by an order of magnitude unless the program was verified by a theorem prover or something along those lines. Something like verification is really the only confounder that I've come across.
Code sharing often increase coupling and the coupling can be at odds with naive attempts at achieving DRY. The real world is not as simple as this makes it out to be.
You are right more code means more bugs, but more code might also be the difference between a viable product and one that hasn't been built yet because everybody is locked into DRY bureaucratic hell. As in everything, there's a balancing act that needs to be performed and that requires judgement and experience.
To a point. Abstraction also creates indirection, which can make debugging harder, especially for people unfamiliar with your code.
Including unit tests? A code base with unit test is likely to contain less bugs and more lines of code. And what about code golf?. This just does not seem right and I believe the original saying is related to languages with/without rich stdlibs and/or the NIH syndrome.
Except you've added a dependency that connects two pieces of code that were independent before.
Sometimes duplication is worth it simply because it reduces the impact of changes.
That's one of the big problems with code duplication. "You fixed a bug? Great. Did you fix it everywhere?" It's better if there's only one place to fix it, because when you've fixed it, you've fixed all of it.
But as others have said, eliminating this is not the only good thing in programming. There are limits to how far you should go to prevent or eliminate duplication.
You've got one aspect, and the other aspect is that code factored into a reused function F is now reviewed more too, ie. whenever you're reviewing the callers, you work through the functions they call as well to trace the behaviour. So you're also much more likely to find any bugs in F.
It's a double-whammy for bug squashing, which is why DRY is such an important principle. Certainly there are some misapplications, but its benefits hold up pretty well in a wide variety of scenarios.
Isn't that how we ended up with left-pad.js?
SPOT more accurately identifies the pain point of duplicated code - if there is an algorithm or configuration value that needs to be specified, than that should just be in a single place, everything consistent and easy to change.
> The DRY principle is stated as "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".
How is SPOT different?
If the answer is an easy yes, then build your abstraction. If it's no, don't. If it's maybe, leave the duplication in place until you have a clearer understanding.
Disclaimer: I'm not making a value statement here :-)
Maybe some of our processes, language and experience from the version control space (branch, merge, rebase, upstream...) could be ported over to the reuse vs duplication problem. But when you layer that with the conventional use of those concepts in version control, the result would be quite the multidimensional mindbender. With that as a hypothetical baseline, simply guessing a good compromise between abstraction and duplication and then dealing with the consequences suddenly does not seem that bad a fate.
If you need to modify a small part of a class's function, there's typically no way in a prototype/traditional inheritance language, to say "like the function from the parent class, but with these 5 lines different".
Inheritance of any kind is a handy tool to help with the "whole class files full of pages of identical code with one different method" problem, but a) that's not the cause of a lot of duplication out there, b) you still have to know/care to implement the inheritance, and c) it can come with its own struggles, a la tight coupling to implementations/classes you may not fully control. This isn't a pro-OO or anti-OO screed; just pointing out that inheritance as a means to DRY up code is a limited solution at best.
I wonder if the man hours wasted on hunting through someone else's code to figure out the bajillion abstracted functions are doing when you are trying to make a change outweigh the minor inconvenience of copying and pasting changes are? Yes the latter introduces more opportunities for errors, but are they more than knowing all your dependencies and ensuring your changes don't break their expectations? Unit tests are only as useful as the man who knows the future.
Good point. The problem with that is, of course, that you may not be aware of others existing.
> There is no silver bullet...
I sometimes notice but just can't stop it. The next step is usually a desperate attempt of painting that pig with documentation.
Regarding the last point: iterators, good abstraction; averaging function, good abstraction. These things never change. But bad abstractions copy things that are only similar in appearance.
"Minimize both duplication and bad abstractions" should be something everybody can get behind, even if only because it leaves all the details open.
Leaving all the details open means the slogan is content-free. Why would anyone get behind it?
On the other hand, maybe it also points to problems with our abstraction mechanisms?
I think Rust did two things right in that regard: firstly the macro invocations are always suffixed with '!' so you know it's not a regular function call right away. Secondly Rust macros are so quirky, ugly and painful to implement that you only ever use them as a last resort, so people tend not to abuse them too much.
And that, ladies and gentlemen, is called a framework.
Macros or no macros, if the problem domain is large and you stay on it long time, you end up with framework. Macros can make that framework easier, not harder.
In most languages you end up with large configuration files and huge number of strings as parameter. The program semantics might be defined by interpreting JSON structures, template languages and ungodly amount of string parameters that are in effect reserved words. If you persist long enough, you have code generators and state machines and define execution semantics with petri nets.
What you want to do us use the programming language that is best suited to solving the problems you have. Growing the language towards the problem you have is great opportunity
to make the programming interface simpler to understand.
With Lisp, especially with Common Lisp it's possible to grow the language so that it stays familiar as much as possible.
>In most languages you end up with large configuration files and huge number of strings as parameter.
Not in the languages I'm familiar with at least. I don't understand what configuration files and macros have to do with each other.
>Growing the language towards the problem you have is great opportunity to make the programming interface simpler to understand.
Simpler to use, not to understand. You know what it does, you don't know how it does it or how it can be extended because it doesn't play by the language's usual rules. It's nice if you're copy/pasting stack overflow snippets or your use case is very standard, it sucks if you want to modify the code to do something a bit different and out of the box because now you have to learn the rules of this custom DSL.
>With Lisp, especially with Common Lisp it's possible to grow the language so that it stays familiar as much as possible.
So adding custom constructs to the language makes it stay familiar? That doesn't make a lot of sense to me
There are rules and conventions for writing macros. If you follow them, their usage will be clear for anyone who knows the language - because the language itself includes a lot of macros and they all follow the same set of rules. Common Lisp, Clojure and Elixir are very good examples of this (with Racket taking it up to 11).
Of course, you can write macros which work and behave in unexpected, weird ways. In practice, though, you don't - why the heck would you?
... which often includes yourself a year later.
Seen in this light, the kind of DSL you can create with a full-blown macro system isn't all that different.
Programming, in large part, is an exercise in language creation.
> It's nice if you're copy/pasting stack overflow snippets or your use case is very standard, it sucks if you want to modify the code to do something a bit different and out of the box because now you have to learn the rules of this custom DSL.
This basically describes Spring, and I'm not just being snarky. To someone from the outside, Spring's annotations seem to have all the problems you list. I have trouble imagining that heavy use of macros could be much worse, and I can certainly see the overlap between macros and frameworks.
Emacs-lisp is one example of a Lisp that has evolved into the domain of text editing.
It's different. Macros can change the language grammar (it's like inventing a new way to structure a sentence in English, instead of the usual object-verb-complement). Libraries just provide new vocabularies (new names, new verbs, etc.).
You can use the English language to express ideas on very different topics (food, law, engineering, romance, etc.) without changing its grammar.
But that's the point. When you can't change the grammar, you instead end up writing in a different language that doesn't even have a grammar, and encoding it in the base language plus a bunch of horrible configuration.
Look at Spring for an example.
This is bad use of macros, or an ugly macro system.
Macros, at least in Lisp, made code even clearer to understand; because they let you create the constructs that make the problem domain map more directly, straightforwardly, easily, to the programming language.
So they reduce line count, they reduce need for kludges or workarounds. They allow more straight code.
But this is within the Land of Lisp, where writing a macro isn't something "advanced" nor "complex" nor "esoteric". In the Lisp world, writing a macro is 95% similar to writing a run-of-the-mill function.
But really, this is a recognized problem of Lisp, and has been called the Lisp Curse.  One is never programming in "just Lisp", but rather in Lisp plus some half-baked DSL haphazardly created by whoever wrote the program in the first place.
Also, don't confuse readability with understanding. Yes, DSLs are typically easier to read, but only after you come to understand the primitives of the language. When every program has its own DSL with its own primitives, even programs that do similar things... That becomes quite a burden.
This is also true of functions. Without reading the body, you don't know if it's just going to return the sum of the two integers you passed to it, or if it's going to change some global variable, launch the missiles, and then return a random int.
Yes, macros are more powerful, and therefore you need to be more careful with them. But they are still much better than what ends up being used instead. With languages that don't have macros, you end up with complex frameworks that use runtime reflection, or code generators that run as part of the build system (which end up being an ad-hoc, messy macro system).
Or, some horrible solution where you embed a DSL by interpreting trees of objects, which effectively represent an AST. In this case, the embedded language doesn't follow the language rules, but it seems like it does, because you're looking at the implementation of the interpreter, instead of at the syntax of the embedded language.
> but rather in Lisp plus some half-baked DSL
Why does it need to be "half-baked"? Why do you assume that writing a good DSL is impossible for most Lisp users? Are you sure it's actually the case?
Note well: This is my understanding of the claim. I take no position on whether it is true.
Writing and maintaining a good DSL is like writing and maintaining bug free code. You always start with the best of intentions, but human fallibility and entropy are always pulling you in the wrong direction.
This doesn't mean that the attempt is not worthwhile. But it does mean that you should expect eventual failure.
Also, yes I would propose that a good DSL would be difficult to write for most programmers of any type. Not because of any inherit deficiency in the programmer, but rather because we tend not to spend enough time in a single domain to understand it well enough to write a good DSL.
Quoting user quotemstr here:
"Every program is a DSL. What do you think you're doing when you define types and functions except make a little DSL of your own? (...) Programming, in large part, is an exercise in language creation."
Most lisp docs will tell you to use macros only when necessary, because as great as they are, they have inherent issues that aren't fixed just by having a good macro system.
No true scotsman?
One also declares variables when necessary, and one also creates arrays when necessary, etc. But imagine a programming language that doesn't support arrays. It would be a nightmare if you need to do certain scientific computations.
So in the same way, yes, not having a (proper, Lisp-like) macro system surely hurts a lot, once you realize how it makes certain problems become really easy.
And, by the way, one should write macros when necessary. In Lisp, we're using macros most of the time!
>Isn't loop essentially a macro (probably special op) that a lot of people hate?
And other Lisp programmers like the LOOP macro, since it can allow for very readable and concise code to do something simple that should stay simple to read.
(loop for x from 1
for y = (* x 10)
while (< y 100)
do (print (* x 5))
(10 20 30 40 50 60 70 80 90)
(loop for x in '(a b c d e)
for y in '(1 2 3 4 5)
collect (list x y) )
((A 1) (B 2) (C 3) (D 4) (E 5))
Source of examples: http://cl-cookbook.sourceforge.net/loop.html
Today there is a related discussion in comp.lang.lisp: https://groups.google.com/d/msg/comp.lang.lisp/U5LqiM5nKq8/M...
For the first: There's an implicit x->x+1 hiding in there? Why 'for y' if y is derived from x and not incremented on its own?
For the second: It looks like a nested (cross product) loop, but it's actually a zip (dot product) loop?
loop doesn't do cross-producting; if you know that, there is no mistaking it. All the clauses specify iteration elements for one single loop.
"for x from 1" means starting at 1, in increments of 1.
"for y = expr" means that expr is evaluated on each iteration, and y takes on that value.
y could be incremented on its own, but then that example wouldn't show the "for var = expr" syntax, how one variable can depend on a combination of others.
This is what a LOOP macro looks like in code:
(loop for i from 0 below n
for v across vec
when (evenp n)
collect (* i v))
(loop ((for i :from 0 :below n)
(for v :across vec))
(when (evenp n)
(collect (* i v))))
The LOOP macro is historically coming from Interlisp (70s), where it was a part of a certain language design trend called 'conversational programming'. The idea was to have some more natural language like programming constructs, combined with tools like spell checkers and automatic syntax repair (Do What I Mean, DWIM). From there this idea and the FOR macro was influencing the LOOP macro for Maclisp. The LOOP macro grew over time in capabilities and was then transferred to later Lisp dialects, like Common Lisp.
There are actually Lisp macros which are even more complicated to implement and even more powerful, but which create less resistance, since they are a bit better integrated in the usual Lisp language syntax. An example is the ITERATE macro: https://common-lisp.net/project/iterate/
Thus it is not the complexity or the functionality of the macro itself, but a particular style of macro and its implementation. I personally also prefer something like ITERATE, but LOOP works fine for me, too.
The advantage of something like ITERATE or even LOOP is that they are mostly on the developer level, and not fully the implementor level. A developer or group of developers can develop such a complex macro and can integrate it seamlessly into the language, making the language more powerful and allows us to reuse much of the knowledge/infrastrucure about/of the language.
Implementing and designing something like ITERATE and LOOP requires above the usual dev capabilities. Generally macros require some of that, since it makes it necessary that the dev can use or program on the syntactic meta-level. It's there where language constructs are reused, implemented and integrated.
Lisp docs will tell you to use macros only when necessary? They tell you to WRITE macros only when necessary. Since many Lisp dialects have a lot macros, you have to use them anyway. Most of the top-level definition constructs are already macros. If we use DEFUN to define a function, we already use macros.
In my experience actual Common Lisp uses a lot of macros. I also tend to write a fair number of macros.
But generally good macro programming style is slightly underdocumented, especially when we think of various qualities: robustness, good syntax, usefulness, readability, avoiding the obvious macro bugs, ...
On Lisp from Paul Graham is useful, as a few other sources, ... Read On Lisp here: http://www.paulgraham.com/onlisp.html
Macros are very useful and I use them a lot, but at the same time one needs to put a bit more care/discipline into them and some help of the development environment is useful...
Macros are a tool for creating abstractions. In the mind of the author, abstractions are always clearer. Others who have to work with those abstractions legitimately may or may not agree.
In the case of authors who think that abstractions are an unmitigated good, I seldom agree with their abstractions. And I don't care whether they are implemented as deep object hierarchies, macros, or functions with a lot of black magic. If you are unaware of the cognitive load for others that are inherent in your abstractions, then you are unlikely to find a good tradeoff between conciseness and how much of your mental state the maintainer has to understand to follow along.
>But this is within the Land of Lisp, where writing a macro isn't something "advanced" nor "complex" nor "esoteric". In the Lisp world, writing a macro is 95% similar to writing a run-of-the-mill function.
I agree with you but I think there are two aspects to code readability: A/ is it clear what the code does (the intent) and B/ is it clear how it does it (the implementation).
I think macros can help massively with A (hiding redundant code and clunky constructs) at the cost of obfuscating B. The thing is that if you want to hack into the code at some point you'll need to understand B too.
To take a very simplistic example imagine that you're reading some CL code and see something like:
(let ((var 42))
(magicbox var (format t "side effect~%"))
(format t "~a~%" var))
So, without running the code or looking at what "magicbox" does, if you assume that it's a function call, then you can expect that this code will print "side effect" when magicbox is called (since the parameter will be evaluated before the call) and then whenever it returns the 2nd format will display "42" since var is not modified between the let and there.
You run your code and you see that it only displays "magic" instead. Uh, What happened?
Well there you're lucky because there's only a single statement to consider, the magicbox invocation. You ask emacs to find the definition and you see:
(defmacro magicbox (var unused)
(list 'setq var "magic"))
Not really. Since the call explicitly names var you could easily narrow it down to that by going through all the places in the code where var is mentioned. If all the other expressions concerning var are non-destructive, you have your smoking gun: it must be magicbox. If var is mutated all over the place in that code then you have to work out whether it is magicbox or something else.
Give a mad man enough rope and he can find a way to hang himself.
Where do you draw the line? At least with functions, there's only one syntactic structure that is user-definable, making it easy to distinguish foreground from background.
I don't understand what you mean by that. When I call a function in Common Lisp I have some garantees, I know it won't change the value of the parameters that I pass by value, I know it won't add or remove variables from the current scope, I know that the parameters to the function will be evaluated before the invocation. For macros anything goes, for all you know it could expand to a RETURN-FROM and exit your function prematuraly, it could implicitly modify a local variable etc...
> Operating overloading is another. They can all be used well or confusingly.
I completely agree and I consider that operator overloading for anything that's not "semantically" equivalent to the builtin operator behaviour is inherently evil. It makes sense to overload "*" and "+" on a matrix or vector type, it doesn't make a lot of sense to overload "<<" or ">>" to deal with input/output. But obviously nobody would be insane enough to do that.
When I read "a = b + c" in some C++ code I fully expect that "a" will be set to the value of a sum of "b" and "c", whatever makes sense for the types of "b" and "c". If adding "b" and "c" doesn't make any obvious sense semantically (adding database handles with strings) then it shouldn't be done.
I think that it's easier to draw the line and design sane guidelines for when you're allowed to implement overloaded operators. Macros have much broader use cases.
I meant that a whole bunch of specialzed functions can also be a dialect, which are opaque unless one takes the trouble to look into them. You can write very confusing code, unless it matches up closely to a domain (and even then, you need to be on top of the domain). Of course, the "unraveling" is easier, (and less literal) than for macros.
I was just trying to say it's a difference in degree not in kind - if you measure in terms of power and usability.
I take your point that it's not just syntax, but differences in (basic) guarantees, so there's less ability to treat it as a black box (information hiding within a module). But any module (e.g. a fn) can behave differently from how you expect.
tl;dr macros vs fn is a difference in degree, in terms of outcome (i.e. confusion).
The space of things that a function won't do is tiny compared to the space of what it may. You've taken a vast universe of possible behavior and eliminated from it a few possibilities like mutation of a local variable in the caller; that still leaves a vast space. Generally, to maintain code, at some point you have to grapple with understanding what it does more than with what it doesn't do.
No. When you define a function or a class, you define a new vocabulary (a new verb, a new name), but you don't change the grammar. Macros are a different beast because they can change the grammar.
If there's a sin, it's maybe that macros make it a bit too easy to (prematurely) optimize the conciseness of your code, and that becomes an enticement to play code golf.
This exact same thing happens with functions, too. At some point, a third party will have to read the code to understand it. You don't know what a function does until you read its implementation either.
Often, having a domain specific language (even if the domain is very specific to the application) will make the application easier to understand.
If you write code in say, Go, then sure, a third party will be able to look at a loop and be able to figure out the mechanics of what it does. But that person will still not understand how the application works. They will still need to learn both the domain and the structure of the application. And with the right macros, the application will likely be smaller and better organized.
Avoiding abstraction doesn't magically make it easier to understand the solution to a complex problem.
In any case, I would like to point out that there are other answers to the problem than macros. For example, Haskell has lazy evaluation, currying and strong type system, and many use cases for macros can be implemented just as ordinary functions.
Between debugging C++ meta-template programming and Rust macros, I am undecided what to choose as best.
(This is a joke, but it's also a serious point; people keep thinking of Lisp as the "king over the water" that will one day return and save them, which keeps people from looking at the reasons why Lisp never achieved mass adoption. Or arguing that Lisp cannot fail, it can only be failed.)
Edit: I think you're downvotes are bit harsh and as a Scot I appreciate the "king over the water" reference - people get all romantic about the Jacobites without thinking too much about what their policies might have been.
The recurring cost of maintaining and deploying a working Lisp system is perceived by you to be higher than the benefits the language begets?
I agree that lisp has issues marketing itself.
Which has some previous discussion about it:
I would imagine Martin has gone through the path of discussing macros long time ago, and for some reason or other does not refer to them here.
JS fatige is caused by a culture that creates packages for single functions.
If the language ast and syntax aren't iconomorphic then you add a learning curve and need to insulate users from compiler internal changesy quasy quoting doesn't work everywhere.
In a compiled languange first class macros also add significant compile time.
A couple languages have them like nim's macros , haskell's TemplateHaskell extension and Rusts compiler plugins, though.
Do you mean "homoiconic"?
But I'm not sure that the metrics in the referenced article have much to do with language design. Most of what they are measuring (their conclusion says as much) are people copy pasting files and entire projects. JS by their stats is the worst about this but JS is a language that reduces duplication more than Java (also measured with less copying).
Seems to be a social or dev skill issue rather than language design. Honestly most languages have an excellent tool for reuse - the function/method, which isn't used enough to remove all duplication as is.
Reading the conclusion again, they're capturing the existence of node_modules/, which has nothing to do with language design.
Since .Net 3.5 I barely use them, I really only use the built in ones for foreach and property accessors. The addition of lambdas and short-hand property accessors vastly reduced the boiler plate code in C#. I imagine the addition of async/await has also helped a lot with other certain types of code.
My round-about point is that existing languages can improve to reduce code duplication.
The other comment about Django (which would also work for Rails); a lot of times, big enough libraries/frameworks written in a language, are actually DSLs which would suggest we lean toward the 'Alan Kay' 'make everything a DSL' solution. The mathematician/CS person in me wants the 'improve the language' solution. But low hanging fruit for language design I don't see here.
Yes, that is currently why I design that kind of scenario in a (possibly blue sky) DSL 'on paper' (if there isn't one handy) and then translate that back to libraries and structures in the language i'm working in. I just notice that this becomes annoying sometimes if the existing solutions really do not match the DSL I envisioned.
In the best case, the DSL or abstraction doesn't leak and the effect of the code matches the intent so that you don't need to peek under the hood to make sure it's doing what you want/think. In that (rare) case, cognitive load remains low because you can work close to the problem domain and stay there without worrying about the specifics of the underlying code.
Last I checked, things like AoP, reflection, Spring/Hibernate contexts, and other magic metaprogramming and annotation shenanigans were commonplace. At least with macros, you can usually just expand them and read what your code is doing.
Boilerplate is the most flexible. Language is most unifying. Things tend to move towards language, but if you move too fast, you are stuck with bad decisions, because changing a language is near impossible.
To clarify, hopefully, just a little bit: the DSL approach certainly appears the most promising so far. But there are a few issues. One is that language creation, maintenance, interoperability and comprehension become critical issues very quickly, even with amazing tools. (One could argue that amazing tools make the problem worse, because they hasten proliferation).
Then you notice that even some apparently pretty radical DSLs actually have quite a bit in common. And the things that separate them are just variants of a few simple architectural elements (dataflow, constraints, storage, data-definition).
You also remember from Guy Steele that languages should not be "a thing", but rather a "pattern for growth", and from natural languages that we don't really invent entire new languages for specific domains, at most we add some vocabulary/jargon. So DSLs should not be a thing, what should be a thing is a language that allows us to build APIs that have the benefits we want from DSLs. And so while Lisp Macros certainly point the way, I don't think they're the answer. Smalltalk's keyword syntax is closer, despite or maybe because it is less powerful, and Grace's extension of keyword syntax gets us even closer.
So we need a language that allows us to build what we would consider DSLs as APIs. For that, it will also need to abandon call/return as the dominant/only generalized abstraction mechanism
I saw a talk by Christina Lee and Huyen Tue Dao that builds one on the type system (lambda extension fimctions) that feels very intuitive.
disclaimer: I won't argue about the superiority of one macro system over another, the point is that macros exist in both languages.
One Model To Learn Them All
I don't think, in Java's case, that fixing this is low hanging fruit. Java's more reliant on code generation than comparable languages for a reason. Most the codegen I've encountered is stuff that would be hard to implement in a more elegant way due to limitations of the platform's runtime environment (such as type erasure) that make it hard to handle a lot of the dumb verbose mapping work and stuff like that at run time.
Someone else mentioned Clojure. I haven't tried Clojure yet, but I'm willing to believe it - dynamic typing seems like it could easily be a secret weapon on the JVM, since the platform is so halfhearted at static typing. But I think that teams that are committed to using a static, infix, non-S-expression language may be painted into a bit of a corner here.
How could a language solve this? I know F# can compile some stuff using things online, but this means the build is suddenly not reproducible. And we could always only generate that code on the fly when compiling, but that always wreak havoc on some tooling or IDEs when the code you're referencing isn't there until compile-time.
The other approach is to just handle these sorts of things at run time, using the richer run-time information that your code has access to. So, e.g., if you're binding an XML file to a List<Record<Employee>>, in C# it's possible to actually express something like List<Record<Employee>>.class. The mapping function can just look at that and figure out how to map data dynamically. You, the developer, then get to own your own domain objects instead of having to rely on ones that are auto-generated by some code generation process. You don't have to accumulate a bunch of XML files and XML schema files and associated obnoxious-to-maintain clutter. And you can do it without resorting to a bunch of custom weirdness like Jackson's TypeReference class.
Why do we need modules at all? http://lambda-the-ultimate.org/node/5079
C++ is the real outlier there. That could be because C++ code is much harder to generate, but I don't buy that it's that much harder than Java. Or it could be because C++ templates aren't considered "code generation". Or it could be because C++ doesn't get used for projects with that much boilerplate code. Or...
Trouble starts if there are two source files that do almost the same thing but slightly different, and then you need to change them; now you probably need to change these duplicate files in lock step, and that is a rather error-prone process. IIRC, this kind of problem was the reason for adding templates to C++. And say what you will about C++, but templates are a very powerful tool.
I fail to see the connection to language design, though. Is the author saying that one should add some feature to make duplication as unnecessary as possible (like templates in C++)? Or that the tooling around the language should be better suited to automatic code generation?
FWIW, I think code generation is a very powerful tool; in a way, code generation is meta-programming. (Is there a distinction at all?) And I think that there is a lot of potential in this area. Go has supported the "go generate" command for a while now, and I have seen a few very interesting use cases (e.g. ffjson, which generates code to serialize/parse Go data types to and from JSON more efficiently than the builtin reflection-bases mechanism).
Someone once said (I forgot who), "I'd rather write programs that write programs than write programs." That sums it up pretty well, I think. ;-) Okay, okay, so now I do see the connection to language design. Sorry, my fingers were faster than my mind this time.
Of course, there is whole web server Kestrel referenced somewhere, using NuGet and DLLs full of MSIL. Formulating "depending on compiled library code" like "contains _ lines of code" is, especially in the context of the conversation, arguably dishonest.
The rest is ~100-150 lines code+configuration and a readme file (~180 lines).
However, I'm not sure Visual Studio gives you any way to avoid this when you use the GUI to start a new core project. It is however possible to create a non-Core ASP.NET project with literally nothing in it.
applicationhost.config (and everything in .vs) is local configuration for your editor (Visual Studio 15, it appears), not source code. Would you include .emacs.d in the weight of a sample project ?
Every folder in /wwwroot/lib/ that contains a .bower.json file is a local package cache, not source code. You don't have to commit it, just run bower when building your solution (if it's not done automatically for you) to restore them. If those were .dll or .pdb files instead of .min.js and .map, would you count them ?
The remaining two hundred lines of code are the contents of the "Sample" project, written to illustrate ASP.NET Core.
I really don't care why VS put that stuff in my project- but as far as I'm concerned, if it's code of any kind, and it's in my project, and I didn't write it, then it's boilerplate. If it looks like a duck, and it quacks like a duck, then it's a duck.
I mean if I'm introduced to some other programmers project I now have to wade through this rats nest of package caches and auto generated classes and other trash trying to figure out which little bits are actually unique to the project itself. I really shouldn't have to do that.
(Oh, and /obj is one of the standard build artifact directories for .NET together with /bin...)
I have not used the tool myself, so I cannot comment on how well it works in practise. But I found the idea intriguing.
 K. Narasimhan, C. Reichenbach, Copy and Paste Redeemed http://creichen.net/papers/cpr.pdf
If you don't check in the generated code, that means that you have to run the generator every time you build (or at least every time you get a clean copy of the tree and build). For code that changes very rarely, that may not be a net win.
Also, if you don't check in the generated code, and then you upgrade to the latest version of the generator, surprising things can happen (or even if different people have different versions of the generator installed).
So there can be cases where checking in generated code can be at least a reasonable thing to do.
I imagine a Grand Refactoring... The "great compression" of '23, or something...
I interpret this as code reuse which is almost the opposite of duplication.
Reuse is not always duplication, however, and reuse without duplication is quite often better.
a) I'm the only maintainer of my own side projects (most of the time I never finish nor publish them ... including a cool wysiwyg mouse-driven Clojure POC debugger)
b) I tend to perform multiple full rewrites of my projects (up to 4 times).
c) My code tends to get obscure very quickly since I enjoy giving it an "ontological" twist, and eventually this leads to less code but the code gets less understandable from the static perspective of the source file it sits in, so to get a good grasp of what it does, one needs to run the code and observe what it does when it is evaluated (at macro-expansion time or run-time): this is why I semi-successfully attempted to write the debugger mentioned above.
Meanwhile at work, I have not performed a full rewrite yet (understandably so) and the goal is to keep the code as flat and linear as possible so that anyone in the team can grasp what it does in one glance. Obviously this leads to a lot of repetition, but this is for the greater good.
Currently I'm working on improving my dev experience with Clojure from two angles:
1) Saner macros : I've been tweaking Clojure's reader so that code generated with the backquote reader macro gets printed in its original form when using pprint. For instance `(a b c) expands to (my-ns/a my-ns/b my-ns/c) and becomes `(a b c) again when printed with pprint. I'm also thinking about expanding macros in temporary files in order to get sane stack traces for code generated by macros, but this is a surgically more complex thing to do.
2) "Macros" that expand and persist in the very file they are written in. This allows for in-file debugging and should address point c) from above. Example:
At first the content of your file looks like:
(+ 1 2))
(+ 1 2)
So there is no point in including the most common auto-generated code in the programming language itself. Because the tool that generated it can keep evolving independent of the language and actually uses the lower level constructs exposed by language. By including the logic within programming language we'll just bloat the language and loose out on the beauty of composability.