Hacker News new | comments | ask | show | jobs | submit login

I found the following comment very insightful in a past discussion:


I reproduce the relevant part:

Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.

State > Coupling > Complexity > Duplication. I find that to be a very sensible ordering of concerns to keep in mind when addressing any of those.

Interesting indeed but the part that stuck with me the most is:

>> Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary.

I read the article in 2016 and that phrase stuck with me ever since, I had never thought about it but it's such a simple and self evident fact but so easy to miss. It's a powerful concept to know. Both when writing and when refactoring code.

If you're working to get a certain task done at your job, yes I can see wanting to minimally touch the code.

If something gets bad enough, I will refactor the whole damn thing, but only at jobs where there are unit tests. If there are no unit tests, this truly becomes an impossible task and it's best not to touch anything you don't need to.

You have to have good tests if you ever want to tackle technical debt.

I've heard this called "bunny code." It doesn't matter if it's good or bad, it'll reproduce.

This too is one of the few pieces of programming wisdom that I find unforgettable.

Deleted not to please downvoters.

The point is not whether it is correct and necessary, but that its existence makes the argument that it is indeed so. What you mention is actually the insight that "it may not be". But you reach that idea _against_ the argument its presence is making.

That is, you find some code. Its presence says "I'm here for a reason", your answer of "maybe it's just because..." comes as, precisely, an _answer_ to that argument.

Neither the argument itself nor your answer are necessarily and always correct. This doesn't argue that point, just that the presence of a piece of code makes such an statement.

The article explains what is meant.

> We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."

I think it is a mistake to call this a sunk cost fallacy. the other stuff is true, but it's not the same thing as the sunk cost fallacy

I really enjoyed you insighful comment. I can't understand all those furious downvoters. Perhaps they are a bad abstraction.

I really agree with both of you.

I should guess hamandchess and trufa are very young people, that are beginning in paid software. This explain they overvalue experienced programmers blogs.

The only thing experienced programmers truely value is profiling- write it, run it, measure it.

Advice is nice and all, but at the end of the day, even that new method to write code, gets to perform on the profiling table of your manager. OO ? Use it, run it against procedural or functional approach, measure it, decide.

Everything else is politics, religion and that one irresponsible guy who gets high on new things and away with it somehow, while touring companys.

There is a quote by Linus Torvalds that is relevant here:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they’ll be obvious." -- Fred Brooks, The Mythical Man Month (1975)

Yep. And this is why our industry's decision to focus on essentially procedural abstractions for our interfaces is...problematic.

Let me offer a different interpretation: That's why it doesn't matter so much if you're doing code in a procedural or functional way. If your data structures are wrong, the code will be bad, period.


To clarify: when I wrote "procedural abstractions", that included functional.

Ok, I guess I misunderstood you then. Maybe your comment was more in the spirit of "APIs should be less secretive about the shape of data they are maintaining internally?"

Oh, it goes further than that. :-)

The assumption that data is something to be maintained internally, at best hidden behind an interface (a procedural one) and at worst "exposed" is so ingrained that it's hard to think of it any other way.

However, why can't we have "data" as the interface? The WWW and REST show that we can, and that it works quite swimmingly. With a REST interface you expose a "data" interface (resources that you can GET, PUT, DELETE, POST), but these resources may be implemented procedurally internally, the client has no way of knowing.

The interface is the data. Have a look at Data Distribution Service or Eve programming (you may have seen them before). They go further than rest in that you can react to changes in the data model (rest is only half a protocol)

Yeah, I am aware of Eve.

I've also done my bit, with In-Process REST[1] and Polymorphic Identifiers[2] for the "REST" half, and Constraint Connectors[3] for the "reacting" half.

[1] https://link.springer.com/chapter/10.1007/978-1-4614-9299-3_...

[2] https://www.hpi.uni-potsdam.de/hirschfeld/publications/media...

[3] https://www.hpi.uni-potsdam.de/hirschfeld/publications/media...

Except that functional programming completely eliminates (yet still allows) concern no. 1 in the mentioned order -- state > coupling > complexity > code.

Not to mention the better expressive power for describing data structures with algebraic data types (just + and * for types really).

That's just not true. Functional programming does not eliminate state. You can't do computation without state. What fp does differently is it pushes state around like a hot potato. In my eyes that is about as problematic as OO (where you cut state in a thousand pieces and cram it in dark corners and hope nobody will see it).

If you make global arrays instead you will always have a wonderful idea of what your program's state is, and you can easily use and transform it with simple table iteration.

> That's just not true. Functional programming does not eliminate state.

And yet it says so in the first sentence in the Wikipedia page for functional programming https://en.wikipedia.org/wiki/Functional_programming

>a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.

But I'll take it that you don't have much functional programming experience.

Of course one can still go with a big global array and keep updating it in-place. A good programmer can write Fortran (or C in that case) in any language.

At some point, you need to modify some state, otherwise your program/language is useless. And that's not me saying that, I am just quoting, or at least paraphrasing, Simon Peyton Jones:


And of course a lot of so-called "functional" programs just outsource their state management to some sort of relational database. And the people talking about their creation will praise the state-less-ness of their creation. Unironically.

What can you do? ¯\_(ツ)_/¯

Anyway, more practically, the vast majority of workloads do not have computation as their primary function. They store, load and move data around. Computers, generally, don't compute. Much. For those workloads, a paradigm that tries to eliminate the very thing that the problem domain is about, and can only get it back by jumping through hoops, is arguably not ideal.


> avoids changing-state and mutable data.

This doesn't mean that functional programming eliminates state. Avoiding changing-state and mutable data is different and the Wikipedia article is referring to how functional programming doesn't mutate existing data, so you avoid the stale reference problems that can occur in OO languages.

Instead, the state is the current execution state of the program. Function calls are side affect free (except when interacting with the external world, which is a special case I'm not covering here). Because of this, the only way data can be communicated from one function to another is by passing it in when calling the function, or by returning it. This means the state is just the data local to the currently executing function, and any calling functions (though the data in that part of the state isn't available to the current function it's still in memory until the calling function returns).

Contrast this with procedural programming languages, where state can also be maintained in global variables, or object oriented languages, where objects maintain their own state with the system state being spread around the whole system.

Again, you can't do computation without state. The only question is how honest you want to be about it. And whether you put things in global tables or not is completely orthogonal to whether you mutate data or make new data.

And please, no beaten up buzz words and selling pitches needed.

I looked up the full text of the book, but couldn't figure out what tables mean in this context.

The quote comes usually with "data" instead of "tables" and "algorithms" instead of "flowcharts".

Row/column data such as you would find on mainframe accounting software

I would assume database tables

I'm guessing s/tables/data/g

Yes, and sometimes I feel this also holds in a way for user manuals.

"Algorithms + Data Structures = Programs" is a classic book by Niklaus Wirth.

It was one of the first to emphasis data structures in addition to code.

There is a free pdf online[0] discussed on HN in 2010.[1]

(from wikipedia) "The Turbo Pascal compiler written by Anders Hejlsberg was largely inspired by the "Tiny Pascal" compiler in Niklaus Wirth's book."[2]

[0] http://www.ethoberon.ethz.ch/books.html

[1] https://news.ycombinator.com/item?id=1921125

[2] https://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures...

To add a little of poetry, another quote: In fact, I'll take real toilet paper over standards any day, because at least that way I won't have splinters and ink up my arse.

From linus rants: https://www.reddit.com/r/linusrants/comments/8ou2ah/in_fact_...

I think you're just misusing that quote... At the end of the day, for any given datastructure setup, an algorithm still needs to be implemented. There's many ways to slice that pie, and depending on how you do it, that can cause a lot of saved time or misery for the engineers you work with.

Data structures and their relationship cannot express everything. SQL is not Turing complete if you don’t use CTE to introduce recursion. And I think that we all agree that SQL is perfect to work on data structures and their relationship.

The original comment is on a completely different level of analysis. I think that if you know about Linus Torvalds, that you will agree that he knows what SQL is and how it differs from a turing-complete language. The point being made is much deeper and philosophical, and makes a lot of sense in complex systems.

does he know how SQL works?

I “know” who is Linus Torvalds since when I installed my first Linux distribution on my 386sx 25mhz when I was 14 or so in 1995/6. I think he is very smart but sometimes I disagree with him and with his harsh way of relating with the rest of the world. Now, after a useless appeal on authority, would you mind to explain what is wrong with my opinion about data structures and relationships? I don’t really think that you can do everything just with data structures and relationships. If you think the opposite than please explain how you can do everything with something not Turing complete.

> I don’t really think that you can do everything just with data structures and relationships.

No, you can't do everything with just data structures. Everyone knows this. 1st-year junior programmer knows this. It's obvious. The original question did not talk about this, you misunderstood the level of analysis it was aiming at.

The fact that SQL is not turing complete is a meaningless truism here, because Linus obviously did not mean that we should all start using SQL instead of C. The point he is making is that data structures are of much bigger importance to get right in order for the program to be good. Not just fast or just maintainable, or just easy to understand. But all of those things and many others.

"SQL is perfect to work on data structures" if and only if relational tables are the only data structure that you know.

Try to look at it that way: what isn't a relational table? Any data structure you can make is essentially a tuple of primitive elements. It may point to further data items, but still. Now, put equally shaped tuples in common tables, and you have a database.

Trees, graphs?

Of course one can force anything into a relational database. The data analog of "Turing tarpit".

Ironically graph databases are way better for describing relations than relational databases.

> Trees, graphs?

Easily represented as a vertices-array and an edges-array. It's conventional to index the (directed) edges to optimize iteration over all edges from a given vertice. If you're being "sloppy", you can also represent edges as vector<vector<int>> (one array of destinations per source). This is more convenient but comes with the slight disadvantage that these arrays are physically separated (for example, you'll need two nested loops instead of only one to iterate over all edges).

At the same way that you can force everything in a deterministic or non deterministic Turing machine, depending from the problem. But something that is just looking at data and relationships, akin to a relational database, while extremely powerful, can’t solve every problem in the world. There are much better tools for that. And they have something more than just data and relationships.

Of course you can force anything in a graph database. But then you have to make special collection objects to iterate over all Foo items in your process. I guess you'll also need some kind of garbage collector.

> Ironically graph databases are way better for describing relations than relational databases.

How so?

You can force pretty much every data structure that I know in a table. That doesn’t mean that you can solve everything with a non Turing complete language. So, unless I’m badly mistaken, you’ll need something more than data and relationships to solve everything.

The Linux kernel, Linus's lifetime project, is full of data structures and contains no SQL. Because it's completely inappropriate in that context.

I would say they are all related concerns but my order would probably be Complexity > Duplication > State > Coupling. In nearly all the refactorings I've done, reducing complexity and duplication will tend to automatically take care of the other concerns.

100% with State. It is just my personal experience, to keep track of all states, it takes O(n^2) of my brain power, and the same order of magnitude tests to cover it up.

Excellent quote! Saw this comment a couple years ago, agreed strongly, and been looking for it since — thanks for resurfacing.

When referring to "state" are we talking about just mutable state? Or are we talking about both mutable and immutable state as being equally undesirable? Because in my experience immutable state is fine, and often desirable, whereas mutable state is almost always toxic. I could be convinced otherwise, for sure, but it might be worthwhile to make the distinction between the two.

All state is a problem; it's something you need to keep in mind when analyzing code, because it may be used in a given computation. You need to keep it all in your head in order to comprehend what is happening.

Local state here is better than global state, especially if you consider the advice to write shorter functions - if your functions are small, so are the scopes in them, and the local state is easy to trace and memorize. Global state is not bounded, there could be hundreds of constants, enums and global objects to keep in mind.

Immutable data structures are easier to comprehend than mutable ones, because there's less points where state is modified. If you take Redux as an example, you still need to know what is in the "immutable store" at any point in order to understand how the code uses it; Redux tries to minimize the pain by limiting changes to the store in actions/reducers and by giving you access only to part of the store(local state vs global). However, you still need to understand what changes a sequence of actions perform on the store, so that's still state you need to be concerned with.

Not the OP, but as someone who has also found it to ring true, ideally the less state the better, immutable or mutable. But, if I had to choose which kinda of state I’m managing, I’ll take a bunch of immutable structures over a few mutable ones any day.

We all know the issues that arise out of mutable state, values getting changed for seemingly no reason, race conditions literally sapping every little bit of will to live you had, mutable state doesn’t scale well (at least from a complexity standpoint). Now you’ve gotta worry about locks, and all that fun stuff if you try to do any kind of non trivial concurrent programming.

Now, I’m not saying immutable data structures are literally the silver bullet, but they do almost completely solve all the above mentioned issues with mutable state. But, they too, have their own issues. Working with immutable structures can be significantly slower, especially as the amount of state grows, any modifications mean you have to create a new structure and copy data, and now you’re also going to be having a lot more memory being used, and that’s not mention the conceptual differences you have to adjust to if you’ve never worked with strictly immutable structures before (“what do you mean I can’t just update this array index with arr[i] = 2?”). But, in my experience, debugging can be orders of magnitude easier, concurrency is something that is actually enjoyable to work with, and not a chore of mutex checking, and hoping some random thread isn’t going to fuck up all the data, and given the power of modern computers, the memory bloat that comes along with immutability isn’t really an issue anymore.

But, I’m also one of those people that thinks functional programming is the one true path, so I may be a bit biased/misinformed on some crazy mutability benefits that make the bullshit worth it.

I heard that since the future is more and more cores, instead of faster ones, functional programming is going to become increasingly necessary.

Oh definitely, while I don’t think it will ever evolve into everyone using Haskell, we’re definitely going to keep seeing more and more functional concepts creep into all programming languages. Hell, even king OOP, Java, is even breaking down and adding some functional things last I heard, finally getting lambdas (I think), right? And I imagine that was much to the outcry of thousands of “enterprise” developers, what’s going to happen when big bad functional programming comes to down and shuts down all their abstract factories?! Think of their child classes!

But, I digress, and honestly think a nice balance between concepts, and using what works best for the task at hand. However, I’m super excited for the future of functional languages. I love Elixir/Erlang, once you get the hang of OTP/actor model/functional programming, it’s absolutely mind blowing how easy it is to write highly concurrent programs that are super reliable, easy to restart from failures (just let it die is what they say right?). Nothing like the headache’s I experienced when I first learned concurrency with pthreads in c++. And what’s exciting is the Erlang VM is legendary for being able to build up highly reliable and super concurrent, complex software, however one of it’s biggest dings was it’s far slower than something like C when it comes to raw computations. This is largely because of it’s functional nature, since it’s data structures are immutable, any modifications will result in having to make new copies of data, while C could just do it right in place. However, now that raw computing power is becoming cheaper and faster, they is becoming much less of an issue. And the Erlang VM can handle things like clustering servers to run processes across several computers built right in. I don’t want to imagine what it’d be like to have to set that up with our old friend C out of the box (but C also doesn’t have the overhead of things like a VM or a garbage collector, so it’s not like it doesn’t have a ton of advantages over Erlang I just wouldn’t want to use it to build concurrent software across multiple servers).

Java had lambdas since 8 (we're on 11 now?). They are a great addition to the language and streams (essentially FP) are amazing.

Also, my FactoryFactoryProviders are alive and well :)

Jose Valim goes into this when explaining why he created Elixir: https://news.ycombinator.com/item?id=17513812

Immutability is property of data structure. It can help prevent some unexpected errors, mainly accidental side effects, but make no mistake, you can still use them to create bad and stateful code.

Think of a function, where everything is immutable, but instead if full of if/switch statements and complicated branching behavior. Even if it is deterministic, it will become intractable for reasoning once it reaches certain scale.

> When referring to "state" are we talking about just mutable state?

I don't think you should restrict yourself to thinking only about mutability and immutability in your program, but also that of the entire system. If your program is completely self-contained, that's good, but often they need to integrate with outside services and communicate over the network and write data to disk etc. Those dependencies also result in state that might affect the behaviour of your software and you need to consider it when designing and writing code.

I disagree, simpler code can be better if the library is well known. Otherwise we would never be using utility libraries. Though yes, coupling indiscriminately is problematic

I don't fully understand the state part. Could someone provide an example of what the OP is talking about? thanks!

Say you write some software that manages a shopping cart.

a) You can "store" (even if it's in-memory) just the products and their quantities. Then each time you need to display the cart you go and compute the corresponding prices, taxes, discounts, whatever.

b) You can store each cart line, whether it has discount(s) or not, as well as its taxes and the cart's global taxes and discounts and whatever else you can imagine.

Option "b)" is probably more efficient (you are not constantly recomputing stuff) but you will be better off in the long term by going with option "a)":

- Your cart management and your discount/tax computation are less coupled now (the cart doesn't really need to know anything about them)

- You have less opportunities for miscalculation because everything is in one "logical flow" (computeDiscounts()/ computeTaxes()) instead of being scattered (some stuff is computed when you add an item or when you remove it, or when you change the quantity, or when you specify your location, etc..). The code will most probably just be simpler with option "b)".

The article argues that you should sacrifice the performance in cases like this. I agree.

Hah I get where you’re going with this example, but shopping carts in particular do want to keep the line in the cart as “local state” because the desired behaviour is that once a customer has added something to his cart, within a reasonable time limit is what he pays for, even if there are some sort of price flux. So probably not the best of examples.

Anyway I myself so wholeheartedly agree with the minimizing state idea.

yes it is annoying when prices change in your shopping cart at time of checkout. that has happened to me more than once after keeping it there past a store's midnight.

Well, more state in code usually makes it more difficult to do things like run the code concurrently. You have to worry about managing data races when there is a lot of shared state, whereas in stateless code no complex managing is needed

Although this is true of stateful code, I think an even more fundamental, but related, reason to reduce state is this: code that is stateless always behaves the same way so it can be characterized and reused more easily than code that changes behavior depending on the state. This is the reason it is good for concurrent programming, but it also means it has a more concrete/consistent nature.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact