Data, objects, and how we're railroaded into poor design (2018)

Timwi · 2025-08-21T12:07:48 1755778068

No mention of C#, which of all the languages I've seen makes the best distinction for what he wants: structs/“value types” = data, classes/“reference types” = objects. He briefly mentions that Java is in the process of adding that; C# has had it since version 1. Now, 15 versions later, C# has acquired all the modern features from all paradigms of programming, including type inference, record types, and an incredibly full-featured pattern matching system. Java is laughably lagging behind.

He also praises async/await as a revolutionary leap forward for programming languages, but without a mention of the language that invented/pioneered it (that's right, it's C#, in version 5).

Now, I'm not going to sit here and claim that C# is perfect for all purposes. It has made mistakes which have to live on due to backwards compatibility. The mutability of structs is, in my opinion, one such mistake. But even so, of all programming languages I've seen, C# fits this author’s ideas like hand in glove.

Edit: the scenario he describes of putting a large array or other static data structure directly into the code is also better supported by modern C# compilers. For small structures you still get the element-by-element-adding bytecode, but there's also a serialization format that can bake large objects directly into the assembly.

gf000 · 2025-08-21T12:15:37 1755778537

I do agree that the article could be summarized as "value types vs reference types, a language should have both".

But your idea of Java is pretty dated - it has type inference, full algebraic types (records and sealed classes), pattern matching (switch expression - though many of its more advanced features are TBD/experimental yet).

And then Java has virtual threads, which C# suddenly wants to add as well, that language is definitely on the same road as C++, and this "adding every conceivable feature" is not sustainable, it will crumble under its own way. Java is much more conscious of it, which I greatly value.

Timwi · 2025-08-21T13:17:18 1755782238

> it has type inference, full algebraic types [...]

For the record, I didn't say Java doesn't have those. The only thing I said it doesn't have is value types (and in another post, true generics). I'm aware that it has a couple features where C# is the one that's lagging behind, but on the whole, it's really no competition. I'm also aware that stuffing a language full of features willy-nilly is unwise and I'm curious where this will take C#.

arethuza · 2025-08-21T12:53:17 1755780797

It actually looks like they have decided not to add virtual threads to .Net:

https://github.com/dotnet/runtimelab/issues/2398

SkiFire13 · 2025-08-21T13:32:08 1755783128

> had it since version 1. Now, 15 versions later, C# has acquired all the modern features from all paradigms of programming, including [...] an incredibly full-featured pattern matching system. Java is laughably lagging behind.

Did C# finally implement some kind of discriminated unions with exhaustive pattern matching? Last time I checked not even plain `enum`s supported exhaustive pattern matching. For comparison, even Java supports discriminated unions in the shape of sealed classes/interfaces with exhaustive pattern matching since Java 21

stonemetal12 · 2025-08-21T13:58:39 1755784719

Sort of. They are a part of C#14 and .net10. Which officially releases in Nov, but has been out as preview for a while now.

SkiFire13 · 2025-08-21T14:13:28 1755785608

Maybe I'm missing something, but it doesn't look like C# 14 will include them nor release in November https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/cs...

valiant55 · 2025-08-21T21:31:44 1755811904

They might make it into v15 next year according to the latest presentation from Mads. They have the design hammered out just needs to be implemented. It sounds like it should be in an early preview version to let the other teams start to build features asap.

2d8a875f-39a2-4 · 2025-08-21T09:33:40 1755768820

Maybe it's just me but I find the complaint confusing and the suggested remedy absent in TFA, despite reading it twice.

Data comes from outside your application code. Your algorithms operate on the data. A complaint like "There isn’t (yet?) a format for just any kind of data in .class files" is bizarre. Maybe my problem is with his hijacking of the terms 'data' and 'object' to mean specific types of data structures that he wants to discuss.

"There is no sensible way to represent tree-like data in that [RDBMS] environment" - there is endless literature covering storing data structures in relational schemas. The complaint seems to be to just be "it's complicated".

Calling a JSON payload "actual data" but a SOAP payload somehow not is odd. Again the complaint seems to be "SOAP is hard because schemas and ws-security".

Statements like "I don’t think we have any actually good programming languages" don't lend much credibility and are the sort of thing I last heard in first year programming labs.

I'm very much about "Smart data structures and dumb code works a lot better than the other way around" and I think the author is starting there too, but I guess he's just gone off in a different direction to me.

RaftPeople · 2025-08-21T16:51:35 1755795095

> "There is no sensible way to represent tree-like data in that [RDBMS] environment" - there is endless literature covering storing data structures in relational schemas. The complaint seems to be to just be "it's complicated".

Ya, this one really confused me. Tree-like data is very easy to model in an RDBMS in the same way it is in memory, with parent+child node pointers/links/keys.

It's possible he was thinking of one of these challenges:

1-SQL doesn't easily handle the tree structure. You can make it work but it's square peg round hole.

2-He mentioned JSON so maybe he was really thinking in terms of different types of data for each node. Meaning one node might have some text data and the next node might have a list or dictionary of values. That is tougher in RDBMS.

arethuza · 2025-08-21T10:44:52 1755773092

My main complaint with SOAP was how leaky an abstraction it inevitably is/was - it can look seductively easy to use but them something goes wrong or you tried to use it across different tech stacks and then debugging would become a nightmare.

When I first encountered RESTful web services using JSON the ability to easily invoke them using curl was such a relief... (and yes, like lots of people, I went through a phase about being dogmatic about what REST actually is, HATEOAS and all the rest - but I've got over that years ago).

NB I also am puzzled as to the definition of "data" used in the article.

2d8a875f-39a2-4 · 2025-08-21T11:28:32 1755775712

Sure, SOAP was often awful, I agree with that. But I can't see any angle where one can credibly assert that a SOAP XML payload isn't equivalent to a REST JSON payload in terms of the operation of a receiving application. Both are a chunk of structured information, your application parses it and operates on the resulting data structures.

fauigerzigerk · 2025-08-21T12:55:40 1755780940

>But I can't see any angle where one can credibly assert that a SOAP XML payload isn't equivalent to a REST JSON payload in terms of the operation of a receiving application.

I guess the angle is that there was a style of SOAP where the payload was interpreted as a remote procedure call conforming to an interface described in a WSDL document.

So there would be a SOAP library or server infrastructure (BizTalk) on the receiving end that would decode the message and turn it into a function call in some programming language.

In that sense, the payload would be "data" only for the infrastructure code but not on the application level.

And now I'm going to have to spend the rest of the day trying to forget this again :)

arethuza · 2025-08-21T11:30:52 1755775852

Sorry about getting sidetracked there by my SOAP rant - I completely agree with your point.

imtringued · 2025-08-21T12:03:13 1755777793

SOAP is the 9th circle of hell.

Most implementations don't retrieve parameters by tag, they retrieve parameters by order, even if that means the tags don't match at all. This is completely unlike JSON.

Also, almost nobody uses REST, so stop calling things REST, when you are talking about HTTP APIs.

BlackFly · 2025-08-21T10:52:54 1755773574

Yeah, a leaky abstraction with abstraction inversion on top of it! So within the actual payload there was a method identifier so you had sub-resource routing on top of the URL routing just so you could have middleware handling things on the payload instead of in the headers... So you had an application protocol (SOAP) on top of an application protocol (HTTP).

imtringued · 2025-08-21T12:06:11 1755777971

>When I first encountered RESTful web services using JSON the ability to easily invoke them using curl was such a relief... (and yes, like lots of people, I went through a phase about being dogmatic about what REST actually is, HATEOAS and all the rest - but I've got over that years ago).

I'm not dogmatic about this. People don't understand what REST is. REST is some completely useless technology that absolutely nobody needs. Using the right words for things isn't dogmatism, it's being truthful. The backlash from most people comes from Fielding's superiority complex where he presents REST as superior, when it is merely different, different in ways that aren't actually useful to most people, and yet people constantly give this guy a platform by referring to his obsolete architecture, earning them the "well actually"s they deserve.

alphazard · 2025-08-21T12:50:09 1755780609

Part of the mismatch here is that people discover programming languages as tools for thinking, maybe better than any other thinking tool they've ever encountered, and with the unique ability to verify one's thinking. Then they want to do all their thinking with this tool, including all of their design.

IME this leads you to consider more and more featureful languages which are worse and worse at actually building software, and never match the flexibility of a tool like pen and paper.

Poor design is your own fault. Write more, draw more, prototype more. You may need to develop your own notation, you may need to get better at drawing or invest in a drawing tool. You may need to learn another programming language, which you use only for prototyping.

The design of your system does not need to be perfectly represented in your source code. The source code needs to produce runnable machine code, which behaves in the ways that your design dictates, and that's the only link between the design, the code, and the running system. Programming languages today are pretty good at producing working software, but not very good at doing that and designing systems and communicating designs and documenting choices, etc.

kmac_ · 2025-08-21T13:42:42 1755783762

Well said. Languages are for implementation, not design. There isn't a single language that connects both. Also, every popular language has multiple patterns that communicate code intentions. We can add to that conventions added by frameworks, companies, and even teams.

mcdeltat · 2025-08-21T07:44:26 1755762266

IMO the nice thing about Erlang and Elixir is their foundation of representing data is rock solid. Because data is fully immutable, you get a lot of nice things "for free" (no shared state, reliable serialisation, etc). And then on top of that you can add your interfacey, mutable-ish design with processes, if you want. But you will never have oddities or edge cases with the underlying data.

In contrast with languages like C++ and Java where things are shakey from the ground up. If you can't get an integer type right (looking at you, boxing, or you, implicit type conversions), the rest of the language will always be compensating. It's another layer of annoyances to deal with. You'll be having a nice day coding and then be forced to remember that int is different to Integer and have to change your design for no good reason.

Perhaps you disagree with Erlang's approach, but at least it's solid and thought-out. I'd take that over the C++ or Java mess in most cases.

jasode · 2025-08-21T09:03:08 1755766988

>IMO the nice thing about Erlang and Elixir is their foundation of representing data is rock solid. Because data is fully immutable, you get a lot of nice things "for free" (no shared state, reliable serialisation, etc). [...] >In contrast with languages like C++ and Java where things are shakey from the ground up.

Yes, immutable does provide some guarantees for "free" to prevent some types of bugs but offering it also has "costs". It's a tradeoff.

Mutating in place is useful for highest performance. E.g. C/C++, assembler language "MOV" instruction, etc. That's why performance critical loops in high-speed stock trading, video games, machine learning backpropagation, etc all depend on mutating variables in place.

That is a good justification for why Erlang BEAM itself is written in C Language with loops that mutate variables everywhere. E.g.: https://github.com/erlang/otp/blob/master/erts/emulator/beam...

There's no need to re-write BEAM in an immutable language.

Mutable data helps performance but it also has "costs" with unwanted bugs from data races in multi-threaded programs, etc. Mutable design has tradeoffs like immutable has tradeoffs.

One can "optimize" immutable data structures to reduce the performance penalty with behind-the-scenes data-sharing, etc. (Oft-cited book: https://www.amazon.com/Purely-Functional-Data-Structures-Oka...)

But those optimizations will still not match the absolute highest ceiling of performance with C/C++/asm mutation if the fastest speed with the least amount of cpu is what you need.

gf000 · 2025-08-21T12:41:53 1755780113

I think getting "value" vs "objects" right is paramount: this in itself is only a semantic difference yet, but that will allow for a lot of compiler optimizations down the line.

E.g. the value 5 can't be changed, neither in Haskell, neither in C. A place that stores that value can, which makes the place an "object".

Mutability fundamentally means identity (see Guy Steele), so I think this is a fundamental distinction.

As for immutable data, sure, on single threaded code they do carry some overhead, but this may also translate to higher performance code in a multi-core setting due to better algorithms (lockless, or finer grade locking).

Timwi · 2025-08-21T12:18:12 1755778692

> int is different to Integer

Having separate types for these is the problem; not the boxing. C#/IL/CLR handles boxing in a way that doesn't exhibit the problem. If your code is dealing with integers, they are never boxed. They are only boxed when you cast to a reference type such as object. As soon as you cast back to int, you are unboxing.

Java exhibits the problem in a big way because it doesn't have true generics, so you can't have (say) a list of integers or a dictionary with integer values, so they must always be boxed, so you need a separate “boxed integer” type to maintain type safety. In C# you can just use unboxed integers everywhere.

Towaway69 · 2025-08-21T10:39:44 1755772784

> Because data is fully immutable, you get a lot of nice things "for free"

This. I discovered this by implementing Flow Based Programming[1] (FBP) in Erlang[2] - the FBP is best suited to languages that are message based and immutable data structures. When using a mutable language, messages are constantly being clone as they are passed around. This does not need to be done using Erlang/BEAM.

My feeling is that FBP is a much under-rated programming paradigm and something really worth exploring as an alternative to the current programming paradigms.

My take on FBP is that data is being passed to functionality, as opposed to function being passed to data - the functional programming paradigm, or data and function is passed around as is the case with OOP.

IMHO it makes sense to pass data to functions, particularly in the times of SaaS and cloud computing.

[1] = https://jpaulm.github.io/fbp/index.html

[2] = https://github.com/gorenje/erlang-red

brabel · 2025-08-21T09:50:24 1755769824

That's also Clojure's approach. It's very nice to program with immutable data at a high level, but for certain things, you just need to use the computer's primitives as they actually are, with all the mess that entails. So, I would say we need languages like C++ and Java (but perhaps we should all be using Rust, which makes the mess much more manageable, despite bringing in a lot of complexity that programmers need to wrap their head around, by for example, making it really easy to represent data, and defaulting to immutability) even if it would be "nice" to avoid them where possible.

Erlang and Elixir (and Clojure), however, lack a static type system, which makes it really difficult to use them at large scale (I am happy if you can provide convincing evidence to the contrary - I just haven't seen any). There's Gleam, which is a beautiful, simple language, that has a very good type system, but unfortunately, it's a bit too simple and makes certain things harder, like serialization (e.g. https://hexdocs.pm/gleam_codec/).

Haskell and Ocaml are more usable, but for some reason are extremely niche. I don't think there's any popular language that's in the "statically typed, functional" school, which I think shows that humans just don't prefer using them (they have been hyped for decades now, yet they just never stick). Perhaps a new generation of these languages will change things, like Unison for example (which stays away from Monads but provide an arguably superior abstraction, Abilities - also known as Effects). I think I would love for that to happen, though as I said before: sometimes you still need to reach out for bare metal performance and you have to use Rust/C++/D or even Java.

Timwi · 2025-08-21T12:22:43 1755778963

A lot of functional programming patterns have made their way into newer versions of C#. That obviously doesn't make it a functional language now, but it is also no longer just obviously procedural.

gf000 · 2025-08-21T12:44:29 1755780269

Scala might be one of the earliest languages that set out to combine FP with (mutable) OOP, so it probably worth a mention here.

cbsmith · 2025-08-21T08:22:06 1755764526

I didn't like how this essay misunderstood the design principles in Java's class files. With dynamic binding to the runtime, you can't know for certain the layout of a data structure in memory (e.g. what is the ideal memory alignment?). If the class file is untrusted, you can't even be sure you have a valid data structure in the first place. So allocating the array and then assigning elements one at a time is what you do.

imtringued · 2025-08-21T12:10:01 1755778201

It also appears to be a nonsensical complaint in general. Why abuse .class files for data storage when you can choose any data format you like?

I've only seen this pattern used in contexts, where there is no filesystem to begin with.

Timwi · 2025-08-21T12:27:54 1755779274

> Why abuse .class files for data storage

The only alternative is to have the data in a separate file, which needs to be available, read in, and parsed. .NET/CLR provides a mechanism to bake large objects into the assembly and I don't see that as abusive. It's way more convenient when you can treat the object as being “just there”.

gf000 · 2025-08-21T13:39:46 1755783586

Well, jars can have resource files included - I would say this is a more-than-solved problem.

nopurpose · 2025-08-21T08:37:23 1755765443

So much written about relation between objects and data, but not a single mention of Lisp and derivatives?

mort96 · 2025-08-21T08:59:27 1755766767

An excellent opportunity for you to elaborate on the connection, since I'm not seeing it.

Timwi · 2025-08-21T12:29:16 1755779356

Same. Lisp’s selling point is that “code is data” — not objects.

lmm · 2025-08-21T07:14:13 1755760453

I've had thoughts along this line for a while. I think Scala does better than this article gives credit for; case classes are a significant step in the right direction, particularly post-Scala 3 (or with Shapeless in Scala 2) where you have many tools available to treat them as records, and you can distinguish practically between case classes (values) and objects with identity even if in theory they're only syntax sugar. It also offers an Erlang-style actor system if you want one.

In my dream language I'd push this further; case classes should not offer any object identity APIs (essentially the whole of java.lang.Object) and not be allowed to contain non-value classes or mutable fields, and maybe objects should be a bit more decoupled from their state. But for now I wouldn't let perfect be the enemy of good.

arethuza · 2025-08-21T08:35:16 1755765316

"objects should be a bit more decoupled from their state"

Do you mean allowing the "class" of an object to be changed - CLOS can do that. Mind you it's a long time since I wrote any code using CLOS and even then I'm pretty sure I never used change-class.

lmm · 2025-08-21T08:54:51 1755766491

The thing I'm envisioning is something akin to typeclass instances / trait impls, but specialised to the case where you have a service with identity rather than being for general function implementation. Just making the bridge between the "bag of state" piece and the "interface implementation accessible via a name/reference" piece a bit more of a first-class citizen.

etbebl · 2025-08-21T18:16:11 1755800171

The author dismisses C++ out of hand, but I really think it does a pretty good job of making this a non-issue. Want a structured value type? Sure, that's a struct with public fields by default, passed by value with automatic copy constructor and assignment functions. Want a mutable type that's encapsulated and needs to do something special to be cloned? Sure, that's a class passed by unique_pointer or reference, with non-default (or deleted) copy constructor and assignment functions and private fields by default.

Every language I've used since then feels like it makes this issue needlessly complicated and implicit.

jact · 2025-08-21T15:33:58 1755790438

OCaml’s object system is unfairly maligned by its users. It’s unfortunate because object types allow for some of the most useful kinds of polymorphism and people don’t reach for them often enough.

However, OCaml’s first class modules are frequently a useful and serviceable alternative. I think they’re a precise middle ground between what we here call “objects” and “data.”

chuzz · 2025-08-21T07:09:09 1755760149

Good post, for what is worth Java is slowly and painfully correcting course with features like records and project Valhalla. As with the other language improvements though we will have to live with the tech debt for decades to come…

Timwi · 2025-08-21T12:32:15 1755779535

While C# certainly has tech debt too, the specific feature the author mentions (value types) has been there since version 1. C# pioneered async/await in version 5, and has a plethora of functional features like generics, type inference and pattern matching now. Java is lagging behind like a wounded cow...

high_na_euv · 2025-08-21T08:20:07 1755764407

I'd say majority of programming languages struggle with elegant, robust and exhaustive check error handling

dzonga · 2025-08-21T10:02:20 1755770540

clojure + lisps have this approach - everything is data. I encourage everyone to dabble in clojure. it changes your mindset. that when you go back to your regular language you will write better programs.

mananaysiempre · 2025-08-21T10:19:09 1755771549

> Extensibility

> [Data] The schema gives us a fixed set of variants, over which you can of course write any function you want.

> [Objects] We have a fixed set of exposed operations, but different variants can be constructed (including an improved ability to evolve a variant without impacting clients).

No mention of the expression problem? The TL;DR is, sometimes[1] we want both. And sometimes[2] it’s an exceptionally good idea for a lot of slightly different sets of variants to coexist withn the same program, which there also isn’t really a satisfactory solution for.

[1] https://www.craftinginterpreters.stuffwithstuff.com/represen...

[2] https://nanopass.org/

emorning4 · 2025-08-21T14:24:27 1755786267

Why is the most relevant comment at the bottom of the page?

The expression problem absolutely captures the nature of the issue and a lot has been written about it.

danieltanfh95 · 2025-08-21T08:21:22 1755764482

clojure exists.