
Classes vs. Data Structures - Anchor
http://blog.cleancoder.com/uncle-bob/2019/06/16/ObjectsAndDataStructures.html
======
DeathArrow
People are starting to use data oriented design instead of OOP. Data oriented
design doesn't hide state, is generally faster and easier to comprehend as it
doesn't abstract too much.

[https://www.youtube.com/watch?v=QM1iUe6IofM](https://www.youtube.com/watch?v=QM1iUe6IofM)
[https://www.youtube.com/watch?v=yy8jQgmhbAU](https://www.youtube.com/watch?v=yy8jQgmhbAU)
[https://www.youtube.com/watch?v=rX0ItVEVjHc](https://www.youtube.com/watch?v=rX0ItVEVjHc)

~~~
danmaz74
Data oriented design makes sense in video games where perfomance Is very
important, but in most business applications having good abstractions which
are flexible and easily maintainable is much more important than optimising
for cache usage.

~~~
eska
I keep hearing this from OOP proponents, but I just don't find it to be true
in my experience. Programs written with DOD in mind have very clear data flow
and only pass on and use data that is relevant. Programs written with OOP in
mind primarily care about some notion of beautiful code and abstractions,
which I find to be highly subjective. As a result they generally have very
muddy data flow where e.g. unrelated data is passed around that isn't even
required to implement a feature. This creates all kinds of poor
modularization, dependency hell, huge monoliths, difficult testing (mocking,
fakes, etc...), among many other problems. Whenever I have had to rewrite
large parts of a program, I have always found it to be easier to do this in a
DOD program rather than an OOP program. The _biggest_ reason why DOD is used
in video game programming to begin with is flexibility in mixing and matching
functionality of game objects (entity-component-systems etc).

------
Jach
I guess this applies for Java and C++ style "classes". This does not precisely
apply to the first ANSI-standardized OOP system, Common Lisp's. Standard
classes do not own methods, instead methods are specializations of a generic
function that stands alone and dispatches on the class types (or EQL values)
of all its arguments.

I'd really like it if Uncle Bob eventually has his fill of Clojure and moves
on to explore what Common Lisp built decades earlier, then blogs about that
too.

~~~
dreamcompiler
Came here to say exactly this and you already did, so thanks. It's amazingly
liberating to use a language where generic functions are first-class, and
classes don't own any methods. Once you've written code this way, the other
way seems backward and restrictive.

~~~
StefanKarpinski
Spot on. Multiple dispatch avoids the whole issue because methods are external
and don't live inside of classes. Lisps, of course support multimethods, which
is great. There are some down sides, though. They are opt-in (defmethod) and
tend to have a significant performance hit associated with them. Someone needs
to anticipate your need to add types and/or functions _and_ think it's worse
sacrificing performance for that ability.

Julia, builds on this tradition but allows you to have your cake and eat it
too. It has multimethods/generic functions _and_ they are the only option—all
user defined functions are multimethods. They also have excellent performance
(they're used for everything, they have to).

Of course, there's no free lunch and you do give up traditional separate
compilation, but the degree composability it gives to the ecosystem is hard to
comprehend without experiencing it. Simple, reusable data types are shared
across the ecosystem with anyone adding whatever (external) methods they want.
Generic code that handles a literally exponential explosion of argument types
"just work"—and the compiler generates fast code. All without doing anything
special, since multiple dispatch is the default and only way functions work.

~~~
MaxBarraclough
> Lisps, of course support multimethods, which is great. There are some down
> sides, though. They are opt-in (defmethod) and tend to have a significant
> performance hit

Worse than faking it in (say) C++ using the visitor pattern?

------
hudon
The claim that "an object is a set of functions that operate on implied data
elements" has a strange corollary because of how in modern OO languages like
Java or C#, there is no syntax or popular naming convention to tell the
difference between a data structure and an Object. For example, in Java, a
LinkedList object is actually not a linked list, it is a set of functions that
operate on an implied linked list. If the system needed direct access to the
data for whatever reason, we'd need to explicitly have a LinkedList data
structure object that only contained the data (the values and their pointers),
as well as a second class, the LinkedListOperator, that contains all the
functions (add, first, etc.). Likewise, in the author's examples, there'd be a
Square class and a SquareOperator class.

I was going to say that Haskell addresses this by putting values in data types
and behaviors in "type classes", but then I remembered that functions can be
values... which is now making me think that the reality is probably more
abstract or complex than the author here is letting on.

~~~
danmaz74
In a purely oop language, data structures exist as an implementation detail,
but you can never access them directly (as in, bypassing the object
interface). That's by design.

~~~
hudon
I get that in your application, you may want to keep a linked list behind its
interface 90% of the time. However, considering your system as a whole, at
some point you may want to take that linked list data and write it to a
database, in which case the cleanest thing is to bypass the interface and
extract the "data structure object" so to speak and deal with it in a
database-related object, rather than encumbering your LinkedList object with
database behaviors.

~~~
jayd16
This is a violation of OOP. Instead, consider methods that produce and consume
a serialized representation of the data instead.

Things like Java serialization and Python pickle attempt to do what you say
and are considered failures (or at least security risks) because they allow a
third party to act on object implementation internals.

Security aside, a denormalized representation of data could be different than
the implementation specific representation that's encapsulated inside an
object.

~~~
Silhouette
But this is partly a self-made problem, because in this OOP model you have
decided that the internal representation of your data is to be hidden and
therefore the data is only accessible via the provided interface.

In practice, it is debatable how often that is helpful when you're
implementing generic data structures. An alternative is to specify the
representation explicitly and provide a set of functions designed to work with
it, but also to allow direct access by other functions when that is useful.

You can still build a layer of more abstract interfaces on top and write more
generic algorithms in terms of those interfaces rather than any specific
concrete representation, as for example Haskell's typeclass system does.

~~~
jayd16
>An alternative is to specify the representation explicitly and provide a set
of functions designed to work with it, but also to allow direct access by
other functions when that is useful.

I don't see that as an alternative. I consider this natural OOP. You don't
lose anything by strictly enforcing the encapsulation because you can always
explicitly provide low level methods into the data structures. Conversely, you
lose all safety when you open up encapsulation. The method API of an object is
the contract it provides. If you go around that contract it's much harder to
make safe implementation changes.

Because of this, low level access should be opt in, in the way you describe.

~~~
Silhouette
_You don 't lose anything by strictly enforcing the encapsulation because you
can always explicitly provide low level methods into the data structures._

But then you're not really gaining anything either, unless perhaps you have
some mechanism to enforce that the low-level access should only be used in
specific circumstances when it is deliberately intended. It's like writing
classes that have a few data members, but then writing direct get and set
accessors for each of them anyway. There's no more complicated invariant that
you're enforcing at that point, and without any non-trivial invariants to be
enforced, the whole argument for encapsulation and data hiding becomes moot.

 _Conversely, you lose all safety when you open up encapsulation. The method
API of an object is the contract it provides._

Do you always need that safety, though? If your data is defined to be stored
in a certain representation, and direct access to that underlying
representation in that format is available if needed, isn't that
representation now just another part of the contract? Again, you're balancing
two competing priorities: is there some invariant to be enforced that is
sufficiently complicated for data hiding to be a useful safeguard, and is it
useful to interact efficiently with, or make safe assumptions about
performance based on, the true data representation?

Which one is the more important consideration must surely depend on how
complicated your representation and any related invariants are. Being able to
pattern match against values of some relatively simple algebraic data type can
be very useful, for example. On the other hand, it's not much fun to
accidentally corrupt the super-efficient look-up structure at the heart of
your whole system that now uses a complicated set of hash tables and the odd
Bloom filter internally after someone spent three weeks optimising it for a
50% speed boost.

~~~
closeparen
You can change the internal representation while doing transformations in the
getter to preserve compatibility.

With collections, you can export contents through an Iterable, or to Array, or
any number of other strategies, without coupling the consumer to your internal
representation.

~~~
Silhouette
Of course, but what I am questioning here is how often we really do change
internal representations of simple data structures. The theoretical benefit of
hiding every representation behind an interface is clear, but any abstraction
also has a potential cost if it creates a barrier to doing something useful
and/or becomes leaky.

You can still provide standardised interfaces for things like iteration along
with a data structure even if you choose to expose the specific
representation, so I am not sure how strong an argument your second point
makes. Depending on the situation, you may find your consumer is implicitly
coupled to the true representation anyway, perhaps because it inadvertently
relies on values being iterated in sorted order or insertion order or because
it assumes certain performance characteristics even if these things are not
strictly part of the documented interface. More than once in programming
history, even standard libraries of popular programming languages have been
updated to guarantee some behaviour that had been reliable in practice but was
never actually part of the original specification.

~~~
tikkabhuna
My perspective is from a heavy Java background.

By using the standardised interfaces for everything you can change the
implementation without changing how you work on it.

For example, my service that stores a collection in a database. I could write
my service so that it takes in a Collection, rather than an ArrayList, because
then anyone using it can pass in a Set (no duplicates), CopyOnWriteArrayList,
TreeSet (ordered set).

By hiding internals and using the interface, I can pick the abstraction I need
and give more freedom to those using my classes.

Another example. Once Project Valhalla and Value types come in, LinkedList
might be changed to use value types for Nodes. Lets say I've tightly coupled
my code to the implementation of LinkedList. This could potentially break my
code.

~~~
Silhouette
And there is nothing wrong with writing generic code like that! Just because
you _can_ access the specific representation of the underlying data, that does
not mean you have to or would do so routinely. In languages that do tend to
expose simple data structures directly, it is still normal to provide
standardised tools for accessing and manipulating them, and most of the time
that is probably still how you would interact with them.

Regarding your second example, if you are working with an explicit
representation then you simply would not make a breaking change like that.
Instead you would create a new data structure with the new representation,
which other code can then choose to use instead if it wants to. Again, nothing
about this prevents both versions from also providing equivalent functions to
access them in the same way where that makes sense or writing other code in
terms of those functions rather than tied directly to the specific
implementation.

------
justinpombrio
The first set of points:

> Classes make functions visible while keeping data implied. Data structures
> make data visible while keeping functions implied. > Classes make it easy to
> add types but hard to add functions. Data structures make it easy to add
> functions but hard to add types.

is known as the Expression Problem
[https://en.wikipedia.org/wiki/Expression_problem](https://en.wikipedia.org/wiki/Expression_problem).

The last point:

> Data Structures expose callers to recompilation and redeployment. Classes
> isolate callers from recompilation and redeployment.

is only somewhat true. I _suspect_ it would be more accurate to say that it's
a matter of indirection: static dispatch isolates callers from recompilation;
static dispatch exposes callers to recompilation; calling a function pointer
isolates callers from recompilation; calling a function directly exposes
callers to recompilation. (All of this in statically typed languages.) Though
this isn't my area of expertise. Perhaps someone else knows more? [Edit:
sounds like these "expose" cases often don't cause recompilation either.]

~~~
charlieflowers
Regarding the claim ... > Data Structures expose callers to recompilation and
redeployment. Classes isolate callers from recompilation and redeployment.

... most projects (maybe all?) I've worked on in 20+ years deployed a full set
of new bits upon release, rather than trying to differentiate at the level of
which source code files were and were not touched.

So this strikes me as a carryover from many years ago when working on large
C++ projects with slow compile times was even more painful than it is today.

Any counterpoints?

~~~
kstenerud
C++ is still painfully slow to compile, and seems to get slower every year.

~~~
jcelerier
> C++ is still painfully slow to compile, and seems to get slower every year.

oh come on. I can get a full build of qt5's main libraries (core, gui,
widgets, network, xml, etc) in 15 minutes on my laptop. It took an hour a few
years ago. Compilers are getting faster all the time.

------
RcouF1uZ4gsC
>OK, OK. I get it. The functions that operate on the data structure are not
specified by the data structure but the existence of the data structure
implies that some operations must exist.

This reminds me of Linus's quote:

"

I'd also like to point out that unlike every single horror I've ever witnessed
when looking closer at SCM products, git actually has a simple design, with
stable and reasonably well-documented data structures. In fact, I'm a huge
proponent of designing your code around the data, rather than the other way
around, and I think it's one of the reasons git has been fairly successful (
_).

(_) I will, in fact, claim that the difference between a bad programmer and a
good one is whether he considers his code or his data structures more
important. Bad programmers worry about the code. Good programmers worry about
data structures and their relationships.

"

[https://lwn.net/Articles/193245/](https://lwn.net/Articles/193245/)

~~~
dmux
I may need to reread it, but wasn't one of the key arguments in Parnas' "On
the Criteria To Be Used in Decomposing Systems into Modules" that by modeling
around data we fall into the trap of writing code that's "temporally"
dependent?

------
gugagore
This conversation reminds me of
[https://en.wikipedia.org/wiki/Expression_problem](https://en.wikipedia.org/wiki/Expression_problem)
.

I don't understand: "but the existence of the data structure implies that some
operations must exist."

Grounding it out to a specific data structure, the existence of `List` implies
that e.g. `sort` exists?

That direction makes less sense than `sort` implies the existence of e.g.
`List`(something to be sorted).

~~~
narag
_the existence of `List` implies that e.g. `sort` exists?_

The existence of List implies that operations must exist to insert an element
in a list, access to elements in a list, find out the size of the list, etc.

Edit: BTW 'implies' is the the magic word in the text. It's what creates all
the appearance of meaning. Try to replace it what something else. Now I
remember why I disliked Plato so much.

~~~
gugagore
I thought about using `[]`or `indexOf` as examples of operations, but my
question still remains: what is implicit about it? It's part of the public
interface of `List`.

Not at all like the private members of an object, which I think was the
analogy being made.

~~~
lalaithion
Those aren't implicit because the "public interface" is the Object List, not
the Data Structure List.

    
    
        struct list {
          float node;
          struct list *next;
        }
    

Above is the data structure; it implies operations. Below is an interface
(class, in the article); it implies data.

    
    
        #define LIST_H
        
        float index(struct list *ls, int i);
        int find(struct list *ls, float x);
        void sort(struct list *ls);

------
725686
Maybe a little tangential but it immediately came to my mind Alan Perli's
quote: "It is better to have 100 functions operate on one data structure than
10 functions on 10 data structures." I think I first heard this from Rich
Hickey and made so much sense.

------
h8liu
What the author calls "objects" (or "classes") is really often just
"interfaces".

> An Object is a set of functions that operate upon implied data elements

If this is replaced with:

> An Interface is a set of functions, often operate upon implied some implied
> data elements (but not necessary).

Everything in the article will probably be less confusing.

------
F_J_H
And while the discussion takes place and the debate rages between classes vs.
data structures, there's some poor analyst or data scientist who just needs
access to the damned data to load it into a pandas data frame to do things
that those designing the objects and data structures never dreamed of in the
first place...

------
atoav
Interesting read. What immediately sprung to my mind was Rust's Trait system
which sort of manages to give you the best of both worlds. With Traits you can
implement common behaviour/functions for multiple datastructures.

When I started using Rust I wasn’t used at all to seperate data and behaviour
that strictly, but it makes sense. OOP paradigms were still hardwired in my
head so the hardest part was actually _wanting_ to do it in that decoupled
way. Something about a car object that has wheel objects and a car.drive()
function gives you a good feeling as a programmer, but sometimes it is more
effective to stay with the data structure and describe the car as a struct of
vectors which implement a _Driveable_ trait..

------
skybrian
Thinking about client-server architecture (for example, a database) can
clarify things.

Encapsulation means you never have the canonical data. The server is the
system of record. You can get data back in response to queries, perhaps even a
full data dump, but it's a snapshot. You can also send commands to mutate data
on the server. Typical applications aren't allowed to do a complete
replacement (restoring from backup).

On the other hand, data is better thought of as what's going over the network.
Messages consist of data. Encapsulation is almost meaningless; if you want to
keep something private from the receiver, don't include it in the message in
the first place (or use encryption, maybe). Anyone reading the data has to be
able to understand the format, or at least, ignore what they don't understand.

In the degenerate case where the caller and implementation live within the
same process, the same types often get used both for message transfer (in
function arguments and return values) and storage. There is widespread
"cheating" for performance reasons, and it can get confusing. For a transient
process, it might not make sense to think in these terms at all. (Traditional
Smalltalk used _persistent_ images and client-server style encapsulation makes
somewhat more sense there.)

You can also "cheat" by using the same schema for data transfer and storage,
or having a trivial mapping between them. This can introduce unnecessary
coupling, but there are systems where it works. (Consider that you can make a
full clone of a git repo and it doesn't encapsulate any data.)

------
hotBacteria
I like the shapes problem because I actually encountered it and it made me
think.

I'm not sure about the switch approach described in the post:

    
    
      function area(shape)
        switch shape.type
          case "square": return shape.side ** 2
          case "circle": return 2 * PI * shape.radius
          case "triangle": return ... 
          case "segment": return 0
          case "polygon": return ...
          ...
          case "oval": return ...
    

You can have a lot of cases, some of them requiring non trivial code...
Eventually you write a function for each case and it's more work than adding a
method for each shape because you still need to write the switch...

Classes seem work better than structures here.

But then you want to handle intersections

The switch approach doesn't seem realistic:

    
    
      function intersection(shapeA, shapeB)
        if(shapeA.type == "circle" AND shapeB.type == "circle")...
        if(shapeA.type == "circle" AND shapeB.type == "square")...
        if(shapeA.type == "square" AND shapeB.type == "circle")...
        ...//uh oh you have nShapes**2 cases to handle
    

But java classes or not better: where do you define Circle-Square
intersection? In Circle? In Square?

Even with multiple dispatch the solution is not ideal. You now have some
things related to Circle (area, perimeter...) in the Circle.blub file, and
intersection(Circle, Circle) wich only works with Circles is now in
intersections.blub...

I don't see a good solution and sometimes I feel like the problem is more with
our tools (code in text files) rather than programming paradigms

~~~
eska
I have to admit I don't quite understand your issue. To me it seems like
you've used a lot of OOP and cannot befriend the idea that the data structure
(e.g. "Circle" in file GeometryTypes.blub) and operations that are performed
with it (e.g. collisions in "CollisionDetection.blub") are completely
separate. There should be no discussion whether the circle type and collision
belong in the same file, while combinations are in some other file. Think of
it like this: if you're going to add 3d rendering of circles, will you put
that in Circle.blub together with collision detection? Wouldn't you rather add
it to 3DRenderer.blub?

That said, ultimately it doesn't really matter. If you're going to implement
collision detection like this, then yes, you will have a combinatorial
explosion. This is not a language issue. Switching from Java to some other
language with a different form of dispatch will not save you from implementing
a lot of algorithms when adding bezier curves into the mix.

The practical approach is reduce the problem to a common case, e.g. to turn
the collision shapes into a set of triangles first, and then perform triangle-
triangle collision detection.

------
fpoling
The dependency discussion is wrong. Changing code of a function does not lead
to recompilation of callers in most static languages. So if one change
circlePerimeter, only that has to be recompiled. But if one changes data
structure, then the callers has to recompiled. But this is also true for
objects. In C++ changing data typically leads to recompilation of both
implicit and explicit data structures. Essentially objects and data structures
behaves the same.

~~~
dllthomas
> Changing code of a function does not lead to recompilation of callers in
> most static languages.

I don't know how we're quantifying so as to assess "most", but at least in
some popular static languages there are circumstances (most notably inline
functions) where callers are likely to be recompiled and circumstances
(dynamic loading) where they clearly won't be.

> But if one changes data structure, then the callers has to recompiled.

Only if you're changing parts of the data structure that are visible to
callers. For instance, if your API operates on opaque handles you can change
the underlying data structure however you'd like without recompiling the
caller.

------
mannykannot
_Why does every software example always involve shapes?_

Because they allow us to avoid discussing the complications that arise when
entities have lifetimes, over which, at different stages, different operations
are meaningful.

~~~
b0rsuk
Because it's one of the very few cases where class inheritance is an elegant
solution.

------
solinent
A class is fundamentally about implementing object semantics. This simply
means everything (eg. all objects) are an instance of some class. It has
methods which can be used to operate or communicate with other objects or
itself.

Data-structures are put most simply as ways of organizing data. The
organization of the data implies a specific layout--a way of representing your
data as a table of integers, ie. in RAM.

After reading the article, I don't see a meaningful distinction between
objects and data structures. A data structure can be represented as an object,
especially in OOP languages where it _must_ be.

A general class doesn't necessarily lay out its data in any particular way--
which allows abstraction over the data representation of the class.

However, some classes are made which are designed in a manner which guarantees
a certain data layout. std::vector with its contiguous memory requirement
comes to mind.

To add to this, the c++ conception of a "concept" or haskell's concept of a
"typeclass", or even a generic class, is really what this article is talking
about. Or even Java or Go's interfaces. There is absolutely no way to
guarantee a specific data structure through an interface, typeclass, or
concept, since they fundementally do not mention their data representation at
all.

------
Aromasin
I enjoy the authors "question/answer" style of writing. I often find myself
asking questions just like this when reading an article, and find that when
the author isn't "question focused" they never get answered. It seems that
when the entire writing style pivots on the idea, the author forces themselves
to consider more Q's to pad out the content and, incidentally or otherwise,
provide more A's.

~~~
hoseja
I find it smug and insufferable, like someone on tumblr lecturing you about
demisexual marxist theory or something.

------
bendbro
This doesn't make sense to me: "Right. Now consider the area function. Its
going to have a switch statement in it, isn’t it?"

Perhaps I am nitpicking, or perhaps I am reading this wrong, but I would not
design the square data structure to have a perimeter function. The square data
structure should just expose the data that describes a square (length, width).
Adding higher abstractions (perimeter, etc) on top of the data structure only
serves to create the trumped up problem later described in the dialog. The
perimeter method should be defined in the Square class, where perhaps a
"StraightLinesOnlyPolygonMixin" could define the perimeter method.

In general, I cannot see why a Data Structure would define computational
methods. You are tightly coupling logic to the underlying data source, which
is wrong when that logic obviously could apply to any underlying data source
(I don't care if my square is backed by RDB, S3, a hardcoded instance, etc)
The perimeter method, and probably the Square class, should be the same.

~~~
ivan_gammel
You are getting it wrong. Data structures do not own functions, instead they
are passed to functions. So you have shapes and somewhere else you have
perimeter function which has switch statement to determine the algorithm of
calculation based on the type of the structure.

~~~
bendbro
Ah, and what owns these functions?

And more pressingly, why would you ever pass a data structure to a function
that had more than one algorithm to compute a result? The data structure (or
perhaps some intermediary (an adapter?)) should own the algorithm within a
function that computes only on that data structure. This ensures that all
methods associated with your data structure are obviously and explicitly
associated (in a single file, class, whatever). The alternative, as outlined
in the dialog, is to spread a bunch of switches all around your code. Given
these two possibilities, why would one choose to place switches in disparate
places throughout your code?

~~~
Silhouette
_Given these two possibilities, why would one choose to place switches in
disparate places throughout your code?_

You are touching on something called the "expression problem". You might be
interested in reading some commentary about it. There is an inherent decision
to be made any time you have many algorithms each operating on many data
types. In most programming models, you have to choose between grouping your
code based on your data types (but then any new algorithm needs to be
implemented on each data type) or based on your algorithms (but then any new
data type needs to be supported by a new case in each algorithm). Neither is
the "right answer". You fundamentally have a two-dimensional system here, and
you need to decide which axis you are going to prioritise in your design.

~~~
bendbro
Thanks. I will look into this. I would like to see examples that obviously
require algorithm-grouping or type-grouping, as in my experience, algorithm-
grouping has always lead to headaches.

I think I have encountered this issue in the past but was turned of by the
lack of formality in the discussion. I wish there was more academic, concrete
discussion of these issues, because I feel that what I am doing now (informal
discussion) likely has many holes.

------
steve-chavez
> Since the database schema is a compromise of all the various applications,
> that schema will not conform to the object model of any particular
> application.

Here it jumps to objects instead of database VIEWs, that can be tailored for
each application. There's no need for complex object models when you embrace
the db and you don't treat it as a dumb store.

------
mcguire
" _No, ORMs extract the data that our business objects operate upon. That data
is contained in a data structure loaded by the ORM._ "

Technically, ORMs are a set of waldos that you operate inside a glove box in
order to manipulate the data in the DB without getting DB cooties on you.

------
tydok
Classes vs Data Structures, or maybe Objects vs Data Structures, or maybe
Classes vs Objects, or maybe Data vs Data Structures, or maybe Data vs State,
or maybe Classes vs Types, or maybe OOP vs FP, or maybe I don't know what I'm
talking about...

------
micimize
Seems to me that with the definitions given, structures and objects aren't
opposites, they're corollaries.

A database table is both a data structure and a collection of functions for
accessing/manipulating it (SQL). An ORM maps between the "Object Oriented"
objects and the "Relational" objects that are tables.

Interesting the author thinks about the the api exposed by a table as the
"data structure" itself rather than an object. Pragmatically, we tend to refer
to the "objects" at the lower level of abstraction as data structures. Is
[...] a function that defines an array, or the array itself?

------
nfrankel
Am I the only one who has trouble with the dialog form?

~~~
roelschroeven
No. The way the text is written obfuscates the point the text is trying to
make, IMO.

~~~
xvector
I personally found the dialogue form amazing and made the article very
understandable.

------
sdegutis
This is really hard to follow and I'm not sure I'm understanding it the way he
intended.

Correct me if I'm wrong, but he seems to be saying that, to avoid breaking
consumers of your library often by changing the implementation, hide the
implementation details behind classes, right?

This seems to be a common reaction to someone who experiments with the
"freedom of functional programming" where that freedom means operating on and
returning raw data structures that OOP usually hides behind private variables.

That's still bad practice, even in code that heavily uses FP, and good code
usually mixes FP and OOP properly, so that you're given functions when you're
meant to have functions, and data when you're meant to have data. This is how
I've been writing JavaScript for a few years now, and it's not how I've seen
Java or Clojure usually written.

~~~
Twisol
I don’t think he’s exactly advising any particular action. Rather, data
structures and objects tend to be badly conflated, and there’s a lot of value
in clarifying the distinction between them. You’ll use each in different
circumstances, for different reasons, by weighing the needs of the system
against the design tools at your disposal.

In Rust, we keep the same distinction by modeling data structures as structs
and enums, and modeling the “object” side by traits (whether static- or
dynamic-dispatch). Traits decouple a consumer from the particular data and
emphasize a behavioral contract, allowing any data structure to implement the
desired behavior.

~~~
sdegutis
So basically mixins, right? Those were hard to use correctly in Ruby, because
you might have multiple whose behavior clash because they can access the same
data and were not written with each other in mind. I wonder how Rust solves
that.

~~~
Twisol
Not quite. A mixin is a piece of code written once and transcluded into
another module. Traits are more related to OOP interfaces: every type
implements one in its own way. The difference with interfaces is that traits
can be implemented separately from the definition of the underlying data type,
which clarifies the distinction between inherent operations on a specific data
structure, and derived operations that bind it to a more general contract of
use.

------
flakiness
Was surprised how Uncle Bob getting better at trolling these days. He's been
provoking, but somewhere in recent years he turned himself to a troll. (Or
he's just talking to his audiences, who aren't HN readers.)

------
mistrial9
very interesting and worthwhile! Data has gravity; and data dependancies are
more costly than code dependancies, are two lines that are current.

Objects may possibly have broader uses that what is described here ("business
data applications") but within the definition given, the description of Object
and operations on object make a lot of sense. This post is worth re-reading a
few times.

------
stupidcar
Whenever I hear "Socratic dialog", I reach for my revolver. Is there any other
form of teaching so irritating and patronising? You might have a brilliant
store of insight to impart, but if you insist on trying to do so via a twee,
affected and unbelievable conversation with a Mary Sue wise professor, I'm
going to write you off as insufferable before the fawning moron you have as
proxy for your audience utters their first "Oh, wow! So you mean that straw-
man you just put in my mouth _isn 't_ true?"

~~~
thom
The only thing more annoying is when people ape Why's (Poignant) Guide to Ruby
and you have to follow the adventures of some tedious otter as it meets the
rabbit people who ultimately explain pointers in a way which takes a thousand
too many words.

~~~
ci5er
Have you ever come across anything that explains why pointers are hard for
some people?

I find it difficult to explain pointers to people because I don't understand
what they are missing. I could use some help understanding their lack of
understanding.

When they don't get "indirect reference to the address of a data structure or
object in memory", I'm stuck on how to proceed. Pointing people to the very
elegant treatment of this topic in the old K&R doesn't always do the trick.
Somehow.

~~~
tomrod
I don't think I understand pointers. Maybe we can help each other. My mental
model of pointers is a map: lat/long identify the object of interest (the
local coffee shop), but not what is interesting about it (their menu and
hours).

~~~
ci5er
Hmmm. Interesting. I'm going to have to roll that around in my brain for a
day-or-so to see if I can make that compute for me. (Which means I may be back
here in a day to check back in!)

Maybe it's because I started in assembly language, but to me, data exists at
an address in memory. Or starting at an address in memory. That data might be
on integer, a float, a character, the beginning of a string, the head of an
object/data-struct, OR an address of data somewhere else (which may be any of
the above again). Pointers, to me, are simply an address of an address of a
data thingie.

Why would you want to use these? One reason is that pushing large data-objects
back-and-forth across function calls is expensive (pointers are usually just
32-bit or 64-bit numbers). Another is that one can have a pool of resources
(think polygons in a large 3d scene), that are marked "used" or "unused". This
gets a large performance boost from not having to do the whole malloc/free
thing on lots and lots of small objects. Another (this is related to the
first), the function can receive a data-structure, and modify it (side
effects!) without having to copy it and pass the modified copy back.

I'm intrigued by your mental model. Having not thought it through - it seems
as if lat/long is appropriate, but maybe even the idea that it is a coffee
shop that you will find there is undetermined. (That is: the fact that it is a
coffee shop is one of the things that is interesting about it, like the menu
and hours. Again, I need to think this through).

Thanks!

~~~
kasey_junk
Note that your model is also an abstraction. Modern architectures move data
around between a variety of memory locations (and store it in multiple
locations) which are all addressed via the pointer ‘transparently’. It’s not
really a memory address anymore and hasn’t been for some time.

That’s not to say your model is bad, it’s the same one I have in my head. But
it’s no more ‘real’ than coordinates or library cards or P.O. Box number.

~~~
roelschroeven
Isn't it still the model that is exposed to the programmer, even though
internally all kinds of things are going on to make it all go faster? For
example, nowadays many/most processors have separate caches for instructions
and data, meaning that they strictly speaking don't conform to the Von Neumann
architecture. But the programmer doesn't see the different caches, ideally
doesn't even see the cache at all.

It's like the execution model, where processors do a lot of out-of-order
execution, but do a lot of work so they can present a model to the programmer
as if everything happens in-order.

~~~
kasey_junk
That depends on the architecture, the os, the compiler & the runtime.

Part of the confusion in teaching pointers (I think) is that we act like they
are a lower level abstraction than they are (because they used to be). On
modern commodity hardware with mainstream languages pointers have very little
to do with memory addresses. Accessing memory via them will not happen in
constant time, they may or may not perform better when used, even if the
language doesn’t copy them on use the OS or processor might etc.

I just find when I try to teach people the memory address model of pointers
there are so many exceptions that the model isn’t helpful. Like the OP I
struggle with an alternative.

------
bcp2384
Not every language is class-based...

------
gowld
When Uncle Bob discovers subclasses, friends, decorators, and mixins, his mind
will be blown.

