
Data Still Dominates - atomlib
https://theartofmachinery.com/2019/06/30/data_still_dominates.html
======
MikeOfAu
LISPs have this idea so deeply etched into them that code itself is data -
homoiconicity.

In the Clojure community, for example, you will find that the primacy of data
is well understood and used. Data-Oriented design and all that. You'll see the
aphorism "Data is the ultimate in late binding" mentioned.

You will also find that it is immutable data that reigns supreme.

Notice also how, in a reactive system, it is the data (or the arrival of it)
that coordinates the functions/processing, not the other way around.

I was exposed to the "data first" (and non-OO) way of thinking about 5 years
ago, after being in the OO world since the mid-80s. I haven't looked back.
When the penny drops for you on this it is a profound and liberating moment
(or at least it was for me).

~~~
hackits
There was two competing viewpoints of OO. A object is nothing more than an
animated data structure, and the opposing view saw objects as behaviors and
you didn't map them explicitly to your data model.

As most people viewed object's as simple extension over the data, it's not
surprising to see most developer opt out for more basic form of data
representation and that doesn't need object's at all.

------
i_v
I've once designed "The Fantastically Modular and Configurable Ball of Mud."
Scratch that. I've definitely succeeded many times even if that wasn't the
goal. I now take the approach of writing single-purpose throwaway tools until
I've gotten a firm grasp on exactly the use-cases I'm _not_ targeting. Few
open source projects list anti-features, but when they do, I'm extremely
appreciative.

------
js8
Good grief, the three examples sound like a project I am working on..

I have a feeling that microservices are the new OOP, just a layer higher. Just
like in OOP, where you keep data in lots of tiny objects, you keep the data
across a bunch of little databases instead of a big central one.

It is a very seductive idea (and as Alan Kay explains, inspired by biology),
unfortunately, it is really not that useful for designing real-world computer
systems, where we want data to be consistent and aggregate them in different
ways.

~~~
dkersten
> it is really not that useful for designing real-world computer systems,
> where we want data to be consistent

I believe it was _" Life Beyond Distributed Transactions"_[1] that showed an
alternative approach that often works without providing globally consistent
data. I've done a few thought experiments in the past, where I tried to design
some systems without transactional consistency. Its not necessarily easy, but
it often (although, I assume not always) can be done. Whether the extra effort
is worth it or not, depends on your specific needs and goals, I guess.

[1]
[https://queue.acm.org/detail.cfm?id=3025012](https://queue.acm.org/detail.cfm?id=3025012)

~~~
js8
I think this is a very different approach than microservices. You're not
sidestepping the problem of consistency, you're aware of the trade-off.

~~~
dkersten
Well, you're designing around the limitations of not having consistency. But
yes, its not really related to microservices.

------
rossdavidh
I have heard it said that the ur-problem, the source of most other problems,
in computer programming is that every lesson needs to get relearned, the hard
way, every 5 years, because the average age is so young, and the excuses for
ignoring the lessons of the past are so easy.

~~~
adrianratnapala
Is it because people or so young, or that the pipeline to become a junior-but-
still-responsible person goes through schools rather than something like an
apprenticeship.

~~~
rossdavidh
Both, I think. The field has grown in size, so that alone makes the young-to-
middle-aged ratio skewed. Then, many middle-aged programmers drop out to
become managers.

I totally agree that programming would be better taught by an apprenticeship
model than the college model, and I say this as a person with two college
degrees.

------
jackyinger
I wish I’d realized this much earlier. But perhaps the journey to
independently arriving at this idea was worth the delay.

Along similar lines, there’s so much noise about new frameworks and languages
on HN that it may inspire a sort of cargo cult in less experienced developers
(it took me a while to see through that fog).

Anyone have more suggestions for useful aphorisms?

------
sadness2
For an opposing perspective, I cite this article, entitled Process first, not
data first

[https://scalablenotions.wordpress.com/2015/09/30/process-
fir...](https://scalablenotions.wordpress.com/2015/09/30/process-first-not-
data-first/)

~~~
TheOperator
I can tell you process first thinking is hugely important in large business
because there is usually already a process and it is incredibly costly in
terms of political capital and job security to deviate from a long existing
enterprise process.

~~~
lincpa
In a large business, `process first thinking` is based on the
credential(evidence). This credential is data.

Credentials (data) flow through the nodes in the process to form a dataflow.

Every node in the process is a pipe-function (pure function).

This is [The Pure Function Pipeline Data
Flow]([https://github.com/linpengcheng/PurefunctionPipelineDataflow](https://github.com/linpengcheng/PurefunctionPipelineDataflow)).

~~~
sadness2
Thanks for sharing this classical perspective. When I'm kicking off a system
for a complex organisation, I start by focusing on what processes/changes
occur in the organisation. I find that users can intuitively reason about what
they are trying to get done, which leads to clear pipe-functions, from which
one can infer and discover correct inputs and outputs. If you begin with a
focus on inputs/outputs/data structures, you tend to end up with a lot of
disagreement and omissions. The idea is to get to a more suitable Pure
Function Pipeline Data Flow sooner.

~~~
lincpa
In large enterprises and legal societies, procedural justice pays more
attention to the traces of procedures on data (auditable evidence)，So the
input and output data is the most important.

Addition， I agree with the following view:

```

Even the simplest procedural logic is hard for humans to verify, but quite
complex data structures are fairly easy to model and reason about. ... Data is
more tractable than program logic. It follows that where you see a choice
between complexity in data structures and complexity in code, choose the
former. More: in evolving a design, you should actively seek ways to shift
complexity from code to data.

    
    
       ---- Eric Steven Raymond, The Art of Unix Programming, Basics of the Unix Philosophy
    
    

Show me your flow charts and conceal your (data) tables and I shall continue
to be mystified, show me your (data) tables and I won’t usually need your flow
charts; they’ll be obvious.

    
    
        ---- an early edition of The Mythical Man-Month
    

```

------
noobermin
I'm not a systems designer, so I am somewhat ignorant to make this statement,
but this sounds almost like "nouns-first" is the right thing, but almost every
single one of those people cited in the article are anti-oop people. Moreover,
OOP has generally fallen by the way side and been blamed for unnecessary
complexity in some legacy systems from the 90s/00s. The moment you
conceptualize your system as the interaction of concrete data types, OOP
starts to seem like a logical paradigm to choose.

May be this is just in my head because I recently discovered this talk (ignore
the incendiary title):
[https://www.youtube.com/watch?v=QM1iUe6IofM](https://www.youtube.com/watch?v=QM1iUe6IofM)

~~~
wellpast
(The talk you link to is strongly anti-OOP. I assume you linked it as an
example demonstrating your statement that anti-OOP is on the rise these days.)

To respond to your point, though:

Traditional OOP wants to meld _behavior_ into the noun. Which should make the
data-first (noun-first?) POV more apparent -- data isn't first-class when it's
coupled to behavior as traditional OOP usually has it. So functional-oriented
program is more data oriented as data remains first-class (ie decoupled from
behavior).

~~~
0815test
Data is _defined_ by behavior, even for "plain old" types. The point of
"objects" is that sometimes you can implement the very same behavior in ways
that are essentially isomorphic from the POV of your outside code, and want
the freedom of switching out the underlying implementation at any time, and
perhaps of validating complex invariants about the behavior your object has
been designed for-- _without_ letting implementation details dictate what
sorts of behaviors you're going expose (the way, e.g. a "record" datatype
exposes the equivalent of getters and setters, or a "variant record" exposes
pattern matching, etc.).

This ("objects-based" programming; or programming with "abstract types")
actually works fine. The part where OOP leads to _real_ problems that make it
inimical to true modularity is all about the tacked-on features of inheritance
and polymorphism; specifically, _implementation_ inheritance. Because that
means you've started relying on the very interface you were supposed to define
in order to implement some other behaviors implied in it, and then for good
measure you're allowing that interface to change in practically arbitrary ways
as new "derived" classes are defined. It's not surprising that this fails to
work well.

~~~
wellpast
> Data is defined by behavior, even for "plain old" types.

This _is_ the POV of the OOP-ist, but it is not necessary and it is limiting.
It's actually the "debate" we are having, so asserting it isn't proving it!

Functional programming has a complete story for polymorphism so OOP does not
win on that account contrary to what you're implying.

I do think we have common ground in shunning class inheritance which is
useless and a complete disaster.

However even without inheritance it seems to be that "objects" are provably
replaceable by functions. (E.g., promote the implicit `this` reference to a
first-class reference provided as an explicit arg to functions.)

I see no reason why data can't have opaque parts so data encapsulation doesn't
seem to be a unique OOP claim either.

So then if objects aren't necessary for polymorphism and data encapsulation,
then what _are_ they good for?

~~~
0815test
> This is the POV of the OOP-ist, but it is not necessary and it is limiting.

How is it "limiting"? And if you want proof, look at the untyped lambda
calculus - there you find data types defined _entirely_ in terms of functions
- pure behavior! (For example, the Church natural numbers are defined by the
behavior of iterating some arbitrary function exactly _n_ times; the Church
booleans by taking two arguments and returning either the first or the second
argument (which in turns makes it possible to define _if-then-else_ , a sort
of pattern matching); and so on and so forth.) It just so happens that this
behavior-focused encoding is enough to express arbitrary programs - which is
the opposite of limiting!

~~~
wellpast
> there you find data types defined entirely in terms of functions - pure
> behavior!

In every programming environment that I am aware of, such data-less functions
you describe would be happily deleted without any worry to customers &
stakeholders.

For example:

    
    
       add : Int -> Int -> Int
    

This may look nice on paper but on a real computer there are bounds to this
purity. And anyway my program only becomes useful when actual integers are
instantiated and appearing on stacks and the heap.

The "data-first" ideas we're discussing here ask one to stop obsessing over
the functions and model the data soundly. You'll find any PL will do when
operating over sound data expressions. This approach ime brings clarity and
power to problem solving.

Theory divorced from practice _is_ limiting. This is not a philistinic take,
btw -- theory is supremely powerful when applied successfully for outcomes.
But the "pure behavior!" you're talking about here seems too excitedly far
away from practitioner-space.

------
lincpa
The Pure Function Pipeline Data Flow, based on the philosophy of Taoism and
the Great Unification Theory, In the computer field, for the first time, it
was realized that the unification of hardware engineering and software
engineering on the logical model. It has been extended from `Lisp language-
level code and data unification` to `system engineering-level software and
hardware unification`. Whether it is the appearance of the code or the runtime
mechanism, it is highly consistent with the integrated circuit system. It has
also been widely unified with other disciplines (such as management, large
industrial assembly lines, water conservancy projects, power engineering,
etc.). It's also very simple and clear, and the support for concurrency,
parallelism, and distribution is simple and natural.

There are only five basic components:

1\. Pipeline (pure function)

2\. Branch

3\. Reflow (feedback, whirlpool, recursion)

4\. Shunt (concurrent, parallel)

5\. Confluence.

The whole system consists of five basic components. It perfectly achieves
unity and simplicity.It must be the ultimate programming methodology.

This method has been applied to 100,000 lines of code-level pure clojure
project, which can prove the practicability of this method.

[The Pure Function Pipeline Data
Flow]([https://github.com/linpengcheng/PurefunctionPipelineDataflow](https://github.com/linpengcheng/PurefunctionPipelineDataflow))

~~~
wodenokoto
/s?

~~~
lincpa
In The Pure Function Pipeline Data Flow,

\- programming is the process of designing a data model that is simple and
fluent in manipulation.

\- Data and logic are strictly separated, Element level separation of data and
logic, data stream processing.

\- Its Warehouse / Workshop Model is ideal for data programming.

~~~
heavenlyblue
If I make an another comment, will you add more buzzwords in your next reply?

~~~
lincpa
I don't remember having commented on you. There are many people who like my
article. Their skills are very good. I hope that you can have a technical
comment.

------
codeisawesome
So I love the quotes at the top of the article, and evidence bears out on
those. HOWEVER, is it really that big a problem that the Search Index goes out
of sync with the Database? I suppose it depends on the use-case, but without
that spec info we can't make a judgement. Additionally, deploying a beefier
task queue and increasing the size of the search cluster might have also
helped... I feel it's a bad example.

~~~
rossdavidh
It depends upon the process which was doing the search. For example, does some
automated process search for possible duplicates of a given field, nearly
immediately, and need to know whether the returned set includes the new record
(or not), so it can know whether "1" means "1 duplicate" or "no duplicates".
That's just an example I made up on the spot, but there are more
possibilities. If storing a record sets off a chain of events, and one of
those might do a search, then the sync between index and database might be
critical.

Or, of course, depending on the system, it might not.

------
robohamburger
It is always worth taking a look at what you need to accomplish and the data
you have and the tools for manipulating it then build (or not build) a system.

If you just hit everything with the REST, OOP or db hammer you end up with
things that more complicated than they need to be.

