
Show HN: Applying the Unix philosophy to neural networks - cloudkj
https://github.com/cloudkj/layer
======
HALtheWise
I don't see any claims about performance, but I would be very surprised if it
was anything better than abysmal. In a modern neural network pipeline, just
sending data to the CPU memory is treated as a ridiculously expensive
operation, let alone serializing to a delimited text string.

Come to think of it, this is also a problem with the Unix philosophy in
general, in that it requires trading off performance (user productivity) for
flexibility (developer productivity), and that trade-off isn't always worth
it. I would love to see a low overhead version of this that can keep data as
packed arrays on a GPU during intermediate steps, but I'm not sure it's
possible with Unix interfaces available today.

Maybe there's a use case with very small networks and CPU evaluation, but so
much of the power of modern neural networks comes from scale and performance
that I'm skeptical it is very large.

~~~
enriquto
> I don't see any claims about performance, but I would be very surprised if
> it was anything better than abysmal. In a modern neural network pipeline,
> just sending data to the CPU memory

Notice that the bulk of data does not necessarily go through the pipeline (and
thus by the cpu). You may only send a "token", than the program downstream
uses to connect to and deal with the actual data that never left the gpu.

~~~
bollu
Pretty sure you can't do this since the GPU's memory is per-process isolated
as well.

~~~
tntn
Unless you share it between processes.

------
peter_d_sherman
Excerpt: "layer is a program for doing neural network inference the Unix way.
Many modern neural network operations can be represented as sequential,
unidirectional streams of data processed by pipelines of filters. The
computations at each layer in these neural networks are equivalent to an
invocation of the layer program, and multiple invocations can be chained
together to represent the entirety of such networks."

Another poster commented that performance might not be that great, but I don't
care about performance, I care about the essence of the idea, and the essence
of this idea is brilliant, absolutely brilliant!

Now, that being said, there is one minor question I have, and that is, how
would backpropagation apply to this apparently one-way model?

But, that also being said... I'm sure there's a way to do it... maybe there
should be a higher-level command which can run each layer in turn, and then
backpropagate to the previous layer, if/when there is a need to do so...

But, all in all, a brilliant, brilliant idea!!!

~~~
kwaugh
> Now, that being said, there is one minor question I have, and that is, how
> would backpropagation apply to this apparently one-way model?

The author mentioned that this is only for inference of neural networks (not
training), so this does not support backpropagation.

~~~
PeterisP
This kind of misses the point of the Unix philosophy of being able to
dynamically reconfigure things - realistically, to get decent results, you'll
need to do inference with the exact same connections as you trained (or at
least finetuned) them, so there's no good reason to split the model in smaller
parts.

~~~
iandanforth
I think that there is room for this idea at a higher level of abstraction
where pre-trained sub networks are exposed as command line utilities.

cat tweets.txt | layer language-embed | layer sentiment > out.txt

~~~
PeterisP
My point was that this is _not_ possible as the trained layers are
intrinsically tightly coupled. You _can 't_ combine pre-trained sub networks
in arbitrary manner without retraining. In all the standard practice of
reusing pretrained networks, you would take a pretrained network or part of
it, and _train_ some layers around it to match what you need, optionally fine-
tuning the pretrained layers as well. If you want use a different pre-trained
embedding model, you retrain the rest of the network.

In your example, the sentiment layer will work without re-training or
finetuning only if preceeded by the _exact same_ language-embed layer as the
one it was trained on. You can't swap in another layer there - even if you get
a different layer that has the exact same dimensions, the exact same
structure, the exact same training algorithm and hyperparameters, the exact
same training data but a different random seed value for initialization, then
it can't be a plug-in replacement. It will generate _different_ language
embeddings than the previous one - i.e. the meaning of output neuron #42 being
1.0 will be completely unrelated to what your sentiment layer expects in that
position, and your sentiment layer will output total nonsense. There often
(but not always!) could exist a linear transformation to align them, but you'd
have to explicitly calculate it somehow e.g. through training a transformation
layer. In the absence of that, if you want to invoke that particular version
of sentiment layer, then you have no choice about the preceeding layers, you
have to invoke the exact same version as was done during the training.

Solving that dependency problem requires strong API contracts about the
structure and meaning of the data being passed between the layers. It might be
done, but that's not how we commonly do it nowadays, and that would be a much
larger task than this project. Alternatively, what could be useful is that if
you want to pipe the tweets to sentiment_model_v123 then a system could
automatically look up in the metadata of that model that it needs to transform
the text by transformation_A followed by fasttext_embeddings_french_v32 - as
there's no reasonable choice anyway.

~~~
iandanforth
Yes. I understand how neural networks work. In my example language-embed and
sentiment are provided by layer. This allows layer to provide compatible
modules. If two modules which are incompatible are used together they might
provide junk output. That is true for any combination of command line
utitilies. If I cat a .jpg I'm going to have a hard time using that output
with sed.

------
mempko
What's wonderful about this concept (and unix concept in general) is that the
flexibility it gives you is amazing. You can for example pipe it over the
network and distribute the inference across machines. You can tee the output
and save each layers output to a file. The possibilities are endless here.

------
skvj
Great concept. Would like to see more of this idea applied to neural network
processing and configuration in general (which in my experience can sometimes
be a tedious, hard-coded affair).

------
craftinator
I've been thinking about something like this for a long time, but could never
quite wrap my head around a good way to do it (especially since I kept getting
stuck on making it full featured, i.e. more than inference), so thank you for
putting it together! I love the concept, and I'll be playing with this all
day!

------
xrd
This might not be a great way to build neural networks (as other commenters
have said regarding performance). But, it could be a great way to learn about
neural networks. I always find the command line a great way to understand a
pipeline of information.

------
luminati
Great idea but however equally great caveat - it's just for (forward)
inference. Unix pipelines are fundamentally one way and this approach won't
work for back propagation.

~~~
bigred100
I don’t see any reason you couldn’t just spit out the output and the
derivative of the layer output with respect to the weights, then multiply and
carry these all the way down. Then if you have a loss function at the end you
have the gradient. Probably this project is for fun and not scale so it’s
fine. But then you need to think about changing the weights on every layer
based on the optimization

------
Rerarom
Sounds like the kind of thing John Carmack would enjoy hacking on.

~~~
fouc
How's that? Isn't he more of a C/game developer? Is he a unix guy?

~~~
snazz
[https://news.ycombinator.com/item?id=9810342](https://news.ycombinator.com/item?id=9810342)

 _John Carmack working on Scheme as a VR scripting language_

------
Donald
See also Trevor Darrell's group's work on neural module networks:

[https://bair.berkeley.edu/blog/2017/06/20/learning-to-
reason...](https://bair.berkeley.edu/blog/2017/06/20/learning-to-reason-with-
neural-module-networks/)

------
mark_l_watson
Wonderful idea and the Chicken Scheme implementation looks nice also.

I wrote some Racket Scheme code that reads Keras trained models and does
inferencing but this is much better: I used Racket’s native array/linear
algebra support but this implementation uses BLAS which should be a lot
faster.

------
dekhn
[https://www.jwz.org/blog/2019/01/we-are-now-closer-to-
the-y2...](https://www.jwz.org/blog/2019/01/we-are-now-closer-to-
the-y2038-bug-than-the-y2k-bug/#comment-194745)

~~~
bitwize
Linking to jwz directly from hackernews will show a nutsack in your browser
and not anything useful.

~~~
toxik
What an absolutely childish thing to do.

~~~
dekhn
the problem is that every time somebody posts a link to his site, a bunch of
HN folks go and say "hur hur" in the comments. Jamie doesn't have patience for
fools.

------
cr0sh
Well - I will say I like the general concept. I just wish it wasn't
implemented in Scheme (only because I am not familiar with the language;
looking at the source, though - I'm not sure I want to go there - it looks
like a mashup of Pascal, Lisp, and RPN).

It seems like today - and maybe I am wrong - but that data science and deep
learning in general has pretty much "blessed" Python and C++ as the languages
for such tasks. Had this been implemented in either, it might receive a wider
audience.

But maybe the concept itself is more important than the implementation? I can
see that as possibly being the case...

Great job in creating it; the end-tool by itself looks fun and promising!

~~~
seisvelas
Ironically, I was only mildly interested in this before I read your comment
and learned that it's in Scheme. Now I'm eager to check it out, haha.

~~~
bitwize
Me too! I was like holy smokes, it's in Scheme?! Make AI Lispy again!

------
andbberger
Reading the title, I can't help but think of the 'Unix haters handbook' and
groan, why would you want to apply the unix philosophy to nets??

~~~
F-0X
>I can't help but think of the 'Unix haters handbook' and groan

The one true gripe they outlined is that the user interface requires some
explanation before one can bootstrap their own knowledge.

I'll defend the unix philosophy till I die, probably. Why _wouldn't_ you apply
it here?

~~~
bitwize
Because all the time spent marshalling and unmarshalling data structures into
bags of textual bytes takes up CPU and does nothing but hasten the heat death
of the universe. We have _better_ models for that sort of thing. See:
PowerShell.

------
nihil75
I love you

