Also it seems to use the standard method of stacking layers, rather than allowing you to describe an arbitrary computational graph which seems (to me) to be the far superior method (see CNTK's Network Description Language).
Leaf takes an imperative approach and explores an easier API (only Layers (Functions) and Solvers (Optimizer Algorithms)), reusability through modularity and abstractions that keep the implementation and concepts to a minimum or rather abstractions that feel as familiar to a hacker as possible.
For future versions e.g., we want to explore what is practically possible with auto-differentiation via dual numbers and differentiable programming.
Of course, most people don't have the resources of Google with a layer of data scientists and another layer of software engineers [and maybe a layer of data engineers in the mix too]. So the idea of a tool tailored to a small team's needs rather than those of Google seems like an interesting niche.
> Leaf is a Machine Intelligence Framework engineered by hackers, not scientists. It has a very simple API...
That is quite a diferentiator.
However, if you want to really understand how things fit together you're probably best reading one of the standard intro textbooks: Murphy's Machine Learning, Bishop's Pattern recognition and machine learning, Hastie et al's The Elements of Statistical Learning, or Wasserman's All of statistics.
Or Barber's textbook, which is freely available online and has some nice mind-maps/concept-maps/trees at the start of each section: http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...
If you actually cared about your patients, then you would use whatever method has the highest accuracy. False predictions mean injury or death. Using a suboptimal method means people die.
The best of both worlds is to use the whatever model gets the best predictions. Then train another model which is understandable on the output of the first one. I.e. generate random data, see what predictions the good model makes. Then the understandable model has infinite data to train with and doesn't need to worry about overfitting.
But still, the utility of being able to understand the model is limited. It's just a big set of parameters, without any reasoning or explanation of why the parameters are what they are.
I like machine learning, and prediction centered approaches--but there are many factors (such as adherence both by doctors and their patients) that are important, here. In a sense, the model needs to take into account "model type" into its predictions, which could lead to a model that predicts disease treatments well, but believes it should not be used!
But with Leaf it becomes very easy to create modules (Rust crates) that expose layers/networks/concepts, which can have a metaphorical name.
Intuitively I think that there should be connections activated for less than x% of inputs that could be removed entirely. In some cases this would mean removal of whole node. Would interesting to read something about such an approach.
There is some research on pruning networks after training, removing small weights and nodes that don't contribute much. This results in much smaller neural networks, which run better on smartphones or embedded devices.
This isn't done often though because on GPUs it doesn't result in higher performance. Because of the way SIMD instructions work or something, there's no way to take advantage of sparser neural networks. A synapse of weight 0 still requires a multiplication by zero, and adding the zero to the sum.
Otherwise, this is a bit confusing.
I write / use software to solve problems.
He isn't talking about HN readers but Joe Public. A few days ago our head of creative strategies passed my machine when I was doing a load of work in a terminal. He asked what I was doing and if I was a hacker. (I told him it depended what you define as a hacker).
For better understanding of the term, see "hacker culture":