Like picking hyperparamters - time and time again I've asked experts/trainers/colleagues: "How do I know what type of model to use? How many layers? How many nodes per layer? Dropout or not?" etc etc And the answer is always along the lines of "just try a load of stuff and pick the one that works best".
To me, that feels weird and worrying. Its like we don't yet understand ML properly yet to definitively say, for a given data set, what sort of model we'll need.
This can lead us down the debugging black-hole TFA talks about since we appear to have zero-clue about why we chose something, so debugging something ultimately might just be "opps - we chose 3 layers of 10, 15, and 11 nodes, instead of 3 layers of 10, 15 and 12 nodes! D'oh! Lets start training again!"
It really grates me to think about this considering how much maths and proofs and algorithms get thrown at you when being taught ML, then to be told when it comes to actually doing something its all down to "intuition" (guessing).
And yeah as others have said - data :-)
Yes, we have to try a load of stuff, especially in deep learning, where the old feature engineering seems to be replaced with architecture engineering .
Yet, the stuff to try is usually well-known . You should not be trying stuff completely at random (unless you do a random gridsearch for hyperparams ).
> we chose 3 layers of 10, 15, and 11 nodes, instead of 3 layers of 10, 15 and 12 nodes!
Hinton advises to start with a wide net, ensure that the net is able to learn the function, then add dropout, smaller layers, and/or regularization to reduce the overfit. This should avoid scenario's like the above.
Wanting to have the perfect model for a specific problem mathematically calculated may be an unrealistic demand. If we view machine learning as compression (using the shortest program possible to describe a function/distribution), per Kolmogorov Complexity, we can't compute this shortest program, and if we happen to find it, we can't know we did. The maths is for theory. Intuition is for the practical application. See intuition more as guided guessing, instead of random guessing: with experience you should know what are the flaws and benefits of the different models and architectures for particular problems. You can limit your search to sane narrow ranges.
 Finally, suppose you want to train an LDNN. Rumor has it that it’s very difficult to do so, that it is “black magic” that requires years of experience. And while it is true that experience helps quite a bit, the amount of “trickery” is surprisingly limited ---- one needs be on the lookout for only a small number well-known pitfalls. - http://yyue.blogspot.com.br/2015/01/a-brief-overview-of-deep...
 http://www.jmlr.org/papers/v13/bergstra12a.html https://people.eecs.berkeley.edu/~kjamieson/hyperband.html
This is what machine learning to me feels like. We just hook this stuff together and some really smart guy with a fractional Ramanujen worth of natural intuition says it will work and it does, especially for the really complex image recognition models. Have you guys looked at Google's "inception" architecture? It's a huge rube Goldberg machine of many many layers and it does work, but there's not a lot of reasoning about why it was designed the way it was.
But picking a proper model and hyperparameters requires some insight/knowledge about the properties of the data itself, which is the whole purpose of doing ML experiments in the first place (to gain knowledge/insight about the data)!
However I do agree that we don't fully understand the area yet or the reach of its applications.
Well, yeah, that's easier to say than "read this entire book to understand how machine learning models work". The truth is that different models work with different success for different kinds of data.
How many predictors do I have? How many observations vs how many predictors? How much data overall? Do I care about interpretability? Am I doing classification or regression? If doing regression, what is my tolerance for error? How much time am I willing to spend training and predicting? If doing classification, are my classes balanced or imbalanced? How are the classes spread? Do I care more about overall errors or false positives / false negatives (think of a cancer screening test - you might rather have false positives than miss something with false negatives).
All of these factors, and more, go into play when selecting a model. There's other bits too. Selecting variables: including unneeded or highly correlated variables can actually worsen the result. Feature engineering: Turns out you might be able to process your variables in such a way that the algorithm picks up on the details you need much more easily. Cross validation: You don't want to "overfit" your training data, i.e. get a model that is super specific to your training data such that when encountering actual data points, it has worse accuracy because it's not general enough. Hyperparameter tuning: Usually a lot of these models have tuning parameters that you can tweak, that are hard to know what the values should be from the get go, and you have to try a bunch and look at the response curve of how the accuracy changes.
So yeah, machine learning is not magic. Turns out there are different tools for different problems. We do have some models like Random Forests and SVMs that work fairly well out of the box on a wide variety of problems, and some kinds of neural networks also do well but often need more data and processing time to get decent results. It's all a tradeoff :)
Forgive the naive question, but why couldn't ML figure out its own best "stuff"?
Sounds to me like an opportunity to apply automation - Monte Carlo is your friend and machine cycles are cheap!
Most software development and ML development are diametrically opposed. That's why we have statisticians.
Most software developers like the control that programming gives them - that's why they are programmers. They prefer developing systems that can be programmed (trained) quickly, consistently and with rapid feedback using computer languages.
ML in contrast, is sort of like the claw crane arcade game at Walmart, you remember, the one where you insert 50 cents and then get to "control" the movement of a scoop for 10 seconds and then it drops onto a pile of possible prizes and, when you raise the crane, it drops the prize back onto the pile.
If you're very lucky, you get a stuffed bunny or duck for your kid. Usually your kid goes away disappointed. There's something there but it's just beyond your ability to find it in 5 minutes. That's what ML looks and feels like but 50X worse. The ML training software has 50-100 knobs(you get to pick!) and the object of your desire lives in a multi-dimensional hyperspace. Find out how to use the knobs to reach your boss' goal. This is imo NOT a task for a software developer. And you don't even get a fuzzy bunny for your efforts.
Yes, the claw cranes are rigged:
And if you think that's bad, consider that much of AI-related ML is like trying to succeed with a claw crane designed by God (or maybe the Devil). It pays to choose your challenges carefully.
Talking from personal experience in the private sector:
You go trough the whole process and present your results for a customer. They make the required changes, see few percent improvement in the bottom line and are happy. Year later you pick up the same code for another project and discover small error in data collection script that completely invalidates everything.
Garbage in, garbage out errors can have internal consistency that survive cross validation and provide similar results with many different algorithms and models. Random changes in the real word can produce actual gains.
Standard machine learning datasets rarely suffer from this problem because results and semantics of the problem are known beforehand. If you have a original black-box problem, it's possible to do random search and improve by accident.
Having spent a lot of time doing Monte Carlo rendering, I can say that "why is this pixel messed up" can be a painful affair (and that domain is at least visual!). Bad input geometry, NaNs or Infs because you forgot to handle some 1/r^2 term, etc. occur in all numerical computing.
Some of this is actually obviated in TensorFlow the same way it is in much of numerical computing: frameworks that have been battle hardened through years of bug fixes. This lets you at least focus on "Did I describe my model correctly?" rather than "Did I implement gradient descent correctly too?".
Again, rendering (and all physical simulation) has the benefit of producing a picture and the inputs are effectively "physical". The challenge for machine learning is that you're trying to optimize for figuring out the underlying model rather than relying on the laws of physics for your problem. While this is just as true in any numerical modeling problem ("Should this be a 4th degree polynomial or just cubic?"), the models that ML folks put together are effectively quite complex and the input datasets are both very large and usually unstructured. The visualization of the network being trained in the TF Playground (http://playground.tensorflow.org) helps with the model but not the original space.
I got half way through Andrew Ng's ML course and felt I fully understood the concepts but spent all of my time battling with matrices and the foundation mathematics rather than building on ML understanding. If I had a better grasp of the maths then I think I'd get on better with it.
1.Starts with a VANILLA model, a proven one. To establish a baseline you can fall back on. For example, in deep learning, starts with fully-connected nets, then vanilla CNN, adding BNs and ReLus, then residual connections, etc.
2.Do not spend too much time tuning hyperparmeters, especially in the field of deep learning, once you change the your algorithm, a.k.a network structure, everything changes.
3.Adding complexity as you go. It is important once you established some solid baseline, then you can start add more fancy ideas into your stack, and you will find, fancy ideas are improvement over the already working ideas, and it is not that hard to add it.
4.One important tip to remind, once you change your algorithm, as time goes, those changes might not happy with each other. So reduction is also very important. Rethink your approach from time to time, take away stuff that didn't fit anymore.
5.Look At Your Data. Garbage in, Garbage out. Cannot be more true. Really. Look at your data, maybe a sample of it, see whether youself, as the most intelligent being as of yet, can make sense of it or not. If you cannot, then you need to probably improve the quality.
Anyway, ML is a very complex field and developing like crazy, but I didn't feel the methodology to tackle it is any different from any other complex problems. It is a iterative processes starting from simple prove solutions to something greater, piece by piece. Watch and think then improve.
I'm convinced we lack decent tools for ML debugging: what could they be?
I worked in SEO before, which had far more elements of "black magic". Perhaps SEO helps with the transition to ML, because you are basically reverse engineering a model (Google's search engine) / crafting input to get a higher ranked output. It's feature engineering, experimentation, and debugging all-in-one.
As for the long debugging cycles in ML. John Langford coined "sub-linear debugging": Output enough intermediate information to quickly know if you introduced a major bug or hit upon a significant improvement . Machine learning competitions are not so much won by skill, but by the teams iterating faster and more efficiently: Those who try more (failed) experiments hit upon more successful experiments. No Neural Net researcher should let all nets finish training, before drawing conclusions/estimates on learning process.
Sure, the ML field is relatively new, and computer programming has a longer history of proper debugging and testing. It is difficult to do monitoring on feedback-looped models running in production, yet no more difficult than control theory ;). And proper practices are being developed as we speak . The author will probably write a randomization script to avoid malordered samples automatically in the future.
Debugging and testing is also hard in all things that are somehow related to realtime or concurrency. E.g. OS development, embedded firmware, network stacks, etc. For these things you often also need to know about Math, Physics, Statistics, Electronics, Hardware and Software Architecture, etc.
Game engine development is also hard because you should also know about most of this stuff to really find the most efficient solutions.
Build a Neural Net in 4 Minutes
Build an Antivirus in 5 Min
Build a Self Driving Car in 5 Min
Everyone wants to do machine learning, but nobody seems to want to learn statistics.
I think it's because programming did not require me to learn maths (ok I did learn how to multiply matrices in high school, but I never used it in my life). So my expectation is same with ML.
Those platforms are abstractions so I do not really care how they are implemented. Same way I have not idea how JS really implements objects or sorting. I did one of those crash courses on certain platform, and while there was some stats, I totally did without it. I could probably build a classifier that instead of classifying images of dresses, would classify pillows, curtains or cars. But I did not feel like I learned anything.
I dont think you need to know (or at least should not need to know) much stats at all to use pre-built libraries like TensorFlow.
It feels to me that a lot of the ML courses around concentrate almost entirely on the stats & maths side of ML though. This strikes me as a bit of mental-masturbation.
To teach people how to program from zero-knowledge, we don't first teach them how modern compilers or the JVM works and how they do their complex optimisations and JIT etc. Why are we teaching people how to use ML from zero-knowledge the absolute raw nuts and bolts of the maths involved (complete with all of the mathematical proofs to prove that something works etc)?
Sure eventually it would be useful to know what is going on with the maths, just like with programming it eventually can be useful to know what the compiler/JVM is really doing, but a LOT of productive stuff can be done when blissfully ignorant of what TensorFlow/the JVM is doing.
ML is easy, but the courses are often too aloof and strike me as academically focused on the maths purely for the sake of the maths itself, rather than on what ML can do. ML is not hard - any programmer can understand it, but the maths is off putting to programmers who are not mathematicians (the majority I'd say)
i think there are are a fair amount of people who want to learn the stats .. even some who want to learn the analysis.
it appears that some things that can be phrased in terms of iterative numerical computation can be difficult because there are probably some properties of the limiting behavior of those computations that can't be learned because they've yet to be discovered.
nonetheless, i (maybe?) get what the parent post is generally saying -- as someone who knows nothing about tensorflow, i wonder if tensorflow users are generally interested in flatness, etc., which (exact sequences of tensor products) is the only guess that i've made about what a portmanteau of "tensor" and "flow" uses as a conceptual model.
i wonder if the difficulty of 'machine learning' is that people tend to approach it as its own thing with its own special, entirely separate bag of tricks. certainly there will be some tricks unique to these iterative statistical techinques.
however, i don't think the original article gives enough due-deference to the actual workaday difficulties, challenges, and [non-monetary] rewards of software development in industry: if ML, ANNs, etc. are, as some say, essentially "computer psychology," then being productive on with a team of developers to ship a business product is peddle-to-metal human psychology..
Statistics in Plan English.
Regression Analysis by Gilman
Elements of Statistical Learning.
You need to throw some matrix algebra and calc 1 & 2 somewhere in between. Certainly before ESL. It would also require You can't simply read the books and go through the examples. You will be stuck at a concept at many occasions and you will battle it out until after much googling and reading additional papers, you finally get.
After those 3 books, you've got the basics.
"Data Analysis Using Regression and Multilevel/Hierarchical Models" by Andrew Gelman & Jennifer Hill
That's the book.
For example I've only recently stumbled upon an "Explaining and harnessing adversarial examples" article - and that completely changed my perception about my current work in computer vision.
That follows there is no single "good" algorithm, and you need to have and exploit domain knowledge in order to succeed.
I'm not a practitioner, but I always thought this was the main challenge. Uses of ML are rarely "right" or "wrong" per se, but they rely on intuition to get a model that "works" in a practical sense.
There is no royal way to machine learning: you can't decide you are going to make an algorithm that detects bad comments (as determined by human consensus) and then just go make an implementation that you can reason out to be correct, the way you could prove a graph algorithm correct. Trial-and-error and hard-to-transcribe intuition are baked into the process.
(I'd love to get some insider insight on this comment!)
and those two things are very domain specific so you need to do a lot of homework first, and debugging later.
1) It's 'hard' because you need a lot of 'training data' in order to train models etc.. It's hard to get.
2) 'AI' type interfaces represent a whole new kind of UI challenge. For 'predictive typing' for example, you can optimize an algorithm so that it does better for 90% of the US population, but then it gets 'worse' for the remaining 10%. So it's a paradox. This can have weird effects.
For example, if you have an app in the app-store, you may leave the settings so that it's 'broadly optimal'. You get ok stars.
If you then make it 'better' for those 90%, you might get a little boost in ratings, but you get 1 and 0 star ratings from the 10% for whom it's a sub-par experience. This can destroy your product.
Anyhow - 'there is no right answer' often in AI, and setting expectations can be extremely difficult.
And all of that has nothing even to do with CS.
Stuff like that is absolutely crucial but is often forgotten by engineers.
Yes, often it is possible to determine where the user belongs in that 90/10 setting, but it can take a lot of time in order to be 'pretty sure'. You need a lot of 'user interaction' in order to make that assessment.
The 90/10 rule can broadly apply to things like culture: certain Latino Americans speak/write very differently. A lot of 'le' and 'la' (gendered) in there as well as a whole different set of proper names and colloquialisms.
But it can take some time to really establish if someone is 'latino' from their writing.
Even harder: some people type more precisely, some people type more loosely. You can actually adjust the probability spectrum of a predictive keyboard to match someone's style. But get this: people's style changes all the time! I noticed that when I'm tired, I type like I'm drunk. Or if I'm busy etc.. So there's even variation in style that makes it difficult.
It's a really hard thing to do.
You can 'massively decreasing returns to complexity' in these domains.
Meaning that you can do 'pretty good' with some basic algorithms.
For the next 'bump in performance' you need some complex code.
After that - you really start to have 10x larger models, or crazy complex engineering just to move the needle.
It creates a completely different set of 'Product Management' rules. It's kind of fun, unless you're a struggling startup trying to figure this out on the fly :)
Usually, someone comes along with a new approach which changes the games.
As I understand it 'Neural Networks' i.e. 'Deep Learning' style AI has changed everything voice related quite a lot.
And also - different business approaches can change the game. Google has access to zillions of phrases for properly transcribed audio phrases. This is the 'golden asset' that can underpin a really great voice recognition engine. Google voice is even better than the old industry standard - Nuance - in many scenarios and my hunch is that it's the size of their training data that has given them an edge - at least that.
This is a really concise expression I've been looking for the sentiments you've just laid out so thanks for that!
Really like your insight in Google, think it's spot on.
Re. 'Product Management' rules - would love to know more about this? Do you keep a blog?
Also, I'd imagine that the data could be bad/incomplete, e.g. data was collected in an inconsistent manner or in the wrong areas, leading to an incorrect solution that fits the data, but doesn't solve the problem.
This is the biggest concern I have in using the data that we've collected to come up with a solution using ML: no one ever intended for the data to be used for the purpose for which I would use it, and is incomplete or incorrect.
However, I think the chance of good things coming from inadequate data outweighs not trying to make use of the data.
You HN lads are smart, you're pretty quick to figure out all the 'next problems' that one would encounter.
Yes - getting the right training data can be surprisingly hard.
Did you know how hard it is to get a 'very official' large set of words for a given language? It's hard!
There is no entity that really decides what language is - so you have to kind of determine it from what people write. But that takes a lot of writing, and frankly, you're making assumptions all the time there.
France has a body that's 'in charge' of their language so to speak, and most Western nations have entities that are 'roughly' that. Beyond the West, Japan and China ... it's a gong show.
'Filipino' is barely a language - even though many millions of people speak it, it varies in dialect from village to village and they barely resemble each other.
I think that someone will eventually come up with a 'probabilistic' OS because in the real world, nothing is certain ... some things are just more likely than others!
The French Academy provides an official dictionary and language usage, but speakers hardly restrict themselves to its contents.
Filipino mostly refers to the Manila dialect of Tagalog, whereas Tagalog is a language with many dialects existing in the Philippines. There are lots of languages in the Philippines but as far as I know they aren't referred to as Filipino.
For a lot of NLP problems you will probably have to make your own data set. It can be a lot of work.
>>After much trial and error I eventually learned that this is often the case of a training set that has not been correctly randomized and is a problem when you are using stochastic gradient algorithms that process the data in small batches.
Take this single term from the above sentence: "stochastic gradient algorithms", they represent three key areas: statistics, calculus and CS.
These three things, even when they are to be studied in isolation are much complex. For ML, you must be able to juggle these 3 fireballs effectively. No surprise, it's much, much more difficult than many other software engineering problems.
I'm a researcher with a physics/stats PhD, and if a colleague approached me and said "stochastic gradient algorithms" entails three highly complex areas of scientific knowledge, I would have been stunned and assumed an undergrad with an English major had stumbled into our lab.
Just because you find something extremely challenging, doesn't mean it is inherently challenging. Considering what a lot of people in my field is struggling with, your example is absolutely trivial. You might want to adjust your ego downwards a bit.
I have deep and very high regard for the people who are able to apply ML to fields like DNA analysis or NLP which can take the dreaded "Turing test".
I stand nowhere in the ML arena, but I had tried once and got a good shock of my life: how hellishly difficult the ML can get and how quickly. I really feel humbled. If anything, I learnt to appreciate the width and depth of human brain capabilities. It seems entirely magical now to me how on earth does my brain process/understand such complex things like this very paragraph. Prior to some exposure to ML, I couldn't have appreciated this thing.
>>Just because you find something extremely challenging, doesn't mean it is inherently challenging.
Agreed. I never claimed it anyway. But the kind of problems, for which ML is being applied, the state of the art existing "analyzable algorithms" (like, finding approximate near-optimal solutions for TSP) are far from trivial. In addition to this, we must realize that the ML solution must "beat" these algorithms hands-down in "non-trivial" cases. All this makes ML extremely difficult.
I agree that for real world (and not necessarily state-of-the-art) ML applications, you have to handle many more fields in addition to these 3 fields. All I say is even these three things, when taken together, are very complex things to handle.
On a morning where I woke up feeling anxious and worried after the events of Monday, this was a small but appreciated little reminder that there's still hope.
I was curious about what you do, so I visited your HN profile to find more details. As a HN user - and a Muslim - I believe that it would be more useful for yourself and others if your profile listed something about you and where to contact you, rather than a wall of text arguing for why Islam is evil.
Anyways, I hope you have a nice weekend :)
The person you responded seems like most of "us" - most of us aren't ML experts. He humbly shared his experience with dipping his toe in the ML pool and reported back that he (metaphorically speaking) had to chip the ice off the pool before he could even get in.
This was useful information to "us" (non-experts interested in learning about ML). Calling his difficulties "trivial" for an expert in the field while true, came off as condescending. Telling him (him!) to check his ego came off as egotistical.
No offense intended. ;-)
It's like you're a senior Navy pilot, and you hear a crop duster pilot saying that the Osprey is difficult to fly because you need helicopter piloting skills, plus multi-engine fixed wing piloting skills, plus experience landing on a carrier. He's not wrong, it just doesn't sound that bad to you because you just happen to specialize in the exact combination of skills required.
I am sure there are fields where you don't know stuff, so please grow up.
Well most practical problems require knowledge of different disciplines, so this is nothing special.
For instance, building a fluid dynamics solver requires knowledge of: physics, numerical mathematics/differential equations, computer science, computational geometry, and probably a few more.
Here's the cached version: