Hacker News new | past | comments | ask | show | jobs | submit login
You might not need machine learning (nullprogram.com)
437 points by chmaynard 54 days ago | hide | past | favorite | 193 comments

> A key feature of neural networks is that the outputs are a nonlinear function of the inputs. However, steering a 2D car is simple enough that a linear function is more than sufficient, and neural networks are unnecessary.

This depends entirely on the definition of 'steering a 2D car'. In the model used, throttle is simply proportional to the distance to the nearest wall in front of the car. This means the agent will never accelerate coming out of a corner, because it can't know it has the headroom to steer away from the wall as it's coming out.

Similarly, the model for steering inherently steers the car towards the middle of the track. I would expect the car to wobble from left to right if the road's edges are ragged, make up its own corners if the track edges describe a 'fake' turn on a straight bit, and the car would likely crash if it were to encounter a Y junction or a pit stop. The neural network agents showed smarter behavior here because it is able to capture more complex cross-dependence between different inputs.

On the topic of junctions, if the track were to include them, perhaps it'd be nice if the car chose the quickest route to optimize for lap times. But maybe that stretches the problem statement too much.

> Instead of doing anything fancy, my program generates the coefficients at random to explore the space. If I wanted to generate a good driver for a course, I’d run a few thousand of these and pick the coefficients that complete the course in the shortest time.

In theory this is more random and less efficient than an evolutionary algorithm, which searches the problem space in a structured way. If the author really wanted to hammer the point home, a least squares method to one-shot the coefficients would be more convincing.

All in all, the author doesn't make any hard claims that are false. But I would nuance the point of "neural networks are unnecessary" to "simpler models will do for simpler objectives".

I see the article more as a broader metaphor for the AI hype. Take, I dunno, video recommendation.

Sure, YouTube itself proably built insane stuff in their engine you could never replicate with classic methods (ignoring whether the YT algo is any good).

However, if we are just talking about the Vlog of your real estate company, you should probably A/B-test whether your viewers prefer order by time or clicks and implement a decent title search bar. And kick the consultant hyping you up about ML out, now.

So my takeaway is that is not about whether AI is never useful or about 2D steering, but about using the right tool for the right job.

And building on that, I have to give the author probs to demonstrating an alternative solution to I problem wich I would have definitely solved via AI.

> Sure, YouTube itself proably built insane stuff in their engine you could never replicate with classic methods

Probably. But pretty much anything they recommend is junk, so... That's where the author may have a point. If you don't understand your AI algorithm anymore, it's hard to improve it or even realize how wrong it is. AI is generally good at steering the masses into a couple of "averaged" directions. At the individual level though, it's often crap, unless you are the perfect stereotype that the algorithms assumes you to be.

What's good for you is not the same as what's good for Youtube. Youtube wants to maximize watch time at all costs. Their algorithms is probably very good at that.

Video recommendation is the quintessential machine learning killjoy. YouTube and Netflix were a lot more interesting before they achieved algorithmic homogeneity.

They kill exposure to anything fresh, you teach it a couple things you like and then it keeps you swimming in the same pool.

Rather than discovering something new, everyone just watches The Office and Parks and Rec., again and again. Now those theme songs make my skin fucking crawl.

> ... and then it keeps you swimming in the same pool.

This is a consequence of the metrics that are being optimized, it's not a fault of the algorithm per se.

It's not a fault at all. If you're going to spend more time watching videos if you're recommended stuff Youtube knows you already like, that's what it's going to do. Youtube just wants you to watch more videos. They don't care whether you are exposed to a variety of content.

Except I think there’s convincing argument to make that engagement will go down over time, if the algorithm makes no attempt to prioritize or suggest novel content.

The rare occasions I discover a new channel, it’s almost always from some source other than the algorithm: a referral from a friend, this site, another YouTuber, etc. My viewership of the same repetitive roster of videos absolutely tails off until I find something new from elsewhere.

For example, in months of being subscribed to my mechanics [0] (who does incredibly engrossing and relaxing restorations of mechanical stuff), not once was I suggested a video from Baumgartner Restoration [1], an art conservator who produces videos with a similar attention to detail and high production value.

Thematically this should be an easy recommendation for YouTube to make, but evidently the content is just different enough that it scores as a false-negative. After finding the latter channel independently, my viewing time absolutely rose for a while.

In theory, YouTube ought to be able to detect and learn from this signal of non-algorithmic discovery of new content. Yet, here we are.

[0]: https://www.youtube.com/c/mymechanics

[1]: https://www.youtube.com/c/BaumgartnerRestoration

Stagnation is a known problem in reinforcement learning and similar methods. It's very easy to get stuck at a local maximum. My favorite fun example is https://gym.openai.com/envs/BipedalWalkerHardcore-v2/ where a standard DDPG(https://arxiv.org/abs/1509.02971) will get stuck at pits in the environment. Although it could get a higher score if it learned to jump, there is a penalty with falling in that makes it stabilize on standing still and running out the timer. Video: https://www.youtube.com/watch?v=DEGwhjEUFoI

I suspect there is something similar going on with video/music recommendations. When a bad novel suggestion is made the penalty is likely too high to overcome (User immediately clicks off) with traditional reinforcement methods.

I agree with you, I have the same feeling about Spofity, it's algorithm just doesn't work for me, I have to search somewhere else for recommendations.

My (wild) guess is that it would be very hard to come up with a universal algorithm that doesn't exhibit this characteristic, due to some sort of effect that's comparable to the class imbalance problem, but with added feedback effects.

On the other hand I taught pandora to only play songs by artists that had done heroin, thats kind of cool, it has tons of variety from ray charles to jhonny cash to alice in chains, and it finds artists I had no idea about like James Taylor. Also, don't try to code while listening to my horses channel... There may be variety but there is also a common quality of alert-sedation.

You may be overrating what they do. I suspect that 90% of the recommendation weight is based on what other people clicked after watching the same video.

I've seen so many really complex real-time recommendation pipelines that could be replaced by a simple weighted click-rate style algorithm.

The application of ML and data science in this industry is quite hilariously bad, really.

Somehow so much leads to correlation. :-)

That is the point, for some problems NN is unnecessary. That isn't an argument against the point, that is the point. Many problems in life allow simpler models and can save time and computational power if people used their brains first.

Even more interesting is to show that you can also optimize the neural network weights using a GA or even the authors own basic method. Would be interesting to compare the results of the authors method with the neural network optimized in the same way.

There is enough delay in the feedback loops that the simple steer to the middle works - despite the jagged edge, you still track a straight line to within a few cm.

in motorcycles (and rwd cars) everyone knows you steer with the rear anyway.

Somewhat related, it's interesting how non-intuitive steering is to people for motorcycles and bicycles. They do it correctly, but it's hard to reason about.

That is, that pushing the left handgrip forward, at speed, turns left and not right. Yet, at very slow speeds, like walking it, it's the opposite.

After a few years of riding motorcycles I once went on my first snowmobile ride. The controls were so close to a bike that I kept countersteering into the side of the trail. Once I figured out what I was doing wrong, I had great fun from then on.

The other interesting thing is how hard it is to convince non-riders that counter-steering is a thing. They will just not believe you. Even people who've grown up riding bicycles and counter-steering unconsciously their entire childhood.

>> pushing the left handgrip forward, at speed, turns left and not right

This is the most counter intuitive physics i’ve ever experienced.

The really weird thing is i’d been riding pushbikes all my life and had my motorbike license almost a year before i learned this. It wasn’t taught as part of the licence training.

Super handy to know as a tool, it’s kept me from going for a closer look at the scenery on a couple of occasions.

You should try a sidecar. Want to go left, turn left. Want to go right, turn right, but not too fast because the chair takes to the air and now motorcycle physics are involved.

Thanks for this comment. I used to ride casually but had never heard of this. Found a quick video to explain it


that reminds me of a street hustle in london where a guy had reversed the steering mechanism and would bet you a pound you couldnt steer it straight 5 meters.

pretty interesting to me that we can reverse our intuition.

I love the article, but I don't agree with the premise that machine learning equals neural nets. In my understanding machine learning is a very broad term that just as well could be applied to the polynomial model if the constants were optimized algorithmically. I feel like the presented argument is more for transparent vs opaque models rather than machine learning vs something else. Also one could argue that the polynomial model is just a perceptron[0].

[0]: https://en.wikipedia.org/wiki/Perceptron

The machine learning course at my university starts out with polynomial regression and estimators, statistics of classification, etc.. Neural networks are only one tool in a large toolbox.

But they are all the rage and it is no surprise that a lot of people want to play with them.

Cynically, neural networks are easier as you don't really have to think about your model. Give some examples with some classes and you're done. Or give examples of one class and let the neural net generate new ones. Doing away with the abstraction beforehand is an enticing prospect.

> Cynically, neural networks are easier as you don't really have to think about your model. Give some examples with some classes and you're done.

This way of thinking about it leads directly to things like statistical redlining.

It's also not specific to neural networks. I take a similar approach with logistic regression. Except that I like to replace the "and you're done" step with, "and you're ready to analyze the parameters to double check that the model is doing what you hope it is." Even when linear models need some help, and I need to do a little feature engineering first, I find that the feature transformations needed to get a good result are generally obvious enough if I actually understand what data I'm using. (Which, if you're doing this at work, is a precondition of getting started, anyway. IMNSHO, doing data science in the absence of domain expertise is professional malpractice.)

There is no, "and you're done" step, outside of Kaggle competitions or school homework. Because machine learning models in production need ongoing maintenance to ensure they're still doing what you think they're doing. See, for example, https://research.google/pubs/pub43146/

That's an excellent approach -- and how I try to introduce people to NNs.

NNs are just polynomial regression with polynomial activations; and piece-wise linear regression with relu activations (etc.).

A NN is just a highly parameterized regression model -- for better, or worse.

That was an eye-opener for me.

I had always thought of neural nets in terms of the massive connected graph, that in my head was somehow behaved like a machine.

When I realized in the end its just a representation of a massive function, f:Rm->Rn, which needs to fitted to match inputs and outputs.

I know this is not precisely correct and glosses over many, many details - but this change in viewpoint is what finally allowed me to increase the depth of my understanding.

It's unclear that there is such a thing as an NN, and in any case, that it is graph-like.

What are the nodes and edges?

There is a computational graph which corresponds to any mathematical function -- but it is not the NN diagram -- and not very interesting (eg., addition would be a node).

NNs are neither neural nor networks.

> Cynically, neural networks are easier as you don't really have to think about your model. Give some examples with some classes and you're done. Or give examples of one class and let the neural net generate new ones. Doing away with the abstraction beforehand is an enticing prospect.

If you're trying to solve a well understood business problem sure but my issue with this is that you pigeonhole yourself and your solution. I'm much more interested in understanding the model than doing the implementation because that allows you to build on top of what you get out of the box in a framework for example. It's like learning React before learning Javascript. It might be a good short term solution but long term it certainly isn't.

Oh, I was not defending neural networks. This was the cynical sales pitch for the case where you don't want to employ mathematicians or computer scientists, but just throw code and computational resources at the problem.

But isn't that an important part of the value of neural networks? Mathematicians are expensive so we'd like a computer to make a model for us, just like drivers are expensive so we want self-driving cars.

The issue with that is NN fail in some really interesting ways so you still need a lot of effort to get a robust solution. Remember, after some serious investments by many organizations self driving cars are still in development. At the same time a few people have demonstrated a basic system that seems close without nearly that much investment. Unfortunately, the difference between a demo and working solution can be several orders of magnitude.

> It might be a good short term solution but long term it certainly isn't.

It is only a temporary solution - unless it works.


Could a person with ML experience come up with this solution? Yes! Would his ML experience help him come up with this solution compared to someone who just learned numerical methods and automatic control theory? No. This isn't an ML solution.

Just because something is taught in an ML course doesn't mean that it is ML. It is pretty common for physics classes to teach maths and for chemistry classes to teach physics for example.

So if something is taught in ML class but also in statistics class then it is statistics and not ML. If something is taught in ML class but also in a numerical methods class then it is numerical methods and not ML.

Well... I guess most people equal ML with AI and use these terms interchangeably.

If you just replace ML with AI everywhere in this article it is going to make sense.

The article has other problems, one being the main premise.

The problem isn't to drive a car around track (which is what the polynomials did) but rather write a program that can figure out how to drive a car without you knowing how to solve it.

Well that depends on your definition of AI. Which isn't well defined. We call AI what we perceive as "magic". Black box algorithms have a higher chance of being perceived that way (e.g. neural nets). When you get some insight into how an algorithm works (easier for transparent box algos, but same holds for black box algorithms), you start to see it less and less as "magic", and, consequently, you're less likely to refer to it as an (artificial) intelligence. Because ultimately, that's what we mean by intelligence -- magic. When we say that something is intelligent, we liken it to ourselves: it evokes a sense of identification. It all comes back to a sense of humans being fundamentally separate from "the other" (computers in this case). If we saw the mathematical models and algorithms as just that, we wouldn't call them AI. Also, if we didn't think of our intelligence as more than the behaviour of our biological computer, we wouldn't be enchanted by the concept of non-biological systems mimicking some of our behaviour.

A professor once told in class "when it works and you don't understand why, it's called AI; when you do, it's called algorithm"

I disagree.

We don't find these systems intelligent because, on inspection, they arent.

We are intelligent. Not "magically", but actually nevertheless.

Our intelligence, and that of dogs (, mice, etc.) consists in the ability to operate on partial models of environments; dynamically responsive to them; and to skilfully respond to changes in them.

This sort of intelligence requires the environment to physically reconstitue the animal in order to non-cognitively develop skills.

It is skillful action we are interested in; and precisely what I missing in naive rule-based models of congition.

You provided an illustration of "magic". It's important to realise that you don't need a complex algorithm to produce complex behaviour (see Stephen Wolfram and his work on cellular automata).

In my understanding AI is an even broader term and means "any solution that imitates intelligent behavior". E.g. expert systems which are pretty much a bunch of if-then rules are also considered AI.

It's my understanding as well, many things that a modern programmer thinks in term of "computation" were once considered to be "AI". Lisp and Prolog were "AI", even the A* algorithm is still considered a rudimentary form of "AI" in textbooks just because it uses heuristics. There's a joke that says "every time AI researchers figure out a piece of it, it stops being AI" [0].

It's why I use "AI" and "ML" interchangeably although I know it's technically incorrect - the formal definition doesn't match what people are currently thinking.

[0] https://en.wikipedia.org/wiki/AI_effect

There have traditionally been different approaches and definitions for AI. Some emphasize behaviour while others emphasize the logic behind the behaviour. (In some sense, while expert systems of course were an attempt at getting practical results, they might also have been an attempt to implement what was seen as human reasoning, while e.g. black box machine learning could be more about just getting the behaviour we want.) Some approaches view agents as intelligent if their action resembles humans or other beings that we consider intelligent, while other approaches are merely interested in whether they perform well at a specified task, perhaps more so than humans.

So yes, "any solution that imitates intelligent behaviour" is probably right, but with nuances with regard to what that actually means.

That's not symbolic AI though. That's only statistical methods. The statistical methods are all the rage now, but explainable AI that can reason is an important area of computer science (and research) and uses formal methods.

Edit: yeah, you can downvote this, but current AI research splits right along this line, whether it's symbolic or statistical. Some AI courses will use NNs, others will use Prolog and ASP. You can't just dismiss a whole field of research by reducing AI to statistical methods.

"Expert systems" were the hot research area in AI prior to machine learning (data driven methods, basically). Old methods and problems from that era like automated reasoning still have some research and applications going on, but aren't remotely as big an area as machine learning.

When I see "symbolic AI" I immediately think of Gary Marcus and immediately feel disdain towards the topic because of his behaviour on Twitter and other places.

I don't know the dude. I "only" know that my field of research is deductive reasoning in interactive applications and that this area falls under "Logic Programming" and LP is an area of AI.

I know that AI researchers are usually a bit dismissive about the other area. I don't like statistics either. Reducing the whole of AI research to statistical approaches (and NNs are one of those) is disingenious and dismisses hundreds of researchers doing important work.

You may not want to have rule-based image recognition, but if your car decides to run over somebody, I feel we better have an explanation for this behaviour based on reasoning and logic.

I don’t think anyone is dismissing symbolic AI. As far as I can see, it’s just not beating current SOTA results of NNs? It’s not really about ideology, it’s about what currently has superior performance. Model interpretability is not always a requirement.

The author may have implemented ML when they optimized their polynomial constants:

> If I was developing a racing game using this as the AI, I’d not just pick constants that successfully complete the track, but the ones that do it quickly.

If they wrote code that automatically picked constants that successfully completed the track quickly, (even something as simple as sorting the results by completion time), then that's reinforcement learning.

I agree, machine learning can certainly be over transparent models and classic models can certainly be non transparent. I tend to think of machine learning as any method which optimizes not only the model parameters, but also the model structure in a single step. Though then again the latter are just parameters of a more abstract model. So its all always optimization in the end.

> Also one could argue that the polynomial model is just a perceptron

One also can argue otherwise [1].

[1] https://matloff.wordpress.com/2018/06/20/neural-networks-are...

I came here to state the same.

I am not sure when we changed the terms, but back in the day, this would happily fall into machine learning. As he mentioned, if you want a good driver you would execute thousands of experiments to pick a good set of parameters

As soon as we recognize plain old regression as machine learning, then we start to see "averages" as models of systems and how practically useful could that be?

I think you're being facetious, but on the off-chance you're not, and for the benefit of others: averages are incredibly practically useful for modeling systems. Parameter estimation (which generalizes averages and applies to other distribution features like variance) is a foundational modeling methodology. It's useful for both understanding and forecasting data. Measures of central tendency are nearly always good (if obviously imperfect) models of systems.

Here is a trivial example: one of the best ways of modeling timeseries data, both in and out of sample, is to naively take the moving average. This is a rolling mean parameter estimate on n lagged values from the current timestep. Not only is this an excellent way of understanding the data (by decomposing it into seasonality, trend and residuals), it's a competitive benchmark for future values. The first step in timeseries analysis shouldn't be to reach for a neural network or even ARIMA. It should be to naively forecast forward using the mean.

You might be surprised at how difficult it is to beat that benchmark with cross-validation and no overfitting or look-ahead bias.

Thank you for your fabulous response. I hope my provocative comment wasn't in bad humor, disrespectful or trolling. I too love averages and regressions. Thank you for proudly defending these marvelously simple and powerful tools.

Well, actually working with "averages" as baselines before you start experimenting with more complex ML models is a good habit.

Sure, they are dummy regressors [1], but they can be so useful for proving that your whatever ML model you choose is at least better than a dummy baseline. If your model can't beat it, then you need to develop a better one.

They can even be used as a place-holder model so you can develop your whole architecture surrounding it, while another teammate is iterating over more complex experiments.

You could also settle in for a moving average process as a first model in a time-series [2], because they are easy to implement and simple to reason about.

Never under-estimate the power of an "average".

[1] https://scikit-learn.org/stable/modules/generated/sklearn.du... [2] https://en.wikipedia.org/wiki/Moving-average_model

As a former data scientist I feel we try solve problems the hard way because one of two reasons (maybe both):

1. To feel smart 2. To justify our paycheck

Most of the time simple solutions like the one in the link will be more than enough but we just can't resist the urge to implement this new paper we just found. I remember thinking about using a NN for a problem we had, after looking it closely for two days all I needed was a simple linear regression and got the extra benefits of being able to explain what was going on.

Eventually I started looking for the "elegant" solution because now that was the thing that made me feel smart. Although some times using brute force is the best approach. There's a balance you find once you gain enough experience... I think.

I agree, but as a data scientist myself I've always known that part of my job is to choose the right tool out of the box. If that tool is a linear fit, or heck even adding a few columns in a spreadsheet, it's my job to realise that and choose appropriately. If it's a complex custom-built NN, then as long as the cost-benefit analysis justifies the build time, I'll choose that.

Of course, as you mention, there's always the business-political aspects - of explaining or justifying your choice to people who don't understand any of those tools, and who often want to pretend that they're part of a "smart data-driven AI" company.

> 1. To feel smart 2. To justify our paycheck

Finally someone on HN is being honest. It's also a good touch that this is downvoted somewhat while replies getting mad at essentially choice of words (of the title, nonetheless) are on top.

Doing the elegant solution was the first thing I learned in grad school, it was literally a comment a professor made. In undergraduate, you feel smart for working out a long calculation. In grad school and beyond, you should feel smart by doing the least math possible and using intuition. It's not that long calculations aren't necessary, it's just that thinking a little first is important rather than just turning the hard work crank.

Genuinely curious as to what you moved into after working as a data scientist. I'm a data scientist and desperate to get out.

I founded my own company after I had a problem that solved from myself and thought I could actually scale it. Just starting tho.

As a data scientist at the start of my career: why?

The list is so long we might as well start another thread for it.

For me it was unrealistic expectations. Tons of companies hire data scientists just because everyone else is doing it and/or they think somehow they'll make everything better just by being in the company.

If you are lucky enough to be in a truly "data driven" startup then you'll most likely have fun a learn a lot.

I've seen this pattern frequently on a few data science teams. However, I do think there is a lot of value to data science, but it more has to do with incorporating metrics and accountability rather than modeling.

For example, I've seen many mission-critical rules-based systems with no accuracy metrics. A data scientist is asked to beat this system using machine learning and fails miserably because the rules were developed by experts over years and the data scientist has a month and is only a few years out of grad school. However, in the process of building the model that data scientist built data pipelines for measuring the rules-based system's accuracy and fixed a small number of major data integrity issues. Unfortunately this isn't considered a win and the data scientist usually leaves the company soon after. :shrug:

It is a nice, albeit problematic example. Nonetheless I find many such examples in my daily work as a Data Analyst.

Regarding recommender systems I see many companies trying neural nets and so many other fancy ML stuff for things that - in AB-tests are always outperformed by basic rules.

I understand the fun it is to build stuff and to use the new hot stuff. And at least for many analysts and marketing people as well as shop product owners this is new hot shit.

I also understand, that it is way more easy to get management to hand out the big bucks for something that is the new rage, as they tend to read the respective soundbites in their manager magazines.

But as said - I see it underperforming in tests nearly all the time. Not only, but esp. if you take the costs of development and maintenance into account. These systems cost more to build, more to enhance, more to run and bring in less real business value 70 - 90 percent of all the times I have seen them.

But they are presented to management in shiny presentations from agencies that need to sell the new hot shit to their clients to show that they are relevant. Because, as said, management thinks they need it and are often not open to agencies telling them, that the business value could be better served otherwise. Because in the end for a manager it is often times more valuable to show a fancy state of the art project to his/her higher ups than creating real business value.

> “ Regarding recommender systems I see many companies trying neural nets and so many other fancy ML stuff for things that - in AB-tests are always outperformed by basic rules.”

I work on large scale recommender systems for an ecommerce company and in my career I’ve seen only the exact opposite.

Don’t get me wrong, sometimes simpler ML models, like clustering LSA vectors or nearest neighbors, work better than complex models like neural nets.

But I have never seen plain rule systems work better for any problem even remotely at scale. Rule systems give an illusory sense of control and understanding, yet are rife with complex interaction effects and edge cases that typically make them intractable to change.

From big automotive clients to small-ish fashion eCommerce. From publishing to food-delivery (with upselling in the checkout process) - I found the gains in using rules -> simple ML techniques -> complex systems like NN in most cases not to warrant the costs.

The quality of recommendations nearly always increased from a revenue as well as perceived quality standpoint. However, it almost never had a positive impact on the profit margin that would have justified the necessary investments.

As said - from a quality standpoint most simple systems were just "good enough".

One would need to know the business case and environment. But take automotive (new cars) as an example: The goal is nearly always to get the user to request some form of contact from a physical dealership near them. For that you nearly never need a perfekt, fully configured recommendation.

I know of an example (a car manufacturer) where the search space of all configurable variants (including various things the car owner would never register because they are specific screw variants) is of a size where even the number of visitors to the website per year is some orders of magnitude less than the number of options.

The way to go here was to reduce the search space and the number of variants. Here it turned out that you can, quite fast, reach that goal with few specific questions (active learning) to lead the user to vehicle variants, which correspond to its interests and led to a disproportionately high contact behavior.

And yes: ML techniques were used for the analysis and reduction of the search space. For the concept people to then develop specific questions to get to these reduced attributes. But in the end the recommender now works rule-based.

I don't imply that this holds true for every scale of company/problem. And I know some counter examples - but most companies do not operate on that scale. If you are ebay, Zalando (Germany) and the likes: I would probably get different results from testing the revenue validity of the different approaches.

Your comment is in wild and incredulous disagreement with widely published results and my own ~10 years of industry experience doing ML professionally in ecommerce, quant finance, education technology and quant advertising.

In fact, I’ve always found even just plain cost per unit service goes down with the introduction of more complex ML models. Their greater training complexity and compute costs are much more than amortized by improved performance, easier ability to train and deploy new models (it’s much harder and labor intensive to adjust a rat’s nest of custom business rules than a black box ML model, even in terms of transparency).

Just reduction of operating costs alone is usually a reason to favor ML solutions, even if they only achieve parity with rules systems (though usually they outperform them by a lot).

Your comment makes me feel your methodology for assessing business value and comparing with rule systems is deeply flawed and probably biased to go against ML solutions for preconceived reasons.

> probably biased to go against ML solutions for preconceived reasons.

Wow. Nice ad hominem. Thanks a lot for that.

> Just reduction of operating costs alone is usually a reason to favor ML solutions

I have yet to see one solution in the industries I work in and the clients I work with, were a ML solution beats simpler systems in development and operation costs (given the current real world environemnt there).

And believe me I try to sell these projects to clients, as I strongly believe that in the long run they could gain something from that.

But that would also mean getting rid of a clusterfuck of different systems, different data definitions from department a to department b as well as market x to market y. Politically motivated data mangling (we do not want "central" to know everything so we do not send all data or data in the necessary format).

When you see that markets use technically the same CRM system for example, but they rename tables, drop columns, use same dimension names for different things and so on integrating one market into a central data lake becomes a daunting task, let alone 130 markets. And this is just CRM. Not sales. Not - given automotive - the data from retailer systems.

But this would nonetheless be the data you need for ML systems to learn from. And then there are legal issues. car dealerships are separate legal entities. They are not allowed to "just" send PII date to the central brand (at least not with European GDPR). There is also a lot of stuff central just isn't legally allowed to know like discounts given - just to name one example.

After you get all of this entagneld and cleaned up (and changing all necessary business processes that depend on said structures I strongly believe ML would probably be cheaper. And leading to better results.

Don't think that I am telling my clients otherwise.

Neural Nets really are better, just because either you, your clients, or the problem you are solving is simple, doesnt mean NNs dont work. They work absurdly well.

Not what I said. It is just that in the respective environments the costs of developing and operating these doesn't return a higher ROI than simpler systems.

Not because they do not work, but because simpler systems can be run comparatively cheap in environments that are very stratified and were the underlying data situation is a messed up clusterfuck to begin with.

Believe me I really, really wonder how these companies are able to make money given what they have in terms of underlying central data quality. It is unbelievable sometimes.

It depends.

Rule based systems are great if you have people with deep domain understanding developing the rules.

Unfortunately, those people are rare, so most rule-based approaches fail to perform well.

However, most recommendation systems suck unless you get someone who knows what they are doing to build them.

In terms of business value, I would be very hesitant to make strogn statements like the above (in both cases, actually).

Well - I said I have seen and tested. I would love for positive ML cases to arise. I really would. That would make it way easier to sell my Data Science colleagues to the respective clients on terms other than hype and buzzwords.

I also believe that with a good situation in underlying data quality we could be talking about massively reduced costs in getting these systems up and running - and this would tip the scale in favor of said systems.

But what I see in terms of data quality makes me sometimes just want to run as fast as I can in the other direction.

Yeah, to make this stuff work well, you normally need lots of data, so consumer tech is mostly where you see successes.

If the data isn't being logged by automated systems daily, then you probably don't have enough to make these kinds of things work.

In smaller data environments, rules are going to perform much, much better (but still require the domain expertise, which isn't cheap).

I feel like there’s something more interesting going on here than the author is giving credit to.

Most tools you can start by solving simple problems with, and gradually work up the complexity of the problem you’re addressing until it does “something useful”.

This is a good way to learn what you can, can’t and should use a tool for.

Deep learning is problematic though; solving trivial tasks is actually quite difficult, and often the “something useful” level of sophistication means copy pasting someone else’s paper and tweaking it a bit and kind of vaguely hoping something you do makes any difference.

Being able to solve trivial problems with neural networks is really important, and useful; not because it’s a good solution, but because it means you can see what happens when you try it out.

The problem with this post isn’t the conclusion; the author is quite right. You can solve this problem in many ways, maybe NN aren’t the best for this kind of trivial problem.

...but, if you want to solve harder problems with more than random trial and error tweaking parameters, solving easy problems with the same tool is a good way to learn how.

...and we have like 20 years of proof that using hand crafted models has been proven not to scale effectively.

> and we have like 20 years of proof that using hand crafted models has been proven not to scale effectively.

What do you mean by this? In what context?

Like, the twenty year old models are still running in credit risk and insurance, so I'm confused if you mean in ML/statistical modelling.

You know, driving cars, like the article was talking about?

...or, NLP, audio & image recognition, recommendations... come on. It’s not controversial.

> recommendations

A lot of this is standard statistical methods, much of which are much older than twenty years.

Really, that's the part that threw me, twenty years ago everyone was going crazy for SVM's, which is still machine learning, but the features were definitely hand crafted.

I think deep learning has been super successful with unstructured data, but for tabular data it's pretty much a wash between NN's and boosted trees or generalised additive models.

You do need some flexibility in your function approximation, but not as much as people commonly believe.

I do sorta wonder about that.

The deep learning "revolution" corresponded with an exponential growth in the time/effort/money being thrown at these problems and the amount of data available to do so.

In an alternate universe, could everyone be going crazy about kernel machines?


GPU's are definitely a big part of why this stuff has improved, as you can train much, much faster on larger datasets which is going to improve performance.

NN's are super flexible though, and I'm not sure you'd have gotten the same level of performance out of other methods.

Interesting question though.

I can only agree. It is also notable that pretty much the entire repertoire of debugging techniques from programming land are useless.

Attempting to implementing a paper from scratch is an interesting experience. (1) implement what they describe (2) it doesn't work (3) ... well, that was fun. Project over.

Very different experience from trying to implement quicksort, even though the extensions to a basic neural net often aren't conceptually much more complicated.

The approach they show in the end - it's still machine learning though? Exploring a space and finding parameters to optimize for a loss function (speed around the track), just not deep learning with neural nets.

I think Arthur Samuel would agree. This approach has a loss function, parameters, and inputs that feed in to a model to optimise the parameters.

The big difference between this and the other approach mentioned in the article is the model is a simple one that's easy to understand instead of a many layered neural network which is rather opaque.

I think the article may be better titled "You might not need neural networks."

> I think the article may be better titled "You might not need neural networks."

Since a linear model is essentially a single layer neural network with linear activation, we can't even say that. The athor was using a neural network without realising it :)

I'd argue that a NN cannot have linear activation. I mean, if the activation function is linear, it is not an NN anymore.

Machine learning is a set of techniques developed to attack modeling problems that traditional algorithms couldn't solve. So if the algorithm was developed and used before computers it definitely isn't machine learning. Everything done in this article was known and used before computers existed hence not machine learning.

Doing automatically controlled systems was still possible before computers just that it required a bit more creative use of analog components.

> Machine learning is a set of techniques developed to attack modeling problems that traditional algorithms couldn't solve. So if the algorithm was developed and used before computers it definitely isn't machine learning

I think that it's important to note here that this was the set of problems that computer science researchers didn't know how to solve.

In the early days, they mostly ended up re-inventing statistical methods.

And to be fair, a neural net is just a bunch of linear models joined by a non-linearity. In that case, it's essentially stacked logistic regression, which was invented before computers.

> And to be fair, a neural net is just a bunch of linear models joined by a non-linearity.

Nobody did this before computers though.

> . In that case, it's essentially stacked logistic regression, which was invented before computers.

It isn't "basically logistic regression", it is just a technique which uses logistic regressions. The full technique is ML. If you remove the ML parts it is basically just logistic regression left though.

Like, logistic regression uses a non-linearity to convert the outputs to the 0-1 scale. How does that differ from a one layer neural network?

I think this would probably be a more profitable discussion if you could define Machine Learning for me.

Is logistic regression not ml? Or maybe only if it's fit with gd?

> Machine learning is a set of techniques developed to attack modeling problems that traditional algorithms couldn't solve. So if the algorithm was developed and used before computers it definitely isn't machine learning.

How does your evidence support your claim?

An important observation. People forget that genetic algorithms and a whole swathe of classical machine learning strategies exist.

It might not be a hard boundary, but I think the perception of ML vs. optimization is how much of a model you have. If all you have is a black box, then it's ML; if you know how the system you are studying works, it's (parameter) optimization.

That’s a very unfair distinction, almost like a No True Scotsmam fallacy to say machine learning is only bad and other stuff is only good (in terms of transparency).

But machine learning has predated neural networks by hundreds of years. The core mathematical basis of all machine learning coursework linear regression and decision trees. Other models like SVMs, Bayesian models, nearest neighbor indexes, TFIDF text search, naive Bayes classifier, etc., are basically like machine learning 101, and they have many different properties regarding interpretability depending on the problem to solve.

Saying that linear regression is machine learning is like saying that newtons laws is chemistry. There was no machine learning before computers, just regular old optimization algorithms.

Yes, there was. It was just called statistical modelling.

There were no machines in the sense of ML in 1740.

This is very false. Least squares regression fitting, Chebychev polynomial approximation, and maximum likelihhod estimators all existed at the time and those are all classic examples of standard machine learning. The term “machine learning” essentially encompasses any type of algorithm that expresses inductive statistical reasoning. Even just elementary school descriptive statistics is machine learning. “Machine learning” is a super old subfield of applied mathematics. The fact that the terminology “machine learning” didn’t exist until things like perceptron and SVMs came along is utterly irrelevant semantic hairsplitting.

But ML is essentially just function approximation, and that definitely existed back then.

I know this is super pedantic, but it's important to remember the roots of things, and that even things which appear new have precursors that are much older than a lot of people realise.

Isn't that just searching a space? Machine learning generally refers to a fancy way of searching a space. Here the guy searched randomly so I'd say not machine learning.

But perhaps using a cost function or a loss function is enough to call it machine learning. A machine just used an algorithm to learn another algorithm after all.

Stochastic gradient descent and other things like genetic algorithms and simulated annealing are random search techniques specifically created and taught in the context of machine learning.

Simulated annealing goes back to the seventies and was definitely not "specifically created in the context of machine learning". Many (most?) optimization techniques have their origin in Operations Research.

Simulated annealing was developed for purposes of parameter fitting in physics modeling, based on Metropolis Hastings which was likewise developed in the context of parameter inference for model fitting. Simulated annealing for eg traveling salesman problem came later.

I do agree some optimization algorithms are rooted in other fields. I wasn’t trying to say that machine learning is the only historic field from which optimization methods were developed. I just wanted to point out it is a major historic field where some highly respected search and optimization procedures were first created, since people often overlook how old machine learning is and the vast set of modeling procedures apart from neural networks that make up the core of machine learning.

Lol. Are all optimization problems specifically created and taught in the context of machine learning? Do you know any other context than ML?

Your comment is not coherent. You ask, “ Are all optimization problems specifically created and taught in the context of machine learning?” but this has no logical or semantic connection to my comment in any way. It fails to be a valid response or question.

Instead it seems you falsely believe you are writing with some sarcasm that endows rhetorical flair to undercut my comment. It’s very rude and juvenile in addition to being wholly ineffective.

Correct. I was waiting for someone to point this out. He optimized the paramaters of a polynomial equation. The machine learned to effectively race around the track through this optimization. The machine learned

> Instead of doing anything fancy, my program generates the coefficients at random to explore the space. If I wanted to generate a good driver for a course, I’d run a few thousand of these and pick the coefficients that complete the course in the shortest time.

It is worth pointing out that this strategy can likely overfit on the data that you have used for training: when you change the track, your car may not behave as good as before. In other words: the coefficients are only good for that specific track(s).

The author is still using Machine Learning, even if not with neural networks: the need for rigorous strategies for model selection doesn't disappear.

> It is worth pointing out that this strategy can likely overfit on the data that you have used for training: when you change the track, your car may not behave as good as before. In other words: the coefficients are only good for that specific track(s).

More generally, I think this approach is only suited to "static" courses, which only matter in contrived demos. Any real use of a steering algorithm requires reacting to conditions that can't be predicted a priori; e.g. if you have to avoid collisions with other cars, and one car is controlled by a human player, every run is effectively a different track and overfitting like this would not be an option.

I agree with the premise. But to dig into the specifics here, because it's not clear in the article: is this model generalizable to arbitrary tracks, or will the author have to generate new coefficients for each track?

If you have to generate new coefficients for each track, your polynomial regression reduces to a polynomial interpolation of two-dimensional points which represent the track path on a plane. Which is fine and still accomplishes the specific goal, but doesn't solve what would generally be considered the actual research problem.

But then again I don't know if the neural network actually achieves this. It's a little unclear in the video: I don't know if the model is being able to learn from the human guiding the vehicle on n iterations or if the model is generated by the human guiding the vehicle on n iterations. Presumably the research goal is to develop a model which learns tracks (in this circumstance, that would be akin to the model choosing the coefficients rather than being the coefficients).

More accurate title:

You probably don't need a neural network if your model has 5 inputs.

There's another aspect of this overhype of machine learning right now.

Manager - We need to do X, we need a ML Data Scientist.

Engineering Team - there's a simple solution we can use instead of ML.

Manager - No we need a ML Data Scientist, we need to do this right.

Time Passes, hiring a ML Data Scientist is hard, and the problem never gets solved.

It goes without saying that if you know (or guess) the model or function you are trying to implement, then you don't need machine learning.

Toy problems like this are still interesting because they demonstrate techniques that are applicable to bigger problems. I'm guessing the network in the car game doesn't need more than one layer and two outputs (acceleration vector), but that is besides the point.

The technique being demonstrated is the ability of GA (or maybe particle filters), to find "optimal" weights for a network given whatever simulator. This is always interesting, especially when done with graphics like this.

That’s really not a good example. The polynomial-driven cars behave erratically and slow down for no reason - fixable, but the key is that improving that behaviour will take many man hours of trial-and-error work and math, where the ML version will just improve itself based on the goals.

The problems you mention are likely an artifact of him using a bad optimzation method, not inherently a limitation in the function approximator. Ironically, its most similar to genetic search, a method most commonly associated with machine learning.

If he used a standard optimzation method instead, convergence would be fast and the result much better. A similar problem, using splines to set force inputs for a robot that travels through a maze with barriers optimized using start and end position and force minimization, was a lab in a course I took last year.

The question becomes one of hyper parameter search, i.e. what kind of model/function approximator is sufficient. Here the problem is easy enough that its easy to find a sufficient simple model. The huge networks are nice for more general problems because they tend to work moderately well for everything... in the dataset.

I simply do not understand the ML hype. It's absolutely ridiculous. On top of that you have Elon Musk thinking we are a few months away from SkyNet even though we are decades and decades away from AGI.

Each and every time I see neural networks on some "techies" blog, I wanna vomit.

Your reaction seems quite extreme and vitriolic.

My job is to be the director of machine learning at a medium-sized ecommerce company. My company uses machine learning to solve lots of problems in search & customer recommendation, image & text processing, time series forecasting, and a few “backend” support models for things like phishing / fraud detection and gaining efficiencies in customer support operations.

I am happy to answer any questions I can about why machine learning has been a continued growth and investment area for my company and how we thoroughly validate business value when deciding whether to adopt ML solutions.

We use modern neural networks in probably about 10-15% of the solutions we operate.

> On top of that you have Elon Musk thinking we are a few months away from SkyNet even though we are decades and decades away from AGI.

The actual problem is that we don't know how long it'll be until we build an AGI. Experts put the range somewhere between 10 years and never.

Building an unconstrained AGI is an existential risk, so it's important to try and narrow the confidence bands on these questions. That's one of the reasons why Musk pledged $1 billion to OpenAI.

> Building an unconstrained AGI is an existential risk

But why? So far no-one has been able to explain this to me.

An AGI in at of itself is nothing but a brain-in-a-vat. The exponential increase in scientific knowledge was based on the scientific method, which replaced the ancient Greece discussion-based epistemology with a cycle of observation-hypothesis->test->observation->...

An isolated AGI cannot test its predictions about the world. Since the unconstrained space of possible explanations is bigger than the space of explanations constrained by observational evidence, there is no way an AGI can gain useful knowledge without access to external observations.

This means we are in full control of how "intelligent" (by whatever metric) an AGI can even get by restricting its access to information. But even unlimited access to (passive) information only gets you so far, as some models require data that cannot be obtained passively (i.e. they require deliberate controlled experiments).

The final nail in the coffin of dangerous AGI is interaction with the physical world. Yes, even a toddler is a terrifying menace to millions of people if I place a button right next it that sets off a thermonuclear bomb in a city centre.

But how about we just don't do that? An AGI with limited or no physical interaction with the world (directly via robot body or indirectly through remote access) can't be any more harmful and menacing than the late Stephen Hawking.

There's no need to put AI on a leash, since there's a final naturally limiting factor: energy. Switch off the cooling system and your AGI has to throttle down lest it faces fiery death.

Even without physical interaction, you can do a whole lot with just an internet connection. A superintelligence could identify 0-day exploits and quickly spread across thousands of computers and thus render itself immune to your idea of just switching it off. What do we do then? Sure, we can shut down the internet, but where is that going to leave us and how much damage has been done before that happens?

> A superintelligence could identify 0-day exploits and quickly spread across thousands of computers and thus render itself immune to your idea of just switching it off. What do we do then?

How about simply pulling the plug of the computer or even just the network cable?

More to the point, how would an AI even learn about such mysterious exploit if it doesn't have access to an external network in the first place? Even run-of-the-mill supercomputer centres aren't directly connected to the internet for security reasons, so why change that with a potentially dangerous computer program?

How exactly do you intend to prove that an AI does not have access to an external network?

Network connectivity isn't magic: no physical connection, no network. Simple as that. Hard to imagine in this "always connected" world, but WAN is completely optional.

Did you know it is perfectly possible for a computer to exfiltrate data without physical connections and without even having a networking card?

A few examples: Using speakers or microphones to transmit arbitrary data via ultrasounds. Making the CPU/GPU fans vibrate in a way that sends encoded bits. Blinking the screen to emit electromagnetic waves. Transfering certain data patterns between RAM and CPU so fast that they produce oscillations, effectively converting the BUS into a GSM antenna that can emit arbitrary data over a regular cellular network. Turning the fans off to change the heat signature in a way that transmits information.... Or even simply bliking a light to send data through regular lightwaves?

Scientists discover new ways to exfiltrate data basically every other year, how can you be so certain you've thought of every possible way?

> Did you know it is perfectly possible for a computer to exfiltrate data without physical connections and without even having a networking card?


> A few examples: ...

All these examples require active physical interaction of the machine with the world, which simply isn't possible for a server to do.

This is the environment an AGI will likely "live" in: https://bit.ly/33w7ySX

There's no speakers, no bus oscillations to pick up (from where? by what?) and you might notice that these ominous boxes don't have anything that can pick up signals (optical, vibrations, or otherwise).

Exfiltrating data from machines without touching them is completely unrelated to the capabilities of a program running in a box like this https://bit.ly/3l2hPfw

There are no sensors that can measure the outside and the floors these boxes are located at are heavily shielded and isolated from the outside anyway for various reasons (EM protection, physical security, etc.) so no potential target PC in sight.

The datasheet of the A100 lists "remote intervention from an engineer's home or workstation" as a core feature of that box, using one of the dozens of hyperoptimized networking components, so... not sure what argument you are going for there.

These hyperconnected boxes are definitely (hopefully?) not where an AGI will be built.

You are aware of what a symbol picture is? Also, yes, every server and super computer has network components, that still doesn't mean that the Lawrence Livermore National Laboratory's Sierra and its 4 GPUs per node are accessible from the internet, so your remark is kind of meaningless.

Incorrect. Out of the three Sierra hosted at LLNL that have the "4 GPUs per node" you're talking about, two of them are in fact accessible from the public Internet. One is in Collaboration Zone CZ (Sierra/lassen), another is in Restricted Zone RZ (Sierra/rzansel). Only one of the three is classified (Sierra/shark), and even then it's still sshable when one's logged into SecureNet.

The answer is "Not any time soon", you could also worry about giant lasers on sharks. Trying to regulate now a supposedly imminent AGI is like Morse and Edison regulating the Internet.

We are doing an spectacularly lousy job of regulating what it already exists.

Decades and decades? I wouldn't be so sure in such a bold prediction. One breakthrough in meta-learning and we could be well on our way. (Which is not to say that it will happen next year -- just pointing out that a development like this is nigh impossible to predict with the certainty your comment expressed.)

Well neural networks have been used in lots of genuinely useful products. I use voice dictation on my phone all the time for example, and I don't think I could ever go back to manually cataloging all of my photos (I use Google Photos, but Microsoft Photos and Apple Photos also use ML).

Were you around when every company hawked a blockchain of a type? It's a marketing fling right now, that's mostly what it is for.

I'd make two broad points against this article:

A) In most domains, a nonlinear model is far superior to a linear model. At the very least, generally you need to at least apply a transformation (e.g. logistic) to create a nonlinear model from a linear model, because most problem spaces are nonlinear. A nonlinear model doesn't have to be particularly complex though.

B) Machine learning (i.e. automatic tuning of parameters) is far simpler for the user than manual tuning of parameters. It's not a question of whether you "need" machine learning, but whether it will save you work. In fact, the author here is in denial - he actually says he would do machine learning, but doesn't realize that is what it is: "Instead of doing anything fancy, my program generates the coefficients at random to explore the space. If I wanted to generate a good driver for a course, I’d run a few thousand of these and pick the coefficients that complete the course in the shortest time."

This is a farcical idea if the author actually is serious. The polynomial method is deep, dangerous overfitting. It’s the same reason why you don’t make a hand-made decision tree: you don’t know which features of the data are important or not. Presuming you do know, like basing the car’s steering on a polynomial fit of the three directional probes, will be a disaster in non-toy problems.

This example shows why you _do_ need machine learning. The toy steering problem is a place for you to work out how the car can learn to drive with as little structure or assumption baked in as possible. Putting less a priori structure on it directly means you need models with higher capacity to figure it out.

To put this all more succinctly, “never use machine learning when business logic would do” is a statement about how commonly needed machine learning is, not business logic.

If only there were some rigorous mathematical frameworks we could use for dealing with uncertainty.

It's always a disappointing state of affairs when people skip simple models. If nothing else, a nice baseline like the one described in this article should be constructed before using ML techniques to serve as a comparison point.

"You might not need neural networks" would be a more appropriate title.

The non-nn example given here is one for epoch in epochs loop to take care of picking the best coefficients away from being a ML implementation.

> If I wanted to generate a good driver for a course, I’d run a few thousand of these and pick the coefficients that complete the course in the shortest time.

I might be pedantic here, but wouldn't this then be a machine learning algorithm? Since the machine is learning the most appropriate coefficient based on some heuristic (best of X random).

Wouldn't it be better (and more honest) to say then that, simpler ML models and learning techniques can often yield good enough result that don't justify going to more advanced model and learning techniques?

> I might be pedantic here, but wouldn't this then be a machine learning algorithm?

I would argue no. Simply trying a bunch of inputs and choosing the most effective ones based on the output is like running a single round of training on a machine learning model. It's hardly machine learning by any stretch—there's no feedback loop, the machine doesn't "learn" anything.

If I build a compiler and test a thousand constants for default configuration options and choose the ones that perform the best, or would be a very bold claim to call my compiler "powered by machine learning".

> If I build a compiler and test a thousand constants for default configuration options and choose the ones that perform the best

What if that was automated in the build? I had assumed the quote meant that it would be, not that they'd manually run a few random cases and manually pick the best. I thought it be automated, like given a new course, you'd run some training routine where the machine would play the course say 100 times each time choosing random coefficients and at the end, it'll take the ones from the pass that resulted in the quickest playthrough, and those would say go in some config file and become the coefficient used by the AI CPU race cars for that course.

To me this is machine learning. Especially if you crank up that 100 to 1 million or 1 billion. At that point, it's still something that only a machine could do, I couldn't realistically try 1 billion random coefficients for the ones that result in the fastest playthrough.

So in effect, I see the machine is learning which coefficients perform better for a given course. If it tried 1 billion, it learned that out of 1 billion different possible coefficients, some particular set was the best.

> there's no feedback loop, the machine doesn't "learn" anything

So that's interesting, because if my prior statement is not to be considered machine learning. My next question is what's the criteria to go from the above to machine learning proper? So it seems that it could be the learning has to be a feedback loop. So each attempt at learning must take something away from the previous.

I'd be okay with this definition. And now I'm thinking what's the most minimal modification I can make to meet this new definition.

What if as it tried random coefficients, it remembered the ones it tried prior? And what if it made sure that no new attempt at a random set of coefficients had already been attempted before? This isn't super refined, no heuristic to what are the best coefficients to try next given the ones tried prior like say what linear regression would accomplish. But it still meets the definition. It starts random on the first round, and the next round is no longer truly random, since it can't pick the previous's round coefficients again. So at least it's descending along the set of possible coefficients through a path that will eventually try all combinations.

Would this be enough to be considered machine learning?

I'm also thinking this can start to sound a lot like Evolutionary Computation. Oh boy, in all honesty, I've always been confused about differences between stochastic, metaheuristc, and machine learning optimization techniques.

Answering myself, I think wikipedia might have clarified my confusion:

> The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning.

> Machine learning (ML) is the study of computer algorithms that improve automatically through experience.

So it seems optimizations are the techniques for which most ML is based on, but ML is more of the idea that an algorithm would improve automatically from experience. Later in the wikipedia page, it creates a very clear distinction:

> The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.

Neural networks are just regressors. Yes you can learn the weights with genetic algorithms. Is this advised? Not so much : 99.99% of neural networks are trained with some variation of gradient descent on a specified loss function.

I don't even know if I agree to the statement. Polynomial regression solves basically the same problem as neural networks but performs way, way, way worse on big datasets. But nonetheless I would say that polynomial regression is machine learning too.

This post isn't about learning NN weights with GAs,

Also the reinforcement learning community does a significant amount of it's work with neural networks which are not trained using gradients. Gradient free training is a highly active research area. More like 90% of neural networks are trained with gradient descent

No but this post is trying to generalize on an experiment where NNs are trained with GAs which is - in my opinion - not where NN shine.

It's a GA without crossover applied to a single layer neural network with a linear activation

The best time series prediction model is often “Use the previous day (or week’s) value”. The best classification model is often “pretend everything is the same.” The best regression model is often “assume it’s the mean”.

This should not be surprising: a core premise of data science (empiricism in general? All logic and human knowledge?) is that intuition is often wrong. Initial hypotheses/problem definitions are just intuition.

This is incorrect and I’m not sure the point you’re trying to make.

I love love love the plain C code, making use of unix pipes to produce (real time) video. The subtext is also that you might not need fancy tools.

All he proved was you don't need a neural network for MicroMouse. We've known that for years. Some problems do require it. In my job experience, facial recognition can't be easily done with traditional signals processing. Stereo vision can be, but the algorithms are slow, expensive, and very error prone when compared to deep learning models.

The author is pretty much attacking a strawman he cherry picked. He took one toy example of someone learning ML and used it to attack the newest trend. Instead of taking the time to dive into the topic he's trying to discount it as less relevant. It's obvious he's stuck in the C/Unix era of programming, which has long since passed it's heyday.

Can someone who knows C help me understand the source code?


The concepts and equations he has are pretty simple but I'm really not seeing how they translate into code.

The code also isn't the easiest to try and understand for me. Multiple 1-3 letter or otherwise (Seemingly) poorly named variables, some C jargon I'm not familiar with (Mostly the -> operator) and the fact it's a relatively large chunk of code.

It seems like the core logic loop is here: https://gist.github.com/skeeto/da7b2ac95730aa767c8faf8ec3098...

And the position and acceleration of the car is altered each iteration through randomization?

It's hard to piece together.

The arrow operator is a field access through a pointer. That is, if you have a pointer to a struct, it's the same as dereferencing the pointer and accessing the field: "(*foo).bar" corresponds to "foo->bar".

I think the main driving logic is on line 228. It could definitely do with some more descriptive variable names, though - "a" for angle, and "s" for sense inputs seems a bit too short.

-> is a dereference operator. On line 65 (function ppm_create), f is declared as a pointer to a structure of type ppm (a reference to an object of type ppm). f is the actual structure/object. (f).w is the 'w' member of the structure. The brackets are needed for operator priority.

f->w is another way to write (*f).w

I understand it’s not needed but is it wrong? I see machine learning as a general purpose black box function approximation tool. It could be a polynomial curve or something else entirely, as long as we have enough data we can approximate it using ML. So in one way I see it as a “lazy man’s easy way out” tool.

ML is useful for domains where we don't have a good analytical model of real world behavior of systems. Of course, that is a lot of problems, including many useful things that biological intelligence appears to solve.

But it is true that ML - in particular neural networks - often gets thrown at problems that traditional analysis and modeling could serve more efficiently, and also more predictably. The other benefit of first trying traditional analysis is that it helps the implementer of the system better understand the domain first.

Pertinently, I've heard some scientists working on medical imaging describe to me how there has been a trend away from the basic science of understanding the medical phenomena behind the image being analyzed and instead relying excessively on ML based pattern recognition to make predictions/inferences.

Generally it is easy to agree with the sentiment that machine learning is being used to solve problems where it isn't really needed. But on the other hand, the same argument could have been used when computers initially came along: "you don't need a supercomputer to calculate this function, I can do it with pen and paper faster". So the real value in projects like this is not that you can now drive 2d car game with a neural network, but that neural networks just took another tiny step forward and demonstrate that their capability to minimize error functions is in principle transferrable to real life problems no matter how stupid, and one day those small steps will have accumulated and neural nets will (and already have) supercede many limits that humans have.

I've read a wonderful paper on what information people use when driving, it is pretty similar to this driving simulation. The basic cue is "where you will be in T milliseconds", extrapolated using the current vehicle velocity, and trying to keep that point on the road by steering and throttling. Here is that paper, A unifying theory of driver perception and steering control (2019) https://www.researchgate.net/publication/337024514_A_Unifyin...

In cases like this and many other cases i feel like the purpose of the machine learning algorithm should be to find a simplified polynomial function. I remember reading a paper in which an “AI physicist” was created and it would reduce complex machine learning models into simple equations that would explain the physics of a simulation. It would reduce the complex simulation into human readable and potentially more useful and predictive “laws” of physics.

While true I don't think this driving problem is used as an example for something you should use ML for rather an example to show how you could apply it.

Nevertheless, it is impossible to say why a 'neural network' would be better than some other model function with a large number of parameters. It's true, there are some significant successes achieved with NNs, but there is not a lot of work separating necessities from contingencies. Basically, a lot of it is trial and error.

Can you "download" trained neural nets? Aren't there popular formats for those?

For example, if I want to recognize pictures of animals, with reasonable accuracy, how much space would such neural net take?

I'm not very taught about ML, but in my mind, since ML is generally heavy on computation, it should be possible to distribute neural nets for specific applications and re-use them.

Yes, most of the large models that are not feasible to be trained by individuals are downloadable. Frameworks make it easy to import and export snapshots of model weights. See https://github.com/tensorflow/models/tree/master/research/sl...

Yes you can. The "universal" format is ONNX. You can even do crazy things like this: https://blog.owulveryck.info/2018/06/11/recurrent-neural-net...

How is this not machine learning? I guess because there is no gradient descent to pick the parameters, just random space exploration?

Gradient decent has been known for hundreds of years, it isn't machine learning. Optimizing parameters is an entire field in itself. Calling all cases of parameter optimization "machine learning" is just a fad because it sounds cooler. Before computers people still optimized thing but by hand, doing the same calculations with a computer is no more machine learning than doing them by hand is human learning.

Yes ML is overkill in this case, now test it with sensor noise, track obstacles, and the simple polynomial version might break down

This is a pointless comment that misses the point of the post completely. It says "you _might_ not".

The given example in the opening paragraph describes a scenario where you don't need it.

The parent commenter didn't say that everybody should use ML in all circumstances. Real world tasks are often more complex than invented toy examples, and quite often hand crafted policies don't work so well. And it was not elaborated in the original blog post. So I think it was a valid point.

And doing the track the fastest.

I wasted roughly a month trying to get machine learning working for a game.

Unless you seriously need it, and have a massive budget to match, basic boolean logic will probably get you where you want to be faster.

I wouldn't advise any solo developer to use machine learning unless that's the entire product. It's very easy to make an insanely difficult game without it

well yeah anything you can do with NN you can do chaining enough functions

what NN allows is to test millions of permutations of thousands these functions to find the one that works

if your problem is intractably hard to figure out functions for, NN provide a larger hammer to brute-force an approximate solution.

Hi, Please try implement formula calculations here: https://youtu.be/eBaHqKVWriU Use neuromorphic networks and be happy!

Machine Learning is basically the new Snake Oil Cure All. Everyone is slapping it on everything

Use neuromorhic and be happy https://youtu.be/eBaHqKVWriU

I'd guess that a NN could be extended to "racing" multiple cars, where if that was your goal, a polynomial wouldn't be very competitive?

You don't expect anyone to write up code to create image classifier for obscure category or the daily?

Repeat for pretty much every technology that is "in fashion" at the time.

I like how these cars on the last animation move like a school of fish.

s/machine learning/deep learning/

That's what the author means. There are a lot of "traditional" ML and AI techniques that can work very well for a lot of problems.

Statistical sampling has been reported to work in many cases.

What about Deep Learning? :)


Where have I heard that before?


Wow, how bizarre. Am I weird in jumping to the conspiracy theorist conclusion that this is some sort of experiment on us, like an A/B test to see what kinds of comment works on what kind of thread?

No, this is someone getting a bot some karma so it can eventually contribute to the manipulation of posts (i.e. getting things upvoted to the front page).

Then it is probably our duty to downvote.

Can probably just contact dang to ban him. But it's 4am and I'm too lazy

Does karma play a role in this on HN? I wasn't aware of that.

Yes, either accounts need a certain amount of karma to make their votes matter or higher karma gives more weight to the vote. Then when you post links that promote your own content or business you can upvote them with your army of bots

I thought hn has an intelligent and well informed crowd. ok I get it that engineers are not scientist but judging from the responses here it's quite staggering to me how clueless some ppl are here.. do yourself a favour and maybe learn a bit about ml before forming an opinion. massaging JavaScript for a living is cool I guess but maybe there's a reason faang is paying 5-10x your salaries to ml researchers. maybe just maybe it's not just hype but billions and billions of dollars worth of innovation potential. it's quite clear to me that a lot of cs ppl are super bitter that they are not the top paid ppl anymore but supply and demand dictates these things and turns out inventing the latest transformer architecture is more valuable (and factors of magnitude less ppl can do it) than writing trivial front end or backend stuff..

Basic programming is quickly becoming antiquated and articles like this, and many of the commenters are stuck in the past.

The efficiency of ML is in cost, not overall computation. Throwing machine resources at a problem is cheaper than hiring some guru with the necessary math and CS background to solve the problem.

We've seen the same thing with frameworks/libraries. Before it took specialized knowledge to do basic things like networking, media creation, etc. in code. Now there are existing tools that do everything for you.

The same has happened with optimization/algorithmic knowledge, and the genie is not going back in the bottle. There will definitely be specialized cases where that particular expertise is needed, but that is no longer the norm.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact