Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Logistic Regression for Image Classification Using OpenCV (machinelearningmastery.com)
81 points by andyjohnson0 on Dec 31, 2023 | hide | past | favorite | 20 comments


Digits/MNIST is a very, very bad dataset for CV demos because it's too easy and like done here, you can do a logistic regression on raw pixel values and get sufficiently good results. That's also the reason why Fashion MNIST was created, to give ML demos some difficulty: https://github.com/zalandoresearch/fashion-mnist

For more typical image classification problems, you can get >90% of the way for image classification on any arbitrary image dataset and with much less code by using CLIPVision image embeddings as a model input to your classification algorithm of choice.


Seconded; I’ve used MNIST with more classic algos and it’s very easy to reach 99% without any modern techniques.


I doubt this is meant to give an example for real applications but to give some foundations on machine learning and computer vision.


The title touts OpenCV, but OpenCV isn't doing much here other than unnecessarily complicating things (which is bad for newbies) and doesn't show off OpenCV's unique capabilities.

In the final code sample, OpenCV is a) loading the image, which could be done with PIL and b) training the model, when the demo imports sklearn which has its own battle-tested logistic regression implementation.

There's a lot of useful things that can be done with machine learning and computer vision, but this article is a bad demo of it that won't work on any other real-world dataset and is out-of-date with more modern CV approaches. Their previous article is a good explanation of the math behind logistic regression, though: https://machinelearningmastery.com/logistic-regression-in-op...


I am not judging the article what I am saying is that I doubt the audience is someone that needs to classify images with numbers.


In fact, a neural net is “just” a stack of logistic regressions if you only use sigmoid activations.


There are lots of datasets where applying multiclass LR makes sense. Image classification isn't it.


Pytorch includes a simple neural network example for the MNIST data: https://github.com/pytorch/examples/blob/main/mnist/main.py

It only takes a few minutes to train with default parameters and will have >99% accuracy on the MNIST test set.


Did you know what OpenCV is collecting money right now? https://www.indiegogo.com/projects/opencv-5-support-non-prof...


it always makes me sad to see that projects used by so many companies have to fight to get what amounts to two average paid devs in those same companies


Indeed, it's exploitation.


This blog-post might actually do more harm than good, by giving readers the misconception that logistic regression can generally be used to classify images. In general, it's terrible at this task!


Unless it's a 2x2 image


I love when basic statistical models are used for tasks usually dominated by deep learning; image classification, stock price analysis. Makes me happy for some reason.


In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.

Traditional classification algorithms but not deep learning such as Support Vector Machines and Random Forests perform a lot better on MNIST, up to 97% test set accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#


even knn after dimensionality reduction does pretty good


But why


To achieve Machine Learning Mastery!


clicks and clout chasing


This is the answer. Unfortunately there is a plethora of dubious machine learning and AI tutorial clickbait material appearing on the internet to sell ads.

Here are some resources I've found which don't suck if you actually want to learn this stuff:

https://ai.stanford.edu/courses/ <- Stanford's AI course materials.

https://karpathy.ai/zero-to-hero.html <- Karpathy's "neural networks - zero to hero". The other ones (eg the transformers ones on youtube also seem excellent to me after an initial skim although I haven't got to actually working through them yet).

https://ocw.mit.edu/search/?q=artificial%20intelligence and https://ocw.mit.edu/search/?q=machine%20learning (MIT's relevant opencourseware)

https://probml.github.io/pml-book/book2.html (Draft book "Probabilistic machine learning: Advanced topics" by Kevin Murphy) <- looks seriously excellent although I've really only taken a cursory dip into it so far. He's also got 2 other books at https://probml.github.io/pml-book/ which are more introductory in nature which I expect are probably great too.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: