Logistic Regression for Image Classification Using OpenCV

minimaxir · on Dec 31, 2023

Digits/MNIST is a very, very bad dataset for CV demos because it's too easy and like done here, you can do a logistic regression on raw pixel values and get sufficiently good results. That's also the reason why Fashion MNIST was created, to give ML demos some difficulty: https://github.com/zalandoresearch/fashion-mnist

For more typical image classification problems, you can get >90% of the way for image classification on any arbitrary image dataset and with much less code by using CLIPVision image embeddings as a model input to your classification algorithm of choice.

MOARDONGZPLZ · on Dec 31, 2023

Seconded; I’ve used MNIST with more classic algos and it’s very easy to reach 99% without any modern techniques.

melenaboija · on Dec 31, 2023

I doubt this is meant to give an example for real applications but to give some foundations on machine learning and computer vision.

minimaxir · on Dec 31, 2023

The title touts OpenCV, but OpenCV isn't doing much here other than unnecessarily complicating things (which is bad for newbies) and doesn't show off OpenCV's unique capabilities.

In the final code sample, OpenCV is a) loading the image, which could be done with PIL and b) training the model, when the demo imports sklearn which has its own battle-tested logistic regression implementation.

There's a lot of useful things that can be done with machine learning and computer vision, but this article is a bad demo of it that won't work on any other real-world dataset and is out-of-date with more modern CV approaches. Their previous article is a good explanation of the math behind logistic regression, though: https://machinelearningmastery.com/logistic-regression-in-op...

melenaboija · on Jan 1, 2024

I am not judging the article what I am saying is that I doubt the audience is someone that needs to classify images with numbers.

markisus · on Dec 31, 2023

In fact, a neural net is “just” a stack of logistic regressions if you only use sigmoid activations.

aydyn · on Dec 31, 2023

There are lots of datasets where applying multiclass LR makes sense. Image classification isn't it.

aodin · on Dec 31, 2023

Pytorch includes a simple neural network example for the MNIST data: https://github.com/pytorch/examples/blob/main/mnist/main.py

It only takes a few minutes to train with default parameters and will have >99% accuracy on the MNIST test set.

agilob · on Dec 31, 2023

Did you know what OpenCV is collecting money right now? https://www.indiegogo.com/projects/opencv-5-support-non-prof...

ta988 · on Dec 31, 2023

it always makes me sad to see that projects used by so many companies have to fight to get what amounts to two average paid devs in those same companies

RA_Fisher · on Dec 31, 2023

Indeed, it's exploitation.

clircle · on Dec 31, 2023

This blog-post might actually do more harm than good, by giving readers the misconception that logistic regression can generally be used to classify images. In general, it's terrible at this task!

aldanor · on Dec 31, 2023

Unless it's a 2x2 image

justinl33 · on Dec 31, 2023

I love when basic statistical models are used for tasks usually dominated by deep learning; image classification, stock price analysis. Makes me happy for some reason.

minimaxir · on Dec 31, 2023

In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.

Traditional classification algorithms but not deep learning such as Support Vector Machines and Random Forests perform a lot better on MNIST, up to 97% test set accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#

valec · on Dec 31, 2023

even knn after dimensionality reduction does pretty good

dankle · on Dec 31, 2023

But why

MOARDONGZPLZ · on Dec 31, 2023

To achieve Machine Learning Mastery!

aydyn · on Dec 31, 2023

clicks and clout chasing

seanhunter · on Jan 1, 2024

This is the answer. Unfortunately there is a plethora of dubious machine learning and AI tutorial clickbait material appearing on the internet to sell ads.

Here are some resources I've found which don't suck if you actually want to learn this stuff:

https://ai.stanford.edu/courses/ <- Stanford's AI course materials.

https://karpathy.ai/zero-to-hero.html <- Karpathy's "neural networks - zero to hero". The other ones (eg the transformers ones on youtube also seem excellent to me after an initial skim although I haven't got to actually working through them yet).

https://ocw.mit.edu/search/?q=artificial%20intelligence and https://ocw.mit.edu/search/?q=machine%20learning (MIT's relevant opencourseware)

https://probml.github.io/pml-book/book2.html (Draft book "Probabilistic machine learning: Advanced topics" by Kevin Murphy) <- looks seriously excellent although I've really only taken a cursory dip into it so far. He's also got 2 other books at https://probml.github.io/pml-book/ which are more introductory in nature which I expect are probably great too.