It's weird, but images often sell a paper, so it is worth time learning how to make good images (or at least one person in your group). But that said, research comes first and that is more important. Just don't underestimate the power of good images. I often find this video on colors helpful
Why is PDF reader having trouble displaying 30MB of data?
It's even better than a 30MB of images, because images have to be decoded but vector graphics is just bytes at some point and you decide how to render them.
It's probably some accidentally quadratic behavior that struggles with 10000 vector objects.
Images are relatively well handled with GPUs, and with a precomputed mipmap they can be rescaled very quickly, unlike vector graphics which needs to be re-rendered each time zoom level is changed.
Why wouldn't you be able to send an array of stuff defining the curves, and an array of stuff defining the draw order to the gpu and just render it in a simple?
TL;DR: We devise a linear SDE/ODE model to imitate per-class feature (thinking logits) dynamics of neural nets training based on local elasticity (LE) . We found the emergence of LE implies linear separation of features from different classes as training progresses.
The drift matrix of our model has a relatively simple structure; with that estimated, we can simulate the SDE using the forward Euler method, whose results align reasonably well with genuine dynamics.
Local elasticity models the phenomenon observed in DNN training: the effect due to training on a sample is greater for samples from the same class, and smaller for samples from different classes. For example, training an image of cats facilitates the model better learns images of other cats while not so for images of, say, dogs.
Any comments/thoughts/questions are most welcome!
There is lately a lot of hate against classic statistics on HN. I don't know why. Does it help to understand why and how NNs work? Not yet. But saying that it is utterly useless and won't provide any useful insights in the future sounds to me like telling the young Steve Jobs that dropping out of college and taking calligraphy classes instead of is utterly useless. And still, I am writing this on an Apple product, which set the standards for digital typography...
I'm thinking of some wonderful posts describing where/when/why linear regression can offer performance which is very close to the best from a NN-- except that regression models train much faster on much less data AND are interpretable.
Some discussion about how a NN works well for data where there is a lot of (statistical) structure to the data-- two close pixels in an image are likely to have very similar color/luminosity (and if not, the difference is important to the model, i.e. an 'edge'). But that NN don't do as well in a domain where the different features of the data don't have such relationships, say an econometric model or many biological models or ...
You didn't, but a lot of HNlers do. Maybe I should rant on them, that's true.
Agreed. Yet there are unknown feature spaces in which the deep nets separate the data linearly; the mapping learnt by the deep nets from the data to this space is likely highly nonlinear.
For example, a binary logistic regression can be thought as mapping data to a 1D space ([0, 1]), and separates the two classes at say 1/2 linearly.
Do you have any comments/insight into how you’d say they’re similar/different? Thanks!
Will read and be back for discussion hopefully soon.
That's a great question! Unfortunately not yet -- though we believe further studies may bring us there finally. We found (at least for simple classification tasks) the features seem to have a two-stage behavior: a de-randomization stage to identify the best direction in the feature space; and an amplification stage where features stretches along these directions.
We've been thinking to identify a bound on the exit time of the first stage, and examine how it depends on different hyper-parameters, dataset properties etc, so that one may pinpoint how to reduce the time spending in the first stage, effectively making training faster.
> i.e. can the ode be integrated faster than backprop?
Also a good question, at this stage we need to estimate model parameters (the drift matrix) from simulations on DNNs. As future works we hope to explore if we can pre-determine those parameters so a comparison between backprop might make more sense.