Hacker News new | past | comments | ask | show | jobs | submit | rundigen12's comments login

Paraphrase: ML in Julia is for Serious Scientists Doing Very Sciencey Things. Things you wouldn’t understand. But your desire to train deep learning models on GPUs and deploy them as apps elsewhere? ...Aw, c’mon, that’s boring. You can just use PyTorch for that. Now, who's up for a new round of Neural ODE benchmarks?!


I naively expected this article to include a link to the paper somewhere in it, so we could decide for ourselves. Oh well.


All I see is a filtered woman on the left and a different unfiltered woman on the right, the whole time. At what time does the glitch occur?


If you’re talking about the YouTube video, I don’t think they captured the glitch from A to B, just the unfiltered capture.



Agreed. Is this an example of the patriarchal bias of science leaving women out of the telling of history -- finally being set straight by our friends at NPR -- or leftist revisionism trying to ascribe extremely inordinate influence to one member of a large team just because she happened to be female?

If one is assigning parentage, would Lyman Spitzer be the "father" of the Hubble telescope? Perhaps it had many "fathers" but only one "mother"?

EDIT: MichaelMoser123's added info and quotes are helpful. Merits updating the Wikipedia page to give her greater(/any) credit.


According to the article, she advocated for space-based astronomy.


> Nobody cares how long it takes to train a model.

LOTS of people care how long it takes to train a model. A few minutes, vs. a day, vs. a week, vs. a month? Yea, that matters.

Think about how long it takes to try out different hyperparameters or make other adjustments while conducting research...

If you're Google maybe you don't care as much because you can fire off a hundred different jobs at once, but if you're a resource-limited mere mortal, yea, that wait time adds up.


Yes I agree. most people who come to us at alpes AI do care about training time. how fast they can do experiments

Another important aspect is training and incremental training on edge device.

At the time when privacy is becoming very important and you cannot export data from mobile devices etc. Training time on mobile is an important factor


If you are building large-scale systems that take weeks or months to train, you are at a point where you shouldn't care about this. Throw more compute at the problem, it will pay for itself.

If we are talking days or hours: start parameter search on Friday and return best parameters on Monday.

Do research and iteration on heavily subsampled datasets.

If you are building models for yourself, or for Kaggle, you may care in as much as your laptop gets uncomfortably hot.


I was really hoping to see a summary comparison of the performance(s) of the different models at the end, e.g. accuracy vs. complexity vs. execution time, etc.

Here's a summary from the end of each section...

1. TextCNN: "This kernel scored around 0.661 on the public leaderboard."

2. BiDirectional RNN: 0.671

3. Attention Models: 0.682


Thank you, that's exactly what I was scrolling through the bickering to find.


"By defining \mathbb{E}\left[\mathbf{x}\right]=\muE[x]=μ, ...and using the linearity of the expectation operator \mathbb{E}E, we easily arive [sic] to the following conclusion..."

Yikes. You don't define that \mathbb{E} was an 'expectation operator', or what an expectation operator even does, or the fact that it's linear. The v's disappeared somehow from inside the square brackets -- maybe you meant \muE[v]=μ?

So far this "tutorial" isn't defining its terms very well. I'm lost and it's only the very beginning.


"E[foo]" is syntax to mean the expected value of the random variable foo, roughly speaking the mean value. (For instance the expected value of a dice roll is 3.5. The terminology is slightly suboptimal, since we will never expect a dice to come up 3.5.)

Hence the "E" itself is called an "operator". It can be applied to a random value in order to yield its expected value. You can read up on it here: https://en.wikipedia.org/wiki/Expected_value

The definition "E[x] = mu" is correct, though I would write it the other way, as "mu = E[x]", as it's the variable mu which is being defined.

The v's disappear because of a suppresed calculation:

sigma^2 = E[ (v^T x - E[v^T x])^2 ] = E[ (v^T x - E[v^T x]) (v^T x - E[v^T x]) ] = E[ v^T x v^T x - 2 v^T x E[v^T x] + E[v^T x] E[v^T x] ] = E[ v^T x x^T v - 2 v^T x v^T E[x] + v^T E[x] v^T E[x] ] = E[ v^T x x^T v - 2 v^T x E[x]^T v + v^T E[x] E[x]^T v ] = E[ v^T (x x^T - 2 x E[x]^T + E[x] E[x]^T) v ] = v^T E[ x x^T - 2 x E[x]^T + E[x] E[x]^T ] v = v^T E[ (x - E[x]) (x - E[x])^T ] v = v^T E[ (x - mu) (x - mu)^T ] v = v^T Sigma v.


Also, the "E[foo]" notation is something you'd pick up in an introductory statistics course. Which, IMO, means it's perfectly appropriate to use it without further explanation in this sort of context.

It's not really reasonable to expect technical subjects like this to always be presented in a way that's easily digestible to people who lack any background in the subject area. This article is clearly aimed at people who are studying machine learning, and anyone who is studying machine learning should already have a good command of basic statistics in linear algebra.


Slightly more formated: http://mathb.in/28658


Rewind to the start of the paragraph: "Let \mathbf{x} be a random vector with N dimensions."

You're assumed to already know what a "random vector with N dimensions" is. It's very resonable then to also assume you know what expectations and covariances of random vectors are, and some of their basic properties, such as linearity of expectations and quadratic forms of covariance matrices, since all of these are typically taught in the same course.


That's code for KaTeX/MathJax. It should be rendered, check your script blocker.


Whoa, I was just reading about this yesterday! Did I consent to tracking when I created an account here? ;-)


There once was a documentary about a small group of British intellectuals (who also happened to be musical virtuosos) in which one of them asserted that the decision boundary between clever and stupid can have arbitrarily small width.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: