Hacker News new | past | comments | ask | show | jobs | submit login
CS224d: Deep Learning for Natural Language Processing (stanford.edu)
192 points by andreaespinosa on Aug 14, 2015 | hide | past | favorite | 30 comments

I was fortunate to take this class the first time it was offered. I found it a great introduction to the material, but a bit over my head. Deep learning requires a strong grasp of linear algebra - and particularly at the "Stanford" level. My undergrad didn't prepare me well for visualizing outer products and matrix / tensor derivatives. Once you get over those hurdles, deep learning is quite fun. It often works like magic. I'll give you an example:

A firetruck is _____

Try typing this in Google and you'll get "red", "moving" and "made". During the course you build a network that trains next-word completions using arbitrary bodies of text. You can train it for hours, days or weeks...and it just gets better and better. Eventually you will max out the capacity of your network, but then you can fiddle with the number of nodes and other hyperparameters. In the end you're just training a "black-box" nonlinear function to best approximate an unknown function defined by training data.

That example is just a simple Markov model. Using the 'T9' method of completing text is more of a novelty than something useful. I also have trouble with 'complete the sentence' type of programs because they don't actually create new ideas, they just rehash data. (It does have use in OCR, voice recognition, and typing/texting.)

I agree that the math can be complex, but I think it boils down to probability and the notation of presenting the ideas more than the underlying concepts. I feel like the most advanced math used in NLP is the log function, personally. Along with working with big arrays of data, or structures like Markov models and neural nets, which tend to be just arrays of numbers.

In a normal AI course, we had to form write-ups of contemporary AI articles, and one I found interesting was a model for summarizing text, including chapters, books, and other writing. The key idea was finding the most significant sentences in any given paragraph or unit and then using that verbatim.

It might be interesting to take some of these simple ideas and flesh them out with some of these advanced AI methods. For example, finding a more complete meaning of a book chapter and rewriting the summary.

That's the kind of AI work that I think people expect and are looking for from the NLP field, and it's not necessarily out of reach currently.

I think a common example along the same vein is the analogies trick you always see. It's been demonstrated to death at this point but the great thing here is word2vec more or learns to predict the next word using hierarchical softmax so he's not technically "wrong" since this is the training objective. It's good to clarify it though.

Yes, and I guess that goes along with the black box idea. What function you are training for depends on your needs, and that can be achieved with deep learning or soft AI.

> and one I found interesting was a model for summarizing text, including chapters, books, and other writing.

Do you have a cite for that?

Yes, I just found it actually. The article targets short stories specifically.

A. Kazantseva and S. Szpakowicz, "Summarizing Short Stories." Assoc. for Computational Linguistics, vol. 36, no. 1, pp. 71-109, Mar. 2010. [Online]. Available: http://www.mitpressjournals.org/doi/abs/10.1162/coli.2010.36...

There is a PDF available. It's about 40 pages long.

Your paper is the only paper listed above mine on the reports page! Solid last name optimization

lol and my first name is "Aaron" thank my parents.

Where'd you do your undergrad?

I'd rather not bash my undergrad, but suffice to say I tutored intro linear algebra and was very comfortable with eigenvalues, eigenvectors, Gaussian elimination, and that kind of stuff. What was tricky in 224d was taking the gradients with respect to specific components of a matrix. In the end you get comfortable with what the result should look like, but if you actually write the matrix indices down, it's quite hairy (mostly tensor product(s) that can be rewritten as matrix outer products).

oh yes. thinking in terms of numpy matrix operations while reading the equations took a lot of getting used to.

I went to school with Aaron. Zot zot is all I'll say.

I wish I had more ideas for applications using techniques like this, otherwise I would probably spend much more time researching natural language processing.

Instead, I did a simple project on searching using language processing and just read Foundations of Statistical Natural Language Processing [1], which is not too difficult, and Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition [2], which is a pretty heavy read but a great reference. I was able to find a used copy of the second book for $0.30.

I also put a bit of study into articulatory phonetics and speech recognition as part of a graduate study-abroad, which is an interesting field on its own, but I always wanted to come back to computational linguistics.

[1] http://www.amazon.com/Foundations-Statistical-Natural-Langua...

[2] http://www.amazon.com/Speech-Language-Processing-Introductio...

I have a folder to bookmark machine learning resources.

Here's another good one from the creator of coursera (Stanford grad I think)


Make sure you have http://www.iro.umontreal.ca/~bengioy/dlbook/

Its Bengio's (very well known deep learning researcher) upcoming textbook. I would highly recommend to anyone interested in deep learning/neural net subset of machine learning.

This course is taught by Andrew Ng. Professor Ng is not only one of the founders of Coursera, but is also a prof at Stanford and Chief Scientist at Baidu. Machine Learning (and Deep Learning in particular) is his specialty, so he is a pretty good resource on the topic :)

Great I'll give it priority then in my bookmark!

It's interesting that this is trending at the same time as a RNN based NLP powered assistant that I've just posted on HN.

It uses a lot of the same concepts - recurrent nets and word embeddings. If you guys want to play around with it in a real life scenario, head over there to check it out. Discussion here [1]. Link here. [2]

[1] https://news.ycombinator.com/item?id=10060074

[2] http://getmyra.co

Edit: Update wrong link.

I'm still not clear on the difference between deep learning and machine learning. Also are there good primer books on machine learning fundamentals?

It _is_ a subfield of machine learning, based on neural networks and usually the features are learned and not engineered.

Could it be said machine learning is more surface AI like quality scores.

Whereas deep learning is going down the creating consciousness route?

Not really. Deep learning is the popular name for neural nets that use many layers (deep neural nets or DNNs). They are being used to do more than just pattern recognition (the mainstay use for NNs in the past). But at present DNNs do not attempt to solve complex/compound AI problems like planning or knowledge representation or understanding natural language semantics. It's not clear yet whether DNNs can be extended to those kinds of problems.

That's a little breathless - I'd just think of it as a particular subset of machine learning that's produced some promising results, and leave talk of consciousness out of it.

Yeah I see what you mean, plus consciousness is more of a buzzword now and too arbitrary.

I found that talking about it along the lines or neural networking seems to be more accurate

You could say it, but it wouldn't actually mean anything.

"The Nature of Code" is a book I've heard repeated is pretty good.

It's free to browse online if your interested:


I've gone through much of this book. It's really good, but it's definitely not a book on machine learning fundamentals - more on complexity and simulations.

That chapter however is a nice intro to neural networks, and the previous chapter is a nice intro to genetic algorithms.

Machine learning is the name of the whole field, deep learning is one of the (currently very popular) branches of that field.

This and the convolutional neural net class were offered at Stanford both physically and online. Is anyone aware of anything similar being offered this fall quarter?

does anyone have the .tex source of the notes? I like the style and I would like to get inspired. thanks

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact