Hacker News new | past | comments | ask | show | jobs | submit login
An Introduction to Recurrent Neural Networks (victorzhou.com)
281 points by bibyte 88 days ago | hide | past | web | favorite | 23 comments



It's worth noting that apparently (as I learned lately) RNNs are going slightly out of fashion because they are hard to parallelize and have trouble remembering important stuff at larger distances. Transformers are proposed as a possible solution - very roughly speaking, they use attention mechanisms instead of recurrent memory and can run in parallel.

I have to say that while I understand the problems with recurrent nets (which I've used many times), I haven't yet grokked the alternatives. Here are some decently looking search results for you as starting points. Warning, these can be longer and heavier reads probably not for beginners.

https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c7... (there's some sensationalism here to be fair)

https://mchromiak.github.io/articles/2017/Sep/12/Transformer...

https://www.analyticsvidhya.com/blog/2019/06/understanding-t...

https://www.tensorflow.org/beta/tutorials/text/transformer

That being said, I think that understanding RNNs is very beneficial conceptually and nowadays there are relatively easy to use implementations that should be pretty good for many use cases.


Mainly RNNs are much slower to train than transformers.


As well as stability issues, long range dependencies, etc..


It depends on the problem domain. Transformers are useful for NLP (language modeling, machine translation), but RNNs (and CNNs) are still being used in speech-to-text. When the input/output relationships are monotonic, transformers are probably too general.


Hey, author here. Happy to answer any questions or take any suggestions.

Runnable code from the article: https://repl.it/@vzhou842/A-RNN-from-scratch


Very nicely written post. I particularly like how you attached a link to your codebase on repl.it so anyone who is interested can tinker with the code.

One thing I have been wondering for some time is whether the vanilla RNN can learn negations (i.e. 'not good' == 'bad') and valence shifts (e.g. modifier words like 'very' --- they do not carry sentiment connotations themselves, but may amplify/dampen the sentiment of the words they modify; negations like 'not' can be considered as a special-case valence shifter where it inverts the sentiment of the following word).

My suspicion is that vanilla RNNs are not capable of modelling negations and valence shifters since they make inference on the sentiment of a sentence by 'adding up' the sentiment connotations of its constituent words --- negations and valence shifts, however, works more like multiplications than additions.

I see you already have such examples in your dataset so I thought I'd do some experiments. I simplified your original dataset to the following:

  train_data = {
    'good': True,
    'bad': False,
    'not good': False,
    'not bad': True,
    'very good': True,
    'very bad': False,
    'not very good': False,
    'not very bad': True
  }
  
  test_data = {
    'very not bad': True,
    'very not good': False
  }
While the test cases do not reflect how people actually speak, the hope is that the model should be able to apply its learning to infer their sentiment. For me, however, it would seem the training failed to converge with the default parameter settings (hidden_size=64).

It would be interesting to see how other RNN architectures (e.g. LSTM, Transformers) fare with negations and valence shifters.

P.S.: When calculating softmax, it is better to use the built-in functions or at least do the log-sum-exp trick to prevent under-flowing.


Thanks for the comments! Interesting experiment - I wouldn't be surprised if better RNN architectures were more effective for this example.

Appreciate the softmax tip, I'll update soon.


I tried a LSTM model of stock twists and it seemed reasonably good at handling negations (single and double negatives at least)


Suggestion: do the same for transformer.



Numpy only.


Machine translation using RNNs actually uses two of them, one serving as an encoder and the other as a decoder (the Seq2Seq architecture)


In your post you initialize your weights a certain way but you said there are better ways. Do you have any resources for better ways?


Nice! I like that the author wrote the code by hand rather than leaning on some framework. It makes it a lot easier to connect the math to the code. :)

As a meta-comment on these "Introduction to _____ neural network" articles (not just this one), I wish people would spend more time talking about when their neural net isn't the right tool for the job. SVMs, kNN, even basic regression techniques aren't any less effective than they were 20 years ago. They're easier to interpret and debug, require many fewer parameters, and potentially (you may need to apply some tricks here or there) faster at both training and evaluation time.


This kind of article is absolutely the thing everyone new to deep learning/neural networks should read. I wish there was one for each type of algorithm.



Awesome!


Would be great if you showed the final output (eg. semantic analysis) result.


why do people insist on mentioning the bias terms in expository essays? it's a detail that clutters the equations. why not keep the transformations linear and then at the end make a note that you also need to shift using a bias term.


I doubt Google Translate uses RNN. They use Statistical Machine Translation. Oops, I see they switched to NN in 2016. https://en.wikipedia.org/wiki/Google_Translate


I feel I should point out that those two things are not mutually exclusive. RNNs are, after all, a mechanism for learning conditional probabilities.

I think the confusion comes from Google itself, who used the term "Statistical Machine Translation" (SMT) to refer to "Rule-based SMT". Both methods are statistical.


SMT as a term of art in the translation field means rule based SMT... it's not a google particularity, I see the same usage in both industry and academia


Every top MT uses pure-NMT, not SMT




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: