That was what I heard too, that in PyTorch you could adaptively build your graph. In Keras you could in principle emulate this in a brute-force way: you define the graph, train it, saving the weights, analyze them, decide what nodes to remove and what nodes to add, and then rebuild the graphs from scratch and use the adjusted weights from the previous round. I say in principle, personally I never did it. Some people hint that in PT this is a breeze, I'd be curious to see some example. If you have any links, that would be much, much appreciated.
In a large NN a lot of nodes end up being useless. You can remove them without degrading the performance of the NN.
To be more concrete, here's a link [1] to Google's neural network playground. I built a network with 5 layers and 37 hidden nodes. It trains quite well, but the last layer has 2 nodes that contribute with very little weight to the final output. The app allows you to change their weight (you click on the corresponding line and edit). If you change the weight to zero (effectively dropping the node), the classifier, if anything, gets better. My guess is that you can easily remove about half of the nodes. Conversely, if you look at the nodes with the highest weights out, you can in principle clone them and halve the weight out both for the original and for the clone. With this configuration, the network output is exactly the same, but if you continue training, it allows more flexibility, as the original and the clone are allowed to diverge.
This type of operations are not possible in Keras. Are they in PyTorch? If not, then what type of dynamic graphs are possible? What can one do with PyTorch that one can't do with Keras?
Your first example is commonly referred to as network pruning, and is typically used to compress a model (the nodes are still there, but a sparse weights network can be stored in compressed form). It's also possible to remove nodes themselves, rather than individual weights. This is typically done on a filter level (for convnets), so that entire filters are removed.
The second example (cloning the nodes) is typically performed to improve network robustness (by avoiding important nodes a single point of failure).
To do either one during training you need dynamic graphs, so either PyTorch, or TF eager mode. Here's one filter pruning implementation: https://github.com/jacobgil/pytorch-pruning