Hacker News new | past | comments | ask | show | jobs | submit login
Tensorflow sucks (nicodjimenez.github.io)
399 points by nicodjimenez on Oct 8, 2017 | hide | past | web | favorite | 125 comments



An appropriate quote: "If you can't intelligently argue for both sides of an issue, you don't understand the issue well enough to argue for either."

There are many people for whom the declarative paradigm is a huge plus. I would say there are at least 2 major approaches in running fast neural networks: 1. Figure out the common big components and make fast versions of those. 2. Figure out the common small components and how to make those run fast together.

Different libraries have different strengths and weaknesses that match the abstraction level that they work at. For example, Caffe is the canonical example of approach 1, which makes writing new kinds of layers much harder than with other libraries, but makes connecting those layers quite easy as well as enabling new techniques that work layer-wise (such as new kinds of initialization). Approach 2 (TensorFlow's approach) introduces a lot of complexity, but it allows for different kinds of research. For example, because how you combine the low-level operations is decoupled from how those things are optimized together, you can more easily create efficient versions of new layers without resorting to native code.


After being exposed to several declarative tools during my career, I must say they age poorly: make, autoconf, Tensorflow, and so on. They may start out being elegant, but every successful library is eventually (ab)used for something the original authors didn't envision, and with declarative syntax it descends into madness of "So if I change A to B here does it apply before or after C becomes D?"

At least Tensorflow isn't at that level, because its "declarative" syntax is just yet another imperative language living on top of Python. But it still makes performance debugging really hard.

With PyTorch, I can just sprinkle torch.cuda.synchronize() liberally and the code will tell me exactly which CUDA kernel calls are consuming how much milliseconds. With Tensorflow, I have no idea why it is slow, or whether it can be any faster at all.


I believe that make's declarative is not the cause of it's problems at all - it's poor syntax and lack of support for programming abstractions is what makes it clunky to use.

Something like rake, which operates on the same fundamental principles (i.e. declarative dependency description) but using ruby syntax has aged better.


Indeed. Getting these text based configuration tools work requires a lot of experience in language design.

Lots of tools become accidentally Turing complete, like Make. You need to plan these things from the start. If you want any computation possible at all, you need to be extremely vigilant, and base your language on firm foundations. See eg Dhall, a non-Turing complete configuration language (http://www.haskellforall.com/2016/12/dhall-non-turing-comple...).

If you are happy to get Turing completeness, you might want to write your tool as an embedded DSL and piggy-bank on an existing language, declarative or otherwise.


SBT in the Scala world would also fit this description.


I took the article to be the counterpoint to the uninhibited praise of TF. In that light, I don't think it was meant as a balanced assessment of the whole product, but had a narrow scope of simply pointing out a handful of flaws that he thinks isn't discussed enough.

It's the same feeling when you hate a movie that everyone gives five stars: you might agree with some aspects of the praise (or even most of it), but that's not what you're going to be talking about. You'll talk about how and why it sucks compared to better movies.

I'd guess he could make a strong pro-TF argument if desired, but that just wasn't the point of this post.


The assumption that there are always two intelligent sides to an issue is a pretty big assumption. If you understand both sides of an issue really deeply and you choose side B and are against side A, you should be able to argue intelligently for side A otherwise your choice of side B is not made intelligently, but this falls down on further examination.

If you believe that side B is correct and side A is incorrect given your deep understanding of the issue then an argument for side A is in some way not intelligent because you must keep out your most potent arguments for side B from your argument for side A - you must deny their existence in your head and thus argue from a less intelligent position than you normally would.

The ability to argue both sides is only really possible when all sides are considered trivial in their differences.

on edit: improved formatting for legibility.


You should still be able to give the other side the best defense imaginable. (See 'steelmanning'.)


ok, I've seen stellmanning? https://www.google.com/search?dcr=0&source=hp&q=stellmanning...

on edit: never mind, I see you mean steelmanning. However that does not really have anything to do with what I said, you should be able to give someone the best defence imaginable, but what if the best defence imaginable is shit compared to the other side. Then you cannot argue both sides equally, this does not mean you do not understand either side. It means one side is actually wrong, and the other is correct.


Sorry, I can't spell. It's steelmanning. (Edited the other comment.)

The idea is to beat a steelman of the idea. Because that's a greater victory than beating a strawman.


Sure. Eg you'd be hard pressed finding good arguments for 2=3 (without resorting to shenanigans around definitions).


"If you can't intelligently argue for both sides of an issue, you don't understand the issue well enough to argue for either."

please argue the opposite of this before continuing


[flagged]


Necro-cannibalism did allow some to survive severe famine.

Seriously though, the quote works well for many usual cases of discussion. People have incentives which they do believe in - you can discuss how those beliefs started and why you think they don't apply. Sure, you can find pathological edge cases where doing that didn't make sense. Doesn't mean the rule is bad for almost everything else.


https://www.scaruffi.com/news/opin0712.html

Caveat emptor: Scaruffi doesn't hesitate to let his imagination far surpass his apparent handle on the facts, but hey they are just opinions, so let's not ruin the fun too easily.

"I reached the conclusion that the meat diet started with cannibalism: we ate humans before we ate animals. And cannibalism started with women eating their own children, and children eating their grandparents.

Cats do it all the time: when they cannot afford too many kittens, the mom eats some of them. If she didn't, it would reduce the chances that the others survive. Since they are YOUR children, you don't feel too bad eating them: they are flesh of your flesh, aren't they?

You have to think like a woman of two million years ago, who was pregnant all the time, did not know how to prevent pregnancy nor how to perform abortion, and had to feed many children, often with no male partner to help out.

The tradition of eating old people is documented in many cultures: nothing was wasted, and it would have been silly to waste the flesh of a dead person.

Once you start eating your own family members, it becomes natural to start eating other humans. In prehistory (and even not too long ago) humans from other tribes were not perceived as belonging to the same species: they don't speak my language, their customs are different, they smell different, therefore they are not the same species as me. Killing them and eating them was not any more morally wrong than eating a hamburger today.

The main meal was brains: our brains need a lot of proteins, and another brain gives the highest amount of proteins in the shortest time. Hence the tradition of killing an enemy and eating his brain.

Then cost/benefit analysis made humans realize that humans were difficult and dangerous to kill, whereas many animals were easy to kill and provided almost the same proteins. That's when domestication started: feeding on defenseless cows and goats was a lot safer than feeding on humans armed with spears and axes. That's when cannibalism became a thing of the past, of primitive men. However, it was still practiced (and valued) until recently in many places of the world.

We are living in the first age free of cannibalism in the history of the human species. It's an age just a few decades old. The age of cannibalism lasted several million years. Put things in perspective and, as repugnant as it sounds today, cannibalism can be said to be the norm, and non-cannibalism the abnormality. If some day the whole world becomes vegetarian, you the meat eater will look like a savage but today you think of vegetarians as lunatic people.

There are 1500 animal species that are cannibals, and 75 of them are mammals, and one of them is the chimp, our closest relative. And now you may continue to eat your hamburger."


Interesting take. Makes you think about the apparently rather durable popularity of zombie horror, which even outside its occasional fads retains a strong base of adherents.


[flagged]


It didn't "utterly destroy" the quotation. It's pretty easy to argue the virtues of cannibalism, it's just that for modern day society they do not trump the downsides.

For example, cannibalism has successfully prevented death by starvation (e.g. the Donner Party), and it reduces the need for disposing of the dead. It's an application of reduce-reuse-recycle. Some species of other animals practice it as part of their normal lives (praying mantis as part of mating, lions as a social mechanic).

The very obvious counterpoints are that it's completely counter to modern cultural morality and it's been shown to increase likelihood of transferring brain parasites in humans.

The fact that you didn't even approach the other side and immediately concluded that he "destroyed the quotation" kinda shows that you aren't making the slightest effort to understand opposing viewpoints.


Small correction: not brain parasites, but specifically prions (misfolded protein that fascinatingly self-replicate). Kuru[1] (closely related to Creutzfeldt–Jakob disease and Alzheimer's) was extensively studied last century in the Fore people of Papua New Guinea (which were practicing cannibalism as a funerary rite), culminating in the awarding of the 1997 Nobel Prize[2] for the discovery of the prion, a completely new kind of pathogen.

It's a fascinating story! NPR has a great podcast on it.

And for what it's worth, I completely agree. I was given two key pieces of advice by two brilliant professors in my undergraduate philosophy career:

- Always be as charitable as possible with your opponents' arguments: always assume they are taking the best possible position.

- And this goes hand in hand with the above: always be ready for the best possible counter to your own argument, making sure you have an appropriate response. Few arguments are air-tight, especially for controversial issues.

[1] http://www.npr.org/sections/thesalt/2016/09/06/482952588/whe...

[2] https://www.nobelprize.org/nobel_prizes/medicine/laureates/1...


Thank you, that's exactly what I was referring to. I must have been remembering it wrong.


Do you get the prions only from rare meat? Would thorough cooking protect?


Kuru and CJD reside in the brain meats.

Prions are highly resistant to disinfectants, heat, ultraviolet radiation, ionizing radiation and formalin. ... Prions can be destroyed through incineration providing the incinerator can maintain a temperature of 900 F for four hours.


Not to mention that implied in the quotation is being able to argue both sides of a contentious issue is important to being able to make a case strongly for the one you agree with. Totally irrelevant for generally settled questions like cannabalism, so the "utter destruction" was anything but.


[flagged]


Using emacs?


Despite its shortcomings, I share the same vision as this article. Here are my reasons:

- Tensorflow has a way too large API surface area: parsing command lines arguments handling, unit test runners, logging, help formatting strings... most of those are not as good as available counterparts in python.

- The C++ and Go versions are radically different from the Python version. Limited code reuse, different APIs, not maintained or documented with the same attention.

- The technical debt in the source code is huge. For instance, There are 3 redundant implementations in the source code of a safe division (_safe_div), with slightly different interfaces (sometimes with default params, sometimes not). It's technical debt.

In every way, it reminds me of Angular.io project. A failed promise to be true multi-language, failing to use the expressiveness of python, with a super large API that tries to do things we didn't ask it to do and a lack of a general sounding architecture.


I think the author raises a good point about Google envy. TensorFlow is not the most intuitive or flexible library out there, and it is very over-engineered if you're not doing large-scale distributed training. The main reason why everyone talks it up so much is because Google heavily marketed it from the outset, and everyone automatically assumes Google == Virtuoso Software Design because they couldn't make it through the interview. Really it's just modern enterprise software which has five different ways to implement batch norm that they push on the community so they don't have to train new hires on how to use it.


Or maybe it is built by a company that is doing large-scale distributed training, and they open sourced it not to cater to every need, but to help others trying to do the same thing they are. Companies are under no obligation to make sure their open source is well suited for others use cases.


That was kinda my point, it's not the be-all deep learning library because they made it for their own use case, but its towering popularity (as in 10x the number of stars of other popular libraries) is not genuine.

Also I highly doubt that the main reason Google open sourced it was to be charitable.


There is little analytical or "detailed" about this post. The most complex model is y = 3*x, the author provides no evidence to back up any claims about adoption, difficulty of use, etc., and most of the author's complaints boil down to a lack of syntactic sugar.

I'm open to a discussion about the downsides of tensorflow, which is why I read the article in the first place, but this post doesn't provide that.


I'm probably being overly cynical, but this is (indistinguishable from) a "growth-hack" submarine article by the author to promote their tool. There is hardly any substantiation to support the assertions. Tucked right at the end:

> If you want a beautiful monitoring solution for your machine learning project that includes advanced model comparison features, check out Losswise. I developed it to allow machine learning developers such as myself to decouple tracking their model’s performance from whatever machine learning library they use


I'm actually pretty ok with these types of articles. They are generally well-researched and well-written—giving technical introductions to important concepts.

As always, it is important to be wary of the reasons that an author writes an article. If there is an advertisement at the end, then the author motivations (at least in part) are clear. But I often find that promoters of new systems and tools are able to present excellent critiques of established tools and practices. New things are USUALLY made to address the shortcomings of existing things. You as a reader have to parse whether their arguments are sound and maybe do some more research before you can make a sound judgement on the matter.


I don't think so. Even even if - then ineffective, as a link to their tool is covered behind pages of text.

> There is hardly any substantiation to support the assertions.

How about side-by-side TensorFlow and PyTorch comparison?

...though, for experiment tracking (and experiment running) I recommend https://neptune.ml/.


There are a few categories that I think TensorFlow is notably strong in. Namely:

1. Deployment. 2. Coverage of the library / built-in functionality. 3. Device management.

For more details, I wrote a comparison of PyTorch and TensorFlow (mostly from a programmability perspective) a couple months back. Interested readers may find it helpful. https://awni.github.io/pytorch-tensorflow/


This article is not that detailed, but it's a sentiment I agree with, so I'll add one major shortcoming of Tensorflow: its memory usage is really bad.

The default behavior of TF is to allocate as much GPU memory as possible for itself from the outset. There is an option (allow_growth) to only incrementally allocate memory but when I tried it recently it was broken. This means there aren't easy ways to figure out exactly how much memory TF is using (e.g. if you want to increase the batch size). I believe you can use their undocumented profiler, but I ended up just tweaking batch sizes until TF stopped crashing (yikes).

TF does not have in-place operation support for some common operations that could use it, like dropout (other operations do have this support, I believe). Even Caffe, which I used for my research in college, had this. This can double your GPU RAM usage depending on your model, and GPU RAM is absolutely a precious resource.

Finally, I've had issues where TF runs out of GPU RAM halfway through training, which should never happen - if there's enough memory for the first epoch, there should be enough memory for every epoch. The last thing I want to do is debug a memory leak / bad memory allocation ordering in TF.


The default behavior of TF is to allocate as much GPU memory as possible for itself from the outset. There is an option (allow_growth) to only incrementally allocate memory but when I tried it recently it was broken. This means there aren't easy ways to figure out exactly how much memory TF is using (e.g. if you want to increase the batch size).

There is also per_process_gpu_memory_fraction, which limits Tensorflow to only allocate that fraction of each visible GPUs memory. It's still not great, but has been helpful in keeping resources free for models that do not need all the GPUs memory.


It seems to me the reason for insisting on a verbose declaritivization of everything is obvious: it guarantees you can build run/traintime environments which scale your model automatically.

Google’s mindset isn’t “train this model to multiply by three”. It’s “train this model on a 1% sample of search traffic over the last year.” That’s reflected in the design choices of tensorflow.


Wait, (s)he’s arguing that the code isn’t imperative enough, but the punchline is they don’t like having to type session.run? I don’t understand their vendetta against the graph, which is a powerful abstraction that lets you choose different backend, and let’s tensorboard show you an awesome view of your computation. Session.run isn’t hard to type and it takes at most a few days to grok that everything is lazily evaluated.

Would (s)he like eager apis? Does (s)he want better c++ apis? Or does the author just want to hate on tensorflow because it’s been hyped so much?


> I don’t understand their vendetta against the graph, which is a powerful abstraction that lets you choose different backend [...]

You don't really need a graph to support different backends. One popular approach is to have different array implementations (e.g. CPU and GPU arrays).

> [...] and let’s tensorboard show you an awesome view of your computation

At the end of the post the author shows his API that lets you do the same things as Tensorboard, but for whatever framework you like.

All in all, expression graphs like these used in TF and Theano are great for symbolic differentiation of a loss function and further expression optimization (e.g. simplification, operation fusion, etc.). But TF goes further and makes everything a node in a graph. Even things that are not algebraic expressions such as variable initialization or objective optimization.


> You don't really need a graph to support different backends. One popular approach is to have different array implementations (e.g. CPU and GPU arrays).

And now you can (waves arms) write it twice! Alternatively, you can make the interfaces between various impls be exactly the same but rename them so they're purpose-named. Then you've written Graph, for the most part.


Except your graph is symbolic, and good luck getting a breakpoint to fire when the calculation is happening ... Or if you don't like debugging, the problem manifests itself with merely printing values too.


I believe the issue is that the code only actually runs when you calls session.run this makes debugging a nightmare. Sure TF can improve message, but it's not there yet.

IMO, TF2 should make dynamic execution first class and switch easily to static graph when you need to deploy stuff.



You don't even have to invoke session.run for frontends that can JIT the graph. Tensor flow in Julia is not declarative, for example.


I was reading your post and thinking, "Okay. Fair point. Although I don't face this problem while using Tensorflow, but I can imagine this could be a problem for some people", until I came across this sentence,

> Pytorch’s interface is objectively much better than Tensorflow’s

How does your subjective opinion about Tensorflow suddenly become objective? I don't see any fact based or metrics based argument that establishes this objectively. All I see is an argument based on your needs and preference. That is far from objective.


I thought I was alone.

I think the user-unfriendly of tf mainly comes from these two aspects.

1. The choice of python.

When I'm writing python, I spent more time debugging, (compared to something like OCaml), and sometimes it takes more time to get started because arguments in functions are documented rather than enforced by contract/compiler, (and those "if this pass arg1 else pass arg2" documentation will never let you have the same kind confidence as you would if you are using a more rigorous language), so you end up trying. This isn't specific to tf, but tf makes this more obvious because it's something like a language-in-language.

If tf was made by someone else, I would understand the choice, because the popularity of python and tons of library available, but since Google has unlimited resource, and AI is clearly the future, I really expect they have the courage to choose something else.

Sometimes I wonder, AI may gone rouge some day - not because that we deliberately make it so, but that a bug somewhere in our code.

2. Lack of maintenance.

We all like shiny new ideas, and get excited implementing them, but once the fun part is done, so goes the excitement. A good library needs to be tweaked and re-tweaked, some of these need boring hard work, smart people don't like that.

But hey, Google is doing this for free, as long as it's not deliberately made so (to stall the community), we should be appreciate, it's a open source project and that don't just mean we can use it for free, but also that we should done our part to make it better.


I'm not a comp-sci grad and I worked at Google for a bit after a startup I was at was acquired. I didn't proceed far after that 2 years due to the need to commute and/or relocate but I had a clear path into full time work without much else. Although I was granted a bit of a pass, I believe anyone can work at Google given they are 1) slightly above average and have learned every base that they run into in reasonable depth 2) motivated enough to try to interview at least a couple times including extensive "refreshing." Not many people pass the interview the first time so it's a +6 month play. Maybe longer. They look at progress from one interview to the next. Google is a big organization full of lots of people and not many of them are strictly that far above average. Maybe the Dunning Kruger effect may be shifting what I think of myself and the average developer there a bit but it's not unrealistic for any developer to think they can work there. It just takes interest and effort.


One should also ask the question: would you want to work at Google today. This is not the Google of 2002 ... or even the Google of 2010. 2017 Google is like 1999 Microsoft.

Lots of brilliant people working on heavily resourced projects ... but also significant bureaucracy and many political animals in what was formerly a pristine engineering "garden of eden".

You can see a lot of Google projects struggling now, and many startups in the same space as Google projects doing much better than more resourced teams doing the same thing at Google.

Nobody ever got fired at Google for spending all day brilliantly arguing on Google's internal newsgroups and not doing any real work. And it shows in Google's work culture. Imagine a person doing that at a startup ... or Amazon for that matter. They would not survive very long.

If you are young, and have many years of productive/earning years ahead of you, it might make sense to turn down a Google offer to try something a bit more "bloody" and hectic for a few years before you settle down in a comfy Google job.

There are many brilliant people you can learn from at Google ... but very few work very hard. And hard, productive work is a skill to learn too.


> Nobody ever got fired at Google for spending all day brilliantly arguing on Google's internal newsgroups

There was a pretty high profile example of this not all that long ago.


That was decidedly not "brilliant" arguing.


For spending all day in internal newsgroups or for posting one article that went viral?


Like he tacks on at the end of the article, the reason most care about TensorFlow is because of TensorBoard making most of this a moot point. People would code while being hung upside down as long as they still get TensorBoard.


The serialisation story in Tensorflow is an obscene mess. There are bugs open on keras and tensorflow asking how to export a model and run it on your laptop and even better...on Android. It simply is crazy bad and cannot be done easily.

In fact, to do even half decent export of TF models, you have to switch to keras to try and do any kind of export.

I have a 10 email conversation with enterprise Google Cloud support to try and get a ML Engine output serialised to work on Android.

There are threads open all over the place on stackoverflow and elsewhere - and yes, we have tried all SIX ways.


Out of curiosity, what problems are you running into? I have never had serious problems with the 'save parameters -> dump graph -> freeze graph -> load up with C API' path with feed-forward networks or various RNNs. Either from Go or from Rust.

Admittedly, the documentation in this area is extremely bad and I basically had to figure out myself how to do it, though this was long before 1.0.


The issues with the high level API goes slightly deeper. It looks like some of the graph operations are not available on Android (and equivalently on the desktop) by default[1]. The motivation for this is that we have stricter requirements for computation costs and application size on different platforms. So there is an approach that allows to compile minimal set of operations required to run the graph[2][3]. However it requires Bazel as the primary build tool for the Android app as well. The good news is that the TensorFlow team understands the issue and works on improving the documentation and tooling[4].

That said, would you be able to share any example snippets on how you are persisting and loading these models in your code ? That would be super helpful.

Also, im getting the feeling that you are using the deprecated method of saving. I think they are shifting to Metagraph now (not sure about this) https://www.tensorflow.org/versions/master/api_docs/python/t...

[1] https://github.com/tensorflow/tensorflow/issues/10254 [2] https://github.com/tensorflow/tensorflow/blob/master/tensorf... [3] https://github.com/tensorflow/tensorflow/blob/master/tensorf... [4] https://github.com/tensorflow/tensorflow/issues/10299


That said, would you be able to share any example snippets on how you are persisting and loading these models in your code ?

The relevant code is (still) in private repositories, but I have a deck with some examples using Rust:

https://www.dropbox.com/s/t8r056f6wqlktqv/embedding-tensorfl...


I can find no good empirically-based argument here against the declarative style other than an appeal to taste/preference.

The only effective difference IMHO is that the declarative style does deferred evaluation which makes it more difficult to examine "intermediate states" (aka "debugging") but as the author stated themselves, this is simply a matter of outputting those intermediate states/setting them as outputs... this is no different from examining intermediate state in every programming paradigm ever invented. Unit-testing every step is IMHO a far better way of debugging/ensuring validity, but this is possibly much more difficult in the "black box" environment of self-weighting neural networks than it is in your more traditional programming environment


> First, let’s look at the Tensorflow example

> Now let’s look at a Pytorch example that does the same thing

Reminds me Escobar (russian philosopher, not to be confused with Lopez-Escobar) theorem, which approximately translates into English as "With no alternative choice of the two opposite entities, both will be an exceptional nonsense."


Usually I'm on the hook for being open minded about programming languages in this forum, but I'm putting my foot down.

I really hope at some point this entire universe gets liberated from Python at some point. Even R would be more palatable. Both these examples are awful, error prone, obfuscated, and beholden to Python's difficulties with large sums of data.


Can you give a sketch of how you would like these examples to look? Any language is fine; you can pretend that libraries for GPGPU, backprop, and gradient-based optimization already exist.

I tried to do so myself and couldn't come up with anything significantly better, but I've been writing Python for a long time and might just be stuck in a local minimum :)


Sure.

Another comment has already pointed out that Julia framework. I'll relink it for completeness: https://fluxml.github.io

You can also look at Haskell's Grenade examples: https://github.com/HuwCampbell/grenade

Fundamentally different because the notion of what the compiler should be doing is fundamentally different, and the notion of how data should be input is somewhat different.



Can you turn your comment into something productive by showing a code example front another language that handles these cases better?


Other people will.

But it is productive to say, "An excessively declarative style has been considered bad in programming for over thirty years."

That is reminding people this is a largely solved problem that the industry refuses to embrace because our legacy rube-goldberg contraption of software refuses to accept ever abandoning anything in favor of simplicity.

A great example: Why are both of these examples using the error prone pattern of explicit bounds iteration?



Any example of better approaches in R?


I bet you love Perl


No. Julia and Haskell are my go-to suggestions.

But I'd even take Scala at this point, and you could do worse than Clojure.

Python code isn't especially "ugly" as in line noise. It's ugly in the sense that it's a deluge of outdated and ineffective programming paradigms that maximize the difficulty of writing code.


I got fed up with up with Python and switched to F#. My main issue was the lack of static typing made testing and debugging a major timesink and headache.

It's early days but so far F# it ticks all my boxes and I'm finding it an extremely nice language to write in. It is just as terse as Python (due to type inference) but statically typed, and there is a great plugin Ionide for VSCode which makes for a really polished development environment.

Plan is to use Microsoft's CNTK for ML/DL stuff.

For anyone frustrated with Python's duck typing, I highly recommend you check out F#.


Ummm; for me, as a total layman w.r.t. ML, the Tensorflow snippet actually looks much more readable and understandable at high level... I guess it might be harder to evaluate what's exactly happening inside (what precise algorithms are chosen, what are the runtime costs, etc.), but at least I have some kind of idea what this code might be doing. While between MSELoss, SGD, zero_grad and loss.backward(), all bets are off for me. So, what I want to say here, is that Tensorflow seems to at least win at readability; which I find a non-negligible aspect.


> Pytorch’s interface is objectively much better than Tensorflow’s

Ummm. No. 'Objectively' is utter nonsense. For an objective view we would need to define "better" first and measure both interfaces performance. I think it is preference. I prefer the Tensorflow interface and don't mind it's declarative style.

However, if one wants to criticize something one could start with the static nature of Tensorflow (which you rightfully mentioned), which makes it hard to do stuff like LSTMs and dynamic batching. That works better in Torch. To me that is the only real attack point for Tensorflow. But keep in mind, that Tensorflow is still "1.x" software and stuff like Tf-Fold addresses the problem with static graphs already and Google plans for more dynamic graphs in Tf 2.0.

Also it would have been nice of you to measured performance of the both frameworks on common problems. But looking at the interface and shouting "bad" at Tensorflow is not really critique but a personal dissatisfaction with Tensorflow.

I think that Tf currently aims more at production code than research stuff. The mentioned problem with being to low-level for simple stuff like layers is also not right. Have a look at the shipped contrib modules, you'll find common layers in there.


In my experience (computer vision, deep learning) PyTorch is substantially faster as well, especially in data augmentation where it’s not just a thin layer over cudnn.

That said, you’re right. There’s no way I’d deploy it to production.


Have you looked at ONNX? It is a neural network exchange format that, in particular, lets you deploy PyTorch models in production via Caffe2. Here is a tutorial: http://pytorch.org/docs/master/onnx.html

Disclaimer: I work on Caffe2 team (not on ONNX, though)


If I want to export my own computational graph to ONNX, what is the first place I should look at? Do you know about any documentation or reference implementation of the format?


Reference specification and validator are hosted in https://github.com/onnx/onnx


Does the ONNX thing actually work?

I'm super skeptical of these interchanges, because it seems very difficult to avoid train/test skew. Any difference in detail between the two implementations is a potential problem. I can imagine different order of operations putting out some values by 0.1%, causing 1% eventual loss of accuracy.


It should, if you find that converted model doesn't work, it is a bug, please file it with PyTorch and/or Caffe2.

Implementation details of course differ between frameworks, but luckily neural networks are very robust to noise. In my experience, changes like using Winograd/Fourier for convolutions, or even running the whole thing in FP16 do not result in noticeable artifacts, and these are among the biggest differences you could have between frameworks.


We are taking a pretty hard look at ONNX right now. It doesn't support a lot of ops yet. I would add that it's very early for supporting anything "real" yet.


> There’s no way I’d deploy it to production.

I've seen this stated by a few people, but never justified. What are some reasons for not deploying PyTorch in production?


Nonsense, Pytorch is great in production. I use it for Mathpix (mathpix.com) which processes 20 million images per month.


Don't take this the wrong way, but 20M images per month is only about 20 qps or so (20M / 3600 / 730 => 7.6 qps, but that's unrealistically even). That's pretty manageable even on a single box, depending on the network and the size of the box. When people say they're concerned about a system in production, they often mean when needing to roll it out to many machines due to higher scalability requirements.

Disclosure: I work at Google, but not TensorFlow.


That's a fair point, but just as a counterpoint, for plenty of people, 20M per month is a lot of traffic, and it's good and valid to hear about experiences of people putting PyTorch into production with this scale.


"Production" tends to be an overloaded term. Arguments like this tend to lead to "My production is bigger than your production!" which is frankly silly.

You could mean large scale, real time, "small batch job with online lookups from a CRUD database",..

Let's just admit it's use case specific and move on.


Just one instance of our product has to process 39 million inferences per month (number is dictated by frame rate), with predictable latency, 24x7, on both Windows and Linux.


About 8 images per second? How does that prove that Pytorch is great in production?

I am not saying that Pytorch is bad in production but I fail to see how your metric of 8 images per second proves anything.


While you're absolutely right that vanity metrics prove nothing, it does prove that you can choose Pytorch for a production environment with a decent amount of traffic and be happy with your choice.


Why wouldn't you deploy it in production - on server side at least?


Because TF lets you run _just_ the model, with no Python stuff at all. For us there is no “server side”, our stuff is deployed on prem. The fewer variables there are the easier our lives are going to be.

Additionally, PyTorch download page warns you point blank that it’s an early version of the software and that you should “expect some adventures”. Adventures are fine for research, but inadvisable in production IMO.


I understand the appeal, but IMO wrapping the Python stuff in a docker container is fairly simple.

Making it run at scale using something like kubernetes is more advanced stuff but still within good devops practices.

Unless, of course if you want to run on exotic HW, like TPUs. But that's an issue for Googlish scales.

Edit: BTW, if on premise you mean Windows client software your point is totally valid, Python would suck for this.


Depending on how much extra logic you add around the model serving library, then you are exposing your source code, too. Maybe you care, maybe not.

Re: Windows, there's no TF serving for that, so TF is no better than other Python frameworks.


...and that's why you are using Keras instead.


I like Keras, but I always find myself having to write TF code whenever I need to implement something more interesting. And debugging, already hard in pure TF, is more complex due to the extra layer.

IMO, learning TF or pytorch is more effective at least in the current state of affairs.


But that is precisely how you should be using Keras!

* If you are implementing a standard model (that's 90% of industry use cases, and a large fraction of research use cases as well), Keras primitives considerably simplify your workflow and make you a lot more productive.

* When you need to implement something highly customized or unusual, you can revert back to writing pure TensorFlow code, which will integrate seamlessly with your Keras workflow (via custom layers, functions etc).

Basically, Keras increases your productivity for common use cases, without any flexibility cost for rare/custom use cases. It is meant to be used together with TF, not as a replacement for TF.


You also created the backend abstraction that let you customize a lot without real "raw TensorFlow". Thank you Mr. Chollet.


That's true. Keras excels when you have some pre-baked DNN you need to implement quickly; if you plan to do some advanced things, you get back to backend anyway.


I think the tensorflow code could be improved a bit if the author knew about the layers and loss modules.

  import tensorflow as tf
  import numpy as np
  X = tf.placeholder("float")
  Y = tf.placeholder("float")
  pred = tf.layers.dense(X,use_bias=false)
  cost = tf.losses.mean_squared_error(labels=Y, 
    predictions=pred)
  optimizer = 
  tf.train.GradientDescentOptimizer(0.01).minimize(cost)
  with tf.Session() as sess:
      sess.run(tf.global_variables_initializer())
      for t in range(10000):
          x = np.array(np.random.random()).reshape((1, 1, 1, 1))
          y = x * 3
          (_, c) = sess.run([optimizer, cost], feed_dict={X: x, Y: y})
          print c


Well, working with Android will tell you one thing: Google is abysmally bad at designing APIs.


yeah. one of my favorite things is the collection of FLAG_* constants on Intent: FLAG_ACTIVITY_CLEAR_TOP, FLAG_ACTIVITY_CLEAR_TASK, FLAG_ACTIVITY_CLEAR_WHEN_TASK_RESET, FLAG_ACTIVITY_FORWARD_RESULT ...

the android SDK appears to lack a coherent, overarching concept or set of guiding principles which the client programmer can internalize and rely upon. there are many special cases and inscrutable behaviors.

like, sometimes you ask for a piece of work to be done by the SDK and it calls you back on your implementation of a Listener class. but, other times, you have to register a BroadcastReceiver. still other times, you have to override OnActivityResult. but, then, there are these other times when you have to create and provide a PendingIntent.

you go through enough of this stuff and you start to wonder: "Did there really need to be soooo much variety in the way these SDK methods hand info back to the client? Couldn't they have standardized this?"

it's almost like Google turns their (very smart) programmers loose on the SDK and never looks back. google seems to just defer entirely to their opinions and judgments. smart as these programmers are, they seem to have different tastes, different approaches to API design.

and it all goes into the SDK.


I'm not an AI expert and barely understand how Deep Learning works. It was always my impression that I am the target audience for Tensorflow: mainstream tech workers from the corporate environment who want to add deep learning into their toolbox, while people actually doing research on the matter seemed to prefer Scikit. Wouldn't this justify the adoption of the declarative model and the aggressive abstraction of internal workings?


Researchers definitely do not prefer Scikit. If anything, Scikit is a better option than tensorflow for "mainstream tech workers from a corporate environment".


I'd say you have it kind of backwards. Tensorflow is best for research and developing novel ML solutions, scikit-learn is great for solving normal every day ML problems with normal everyday ML algorithms.

Scikit-learn however doesn't do deep learning. That being said most problems faced by mainstream tech workers don't actually have huge data sets or need deep learning and for those problems scikit-learn is great. Tensorflow only really comes into its own if you have/need some combination very huge data sets, very deep networks, a novel or non-standard network configuration and a large cluster of machines to run your learning on. If you want to apply a standard ML algorithm in the standard way on a 'small' dataset and aren't super constrained by performance then scikit-learn is almost always a good choice.


Scikit isn't deep learning.


Personally, I hate it that all these libraries are so much geared towards neural networks.

Why can't we just have compute networks that can be used for anything, from computational linear algebra to deep learning?

> Let’s be honest, when you have about half a dozen open source high-level libraries out there built on top of your already high-level library to make your library usable, you know something has gone terribly wrong

I consider that not a bug but a feature.


Why can't we just have compute networks that can be used for anything, from computational linear algebra to deep learning?

You mean, like Tensorflow?


Have you looked at the APIs for any of these libraries? I think there is a perception that these are all for NN only because NN is the hot topic that everyone is jumping on noe; but tf for example is a general matrix manipulation library with a bunch of amazing, extra NN stuff shipped alongside it.


This is a good observation! Tensorflow is a practical and flexible dataflow implementation. See for instance Greta, https://github.com/gavinsimpson/greta -- a framework for Bayesian stats built on tensorflow.


PyTorch and Chainer take this approach.

CuPy is even designed to be drop-in compatible with Numpy; I don't think there is much support for LAPACK routines at this point though.

Doing MATLAB-esque work with tensorflow would require some special tolerance for pain, or would require one to be a masochist.


Neural Networks are a very small part of TFs API surface area. I'm writing tensorflow code every day and not doing anything remotely close to a neural network.


Don't flame me but how does tensorflow compare with Azure machine learning platform. For me it provides a great platform for practitioners


I think these are not comparable. Tensorflow is a software library, Azure is a compute environment which allows one to run, among many other libraries, tensorflow implementations of ML models.


I think GP is referring to Azure Machine Learning Studio[1], which does seem like it might be comparable to TF. That said, I don't know enough about either to answer their question.

1: https://studio.azureml.net


Quick shout out for DyNet! The API is so simple and intuitive even a two year old (a prodigious one) can use it.


DyNet is also vastly faster than Tensorflow for realistic NLP models, especially on CPU.

http://dynet.readthedocs.io/en/latest/index.html

If you're doing the sort of work that would be published at ACL or EMNLP, DyNet is a really good choice.


Tensorflow does do really well when it comes to serving models in production. Tensorflow serving or TensorRT 3 are fairly throughput efficient and low latency. PyTorch, for instance, does not have a good serving solution (I guess that's where Caffe2 is useful)


Hell, being able to effortlessly switch between PyTorch and Numpy/SciPy/sklearn/skimage has been so helpful for the project I'm working on. That and I have tensors in later layers whose shapes depend on the training of the previous layers.


I have tensors in later layers whose shapes depend on the training of the previous layers.

Rad! Do you have any examples (or literature) that explains when this is beneficial?


Not yet! I'm not using convnets or backprop or anything so I don't think it would be beneficial that way, but you could get something similar to what I'm doing by looking at Fritzke's Growing Neural Gas[1]

[1] http://papers.nips.cc/paper/893-a-growing-neural-gas-network...


Neat, thanks for the link.


A few more for your selection,

- Bloated build system that is near impossible to get working - who even uses maven ?! Pytorch/Caffe are super-simple to build in comparison; with Chainer, it's even simple: all you need is pip install (even on exotic ARM devices).

- The benefits of all that static analysis simply aren't there. In addition, PyTorch has a jit-compiler which one can argue lets one have their cake and eat it too.

- Loops are extremely limited. Okay, we know RNN/LSTMs aren't really TF's thing, but if you venture out to do something out of the ordinary even making it batch-size invariant is difficult. There isn't even a map-reduce op that works without knowing the dimension at compile time. You can hack something together by fooling one of those low level while_loop ops, but that just tells you how silly the whole thing is.


I love that PyTorch kind of went all-in with anaconda. Building it is so much easier than TF! I'm a recent convert but it's dang good.


That's a serious point of frustration for me. Having an option to use anaconda, fine. Forcing it on your users, meh. I already have a working system using virtualenv and pip, why force another on me?


Why not pip?


For installing it, yeah pip is great too, but for building conda includes third party tools and libraries and stuff. e.g. in order to use the MPI backend for PyTorch's distributed processing you need to build it yourself and conda just makes it a bit easier. That and I had a real bad experience with trying to build Tensorflow (and Bazel) to run on an HPC cluster.


TF doesn’t use Maven. Also, its build system lets you build it for a lot of different platforms in a uniform fashion, and speeds up the development cycle due to fast incremental builds and built in incremental, parallel, multi-language test support.


Very well, Bazel then. I'm sure it's all the rage in the cube-farms, but that doesn't necessarily make one's life easier.


How do you even make a "Tensorflow Sucks" article and not mention CNTK?


So what would you include in the next-generation ML tool?


For nice looking basic statistics of your dataset the pandas-profiling library will do the trick https://github.com/JosPolfliet/pandas-profiling




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: