Case in point - the Azure ML Jupyter notebooks run on a farm of Linux/docker machines:
Once a model is built, it can be deployed for production/scale to the Azure ML backend which runs entirely on a farm of Windows machines.
Both environments have matching Anaconda distros running underneath which makes running on two different OS's practically a non-issue. In fact 95% of our users probably have no idea their notebooks & production site run two OS's (in large part thanks to Python, Linux, Docker, Anaconda, ... and lots of other great open source software).
Of course, there's time and place for eschewing Anaconda, especially if you are trying to ensure your installation and dependencies are minimal. That said, for most data scientists who are workaday *NIX users at best, Anaconda is such a productivity boost.
EDIT: Actually, my analogy needs further qualifications. You _can_ get into dependencies/missing-header-files hell in Ubuntu, but it usually doesn't happen in the first few hour/days. Both Ubuntu and Anaconda have great "first 5 minutes to 5 days" experience, and that goes a long way in ensuring the adoption of underlying technology.
pip install jupyter
pip install numpy
pip install scipy
pip install scikit-learn
pip install matplotlib
The only problem I had was with OpenCV, which requires manual make installation if you want the contrib package. The other problem was when trying to install scikit-learn, it requires manual pip installation of scipy.
For example, you can't get through PyYaml unless python-dev is installed on Ubuntu. I am not sure if wheel would fix it but I don't think so.
Anyway I just noticed that PyPI only supports binary packages for Windows and Mac OS X. Although, you could still generate wheels of packages that you use by using something like this:
pip wheel -r requirements.txt
So yeah Linux is not the most friendly environment for Python Wheels.
I don't see the desire to have pip as the baseline. For me, the conda packaging is much more informative and placing everything you need for multiplatform support into an /info directory with a meta.yaml is a lot more effective than going through the steps of PyPI. conda also makes uploading and hosting on anaconda.org extremely easy.
Normally there is the whole "gee, I don't want to learn another package manager" -- but conda / anaconda.org is extremely worth it. It really is a major engineering step forward from the existing package deployment strategies in Python.
I even configure my travis.yml CI scripts to download Miniconda, create a conda environment from a requiremenets.txt, and then build and test my code via conda on the contiguous integration VM itself.
The only worry is how strongly tied conda and anaconda.org are to the future of Continuum. Given how much Continuum speaks of open-source work, one would hope that these projects essentially live independently (or that forks of them would) but you never know. I do admit that is a major downside.
I'm halfway hoping that I don't know what I'm talking about.
Glad to help :)
It's nice that they have an env tool, but why didn't they just use the existing virtualenv?
Why doesn't anaconda's env work for you?
Conda manages more than just Python packages, but other dependencies as well. For example, here's list of packages in a new environment I created with
$ conda create -n hn python
$ source activate hn
$ conda list -c
For all the talk of vendoring in package manager tools these days, I really, really like the way conda manages this stuff.
Then the Buddha gave advice of extreme importance to the group of
Brahmins: 'It is not proper for a wise man who maintains (lit. protects) truth to come
to the conclusion: "This alone is Truth, and everything else is false'.' Asked by the
young Brahmin to explain the idea of maintaining or protecting truth, the Buddha
said: ' A man has a faith. If he says, "This is my faith", so far he maintains truth. But by
that he cannot proceed to the absolute conclusion: "This alone is Truth, and
everything else is false". In other words, a man may believe what he likes, and he
may say 'I believe this'. So far he respects truth. But because of his belief or faith, he
should not say that what he believes is alone the Truth, and everything else is false.
(from "What the Buddha taught, a really great book!)
What if he believes is objectively wrong?
EDIT: Of course he's talking to people who swear by the truth they protect. Instead of telling them they're wrong, telling them others might be right is far more likely to get them to consider his point of view -- that other "truths" are just as equal.
1. No matter how much you ever think, as a scientist, that you "only do an analysis one time" it is false 99.99999% of the time. You will always want to run it multiple times. Other people will want help modifying and running variations of it. Employers will need you, the scientist, to "productionize" it and make it suitable for automated deployment, probably cross-platform.
2. Your analysis will have to adapt to changing data inputs, which means you invariably have to create a (well-designed, unit-tested, and best-practices compliant) tool kit for custom data cleaning, pre-processing, database I/O, file system I/O, and visualization.
3. You will inevitably need to be concerned with raw-metal performance, but generally in isolated pockets of your code, so you'll need a language like Python that supports targeted performance optimization with tools like Cython.
4. Code is read (especially by newbies who need your help) much more than it is written, so you need a language that is easy to explain and reason about, with very few syntactical tricks and complicated conceptual nuances.
Overall, Python suits this niche very well. It is a full-service object-oriented language with a huge and well-maintained standard library. The third party tools for machine learning and general numeric computing are by far the best in the open source world (apart from a handful of boutique R libraries, which can use via rpy2 anyway), and Python is a simple language that is easy to teach and explain but also supports lots of targeted optimization in the CPython layer.
From the very first line of code you write, when you still naively believe "I will only run this once and I just need to crank it out," you need to be obsessed with writing well-designed, extensible, unit-tested code that is only a short distance from already being "production ready" -- and Python is a great language choice for this.
That said, one of the things I've appreciated about R is how it "just works"...I usually go through Homebrew, but RStudio works just as well. I can see why that's a huge appeal for both beginners and people who want to do computation but not necessarily become developers.
Also, I used to hate how `<-` was used for assignment...but now, that's one of the things I miss most about using R...I've grown up with single-equals-sign assignment in every other language I've learned, but after having to teach some programming...the difference between `==` and `=` is a common and often hugely stumping error for beginners. Not only that, they have trouble remembering how assignment even works, even for basic variable assignment...I've come to realize that I've programmed so long that I immediately recognize the pattern, but that can't possibly be the case for novices, who if they've taken general math classes, have never seen the equals sign that way. The `<-` operator makes a lot more sense...though I would've never thought that if hadn't read Hadley Wickham's style guide 
However, when you dig into the R internals, and you learn about its generic function model of OO, and about the mangled history of S3 and S4 classes, it becomes very frustrating.
R mostly "just works" if you stick to the libraries. But if you want to really understand e.g. polymorphic dispatching and how you can design your own tools to use it, it's a deal-breaker pain in the ass in R. It just simply is not suited for real computer science situations when you need to design the software, rather than just making scripts that treat libraries as APIs.
Since the times when you need "just scripting" are about 0.00001% of real-world cases, it unfortunately means that as nice as R is, it's just not a good enough tool to standardize into a real-world workflow. You're way better off using Python, even if you have to give up easy access to certain libraries, re-write your own implementations, or kludge them on with tools like rpy2.
I would argue the opposite is true.
Most of the time an analyst builds regression trees in the mktg department for a retailer. Or similar situations like this.
It is exactly all of these fractured cases where you're running analyses in some bespoke corner of a business that it is most critical that the analysis tool is productionized and unified.
Plenty of businesses don't do it this way, but their reliance on manual scripts, manual environment and dependency management, manually tracking which versions of code produced which results, and manually calibrating on historic test data, it all adds up to incredible problems that eventually require re-writes focusing on a systematic architecture.
Big multinational IT vendors are even worse than the vending machine example. They cater to situations where a firm has some internal power struggle over IT and the struggle is won by one side who gets the certified approval of a statusy external vendor.
For instance, most big, showy technology re-orgs around Hadoop are unequivocally bullshit. Most firms do not face data problems that are appropriate for Hadoop, not even when their primary enterprise data store approaches the raw size at which Hadoop becomes plausible, because the types of analytical workflows still tend to involve much smaller data.
These firms hardly adopt Hadoop for pragmatic engineering reasons, and engineers may even resist the re-org and offer compelling evidence that it is a waste of money, yet are ignored.
Vendor offerings are much more about catering to the CTO or some other manager level employees, and focusing on b.s. on-paper "milestones" that the product can help them achieve. Many times, merely just the installation and integration of these tools can result in a multi-hundred-thousand dollar bonus, or more, for people high up the IT food chain -- even before anyone has seen whether it will bear any fruit or even come close to being cost effective.
Vendors will do this with anything they can, so it's not surprising they would do it with R technology and they also absolutely do it with Scala, Python, Spark, etc. Big vendors peddling all of that truly are the hotel lobby snack machines of the software world.
I'm not saying it's bad to have R as part of a professional computing environment. R is a great tool. I'm saying that for most business situations, people who tend to make the assumptions necessary to justify using R are often very wrong, and the clunky and inconvenient support for professional computer science and software design in R bites them, making Python a much more pragmatic choice.
Whatever vendors do is pretty irrelevant to this.
As for the real-world percentage, it varies depending on the job -- R is primarily designed for the people who spend most of their time looking and thinking about the data, rather than the code.
It is quite telling that a lot of people here mention packages such as ggplot2 as the advantage of R -- ggplot2 is really quite awesome, but I can see it being similarly powerful had it been done for Python (maybe with a bit more unwieldy syntax, but that stuff is really subjective).
R really shines when you are doing actual data analysis -- things like the data.frame object, first-class missing values, proper attention paid to inference (something I quite often see missing from stuff written in Python or even Matlab), model syntax, estimation objects with actually useful pretty-printed summary, huge variety of statistical plots, everything designed to be convenient to use in the REPL, etc. -- and since these things are expected, you will usually find them in CRAN packages as well. Python (and Matlab) are IMO much less consistent in that regard.
But if your task is primarily to implement a particular type of analysis, put it in production to be easily repeated in the future, and go to next problem, R indeed offers little benefit over Python unless you rely on a specific package (which, btw, you should only do if you understand exactly what it does, because most R packages are written by statisticians, and as a result have very little fool-proofing).
it is in Prolog
each of these languages has its strong point.
there is always some library you want to use in some other language.
or you want to collaborate with someone.
or next year you change your mind and Julia is finally good enough.
"Bokeh is a Python interactive visualization library that targets modern web browsers for presentation."
see the CustomJS function.
Isn't that like saying that C has no visualization libraries on Intel processors because only assembly runs there?
However both Beaker and Jupyter lag behind RStudio in raw usability and convenience, especially for plotting. There doesn't seem to be any tool in this vein that "has it all" right now.
I would really like to hear what usability and convenience advantages RStudio has for you...
1) the IDE like environment. Although I do a lot of work "literate document" style, there's a very fluid relationship between the one-off / throw away code that ends up in those documents and the code that ends up becoming reusable. RStudio puts the interactive environment into my hands but gives me tabs with full documents to work with so that I can take those ad hoc bits of code and seamlessly move them into reusable scripts, or even fully structured libraries and program code. Conversely, any piece of code in those scripts or libraries is trivial to throw back into my interactive console to use ad hoc as well. This fluid workflow perfectly matches my development style: starts with experiments to figure out how to do things which is very ad hoc, gradually matures and eventually ends up being a static solid thing that I want to capture as a reusable component. With Beaker and Jupyter I feel like I am always stuck in "throw away code" mode, and there's a high friction to move code between the two paradigms.
2) The plot display seems to work a lot better. With both Beaker and jupyter I find the inline plots are too static - always the wrong size or the wrong shape, etc. In RStudio I can drag and resize that window independently.
I feel bad because I'm not telling you solutions and I don't really have any - but these are the things I love about RStudio. Of course, the downside of RStudio is ... R.
i hear you on 1 -- agreed & some help is on the agenda: https://github.com/twosigma/beaker-notebook/issues/3688. and then editing regular flat files for inclusion via the backend's native import (even into non-beaker programs).
on 2, yes many plotting APIs have that problem. beaker's native plotting library produces a thumb so you can drag to resize: https://pub.beakernotebook.com/#/publications/568ddaed-b648-...
I know what you want to ask next: “Okay, what about turning my model into a nice and shiny web application? I bet this is something that you can’t do in R!” Sorry, but you lose this bet; have a look at Shiny by RStudio A web application framework for R.