
A book to learn R and Python in parallel for Data Science - zelda_1
https://github.com/rnorm/book_sample
======
mh12345
R has a nice web development framework called Shiny. While it is not
comparable to say Django or Flask, Shiny does make it incredibly easy to share
data analysis. If one wants to share statistical analysis or create a data
oriented dashboard, then there is definitely a reason to consider R and Shiny.
Note that Python has Dash, which is comparable to Shiny, but it is less mature
as far as I know.

While previously Shiny was primarily deployed through RStudio's solutions,
there are now open source initiatives such as ShinyProxy, introducing
Kubernetes as an option for deploying Shiny applications. The latest
iterations of Shiny related libraries are facilitating automated testing and
deployment. These developments allow companies to use Shiny in production, but
it has to be said that the R ecosystem is not as developed as Python's from a
traditional software development perspective.

~~~
eoinmurray92
Dash by plotly is also amazing its like shiny but for python! - we were able
to whip together an app that would let you drag and drop xyyy data and get a
scatter plot instantly - you can try it here (first load takes 1-2s):

[https://dash-app-dx9g2r0la6-8000.cloud.kyso.io](https://dash-app-
dx9g2r0la6-8000.cloud.kyso.io)

It was also really easy to make it, maybe 250 lines of python in total

(guide to making this app is here: [https://kyso.io/KyleOS/creating-an-
interactive-application-u...](https://kyso.io/KyleOS/creating-an-interactive-
application-using-plotlys-dash))

~~~
knbknb
I thought the Python equivalent to Shiny is Bokeh, see
[https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery](https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery))
?

~~~
eoinmurray92
Yeah Bokeh is also excellent

------
billfruit
I sometimes wonder is there any reason to learn R at all, since python eco
system has absorbed most of its advanced statistical functionality, coupled
with the factor that python environment is much more general, with
capabilities to fetch, decode/encoded data,work with binary data databases,
web frameworks for presenting etc.

~~~
minimaxir
I use both Python and R. tidyverse/ggplot2 alone are enough reason to use R,
and are _substantially_ faster for tasks that utilize those packages than the
equivalent in Python (in my opinion).

Although I haven't had as much reason to use _base_ R. For more ML-related
tasks I do go back to Python.

~~~
jwilbs
This. I’ve contributed code to popular libraries in both languages, and while
I (overall) have a preference for python (mostly due to it being general
purpose), I find R code unparalleled when it comes to raw data
manipulation/analysis.

The overall api of tidyverse packages is such a joy, and recent improvements
in purrr/tidyr allow me to construct nested data analysis workflows I couldn’t
even dream of in python.

~~~
ppod
One random example I found recently is a tidyverse package called forcats that
has lots of nice functions for categorical data. For example, it has a single
function that merges all categories with a frequency of less than a certain
threshold in the table into a new category like "other" or whatever. This is a
task I often need to do, but as far as I can see it's a bit of a hack in
python or pandas. It's just lots of little things like this, especially
wrangling data tables.

[https://forcats.tidyverse.org/reference/fct_lump.html](https://forcats.tidyverse.org/reference/fct_lump.html)

There's also the data.table package for this kind of data work, which is maybe
less used but seems to have better performance.

------
cwyers
Nobody would write R code the way this book is teaching. For that matter,
nobody looking to do linear regression for data science in Python is doing
their own matrix math, either.

~~~
conjectures
Linear regression should be regarded as the statistical equivalent of
stripping down a rifle, reassembling it and checking its function. If you
_develop_ any statistical software, you're going to end up doing it at some
point.

------
dajohnson89
This is an interesting concept. It makes perfect sense to learn both
simultaneously. On the other hand, it must be confusing at times. Imagine
learning two languages at the same time, from the same book. It's an
experiment I haven't tried, but i'm curious about the outcome.

~~~
tyingq
Also interesting to learn two languages, in parallel, where neither is
particularly good at parallelism :)

~~~
hjk05
What makes you think R or Python are bad at parallelism? My experience is that
both are very decent.

~~~
tyingq
Both have packages that can manage subprocesses. Both have inherently single
threaded interpreters.

~~~
rpier001
What technology in what language are you comparing them against?

------
photon_lines
Nice work!!!

If anyone is interested, I also made a 'Learn R by Example' project which
attempts to teach R through code comments:
[https://github.com/photonlines/Learn-R-by-
Example](https://github.com/photonlines/Learn-R-by-Example)

------
samt430
Apart from the odd library I have rarely found much benefit to using both
languages for DS as you end up expressing the same paradigms just in different
syntax. And I think for good reason too - the basis of the tools used to do
data science arent in the languages themselves but the packages built for the
task which is why there's often an R equivalent of a Python package and vice
versa. So in effect almost no one 'uses' R/Python for DS as much as Rube-
Goldberg highly-optimised compiled libraries together using different
syntax.ie dplyr/pandas/scipy/ggplot etc are the real stars of the show.

Rather than R vs Python I hope one of two things happen. Either both languages
get replaced by a 'better' ML language eg Swift / Julia giving us users a
'turtles all the way down' experience and removing the reliance on complied
packages. Or, second option, they get relegated even further into being
nothing but glue between some common data formats specific to the type of work
found in DS allowing you the user basically a choice between syntactic-sugar
of one glue-language versus the other. Something like Apache Arrow springs to
mind but I'm not sure where they are at the moment

------
purple-again
I read a good portion of the first chapter and skimmed the rest. I am very
much enjoying this book and hope that you continue to write more chapters.

~~~
zelda_1
thanks! I'm planning to add a few more chapters.

------
Lanrei
Shouldn't '<-' be used instead of '=' for variable assignments, as they aren't
the same thing in R.

~~~
_Wintermute
It's a large source of bike-shedding in the R community but out of the 5
assignment operators in R, those two are largely the same.

There's a good explanation here:
[https://stackoverflow.com/questions/1741820/what-are-the-
dif...](https://stackoverflow.com/questions/1741820/what-are-the-differences-
between-and-in-r#51564252)

------
starpilot
Python and Julia might make more sense today.

~~~
demirev
Is Julia actually used that much? I've been hearing people herald it as the
next big thing for the last five years or so, but it doesn't seem like it has
taken off. I personally don't know anybody who uses it professionally (I know
plenty of people who use R professionally). The most recent SO survey also
indicates that it is rather unpopular.

~~~
ddragon
Julia just got to 1.0 last year, and it does have areas where it's already
between the best options in scientific computing such as differential
equations solving and mathematical optimization. Regardless of not being the
most popular (against the behemoths that have many times it's age and
support), you shouldn't have trouble doing most stuff with it from machine
learning to statistics. And it's a pretty fun and fairly unique language to
learn and use.

------
Y_Y
How come there's no source in the git repo? You shouldn't just throw up the
PDF and call it a day, github isn't just a trendy file host.

~~~
zelda_1
Good point. Just added the code. Thanks!

~~~
Y_Y
Thanks for adding the source for all the code snippets. I'd also be interested
in the LaTeX (if that's what you used) source for the book itself if you feel
like adding that.

------
cttet
I learnt Matlab/R/Python/Javascript altogether. It was a mess for me to grok
all the similar-but-different syntax, but it make me realized more about the
real essence of what is really important for the domain rather than language
details.

------
jamisteven
I feel like ever book ive ever read, on any programming language, makes me
immediately want to pound my head into my desk. Nothing against the author,
its just so clear that as it pertains to programming, being good at
programming, and the teaching of it never come hand in hand. Same goes for
real life, some of the best data scientists I work with, cant for the life of
them explain concepts, and then the ones who are great at explaining it, can
rarely execute with the same eloquence.

~~~
stevewodil
This is true for a lot of things! For example, the famous artists that perform
the music on stage (nowadays, at least) likely didn't write the song they are
singing.

Teaching and executing are two separate skills. Fun little anecdote, in high
school I had this AWFUL science teacher. He would literally just have us watch
Crash Course videos to get the concepts. Turns out he was a relatively
distinguished scientist himself..

------
master_yoda_1
But why? If you would never join NASA then why train to be an astronaut. Do
something useful in life.

------
awestley
I picture the author as an early-to-mid 20-year old that is great on the data
sci side but weak as a developer. Considers "having to learn" python or R as a
hurdle to their accessing their innate mathematical prowess.

~~~
purple-again
Literally the second sentence in the book he states he was an undergrad in
2006.

