
You Can’t Do Data Science in a GUI - gk1
https://blog.dominodatalab.com/data-scientist-programmer-mutually-exclusive/?r=1
======
SkyPuncher
Sounds like the same argument people have against Wix and Squarespace - "you
can't make a website in a GUI".

Yes, you can - but you'll be pretty limited. If you're a brick and mortar or
service focused business, a website builder is great. If you rely deeply on a
customized web experience, you need to do something custom.

Same with data science. You can get pretty far with some simple data analysis
tool. If you need to go farther, then you need to build custom solutions.

~~~
joe_the_user
I thought the argument that a command line gives you reproducibility in a way
that a GUI doesn't was good.

Most of the things someone does in Photoshop don't have to be redone
repeatedly. For system administration or I'd guess data science, a lot of
things need to be redone regularly. Using the command is good for both doing
that and getting in the mindset of doing that.

~~~
TheAceOfHearts
Doesn't Photoshop have a macro system for reproducible commands, though? I'm
not an expert user, but I've definitely seen they have some form of automation
available.

~~~
e12e
And a history api - with a few settings, and making the psd file (not a
flattened jpeg etc export) - photoshop gives similar information to a series
of high frequency vcs commits.

Granted, since Photoshop itself is closed source (and on a subscription model)
there's some very strong limits to scientific replication of a process.

But one could do something similar with gimp, additionally aided by python
scripts.

So yeah, Photoshop bad; cli good isn't as clever as all that as a blanket
statement (not implying anyone said exactly that; just making an observation).

I see the talk/article is about "data" science ; but the headline reminded me
about an Alan Kay talk about teaching - where there's a clip of kids filming a
fallen object and then juxtapositioning the video with a rendered sequence
based on (v=at etc): whole video worth watching, but see the "16:17" link in
the transcript ("Now, what if we want to look at this more closely?"):

[https://www.ted.com/talks/alan_kay_shares_a_powerful_idea_ab...](https://www.ted.com/talks/alan_kay_shares_a_powerful_idea_about_ideas/transcript)

------
vijucat
Actually, I really like these two GUIs which build on the strengths of R by
combining the data-exploratory tools that Hadley Wickham created with the
visual appeal of ggplot2 (also Hadley!), plot.ly, etc;

1\. [https://exploratory.io/](https://exploratory.io/)

2\. Radiant: [https://radiant-
rstats.github.io/docs/index.html](https://radiant-
rstats.github.io/docs/index.html)

Microsoft IDEAR looks good, too:

[https://github.com/Azure/Azure-TDSP-
Utilities/blob/master/Da...](https://github.com/Azure/Azure-TDSP-
Utilities/blob/master/DataScienceUtilities/DataReport-Utils/R/team-data-
science-process-idear-instructions.md)

In addition, the automatic insights generated by Power BI are another example
of how GUIs can help even the hardcore command-line ninja:

[https://docs.microsoft.com/en-us/power-bi/service-
insights](https://docs.microsoft.com/en-us/power-bi/service-insights)

~~~
hadley
I love exploratory - it's an a really interesting place between GUI and code
autocomplete on steroids,

------
AriaMinaei
I didn't watch the talk. So please let me know if I'm mistaken about this one
particular point:

That the author seems to be warning of limitations of certain solutions, but
generalising those limitations to GUIs as a whole.

This is wrong.

There is nothing inherent in a GUI that would make it unsuitable for coding.
Code does not equal text. Code, can be represented in many ways. An AST is
code. Text, in a certain syntax, would represent the same code. And so can a
GUI.

Now, is there a GUI for general-purpose programming that I'd want to use today
for production work? No.

Will there be one in the future? I believe so.

But people discounting coding GUIs left and right, just because they haven't
seen a good example of it yet, only discourages others to explore it further.
It's a self-fulfilling prophecy, to some degree.

Anyway, here is a text/GUI-based programming environment (single data /
multiple representations) that you might want to play with: luna-lang.org

~~~
flukus
> But people discounting coding GUIs left and right, just because they haven't
> seen a good example of it yet

It's not just because I haven't seen a good example, it's because we have 50
years of bad examples. When something has been tried and failed by so many
people for such a length of time you have to at least consider the possibility
that the idea is just a fundamentally bad one.

I think I'm more likely to see flying cars in my life time then a decent GUI
for general purpose programming. The problem with both is that they are
fundamentally flawed.

~~~
TuringTest
Arguably, modern IDEs are decent GUIs for general purpose programming.
Comparted to programming by editing files in a bash shell, they provide lots
of visual tools (autocomplete, debugging) to track the dependencies between
objects in the code, which is what a graphical interface provides over
independent files.

------
namuol
Yes, healthy skepticism is good, but just because code (i.e. text files) is
often the most _powerful_ or _flexible_ tool, doesn't mean it's always the
best tool.

We (programmers) are notoriously bad at advancing the tools in our field.

For a brief history of this, watch "The Future of Programming" talk by Bret
Victor:

[https://www.youtube.com/watch?v=8pTEmbeENF4](https://www.youtube.com/watch?v=8pTEmbeENF4)

~~~
flukus
That guy gets an a+ for presentation but I couldn't find much to agree with
him on.

He talks about code being linear lines of text as though that's a bad thing.
We've pretty much been stuck with this as state of the art in our writing
systems for thousands of years, what would be your reaction if I suggested
everyone should watch videos instead of read books? It's a flexible and easy
way to represent a program that no other tool has come close to.

> We (programmers) are notoriously bad at advancing the tools in our field.

We've been trying to automate ourselves out of jobs for the entirety of the
history of the industry yet programmers are in more demand than ever. Everyone
wants to work on interesting problems and creating inner platforms is far more
interesting than writing boring business logic. Yet for all our efforts we've
barely progressed since the 70's, why do you think that is?

~~~
namuol
> What would be your reaction if I suggested everyone should watch videos
> instead of read books?

Videos are just another useful tool for learning; they don't obviate the need
for books, but they're better at conveying some ideas/information than books
alone.

Just like videos and books aren't mutually-exclusive tools for learning,
graphical tools and textfiles aren't mutually-exclusive tools for building
programs.

------
mirimir
OK, I could write and execute SQL in a terminal. But I'd rather use MySQL
Workbench. I get easy management of tables, views, etc. And flexible display
of results. Why would you not want the GUI?

~~~
sanxiyn
In the article, "GUI" seems to be used as an opposite to "programming".
Analogy would be SQL versus query wizard dialogue.

~~~
gpm
I think you're right that the author is using "GUI" as the opposite of
programming. I think he's correct with respect to nearly every GUI I've ever
seen, but incorrect with respect to what a GUI could do in principle.

There's no reason that your wizard dialogue couldn't be exactly as expressive
as SQL. In principle you could make a wizard that just built an arbitrary SQL
query and ran it.

I said "nearly" before, the reason is graphical programming languages, for
example unreal engine's blueprints. These are a "gui", and allow general
purpose programming. One could imagine a tool with a similar style of
programming, that extends them with inline data visualization and other tools,
that would both undoubtedly be a GUI, and have all the nice features of
programming.

I think the better compromise is probably something like jupyter notebooks,
but that doesn't mean a GUI couldn't do it. And maybe a better GUI exists that
I just haven't managed to imagine.

~~~
mirimir
OK, but consider MySQL Workbench. One types SQL in a query. Or loads a saved
query. And there's an execute button. But unlike working in terminal, the SQL
is still on screen after a failed run. And there's red markup pointing to
errors.

That's the advantage of the GUI.

~~~
eindiran
The environment used as an example in the article is RStudio which by your
definition of GUI would be a GUI (i.e. if MySQL Workbench is a GUI, so is
RStudio). I think you and the author aren't in disagreement per se, you're
just using the term differently -- when he talks about a non-GUI, he means a
tool which executes code that you type into it, and you mean a terminal.

------
nl
_He also suggests leveraging a programming language for benefits that include
reproducibility, data provenance, and the ability to see how data analysis has
evolved over time._

It's true that tracking data and data provenance is very important and hard in
GUI tools.

But it's not really that much easier in code either.

To be specific about the kinds of problems here, I'm thinking of things like
when errors are found in a dataset, new labels are introduced, or you want
multiple splits on the data, but you still want the old version to check your
metrics against.

Yes, you can do things like versioned directories for different data versions
(although this tend to break when you are talking about TBs of data).

Or you can try using traditional version control tools, but that involves
switching between code and your version control tool.

Or you can try transformation orientated programming, where you keep the
original version of the data and then always transform it to get to the new
version. This is slow on large data and fails when new information is
introduced.

Also, normal version control doesn't work well with modelling code, because
you want to use both the old and new versions of the code simultaneously.

Greg Brockman talked about this exact problem in the OpenAI/Ycombinator
podcast.

This is a hard problem to solve - not sure who is working on it.

------
goerz
I’m not quite sure what he’s thinking of as a “GUI” in this context. What is R
Studio? Are there actually people that use “GUI’s” for data science? Seems to
me everyone is using R or Jupyter notebooks, or plain scripting

~~~
mygo
exactly. the way I understood it, if you’re interfacing with a computer
monitor instead of shuffling around magnetic bits on the disk by sending
electric impulses directly to the write head, you’re using a graphical
interface are you not?

~~~
pwneduser
Pretty sure that's not how most people define GUI. You seem to be defining an
operating system.

~~~
mygo
Nah. Technically a graphical interface is an interface that is facilitated
through the use of graphics. Not all computer interfaces are graphical. Some
interfaces are audio/voice (such as Alexa and Siri, Screen readers for the
blind). Etc.

I think the point I was making is that “what is a GUI” is not objective. And
its colloquial definition may also evolve over time. The fundamental question
right now is, is text graphical? Some would say no. I’d say yes. What about
code highlighting? Code hinting? The buttons on your text editor? aren’t those
graphical? Of course they are. I’d argue that they are the best GUI for
crafting custom computer instructions.

------
projectramo
You can use a GUI for the following:

1\. Making a video game of a particular type (racing game, shoot em up, 3D
shooter etc)

2\. Do an analysis of a particular type (regression, ARMA analysis)

3\. Make a web page of a particular type (landing page, agency page, personal
blog)

What you cannot do is use a GUI to do a _new_ kind of thing that the maker of
the GUI had not considered.

And that is why you can't use it to do any kind of "science" which involves
experimenting to see if this new technique will work.

~~~
gramstrong
Phooey. This comes off as unnecessary gate-keeping. How does anyone do
science, ever? They use tools for measurement, tools for analysis, tools for
record keeping...all of these functions can be captured within a GUI. Doing
science doesn't ever require you to be doing something "new" or innovative...
in fact, most of the time you are applying age-old techniques. The only thing
new is the problem, and that's not dependent on what types of tools you are
using to solve it.

~~~
projectramo
I think you misunderstood me.

You can, of course, do science with graphical tools.

For instance, excel is a "graphical tool", and a lot of scientists use it.

But you can't do science on the field the tool is solving.

So you can use excel for biology, but data science is trying to trying to
experiment on the data technique itself.

------
fredley
Maybe the title should have been _I_ can't do Data Science in a GUI. There is
value in making high-level tools more accessible (and in the process gutting
them of some of their power). But maybe not for this author.

~~~
lionel-
Maybe you could do a "I can do data science in a GUI" talk and amaze us all
(addressing all points the OP made). Hadley has been working for years on
making data science programming with R accessible: development of expressive
domain-specific APIs, books freely available online, etc.

------
nicodjimenez
If the GUI had a terminal built in, then yes you can. The mistake is to get
rid of the terminal altogether instead of trying to augment it.

~~~
cup-of-tea
No, like other commenters you miss the points about reproducibility and
scrutability. This isn't just about the tool being efficient and precise for
the user, it's about tracking _exactly_ what you've done in your process and
nothing beats code for that.

~~~
talltimtom
Using a GUI doesn’t mean there is no code. SAS JMP lets you work on your data
visually yet you still have code that defines you tables and plots, and you
can easily work with git or hg.

~~~
dfmooreqqq
JMP was precisely my counterargument to this whole piece. JMP is an excellent
data science tool when used correctly and allows for reproducibility. You can
even call R from JMP if you need to do something that is easier in R.

Further, as others have commented, exploratory data analysis is much easier in
an environment like JMP than it is in R - when you're just playing around with
the data and trying to get a sense of it, it's much easier to make quick
approximate graphs in a GUI than in a command line.

------
cbcoutinho
One amazing GUI/visualization tool I use everyday for investigating
computation fluid dynamics simulations is ParaView[0]. At the very least, it's
a opensource GUI built on top of VTK used for visualizing CFD results,
developed by the incredible developers at Kitware (same company that develops
CMake). Under the hood, it allows you to separate your client from a
{data,rendering} server so that you can visualize large datasets from a small
laptop utilizing a client/server model.

Besides just data investigation using the GUI, all visualization workflows can
be automated either through their python wrapper, or directly through the C++
API. The fact that you can automate the slicing/dicing and post-processing of
CFD results on a server remotely using Python still blows my mind.

I wouldn't normally associate CFD with data science, but in a way, analyzing
CFD results is starting to require the kind of scale of big data, and can
certainly be done with the help of a GUI.

[0] www.paraview.org

------
raghavsb
The talk title is quite provocative but the material discussed not so much. It
is true that core data science is iterative, needs to reproducible and more
recently explainable. Can you do everything in GUI? It really depends on where
you see it from. Early data scientists were programmers before so they loved
to code and built tooling around it. While code is still dominant we also see
the rise of UI-centric tools - these allow you to build ML pipelines by
snapping blocks together. I feel are chasing a "different" type of data
scientist. The term data scientist itself has become quite broad.

Code, CLI gives data scientists infinite flexibility but setup, management
etc. is a challenge. GUI provides very less flexibility but you will have
output fast - works for simpler problems IMHO.

GUI or not data science has to move to cloud-based tools. Whether you write
code in browser or CLI on local machine is matter of choice.

~~~
cup-of-tea
> GUI or not data science has to move to cloud-based tools.

This is a pretty odd statement. Care to explain why?

~~~
raghavsb
Smaller datasets will work on laptop/desktop. For DL work with large datasets
you need to build a GPU workstation. Moreover there setting up environment and
dependencies on different hardware setups is not straight forward.

Cloud provides the flexibility of choosing the hardware, many open source
projects allow you to manage your dependencies and setup better. From 0 to
something, cloud is better than custom.

~~~
PeterisP
For quite many companies their whole "big data" dataset is small enough to be
processed by a single beefy machine. In that case it's far more cost effective
to simply plug in $1000 worth of extra ram in a workstation rather than spend
some extra engineer-hours to do it remotely.

~~~
raghavsb
Couldn't agree more on the data size. In most cases beefy machine work. Would
on-demand (cloud) make it simple?

Also beefy machine works for training jobs. But we need to deploy the models
too.

~~~
cup-of-tea
With the many container solutions available today it's incredibly easy to move
from dev to prod. You don't need to pay for a prod environment to do your
development just to avoid ever having to migrate.

~~~
anandology
Yes, it is incredibly easy, except when you upgrade to tensorflow 1.6 and it
fails with [a cuda error][1] and after couple of sleepless nights you realize
nvidia has deleted the docker image of cuda version 70xx from dockerhub and
you need to find the right commit that works from their git repo and build
everything yourself.

[1]:
[https://github.com/tensorflow/tensorflow/issues/17566](https://github.com/tensorflow/tensorflow/issues/17566)

------
anynym
[https://www.youtube.com/watch?v=cpbtcsGE0OA](https://www.youtube.com/watch?v=cpbtcsGE0OA)

Here's a version of the video with better sound.

~~~
zouhair
I never understood why people record in stereo when all is happening is people
talking. Stereo is awful for that.

------
tomrod
Hmm. I can, and do okay by it. But I do enjoy using the command line and
remain unimpressed with many of R's more "unique"* packages not found in
recent Anaconda distributions or in Julia, so perhaps I'm not this blog's
target audience.

* R packages with low use tend to have highly variable quality. YMMV

------
banku_brougham
The criticisms of the talk are valid, but overlook probably the biggest
counterexample to the (cool) tools presented here:

Tableau.

Its dominating many areas of data analysis, yet the terms of service prohibit
any scripting or API access to the formation of the XML that compose the .twb
workbooks. The user is limited only to GUI clicking, none of which can be
stored.

So I think its fair to say "You Can't Do Data Science in Tableau," even though
its a useful tool and well implemented as a reporting tool. To me it seems a
bit of a trap to build proficiency with Tableau and get locked into a closed
enterprise product, I'm seeing a lot of data engineers at my company getting
sucked into what is essentially a "tableau developer" role.

------
fullshark
Data Scientists need to understand that speed trumps statistical rigor for
basically 95% of the questions people have. GUIs bring speed.

------
jernfrost
Yes there is something fundamentally limiting with GUIs that I struggle with
trying to express in a comperhensive way to people.

You see with almost any task. I see it when working with IDEs e.g. You got
complex project management and there is always something you need to do which
the designers of the IDE never had in mind.

However GUIs make a lot of things easier to do, so I think the ultimate
solution is always something that allows you to mix and match GUI and
programming/plain text solutions.

I think the problem today is that we make these GUI behemoth tools, when it
would have been better with a collection of much smaller tools with more
dedicated usage.

~~~
iagovar
Knime is, maybe, what you are looking for. Sad the server version isn't open
source AFAIK.

------
rollulus
Any thoughts on the R vs Python at 2/3rd of the page? I have the feeling (but
no data) that the data science community is moving from R to Python, is that
correct?

~~~
pas
Yes. R is horrible.

~~~
thousandautumns
I personally love R and think it is superior to Python for about 90% of data
science related tasks.

~~~
pas
That's the thing. R is excellent at the statistics (the science, the
hypothesis testing, and the other tools - and CRAN has real gems), but
completely unusable for data manipulation (cleaning, filtering,
discovering/exploration, tinkering), and especially hostile for anyone who is
used to real programming languages. (No data pipeline for you, no easy
scripting for regular experiments/backtesting. No great APIs, etc.)

~~~
thousandautumns
> completely unusable for data manipulation (cleaning, filtering,
> discovering/exploration, tinkering)

Surely you aren't serious. Data manipulation is one of the areas in which R is
_vastly_ superior to Python.

------
xioxox
When I started my plotting app, Veusz,
[https://veusz.github.io/](https://veusz.github.io/), I expected I would
mostly be using a command line to drive it. As I improved the GUI, however I
found I was very rarely using the command line. GUIs can be very good for
exploratory investigation and plot manipulation. Veusz is scriptable, so you
get the best of both worlds.

------
CharlesDodgson
I agree with him completely, but I also know that most 'Data Scientists' are
just people who know bit of excel and have a head for numbers.

I think we just need to be realistic, if your job is to collate sales results
every month and create a presentation about something interesting the data,
your aren't really a scientist and you can probably just use Excel.

------
teirce
I have run into "Data Scientists" in the past that use exclusively Excel and
maybe some outstandingly awful SQL.

~~~
gaius
To be fair, any job involving data gets title-inflated to data scientist these
days. It may not be their fault!

------
elchief
Building prototypes in a GUI like RapidMiner is substantially faster than in R
or Python

~~~
murukesh_s
I don't understand why the down votes? It could be true for him, while it
could be the opposite way for others. It's all about your perspective. I have
said it before, but saying again:

Now imagine an alternate universe where there are no tools like
photoshop/illustrator invented (No GUI or mouse based operating environments),
we would have still created art through command line. Perhaps it would be a
sophisticated version of SVG that mostly people with development skills would
be producing with the input of designers who would be occasionally checking
the design and giving inputs. Now this process have decades of tooling to make
things better, like the same arguments we have on repeatability through
various tools like macros etc. Now imagine someone coming up with the idea of
a basic version of photoshop, we will most probably dismiss that idea. Very
few mainstream programmers would adopt such a tool (Enterprises would, as seen
with RapidMiner). That doesn't mean one day that would evolve into the
Photoshop that we see today and would totally eliminate development effort in
producing art.

p.s. edited for typos.

------
jancsika
> GUIs deliberately obfuscate the process as you can only do what the GUI
> inventors want you to do.

One of the problems with programming languages that begin with the letter "p"
is that they can be used to create viruses.

------
blakespot
Sure you can!
[https://www.flickr.com/photos/blakespot/29031630962/](https://www.flickr.com/photos/blakespot/29031630962/)

~~~
itomato
I was expecting Quantrix

------
baxuz
Wow. And what about
[https://beta.observablehq.com/](https://beta.observablehq.com/) by Mike
Bostock?

------
sah2ed
"You Can't Do Data Science _Effectively_ in a GUI" would have been a more
honest title, as that is the crux of the speaker's talk.

------
jejones3141
Thanks for the links to videos; the very light gray very thin letters on white
skate the edge of legibility.

------
skybrian
Only having read the summary, I wonder if he considers a digital notebook
(like Jupyter) to be a GUI.

~~~
tanilama
Notebook is code.

------
kome
SPSS or Stata are GUIs that help to write a reproducible code... are we really
arguing about that in 2018? Why so much élitist crap?

And btw - more general question - why computer scientists of the 70s (SPSS is
from 1968) had much better intuitions of what user needed that computer
scientist of today?

------
jgalt212
True, but you cannot do data science without a GUI.

------
trumped
You can do anything with the right GUI... So what he means is that there is no
proper GUI available?

------
djs070
The article doesn't seem to address the question in the title. Why would they
be mutually exclusive?

~~~
LisaG
Did you watch all of Hadley's video? You might get the title more if you
saw/see the whole talk :)

~~~
LisaG
Also, the article does state what Hadley's take on the question is: "As
Wickham defines data science as “the process by which data becomes
understanding, knowledge, and insight”, he advocates using data science tools
where value is gained from iteration, surprise, reproducibility, and
scalability. In particular, he argues that being a data scientist and being
programmer are not mutually exclusive and that using a programming language
helps data scientists towards understanding the real signal within their data.
"

~~~
rockybullwinkle
thank you!

------
poster123
I bet the female/male ratio is higher for users of Excel than for R or Python.
A majority of programmers are men, but the sex ratio of white collar workers
(a large fraction of whom use spreadsheets) is much closer to parity.

There is often discussion about how to have more women in field X. Opening up
field X to non-programmers is one way to do so.

~~~
skate22
I was once told not to change my source code purely to make my unit test pass.
Changing the field to push the ratio towards 50 50 seems similar. I think the
real question is: how do we get more females to want to learn to program.

~~~
hood_syntax
100% this. If people want to push diversity initiatives, go for it, but I
think there's no way that leads to real change without starting from the
ground up i.e. cultivating childhood interest. Teaching about important women
in CS helps break down the cultural expectation of programming as a male
thing. In addition, although this is just speculation, I imagine workshops
where the participants work to build something practical rather than purely
"fun" would appeal to girls. A note-taking or organization or journal or
instant messaging application would be great. Something you could use instead
of pandering to stereotypes. Maybe I'm wrong about the workshop thing, but I
stand by the childhood interest part.

------
paradroid
This is really stupid. You can do anything in a GUI.

~~~
ktpsns
In general, you can do in a GUI what the programmer planned to enable you to
do. As was mentioned previously in this thread, programmers come up with some
sort of "graphical programming" frequently, recreating the
[https://en.wikipedia.org/wiki/Inner-
platform_effect](https://en.wikipedia.org/wiki/Inner-platform_effect) \--
Which brings us back to the plain text language of programming and processing
where all this is already available.

~~~
paradroid
The deeper abstractions are lost on you, young padawan.

~~~
ktpsns
If this is criticism, I don't understand it.

------
lottin
There's no such thing as data science. It's applied statistics. Why invent a
silly name?

~~~
tomrod
Because "The person you hire into your business to be the crazy mad scientist
in the corner who helps with decision making, business development, analysis,
and occasionally reporting" is to long for most standard business cards and
even most respectable email signatures.

~~~
jsilence
I'd love to have a bcard with the official job title "mad german scientist".
that would be groceartig!

