
Banning exploration in my infovis class - mdlincoln
https://medium.com/@eytanadar/banning-exploration-in-my-infovis-class-9578676a4705
======
soyiuz
Whoa, I cannot disagree more with the premise and the conclusions of the
author.

Exploration is absolutely one of the key goals of data visualization. Tukey's
insight was two-fold: First, that statistics puts too much emphasis on
confirmation but not enough on systematic exploration of a data set. Too often
researchers do not understand their data. They bring their own biases and
preconceived notions, where they should be listening to the data.

Second, too often is visualization mistaken for confirmation. A curious
pattern or an outlier may just be an artifact of the layout algorithm. The
"find" part of visualization should happen through mathematical insight.
Graphics can at best describe the underlying mathematical reality but no more.
One cannot strictly speaking "find" anything, only form intuitions or
illustrate already proven insights.

Perhaps the author's difficulty in evaluating his students work lies not in
the exploration part of their assignments, but in his own pedagogic emphasis
on "tools," "frameworks," and "users." None of those things are relevant to
data visualization as such. They might be goals for a business built around
data visualization (to produce tools, or to identify user needs). A university
should offer more than job training. Those interested in users and in "what
one gets paid for" would do better in an internship or at a more narrowly
technical trade school.

I don't know how "finding" is any more of a goal for data visualization than
"exploring." Data visualizations tell stories. They often support the first
and the last step in data analysis: the exploratory phase and the presentation
of findings. They are inherently subjective, evocative, concise, artful.

~~~
shawnhermans
This whole conversation is frustrating because it is boiling down to a stupid
semantic debate. The author is claiming that people don't get paid to explore
data, they get paid to find things. IMHO, this statement doesn't even make
sense. When I explore data, I almost always find something. This something
might not be useful to an "end user," but it is almost always useful and
necessary.

Sometimes, the only thing I find by doing exploration is that a particular
dataset is absolute garbage and shouldn't be used for any purpose. The only
way I find stuff like that out is if I explore the dataset.

~~~
coldtea
> _The author is claiming that people don 't get paid to explore data, they
> get paid to find things. IMHO, this statement doesn't even make sense. When
> I explore data, I almost always find something._

People are not paid to find "something", they are paid to find specific
things.

Hence, the following makes even less sense that TFA:

> _This something might not be useful to an "end user," but it is almost
> always useful and necessary._

~~~
bicubic
In reasonably sized datasets, you'll typically find a lot of interesting
information and relationships that are only loosely or not at all related to
what the analyst is actually paid to do at the time.

Analysts who only find the _specific thing_ and end their work on that are a
dime a dozen, and need to be micromanaged. Good analysts will find all the
other interesting stuff on their own and inform the business about it. Those
good analysts are the explorers, and banning those people form exploring
during training seems like an effective way to take talented budding analysts
and turn them into mediocre ones.

~~~
TeMPOraL
In reasonably sized datasets, you'll also find a lot of spurious correlations
simply by chance. That's one reason in science you're supposed to write down
your hypothesis _and_ methods of analyzing data _before_ touching the data.
Otherwise you risk finding some random noise and thinking it's important.

------
pavlov
This advice is also worth thinking about for anyone who builds -- or is
tempted to build -- open-ended software tools.

I've often made the mistake of emphasizing exploratory UIs in situations where
the reality is that >95% of users are looking for specific solutions, rather
than the chance of dicking around with the generic toolset that I personally
found captivating.

It's really hard to step back from "I want to share this exhilarating
exploration with everyone!" to "I'll narrow this down to a specific use case,
and leave the rest of the potential for another day."

~~~
deevolution
Couldnt agree more. Also I think that defending ones project by saying its an
exploration is often just taking the safer, easier route.

------
simonsarris
Great article and applicable to way more than just infovis.

> In denying the student the ability to frame their main task as exploration,
> they are forced to concede that what they want to find is not what their
> end-user may be looking for and then: (a) engage with their client or
> “create” a reasonable one with real tasks and decisions, (b) understand the
> data and tasks much more deeply, and (c) identify good validation strategies
> (no more insights!).

> Maybe this is obvious, but when I started teaching I thought that being more
> open-ended about what I allowed was better. That somehow it would lead to
> more diverse, cool, weird, and novel projects. In some ways that’s true, but
> as I’ve argued elsewhere, teaching infovis is itself a wicked design
> problem.

Its not just infovis, I think this is a good teaching idea in a very broad
sense. Reading other people's writing nowadays, people are apt to be very lazy
with their language and this by consequence makes them lazy with their ideas.
Infovis should ban "exploration." Architecture should ban "modern." Career
centers doing resume reviews should definitely ban "utilize." I'm sure every
field has such tropes that are maybe useful in the real world sometimes, but
make for quite lazy school projects.

In fact I think he may be going a little overkill in justifying his ban on
"exploration". He doesn't need to talk about weighing the pro/con here, if
before he had a paucity and now he has a multiplicity of interesting student
results, he's won and they've won. "Surprise" be damned. The kids can buy
lotto tickets if they want to be surprised by big data.

For being so simple, restricting the common and obvious in classrooms is
probably an underrated technique. This is widely done in photography classes,
disallowing students from doing certain things, like making them only use film
for a while, so they really have to frame photos and can't just snap 1000 and
"discover" one good one, or restrict to annoyingly wide prime lenses on an
assignment to take some good portraits, etc. These constraints, even though
they are constraints, greatly reduce the samey-ness of results and make the
students engage their brains.

------
peteretep
Reporting software thoughts:

a) Whether or not it's pretty and looks interesting will be enough for first
year sales unless you have a particularly sophisticated buyer

b) If pretty and looks interesting are the only things it does, you'll get
slammed at renewal time because no-one used it

c) You need to know _what behaviour your users_ will change based on the tool.
They may not know that yet. That's a pretty good sales pitch though.

d) If you're particularly cynical, find a way to give users a magical number
they can change based on behaviour, that doesn't actually mean anything. cf:
"Klout influencer score". The more opaque its calculation, the better. Add a
slightly random element to how it's calculated so that users build
superstitions about how it's calculated. Allow their boss to easily run
reports on your users's magical score, and include rankings.

~~~
Fiahil
Congratulations, you just described the advertising industry.

------
anigbrowl
_The line that I use on my students is that: No one is paid to explore,
they’re paid to find._

You're a teacher, not an employer. It's not your job to tell people what to be
interested in. This is exactly the wrong attitude to bring to any scientific
endeavor, but I guess you don't get tenure for letting your students dick
around on their own.

All I can say is that heavens I didn't run into anyone like this when I was
first learning technology.

------
bane
I think this is excellent for education, because of the side benefits that
comes with it...like thinking more along the lines of, and learning about,
what the users are doing with the tool. It's also somewhat contrary to the
state of lots of education, which often focuses on generalizing as a way of
understanding rather than finding specificity.

However, in real-life, figuring out all of the specific ways in which a user
might want to use your tool can be mind-bogglingly difficult. In many cases,
the specific few use-cases works, which means you can often just automate away
most of the infovis stuff and just get to the result.

But when you have more than just a handful of these use-cases, or the use-
cases are not well enumerated, the generalized approach can work better, and
that approach for infoviz is often "exploration". In fact, building the
generalized tool is often a good way at discovering the more specific use-
cases for later rethinking, simplification and capture-in-code.

In that sense, the exploration infoviz tool can act as kind of a meta-
exploration tool for figuring out what your users really need when they aren't
otherwise able to articulate it.

------
xg15
When I read the headline and first paragraph, I thought I couldn't disagree
more - though when continuing reading I actually grew sympathetic. As a
student I've actually found myself "exploring" data in the "meandering" sense
quite a few times - trying to find "interesting" patterns without a clear idea
what "interesting" means or whether what I'm seeing genuinely constitutes a
pattern. Such tasks started out kind of exciting but quickly became incredibly
frustrating. So if that guy demands a bit more rigor in order to avoid this
situation, more power to him.

That said, I think he states in his article something that I see as a core
didactic problem without identifying it as such:

> _The student is often engaging in “exploration” for the purpose identifying
> patterns that influence their design. They are often missing background
> knowledge and develop it in this step. But this is not “exploration” for the
> analyst who may already have a mental model of interesting and uninteresting
> patterns._

So he is expecting students to make a tool suited for the mental model of an
expert even though the students have no idea what that mental model should be
(and without giving them any hints what that should be). If some motivated
students try to derive those tools for themselves on-the-go, he'll permit that
in a fit of generosity.

If the problem he has diagnosed is that the students don't know enough
rigorous definitions and techniques to find patterns, maybe the curriculum
should focus on teaching those instead of going a step further and asking them
to build a visualisation tool based on that non-existent knowledge.

------
randcraw
Doesn't this boil down to the classic question of whether a sensible approach
to discovery can be serendipitous as well as hypothesis driven?

I work in a big pharma, and long ago I learned that our biologists and
chemists had absolutely no patience for inquiry that lacked a basis in the
purposeful exploration of mechanism of action (hypothesis). Without a guiding
principle, the number of possible (and meaningless) patterns tends to explode
combinatorically and there's not enough time in a dozen lifetimes to test all
the nutty proposals your computer can generate in a microsecond.

Isn't that what the author is suggesting? If not, and exploration via random
walk _IS_ in fact a productive practice that's worthwhile, then what's the
rub?

~~~
TeMPOraL
Yes, I think that's exactly the insight that underlies this article, and
that's why I think it makes sense. Unguided "exploration" coupled with large
datasets leads wasting time on finding nonsense.

That doesn't mean flexibility in a tool isn't warranted - it's just that such
flexibility should have a goal, and people should be discouraged from
aimlessly applying it _and then_ claiming they found something important.

(Also "exploration" is kind of a weasel word for student assignment; you can
call your half-assed project "exploratory" and spin a good story from it that
will give you a grade without any real work on your part.)

------
mifeng
This is great advice for anyone building a feature that utilizes
visualization.

I wish I had asked myself "what do users want to find?" instead of "what do
users want to explore?" the last time I built a dashboard. Perhaps people
would have actually used it.

~~~
TeMPOraL
That's the reason I consider most contemporary dashboards to be useless and
missing the point.

The goal of a dashboard should be to give users insight, not to show them
pretty pictures of moving lines and pies. You can find a lot of nice-looking
dashboards on-line, for which the authors didn't even stop and consider _why_
they're building it in the first place. It's easy to spot them, those are ones
with pretty but unlabeled graphs, with plots missing error bar, with pie
charts and various forms of "chartjunk".

It's hard to make a good dashboard, because to do that you need to figure out
_what questions_ will the user want to answer with it.

------
zitterbewegung
So I took a infovis class and I think what the author means is that your
visualization should have some objective utility. Exploration is good in a
dataset as a part of design. But if you are doing some visualization for like
th nytimes or five thirty eight you need to have the skill to communicate
something useful to the user.

Exploration is a tool but giving people utility to actually use your
visualization is more important. You should allow exploration but it shouldn't
be the only thing that people will use your visualization for.

Link to the class I took
[https://www.evl.uic.edu/aej/424/](https://www.evl.uic.edu/aej/424/)

------
rcthompson
I more or less agree with this. I do lots of "exploratory data analysis", but
almost every plot or other output that I generate is designed to help answer a
specific question or test an assumption that I have about the data.

------
latently
The word explore is actually great in a data analysis context. The notions of
exploratory vs confirmatory analysis are widely used, and exploratory means
exactly what your students think it means. Just make sure they don't explore
all of the data at once, otherwise they will have to go collect more so that
they can confirm what they found when they were exploring.

------
coding123
We are going to call this place... Yosemite, but hey, everyone don't go about
exploring this place, we're just here to count how many deer are in the woods.

------
lup874
All I can say is that data issues are often discovered during the data
exploration process. But why would the author even care about data quality.

