
Agate: A Data Analysis Library for Journalists - benderbending
https://source.opennews.org/en-US/articles/introducing-agate/
======
minimaxir
It's worth noting that for journalists, analyzing data is only half the
battle.

Sites like FiveThirtyEight and The Economist usually have separate graphics
departments who use nonstatistical tools like Illustrator to annotate and
apply custom theming. Good visualization is an huge part of a persuasive
argument, and so being able to do both is important (and languages like R have
good native plotting as well)

Additionally, looking at the agate code Jupyter notebook, it appears that the
processing syntax is very, very similar to pandas (despite the warning against
it) aside from the print_bars method, so I'm confused about the specific
utility of the module.

From the post comments, after someone else noted the similarities too:

> _You 're right, most of my problems with pandas are not in its interfaces.
> My problems there are with the overhead of the numpy dependency, its
> confusing handling of text, nulls, etc. (inherited from numpy) and its
> documentation aimed at advanced users rather than beginners._

~~~
staringmonkey
Hello! Author of the library here. Just want to point out that I am a
journalist and very active in the data journalism community. (6+ years) Both
the sites you name-check have journalists who do production online graphics
that don't go through the traditional Illustrator workflow and news
organizations are increasingly discarding that antiquated pattern. (I've made
a hundred graphics for NPR and I don't even have a copy of Illustrator
installed.)

To your second point, that's fine. The most common feedback I've gotten is "I
don't see what purpose this serves that X doesn't already fulfill!" Well okay
then, you don't gotta use it. But given the fact I've done this job for years,
working with the very folks who it's targeted at, I think it's probably safe
to assume I've got some reasons. (Which you will find enumerated in the blog
post and documentation.)

~~~
JPKab
hey man, good work on agate. I like it.

I actually used to live right near the NPR offices in Crystal City.

You might know a pythonista I know who used to work there and is now doing a
machine learning start-up. (his name is Greg)

------
huac
I don't think the syntax is _that_ much nicer than dplyr in R (thank you Based
Hadley). But the approach (focusing on less-technical users and reducing
headaches) is certainly good.

I do really like the graphs being printed in console. Is this common
elsewhere?

------
cmiles74
Where I work, we do a lot of projects where we are replacing some aging and
wacky system (i.e., FileMaker Pro, Access, old and ignored SQL Server 7, etc.)
Our project managers might find this tool helpful, doing the data analysis in
the wacky system is pretty specialized. Dumping that data to CSV and looking
at it through a tool like this seems like it'd be a big time saver.

------
devty
is there any reason for the emphasis on "journalism" other than the fact that
the author of the library is a journalist?

~~~
staringmonkey
Journalists have some problems that tend to be somewhat peculiar to their
jobs. Some examples:

* A mix of heterogeneous and often internally inconsistent data.

* A lot of data that is categorical, free text or otherwise non-numerical.

* A need to be robust that is not always accompanied by the time necessary to become an expert programmer.

I'm sure some other folks have these problems too, though I can't think of any
other industry where folks would touch as diverse a range of data as we do.

If works for other niches, great! But I'm a journalist and I had journalism
problems in mind when I built it. I can't speak to the needs of folks in
science, finance or what have you.

~~~
bitdeveloper
I'd throw history into the ring for something that would be similar in variety
of sources, types of data, etc.

I'm looking forward to checking out the library!

~~~
staringmonkey
Yes, good point! There's a lot of crossover there and also with digital
humanities folks.

~~~
mcburton
+1 for digital humanities folks. Your emphasis on well written documentation
is a strong argument for agate over more powerful, but more confusing, data
processing libraries. I'm already thinking about using agate in my digital
humanities workshops!

