Hacker News new | comments | show | ask | jobs | submit login
How a New York Times Software Engineer Ended Up Covering Miss America (nytimes.com)
111 points by catacombs 3 months ago | hide | past | web | favorite | 38 comments

> There was no database, but the best databases aren’t handed to reporters. They’re handmade. I opened a blank spreadsheet and started digging through years of old news clips and pageant websites. I tracked who was competing where, what titles they were winning, and how.

The more experienced I get in journalism and data work, the more convinced I am that the spreadsheet is the first and best tool for the kind of bespoke, flexible, and iterative data collection and modeling that journalists find themselves having to do -- i.e. building a dataset from scratch.

I only discovered queries in Google Sheets two days ago and it was like magic / “why didn’t I know about this all these years” https://support.google.com/docs/answer/3093343?hl=en

The Sheets Database functions (DGET, DSUM, DAVG, etc) are also really nice, especially after you get familiar with the format for inlining an array of conditions.

wow, that is useful. this is the first time I have seen this type of query capability in a spreadsheet

Excel has the nasty idea to modify what you type in the cell. If it seems date-like or number-like, it will convert the content, possibly corrupting interesting details in the process.

You can of course counteract this with a quote or with formatting, but you have to remember to do this consistently.

I understand why they do it for a spread sheet: It is optimized for calculating. But this tendency makes excel untrustworthy for general processing.

Surprised to see so much support for google sheets over excel.

Of course, all the aforementioned issues with excel can be avoided or changed in settings.

Has the pendulum really swung so far to the other side that sheets is the consistently better option? What about keyboard shortcuts and power users?

Sheets isn't consistently the better option, but it does shine for specific use cases. The three main reasons I'll reach for Sheets is:

- Heavy text munging. The builtin regex functions have no equivalent in Excel without dropping down into custom

- Quick and dirty data pulls and third party integrations. Especially any that involve Google services (since the auth process is less painful). I used to use both VBA and App Script pretty extensively, and could switch between the two without much issue. But I use neither as frequently as I used to, and it's significantly easier to ramp back up and ship when I use App Script rather than VBA. Excel supports Javascript now, but are only usable if you can ensure it's going to be used exclusively in newer versions of Excel.

- Sheets I create for others (especially shared by multiple users) that I know I'll need to maintain and support later on. The access controls, change auditing & revert capabilities, and standardized/centralized execution environment all remove entire classes of support needs and the associated cognitive overhead when triaging issues.

That said, there are certain times I prefer to use Excel.

- Pivot charts are amazing and have no equivalent in Sheets.

- Pivot tables are far more powerful than their Sheets counterpart.

- PowerBI is fantastic (except for the lack of Excel for Mac support. Which can still view the results of PowerBI, but can't do any editing).

- When connecting to internal data sources. Getting access to data sources, for business teams, is a royal pain in the ass. Legitimately so, since it's rare for a business team to have a resource with a technical enough skillset to truly be trusted with direct access to anything. Getting access that's reachable outside of the intranet (where App Script would run) is virtually impossible.

I believe Sheets does have keyboard shortcuts.

And in response to your comment, Excel is still the de-facto tool for people in analyst and non-technical positions.

For some people, Google sheets was their first spreadsheet and people tend to stick with what they know.

There's a gene called Septin-6, abbreviated Sept6. You can tell when gene expression data has been through an Excel cut/paste cycle, because Sept6 been converted to September 6. Oh Excel, you think you're so smart.

My last name consistently gets autocorrected from ‘teh’ to ‘the’ and my name tags for conferences get misprinted.

I bet there is a space there for something between R and Excel. Something where everything is "as-code" in a scm friendly format but the primary interaction could be equally powerful in a text editor or a cell space.

The "workbook" path of things like Jupityr is close.

I hear good things about spreadsheet functionality in org-mode.

I use Orgmode. A lot. It's my primary organisation method, as the General Manager for a small 3 person sub-company, as well as my household organisation.

I've always, unwittingly, organised myself via two systems:

  1. Lists
  2. Matrices 
I love the tables in Orgmode [0]. They are intuitive, simple, and just beautifully done.

That said, the spreadsheet functionality is a different beast altogether. Even though it's the same interface as the tables, it adds an exponential layer of complexity, and separation from how most people view a spreadsheet. It's powerful, but a bit daunting, and I don't see many journalists being able to grok it.

[0] https://www.youtube.com/watch?v=JHKrTsiz4JU&index=25&list=PL...

I just discovered that you can actually use Org's Tables as R data frames, and all sorts of other R actions from within Org. [0]

So that's quite amazing.

[0] http://ehneilsen.net/notebook/orgExamples/org-examples.html#...

I'd love a little spreadsheet widget for Jupyter that was only for data entry (i.e., a formula-less spreadsheet) and produced a Pandas dataframe.

You can do this with xlwings![0] Would highly recommend.


[0] https://www.xlwings.org

I don't believe that's what the poster meant, which you're replying to. xlwings is a python library used to connect to excel files and e.g. change their contents programmatically. I don't think you can use it (out of the box anyway) to display and change data 'as-a-spreadsheet' in a jupyter notebook [e.g. by showing an iframe].

Yeah, unsure about what the parent may have intended. I use it as a data IO for stuff that I want to manipulate in Python somewhat-interactively, by having an Excel sheet open on the side as I work in Python.

Jupyter lab has a spreadsheet editor. I haven't actually used it so I'm not sure if you can calculate with it, but I'd be mildly surprised if you could.

I wonder how things would be different if HyperCard were still around.

HyperCard was a fancier PowerPoint.

I was deep into HyperCard at the relevant time.

I'm using LiveCode for my one-time HyperCard stacks (http://livecode.com/) (was called Runtime Revolution for a few years).

The methods may be different (using a search engine instead of a card catalog), and the locales might have changed (using an online data store at your desk instead of going to a Library), but one thing that hasn't changed is that lots of information doesn't exist in structured form.

This reminds me of one of my favorite stories.

Mariel Padilla who was a student won a Pulitzer for her help in creating a database as an intern that was essential in keeping track of the 24/7 opioid crisis reporting that the Cincinnati Enquirer was doing.



As I remember she heard about winning the award during a journalism class. Quite a nice story, and she did outstanding work.

She was part of a team that won the Pulitzer. She didn't win one herself.

I think of winning a Pulitzer in the non-individual categories like a film winning an Oscar in the non-individual categories. It is a grand achievement that was made possible by the efforts of many. A non-individual award at the Pulitzer level is still a remarkable achievement.

I don't know why, but I was kind of surprised when I got to the end. I was expecting more... some sort of conclusion. Other than that, yeah, use the best tools available and especially if you are getting data in an async manner, a cell-based direct edit tool like Excel is great.

You're not alone. The story the author is talking about -- https://www.nytimes.com/2018/09/10/style/miss-america-2019-p... -- doesn't include much in terms of data analysis nor graphics to visualize it.

The story was interesting but not one that is data-driven, like other Times pieces.

The Insider essay was just hype: "Look at me! I'm a software engineer covering a beat! Go me!"

Yeah, felt like a Richard Linklater film to me: "What, that's it? So you entered some data into Excel. And...?"

I watch sports a lot. A well executed shot or play is a delight to watch. Can someone explain why Miss America or a beauty pageant is interesting in 2018?



Not to be rude, but I was expecting something more from this story than "I filled up a spreadsheet and talked to someone" :/ maybe I'm missing the point of this story?

No, you're right. That is literally the entire point of the story. People on Twitter showered the author with praise. I mean, filled out a spreadsheet and talked to people, as you said, but there wasn't really that much besides that.

In addition, the main story doesn't seem to include of her data.


Keep the low effort joke posts on Reddit. And if you had read the article first, you'd know the software engineer is a she.

Substituting uncover for cover is not even required to make a sexual joke: see acception 6a in https://www.merriam-webster.com/dictionary/cover

Nosql and a jupyter workbook?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact