Github now supports rendering tabular data

jackmaney · on Aug 22, 2013

"git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important."

--Linus Torvalds

(source: http://programmers.stackexchange.com/questions/163185/torval...)

hoka · on Aug 22, 2013

Along the same lines, a quote from 1975(!):

"Fred Brooks, in Chapter 9 of The Mythical Man-Month, said this:

Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html

malingo · on Aug 22, 2013

I'm getting off topic, but this quote brings to mind Dijkstra's remark that "our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed."

Does anyone know whether this has indeed been shown to be true?

jmduke · on Aug 22, 2013

The money line:

Building software is about more than code.

GitHub's ably and subtly positioning themselves as attractive for a much wider audience than programmers.

toomuchtodo · on Aug 23, 2013

Programmers are data manipulators, no? Github is about streamlining data manipulation (storing your code, helping you with CSVs/GEOJSON). They're just lowering the bar to make data manipulation more accessible.

One day, everyone is going to be a programmer. Is everyone a professional driver? Not at all. But most people do know how to drive a car.

MartinCron · on Aug 22, 2013

Or making their product more attractive for programmers who do more than just write code. Either way, I think it's brilliant.

picardo · on Aug 22, 2013

Unfortunately, Git isn't a good way to collaborate on massive csv files for two reasons a) git stores changes at the file level, rather than row level, b) for massive datasets, you quickly run into file size limits. That's why I am looking forward to dat[1], a collaborative DVCS that's going to blow Git off the water for this kind of tasks, even if it achieves only half of its objectives.

[1] https://github.com/maxogden/dat

dchest · on Aug 22, 2013

git stores changes at the file level, rather than row level

It applies delta compression when storing blobs in pack files.

rpedela · on Aug 23, 2013

git does not handle large files well. Actually none of the open-source VCS do a good job. I think that was picardo's main point.

zhemao · on Aug 23, 2013

This isn't precisely true. Git stores changes at the "blob" level. Large plaintext files (like a CSV) are generally split into several blobs. You are correct that git works badly for binary files, like images and compiled executables, But plaintext files like CSVs should be OK.

polskibus · on Aug 22, 2013

In next iteration of this feature, please provide an option for sticky column and row headers - ie. scroll table down with headers in place. Such solution helps inspect large files with many columns. Surely you can do better than rendering a standard html table?

jloughry · on Aug 22, 2013

I second this motion. Pull request, anyone?

devcpp · on Aug 22, 2013

Where to? Do we have access to the source of this?

On that topic, I have never found much of Github's source code, which is extremely hypocritical, and makes me fear the day we'll want to move away from them. I'd be glad to be wrong and find out that I had just never looked in the right place.

BHSPitMonkey · on Aug 23, 2013

That's not hypocrisy in the slightest. It would be if Github made claims like "we think companies should open-source all of their product's code", but they would never make such a claim. In fact, their business model is selling features to people who want to keep their code private.

Against your wishes, maybe; Hypocritical, no.

nonchalance · on Aug 23, 2013

As discussed by a founder of github: http://tom.preston-werner.com/2011/11/22/open-source-everyth...

skeoh · on Aug 22, 2013

GitHub is proprietary. All of their open source projects are at https://github.com/github/.

j4mie · on Aug 22, 2013

It would be nice if GitHub added better support for column-oriented data in diffs, too. This article shows how it could work, with an example implementation in a fork of GitLab: http://theodi.org/blog/adapting-git-simple-data

jdorfman · on Aug 22, 2013

I love this company. They just get it.

erkose · on Aug 22, 2013

If you ignore the new UI.

meowface · on Aug 22, 2013

The UI in general isn't too bad, but the new arrangement and location of things is definitely a bit off.

(And yes, I know arrangement is like 80% of UI, but you get my point.)

hawkw · on Aug 23, 2013

The fact that I have to click on something to see language breakdowns is a problem. The fact that I have to refresh the repo page to make the language breakdowns go away is a bigger problem.

roycoding · on Aug 22, 2013

A tabular/graphical JSON explorer would be interesting as well.

kyle_martin1 · on Aug 22, 2013

I totally agree. This the next logical step.

pit · on Aug 22, 2013

Is GitHub turning into an operating system?

drhodes · on Aug 22, 2013

Not sure, but it might be turning into an IDE. They've got a good part of it in place. Something like FP Complete: https://www.fpcomplete.com/

pearjuice · on Aug 22, 2013

Bracing myself for the Github IDE. If they do that right, any browser can become your development machine.

nonchalance · on Aug 22, 2013

Hopefully they'll add sorting soon

thedays · on Aug 23, 2013

1) Agreed - allowing users to click on a column heading and then sort it by ascending or descending values would be killer.

2) Adding embed codes for these tables would also be incredibly useful for news organisations and non-profits.

danso · on Aug 22, 2013

A massive amount of time and energy is wasted by news organizations on the question of how to show tabular data. This is an incredible feature, and one I'm assuming can be hackable/extended in various useful ways.

Anyone stress-test it with bulk data yet?

7952 · on Aug 23, 2013

It would be good if browsers had native support and rendering of SQLite in the same way that Chrome works with PDFs.

chrisseaton · on Aug 22, 2013

This is great, but their basic text editor isn't usable for me for what I think are pretty reasonably sized files (170kb of Markdown). This doesn't have to be because the editor they use (Ace) works fine with it, so I don't know what they're doing to it. A tech support person said it was only there for small files, not for serious editing.

I was working on a project and wanted non-techies to edit Markdown for me on the site, being able to preview the results, but couldn't make it work on these files.

So I wish they'd fix that rather than more editors for other file types.

gjtorikian · on Aug 22, 2013

Can you link me to such a file?

As a former developer on Ace and now employee at GitHub I've honestly never run into a problem. I can't promise anything but I'd like to see what you mean.

BHSPitMonkey · on Aug 23, 2013

I just tried this with a (very superficial) test case[1] at 189KB and the editor seemed fine, both for editing source and generating previews. I even used the editor to double the file's length, saved, and reopened without problems.

[1] https://github.com/BHSPitMonkey/Large-Markdown-Test/blob/mas...

chrisseaton · on Aug 23, 2013

I've sent you an email with the exact file. Cheers.

dbond · on Aug 22, 2013

Have you tried http://prose.io/ ?

endergen · on Aug 23, 2013

Wow, works great on my iPhone. Love it!

hcarvalhoalves · on Aug 22, 2013

Cool feature. What happens on gigantic files? Does it limit, paginate, lazy-load?

gjtorikian · on Aug 22, 2013

The limit is 600kb; I'll have to publically document that. Only 3% of GitHub's one million CSV files go over that limit.

As you might imagine, shoving in that many rows into the DOM kills any browser. G

jmduke · on Aug 22, 2013

I'm actually kinda surprised that the number is that high.

Part of me wants to make a stumble-upon that grabs a random csv from GitHub. I feel like there's probably some really interesting data waiting to be explored.

sillysaurus2 · on Aug 22, 2013

Please do it!

acdha · on Aug 22, 2013

I'm tempted to say you leave an enable option and report it to the browser teams as a performance challenge – it's the kind of thing which should work better than it does.

I noticed that more than half the time in Chrome isn't HTML parsing but rather layout recalculation - I suspect you'd be able to avoid much of that if you were able to set some sort of min/max width on the columns and containing table with overflow:scroll-x or hidden. Perhaps set a fixed width when the server sees the size is large and/or by measuring the rows while rendering the template?

erichurkman · on Aug 23, 2013

I believe you would have to set hard widths to avoid expensive reflows (layout recalculation). Min/max width are much more expensive, particularly in a very tall table.

acdha · on Aug 23, 2013

That's likely - I wasn't sure whether anyone implemented the optimization of stopping as soon as you've maxed out a fixed width table but you could certainly do something similar by hand when rendering the page and set fixed widths based on the ratio of column sizes.

mjs7231 · on Aug 23, 2013

In similarly large tables, I found if you shove about 1000 rows into the DOM at a time inside a setTimeout(0) loop, most browsers behave quite ok.

cdcarter · on Aug 22, 2013

I've wanted this for so long in my docs. It's so easy to build a doc ecosystem around my code on github between rendering md, and shoving YARD output into gh-pages!

anonymoushn · on Aug 23, 2013

It looks like this (ruby's CSV library) defines CSV to be some unknown superset of RFC 4180, so just following the RFC should get this feature to work.

eranation · on Aug 22, 2013

I was wishing for this for a while, but didn't imagine they'll actually do it. This is awesome (Now all I need is to have it editable...)

minimaxir · on Aug 22, 2013

This does not appear to be retroactive, after looking at .csvs in my old repos.

gjtorikian · on Aug 22, 2013

It should be. Can you provide a link?

kasbah · on Aug 22, 2013

Mine still doesn't work:https://github.com/kasbah/nomech_mini-hw/blob/master/nomech_...

dxm · on Aug 22, 2013

I've had another look, GitHub is showing a nice little message: "Hey, did you know this file could be beautiful and searchable if this error is corrected? Missing or stray quote in line 1"

dxm · on Aug 22, 2013

That's not a CSV file, it's tab separated (I think, I haven't actually checked whether those are tabs), you will need to have the right extension for the contained data for it to work.

gjtorikian · on Aug 23, 2013

Indeed. I made a PR for the original poster: https://github.com/kasbah/nomech_mini-hw/pull/1

minimaxir · on Aug 22, 2013

https://github.com/minimaxir/aggregate-data-from-likes-of-fr...

The 2 .csvs there display as plain text.

EDIT: Both work now.

coherentpony · on Aug 22, 2013

Great. Nested comments, anyone?

ape4 · on Aug 22, 2013

Is the viewer open source?

gjtorikian · on Aug 22, 2013

The viewer is just Ruby's CSV library, iterated within an ERB.