Hacker News new | past | comments | ask | show | jobs | submit login
Github now supports rendering tabular data (github.com/blog)
200 points by makeramen on Aug 22, 2013 | hide | past | favorite | 57 comments



"git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important."

--Linus Torvalds

(source: http://programmers.stackexchange.com/questions/163185/torval...)


Along the same lines, a quote from 1975(!):

"Fred Brooks, in Chapter 9 of The Mythical Man-Month, said this:

Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html


I'm getting off topic, but this quote brings to mind Dijkstra's remark that "our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed."

Does anyone know whether this has indeed been shown to be true?


The money line:

Building software is about more than code.

GitHub's ably and subtly positioning themselves as attractive for a much wider audience than programmers.


Programmers are data manipulators, no? Github is about streamlining data manipulation (storing your code, helping you with CSVs/GEOJSON). They're just lowering the bar to make data manipulation more accessible.

One day, everyone is going to be a programmer. Is everyone a professional driver? Not at all. But most people do know how to drive a car.


Or making their product more attractive for programmers who do more than just write code. Either way, I think it's brilliant.


Unfortunately, Git isn't a good way to collaborate on massive csv files for two reasons a) git stores changes at the file level, rather than row level, b) for massive datasets, you quickly run into file size limits. That's why I am looking forward to dat[1], a collaborative DVCS that's going to blow Git off the water for this kind of tasks, even if it achieves only half of its objectives.

[1] https://github.com/maxogden/dat


git stores changes at the file level, rather than row level

It applies delta compression when storing blobs in pack files.


git does not handle large files well. Actually none of the open-source VCS do a good job. I think that was picardo's main point.


This isn't precisely true. Git stores changes at the "blob" level. Large plaintext files (like a CSV) are generally split into several blobs. You are correct that git works badly for binary files, like images and compiled executables, But plaintext files like CSVs should be OK.


In next iteration of this feature, please provide an option for sticky column and row headers - ie. scroll table down with headers in place. Such solution helps inspect large files with many columns. Surely you can do better than rendering a standard html table?


I second this motion. Pull request, anyone?


Where to? Do we have access to the source of this?

On that topic, I have never found much of Github's source code, which is extremely hypocritical, and makes me fear the day we'll want to move away from them. I'd be glad to be wrong and find out that I had just never looked in the right place.


That's not hypocrisy in the slightest. It would be if Github made claims like "we think companies should open-source all of their product's code", but they would never make such a claim. In fact, their business model is selling features to people who want to keep their code private.

Against your wishes, maybe; Hypocritical, no.



GitHub is proprietary. All of their open source projects are at https://github.com/github/.


It would be nice if GitHub added better support for column-oriented data in diffs, too. This article shows how it could work, with an example implementation in a fork of GitLab: http://theodi.org/blog/adapting-git-simple-data


I love this company. They just get it.


If you ignore the new UI.


The UI in general isn't too bad, but the new arrangement and location of things is definitely a bit off.

(And yes, I know arrangement is like 80% of UI, but you get my point.)


The fact that I have to click on something to see language breakdowns is a problem. The fact that I have to refresh the repo page to make the language breakdowns go away is a bigger problem.


A tabular/graphical JSON explorer would be interesting as well.


I totally agree. This the next logical step.


Is GitHub turning into an operating system?


Not sure, but it might be turning into an IDE. They've got a good part of it in place. Something like FP Complete: https://www.fpcomplete.com/


Bracing myself for the Github IDE. If they do that right, any browser can become your development machine.


Hopefully they'll add sorting soon


1) Agreed - allowing users to click on a column heading and then sort it by ascending or descending values would be killer.

2) Adding embed codes for these tables would also be incredibly useful for news organisations and non-profits.


A massive amount of time and energy is wasted by news organizations on the question of how to show tabular data. This is an incredible feature, and one I'm assuming can be hackable/extended in various useful ways.

Anyone stress-test it with bulk data yet?


It would be good if browsers had native support and rendering of SQLite in the same way that Chrome works with PDFs.


This is great, but their basic text editor isn't usable for me for what I think are pretty reasonably sized files (170kb of Markdown). This doesn't have to be because the editor they use (Ace) works fine with it, so I don't know what they're doing to it. A tech support person said it was only there for small files, not for serious editing.

I was working on a project and wanted non-techies to edit Markdown for me on the site, being able to preview the results, but couldn't make it work on these files.

So I wish they'd fix that rather than more editors for other file types.


Can you link me to such a file?

As a former developer on Ace and now employee at GitHub I've honestly never run into a problem. I can't promise anything but I'd like to see what you mean.


I just tried this with a (very superficial) test case[1] at 189KB and the editor seemed fine, both for editing source and generating previews. I even used the editor to double the file's length, saved, and reopened without problems.

[1] https://github.com/BHSPitMonkey/Large-Markdown-Test/blob/mas...


I've sent you an email with the exact file. Cheers.


Have you tried http://prose.io/ ?


Wow, works great on my iPhone. Love it!


Cool feature. What happens on gigantic files? Does it limit, paginate, lazy-load?


The limit is 600kb; I'll have to publically document that. Only 3% of GitHub's one million CSV files go over that limit.

As you might imagine, shoving in that many rows into the DOM kills any browser. G


I'm actually kinda surprised that the number is that high.

Part of me wants to make a stumble-upon that grabs a random csv from GitHub. I feel like there's probably some really interesting data waiting to be explored.


Please do it!


I'm tempted to say you leave an enable option and report it to the browser teams as a performance challenge – it's the kind of thing which should work better than it does.

I noticed that more than half the time in Chrome isn't HTML parsing but rather layout recalculation - I suspect you'd be able to avoid much of that if you were able to set some sort of min/max width on the columns and containing table with overflow:scroll-x or hidden. Perhaps set a fixed width when the server sees the size is large and/or by measuring the rows while rendering the template?


I believe you would have to set hard widths to avoid expensive reflows (layout recalculation). Min/max width are much more expensive, particularly in a very tall table.


That's likely - I wasn't sure whether anyone implemented the optimization of stopping as soon as you've maxed out a fixed width table but you could certainly do something similar by hand when rendering the page and set fixed widths based on the ratio of column sizes.


In similarly large tables, I found if you shove about 1000 rows into the DOM at a time inside a setTimeout(0) loop, most browsers behave quite ok.


I've wanted this for so long in my docs. It's so easy to build a doc ecosystem around my code on github between rendering md, and shoving YARD output into gh-pages!


It looks like this (ruby's CSV library) defines CSV to be some unknown superset of RFC 4180, so just following the RFC should get this feature to work.


I was wishing for this for a while, but didn't imagine they'll actually do it. This is awesome (Now all I need is to have it editable...)


This does not appear to be retroactive, after looking at .csvs in my old repos.


It should be. Can you provide a link?



I've had another look, GitHub is showing a nice little message: "Hey, did you know this file could be beautiful and searchable if this error is corrected? Missing or stray quote in line 1"


That's not a CSV file, it's tab separated (I think, I haven't actually checked whether those are tabs), you will need to have the right extension for the contained data for it to work.


Indeed. I made a PR for the original poster: https://github.com/kasbah/nomech_mini-hw/pull/1


https://github.com/minimaxir/aggregate-data-from-likes-of-fr...

The 2 .csvs there display as plain text.

EDIT: Both work now.


Great. Nested comments, anyone?


Is the viewer open source?


The viewer is just Ruby's CSV library, iterated within an ERB.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: