"git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important."
"Fred Brooks, in Chapter 9 of The Mythical Man-Month, said this:
Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."
I'm getting off topic, but this quote brings to mind Dijkstra's remark that "our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed."
Does anyone know whether this has indeed been shown to be true?
Programmers are data manipulators, no? Github is about streamlining data manipulation (storing your code, helping you with CSVs/GEOJSON). They're just lowering the bar to make data manipulation more accessible.
One day, everyone is going to be a programmer. Is everyone a professional driver? Not at all. But most people do know how to drive a car.
Unfortunately, Git isn't a good way to collaborate on massive csv files for two reasons a) git stores changes at the file level, rather than row level, b) for massive datasets, you quickly run into file size limits. That's why I am looking forward to dat[1], a collaborative DVCS that's going to blow Git off the water for this kind of tasks, even if it achieves only half of its objectives.
This isn't precisely true. Git stores changes at the "blob" level. Large plaintext files (like a CSV) are generally split into several blobs. You are correct that git works badly for binary files, like images and compiled executables, But plaintext files like CSVs should be OK.
In next iteration of this feature, please provide an option for sticky column and row headers - ie. scroll table down with headers in place. Such solution helps inspect large files with many columns. Surely you can do better than rendering a standard html table?
Where to? Do we have access to the source of this?
On that topic, I have never found much of Github's source code, which is extremely hypocritical, and makes me fear the day we'll want to move away from them. I'd be glad to be wrong and find out that I had just never looked in the right place.
That's not hypocrisy in the slightest. It would be if Github made claims like "we think companies should open-source all of their product's code", but they would never make such a claim. In fact, their business model is selling features to people who want to keep their code private.
It would be nice if GitHub added better support for column-oriented data in diffs, too. This article shows how it could work, with an example implementation in a fork of GitLab: http://theodi.org/blog/adapting-git-simple-data
The fact that I have to click on something to see language breakdowns is a problem. The fact that I have to refresh the repo page to make the language breakdowns go away is a bigger problem.
A massive amount of time and energy is wasted by news organizations on the question of how to show tabular data. This is an incredible feature, and one I'm assuming can be hackable/extended in various useful ways.
This is great, but their basic text editor isn't usable for me for what I think are pretty reasonably sized files (170kb of Markdown). This doesn't have to be because the editor they use (Ace) works fine with it, so I don't know what they're doing to it. A tech support person said it was only there for small files, not for serious editing.
I was working on a project and wanted non-techies to edit Markdown for me on the site, being able to preview the results, but couldn't make it work on these files.
So I wish they'd fix that rather than more editors for other file types.
As a former developer on Ace and now employee at GitHub I've honestly never run into a problem. I can't promise anything but I'd like to see what you mean.
I just tried this with a (very superficial) test case[1] at 189KB and the editor seemed fine, both for editing source and generating previews. I even used the editor to double the file's length, saved, and reopened without problems.
I'm actually kinda surprised that the number is that high.
Part of me wants to make a stumble-upon that grabs a random csv from GitHub. I feel like there's probably some really interesting data waiting to be explored.
I'm tempted to say you leave an enable option and report it to the browser teams as a performance challenge – it's the kind of thing which should work better than it does.
I noticed that more than half the time in Chrome isn't HTML parsing but rather layout recalculation - I suspect you'd be able to avoid much of that if you were able to set some sort of min/max width on the columns and containing table with overflow:scroll-x or hidden. Perhaps set a fixed width when the server sees the size is large and/or by measuring the rows while rendering the template?
I believe you would have to set hard widths to avoid expensive reflows (layout recalculation). Min/max width are much more expensive, particularly in a very tall table.
That's likely - I wasn't sure whether anyone implemented the optimization of stopping as soon as you've maxed out a fixed width table but you could certainly do something similar by hand when rendering the page and set fixed widths based on the ratio of column sizes.
I've wanted this for so long in my docs. It's so easy to build a doc ecosystem around my code on github between rendering md, and shoving YARD output into gh-pages!
It looks like this (ruby's CSV library) defines CSV to be some unknown superset of RFC 4180, so just following the RFC should get this feature to work.
I've had another look, GitHub is showing a nice little message: "Hey, did you know this file could be beautiful and searchable if this error is corrected? Missing or stray quote in line 1"
That's not a CSV file, it's tab separated (I think, I haven't actually checked whether those are tabs), you will need to have the right extension for the contained data for it to work.
--Linus Torvalds
(source: http://programmers.stackexchange.com/questions/163185/torval...)