Hacker News new | past | comments | ask | show | jobs | submit login

People can read much faster than they can speak, and they can google a new words.

You can Ctrl+F arbitraly big text files for keywords. Good luck with doing the same with 2-hours-long video file or mp3. You will need to listen to the whole thing. That's what annoys me about the new trend to do tutorials as video.

You can easily diff text.

Text works with version control systems.

Text works with unix command line tools.

You can trivialy paste relevant fragments on wiki pages, in emails or IM discussions.

Google translate works with text.

Screen readers work with text.




OK, there are so many advantages of text. I wholeheartedly agree and prefer plain text over most "smart" formats.

BUT.

I have this pet idea about source code. Text isn't the optimal format because a program is not a lineal thing, but closer to a tree structure.

Why have we settled on programs written in text? Pretty much for all the reasons you wrote and the fact that we've had bad experiences with other kind of formats in the past. Being able to fall back to plain text when things go wrong is very nice.

But it has its own sort of usability problems. There's an impedance mismatch between text and programs and sometimes it shows. Actually we don't see it more often because we tend to think that text is the way it is done, always was done and always will be done.


>Text isn't the optimal format because a program is not a lineal thing, but closer to a tree structure.

I can tell you what it's like from experience. The Interlisp D environment used a structure editor rather than a text editor. I found it infuriating and clumsy. Admittedly I had come from the emacs-infused PDP-10/Maclisp & Lispm world, so I gave it several months, but in the end I adapted an Emacs someone else had started and did all my editing in that.

I figure if this would work for any language it would be Lisp, and it didn't work for me. It sounds like a great idea, since if the editor's "buffer structure" is the program structure it's easy to write lambda functions to, say, support refactoring your code. But it was rarely convenient.

The other thing that didn't work for me was that it was of course a mouse-driven interface (this was PARC after all) and I found shifting my hand off the keyboard all the time slowed me down a lot too.


I found it infuriating and clumsy.

I've heard about programs that tried to do that before that had massive usability problems. Your comment seems to confirm that diagnostic.

I figure if this would work for any language it would be Lisp

I guess it would be more of a novelty for other languages.

I found shifting my hand off the keyboard...

That's a big no-no!

EDIT: By the way, I've been experiencing something similar recently, teaching Scratch and AppInventor to my son and a group of children at Coder-Dojo:

http://medialab-prado.es/article/coderdojo

Kids like the mouse interface, but I do find it very limiting and slightly infuriating.


Have you tried paredit mode for emacs?


It's already done in most IDEs and you probably use it :)

Ctrl+click on method invocation - did it jumped to the declaration?

Click on some method name and choose "show all invocations" or sth like that. Here you have one of the possible tree views of code.

Another tree-view of code is visible when you have code folding feature enabled.

Another one - when you debug and show subfields of some variable.

Yes, it's nice to have these additional abstractions over plain text. But these abstractions are inherently leaky, and I much prefer to work with text files than with some binary format to fix the leaks.


It's already done in most IDEs and you probably use it :)

That's a kludge. "True" form is text and the the IDEs add a layers on top. The right way is making the tree be the canonical form and leave text just as an interchange format.

But these abstractions are inherently leaky...

Inherently. That's bold and you give no justification. Anyway I like how you prove my last paragraph. It's been conventional wisdom for so long that you consider text representation as the fundamental form and using a format closer to the real structure a leaky abstraction.


The tree would just give you code-folding for free. Callgraphs, searching for references etc would still need to be recalculated by the code indexer. The only advantage is - no parsing step.

Meanwhile you would need to rewrite all the universal text-based tools from scratch specifically for your particular binary format. And this almost never happens so people are left with no way to merge files (Oracle Forms I'm looking at you).

BTW - merging and diffing files is when the abstraction often leaks, too, or at least wants to leak.

For example - when you have 2 trees with every node the same, but root node changed from <interface> to <class>. I guess your tree-based tool shows the whole tree as a difference? What about when you wrap half of the tree in <namespace>?

Textual diff would be 2 lines in both examples.

There are many possible situations, and I admit that in some tree-based approach is better, but there are many situations where you need better granularity than is possible without leaking the lower level. With text format that lower level is human-readable, and you can automerge unsafely (but concisely) and leave fixing the result to human. With binary format if you couldn't detect concise description of the change - you just show "everything was deleted and that new file was added" which isn't particulary helpful to the person that merge changes.

Can you merge word documents or databases? There's certainly market for that.


Callgraphs, searching for references etc would still need to be recalculated by the code indexer.

The what? Sorry, I really don't know what you're talking about. If you want to introduce a reference, logical thing to do is checking it in the same moment.

The only advantage is - no parsing step.

There are many more, see my other reply to icebraining.

About tools, you seem to give more importance to diff than to write programs with a more powerful environment. I can't agree. Also I don't think creating a suitable diff for trees is a hard problem. I just need to convince Linus of the idea and he will write one in a couple of days :-)


Oh, my bad, so you write references into the sourcefiles - smart. Your code isn't a tree anymore, but a graph, but other than that it will work.

Of course you still need code indexers for reverse lookups. You can't write "I am called from a,b,c,..." to source files because printf would be several gigabytes and growing each second:) So you still need code indexer to look up the code graphs in your workspace and build the index where this is called from.

Regarding the diff/merge tooling - IMHO it's much more important than powerful editor. If my company forced me to write code in mcedit I could live with that. I would quit straight away if we had no version control, no matter the IDE we would use. I worked for a short time in oracle forms and I won't do that again.

Re: merging of trees - you keep references in the trees so it's not a tree anymore, but graph possibly with cycles - makes it even harder to merge them properly (and you can't do unsafe merges because people won't fix them if it's binary format).


you keep references in the trees so it's not a tree anymore, but graph possibly with cycles

Cycles, why? Tree+links can be treated as a tree visually.

Edit: Oh, I see what you meant with indexer. That information would be in memory. In the database it's stored in just one direction. When loaded it expands bidirectionally.


Visualy it's not a problem. But now you need to maintain the links at all times. So all the operations must be aware of the whole program calltree. I think this could cause problems, especially when linking across libraries boundary, doing dependency injection, reflection etc, but I may be wrong.

Would like to see it implemented out of curiosity. My mine problem with merging trees could be solved by @pshc idea so maybe it could work?


The key to merging trees is that your editor generates the diff on the fly as you write—your editor's undo/redo stack and your version control system become one and the same. Every time you add, move, copy, switch out, or delete a node, the editor notes the operation on your delta log. When you push your code externally, you're really just shipping the log.

With these semantic deltas, merging should be highly unambiguous and automatable, even in degenerate cases.


Hm, the history will be dirty with abandoned experiments etc, but so what - great idea. I'd like to use that system for a while, but I'm still not sure if it will work.

I remember using jbpm graph language - it had nice graphical editor, but we still switched to xml view to very often, because it was much faster to work with text.

Maybe it's just a matter of proper tools, but I can't imagine how you allow graphic language to do this for example:

sed s/foo\("bar", ([a-Z0-9]+), "baz"\)/foo\(\1\); bar\(\1\); baz\(\1\);/g

Maybe it's problem with my imagination.


When I look at that line, my brain sees its structure, the different layers... :) It is totally representable without restoring to plaintext. With good fundamentals (ADTs) and good UX (an editor that looks like a text editor but is so much more) we can make it work.


That's a kludge. "True" form is text and the the IDEs add a layers on top.

Why is text the "true" form? What's the difference between an IDE that uses text as the "true" form and one which uses a tree as the "true" form but uses text as the UI and serialization format?


Why is text the "true" form?

It shouldn't. It's what the current IDEs do.

The difference is that the program is currently written as text and passed to the compiler, that expects text. The IDEs make a parallel construction to offer some goodies to the programmer.

How I think it should be: the IDE would actually do the syntax checking and reference resolving work, so any program you have written is in fact pre-validated.

There would be performance enhancements, but that's not the only advantage. One example that comes quickly to mind is applying macros, writing GUI generation wizards based on DB schema and in general applying "DRY".


How I think it should be: the IDE would actually do the syntax checking and reference resolving work, so any program you have written is in fact pre-validated.

But IDEs already do that. An example I'm familiar with is Java on Eclipse - it validates the code as you're writing it and marking compilation errors. Your others example have been done as well.

You're right that you need an AST to do the properly, but when there's a cheap and reliable way to convert a blob of text to an AST - parsing - the distinction is somewhat irrelevant.


You are arguing that the fallout is manageable. I'd like to simply do it right from the start.


No, I'm saying it's an irrelevant implementation detail.


That's why my favourite language does represent source code as trees…

That is, in Lisp the source is lists of lists and atoms, which are trees of data. It's pretty cool.


All languages actually represent source code as trees. It's just that the shavings are less obvious in other languages.


It would be nice to have an IDE that surfaced something like the AST. Of course lots do to some extent, but I bet there is room for improvement here. This also seems like a place where Lisp would have an advantage since the syntax is so transparent.

For readers of yesterday's article about whether all the "easy" stuff has already been accomplished, here is an easy-to-read survey of the state of the art in diffing trees:

http://useless-factor.blogspot.com/2008/01/matching-diffing-...

That sounds like a pretty interesting topic to research! And I note that his oldest citation is from 1997, and most are from the last ten years. He also briefly mentions "operational transformation," which I agree seems related and is another area of ongoing research. Both topics seem like they would have lots of practical applications, but right now the general-purpose tooling is weak or doesn't exist. So there is room not just for research but also for folks to implement that research.


Our ‘plain text’ is unfortunately not up to the task of representing all human written text; I'm thinking specifically of traditional mathematical notation, which is also a tree structure represented in two dimensions, and ancestral to programming notation, in that we first squeezed mathematical notation down to one dimension¹ and then augmented it with notations for control flow.

¹with a few forgotten exceptions like the Klerer-May system.


TeX, runoff, others.


> prefer plain text over most "smart" formats

To be usable and easily read, plain text needs word spacing and line breaking, which is a form of "smart" formatting.

> Why have we settled on programs written in text?

Again, text with newlines, indenting, and fixed width fonts, without which it can't be read. So we're talking 2D text here.


Totally agree! Plaintext is an unnormalized form of program code, and working in it generates all sorts of artificial problems. I've started various pet projects to try to be able to edit the AST naturally, but haven't seen much success yet. The UX is really difficult.

I think the Light Table team is trying to do this now with Eve, although it sounds like they are turning it into something even more revolutionary but further from textual visualization.


working in it generates all sorts of artificial problems.

Indeed. Escaping characters from comments or strings is one obvious example. Programmers serve the compiler instead of the other way around.

I've started various pet projects to try to be able to edit the AST naturally, but haven't seen much success yet. The UX is really difficult.

I've also been working in it some months ago, using SQLite and Lazarus. I hope I can recover the project now.

If you or someone else are interested, feel free to contact me by email, it's (for real) in the profile.


The solution for graph representation in a program is "program graph over text". The same way as web pages are described as "HTML over text".


Counterpoints to show that it is just a compromise: The complexity of unicode, collation, encoding, translation, language.

However you're 100% right!

Our application actually used text files on a network share with an indexer over the top as the database engine a long time ago. It worked really well and integrated with NT security and file locking for concurrency control, plus it was very easy to back up. A work of genius. However, NTFS doesn't scale well with lots of small files as it stores them in the MFT so it fell off a cliff eventually.


Counter-counterpoint: all of your counterpoints exist or have equivalents in every other form of communication. You swap unicode (which is nearly universally agreed on) for H.264/Theora/VP8 and AAC/Vorbis/Opus, and you still have to deal with collation and translation/language (which, without transcription to text first, is pretty hard).


Would your app have been feasible then with something like SQLite?


We used something similar in the end I.e. SQL Server Compact Edition and sync the data with a master SQL Server instance.


i seen companies get around that limitation by storing multiple files into a compact file


Yeah we do that now. We store them in a big file in pages that contain rows and an externally visible network process allows us to manipulate the things.

(sql server)


"You can easily diff text. Text works with version control systems."

Yes, yes, a thousand times yes.

It is very strange to me how hard it seems to be to convey this to almost anyone who has not experienced it in serious computer programming. E.g., a few times I talked about it with lawyers who work on complex documents, and got nowhere. And occasionally I have even run into subcultures of programmers who didn't get it (and/or, relatedly, the power of build-automation tools like "make").


Does there exist a diff tool or 'contract version control' for lawyers? Seems like it would be a lucrative market


There's no simple standard one, but there are a lot of solutions. The most basic is that MS Word can open two files and show you the differences.

Then there are various "document management systems". Sharepoint has a lot of features for revision tracking. There are multiple Salesforce based solutions.

So there are plenty of products in that market. I don't know if any of them are good. Sharepoint usually works well if you're already using Active Directory.


The one advantage of video tutorials is that they keep the viewer's attention more easily, particularly for viewers that aren't proficient with text searching and making mental summaries. Some people will get through the text, but without varied sensorial anchors getting stored in their memory they are left feeling confused, like they failed to grasp it as a whole (even if they actually did.)


> The one advantage of video tutorials is that they keep the viewer's attention more easily

Not mine. The information per unit of time is so low so my mind start to drift or I start doing something else and forget about the video. Much prefer text that I can skim and find the parts that are relevant to what I need to do and simply skip the sections that seems to be most fluff.


Agreed. I actually run all my 'speech' (e.g. presentation, tutorial etc) videos at 1.5x or 2x. Cutting a 30 minute video down to 15 is really, really awesome. Particularly when the guy speaks slowly, when he already speaks fast I'll do 1.25x or 1.5x.

I do this on youtube and things like treehouse. The reason I tend to watch youtube on my tablet in the browser and not the app is because the app doesn't allow this and my brain goes numb.

That having been said, I really love video for some things. For example, I'd much rather listen to Greenwald's speech at Brown on civil liberties than read the equivalent article. I tend to clean my room or play Pro Evolution Soccer while doing so and somehow that works brilliantly. I can't quite keep the same concentration when I read for 30 minutes.

But it really depends. If I want to look up some code documentation, that format is a billion times better in text than video. If I'm somewhat familiar with the topic, know what I'm looking for, the ability to easily skip over introductions, side topics and history and just Ctrl+F for e.g. a piece of code, it can save an order of magnitude of time. I tend to like videos for things I'm wholly unfamiliar with and want to listen to from start to finish, which frankly is pretty limited.


Playback at 1.5x or 2x makes an even bigger difference for audio books. A 20 hour audio book suddenly becomes "just" 10 hours, yet it is still understandable.


Video seems to be popular with younger people. I believe this was discussed on HN a while ago. It might also explain the enormous abundance of frighteningly long videos going over the most simple things. My 8 year old daughter regularly makes 20-60+ minute videos about play sessions (dolls, LEGO, Play-Doh - not that LEGO isn't fantastic, but there's a limit...). YouTube seems filled with similar stuff. I've seen 10+ minute videos that are really just about how to type tracert in a Windows command prompt. Someone apparently watches this stuff.

There's also the people that like having videos running while doing something else. I find this to be a disturbing habit, especially when it's done advertisement laden TV nonsense. But people seem to enjoy it.


I find video to be extremely useful for learning something new. For example, when learning math, I find Khan Academy's video lecture to be much more useful than reading the exact same thing out of a textbook.

After I've learned it? Text all the way.


I guess having something running in the background saturates your mind better? I certainly do it, switching from silence to music to lets plays to episodes of QI to MOOC lectures depending on how mentally engaging whatever i'm actually doing is at the moment. Otherwise my mind wanders off the topic at hand and i end up browsing hackernews for far too long.

Sitting at a computer means that there's about 10 possible distractions for me at a given moment, and if I want to do a consistent stretch of actual work, i just happen to need some background noise.


The only thing is the familiarity. If you've heard the song or watched the video a few times, it no longer interrupts focus and actually helps improve it.


An other example are music videos where they show you how to play a certain thing on an instrument. It takes 30 minutes for something that would take two sheets of music notation.

I really think many such videos are a step backward as far as carrying information goes.


I don't even start the video. They are almost always just marketing and hyperbole anyway. I assume that if they went through all the troubles of creating a video they must also have the information in much-easier-to-produce text form. This has come back to bite me only a few times when somebody points out information in an introduction video that is not covered in the "Introduction" page of the manual, but I consider the win of not having to sit through boring hour-long sales pitches much greater than the abysmal information loss.


Video can be by-product of internal training, I often suggest we record such meetings for people that were sick/fture workers. It's better than nothing.

If only someone did ctrl+F for video.


Technically, they let you learn a lot more because it allows for mimicry.


The big thing with text is the ease of changing which piece of information you're consuming. Anything that's on the same page is an eye-movement away, which is the cheapest action a human can take. For a video, you have to interact with the controls and hope you get to the right place.


text alone is not nice either. Good luck describing a complex design, I much prefer a diagram.

Text can also be ambiguous and it requires more attention than a video


> text alone is not nice either. Good luck describing a complex design, I much prefer a diagram.

If you look at a detailed diagram of a complex design, it is even worse than text. I consider myself pretty good at spacial intelligence (Before discovering computers, I was leaning towards mechanical engineering and took 4 years of technical drawing at middle school level, not to mention my lifelong hobby: drawing/sketching), yet give me a call-graph with more than a hundred elements in it and my head will start aching in no time.

If there are high quality diagrams that represent any complex entity in a relatively accurate way is only a consequence of the fact that some (probably)human intelligence has devoted a significant amount of time to synthesize the essence of the problem at hand, abstract the irrelevant details away, and use a highly symbolic representation to communicate the results to others.

You can do that with text (it is called summarizing), but it requires more training for both the producer and the consumers to do it effectively, which takes us to the next point.

> Text can also be ambiguous and it requires more attention than a video

Video and other graphical media helps to lower the threshold to communicate this summarized bits of information, which has both advantages and disadvantages. If there are social advantages to communicate some information to the general population, then significant amounts of effort should be devoted to making the message as digestible as possible (without losing to much accuracy).

However, if you rely on this methods to train the professionals, you will end up with a bunch of marginally competent fools that are not capable of grasping just how much more learning they are missing. Then, they will take over the training of the next generations and knowledge loss is practically inevitable.

There are cases where precision is required, and anything that lowers the attention threshold is more a bug than a feature.


"If you look at a detailed diagram of a complex design, it is even worse than text."

Not necessarily. (Or horses for courses, or YMMV, or however you want to put it.) For some kinds of things, diagrams seem to do awfully well. Not all kinds of things, there are plenty of people who overuse diagrams, but some kinds of things.

E.g., consider the success of Feynman diagrams. They're at the edge of my expertise (I did a Ph. D. thesis on QM calculations, but not the kind that uses them) so I'm not 100% confident of this, but I have the strong impression that no one has developed a textual form that represents those relationships in a way that most people find comparably clear.

Or consider the humble "graph" (in the common usage, not the math "graph theory" usage). I do not want to deal with the text replacement of a nontrivial scatter plot in a typical experimental paper, or a diagram of a clever waveform in an radar ECM monograph.

Or consider the humble map (again in ordinary usage, not math usage) of e.g. Florida or the US interstate highway system.

Electronic circuit diagrams, medium-complexity Venn diagrams, and probabilistic inference networks also seem to be cases where diagrams can be hard to beat.


I read much slower than I speak :)


Wouldn't that mean that you are incapable of reading text out loud at a normal pace? That seems unlikely for a hackernews visitor, unless there is something impeding you like dyslexia or poor vision.


Learn to read, I guess?


I think this is a common issue when reading materials not in your native language. I can read a book written in French at least twice faster than the same book written in English, even though I use English daily and consider myself fluent.


Depends how you use foreign language. I mostly read and write English, I can read it almost as fast as Polish, on the other hand my pronounciation is still bad.


Could be dyslexia, or a physical eye problem.


I think the more general rule is that you write slower than you speak. Speaking is a great form of communication when you are live with someone, but it is a frustrating way to get a one way conversation because as all the other posts say, it is not index-able and I definitely don't want to listen to someone drone on at me about stuff I don't consider important.


And some people can't even read at all!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: