Hacker News new | past | comments | ask | show | jobs | submit login
The Emacs Problem (sites.google.com)
73 points by mofey on May 24, 2010 | hide | past | favorite | 40 comments

Uuugh... XML... how I hate you. Everyone should need to learn Lisp, even if you never ever program it, simply to avoid developing some of these poisonous XML-flavored tools. He mentions Ant specifically. Let me tell you, as an unwilling Microsoft Build expert, XML should be NOWHERE NEAR your build system. He puts down Makefiles, but as someone who wrote a Makefile for the first time after years of Ant and MSBuild, it was like a breath of fresh air. The intentional lack of power is what makes it so great. If anything, Makefiles are too powerful and too close to turing complete.

Here is what a make file should look like:

1) vpath statements for various file extensions

2) Phony targets for logical build units, including "all"

3) Generic rules for mapping .abc to .xyz; these rules should have exactly one line of code which executes an external script/tool

4) A list of edges in the dependency graph

5) A few variables as necessary to eliminate redundant edges

If you put any logic in your Makefiles, you are doing it wrong.

If your builds are slow, add empty dummy files for larger culling by timestamps. If timestamps are insufficient, codify early out logic into tools.

Not having logic in my Makefiles enables parallel execution and strong support for incremental builds. If I were to use Lisp as a build system, I'd create a DSL that had these same properties; forbidding arbitrary logic in my dependency graph. It's about finding the right balance to inject expressiveness without losing desirable properties of the DSL. This is why every developer needs to understand more about programming language design. Anytime you create any type of file format, you are doing it. And anytime you are writing any type of file format, you are reverse engineering it. Understanding the intended separation of logic for Makefiles helps me write better Makefiles.

> If anything, Makefiles are too powerful and too close to turing complete.

"Close to"? Not a formal proof of Turing-completeness (and may only work with GNU make, not sure), but...

  [me@host: ~]% cat fibo.mk
  dec = $(patsubst .%,%,$1)

  not = $(if $1,,.)

  lteq = $(if $1,$(if $(findstring $1,$2),.,),.)
  gteq = $(if $2,$(if $(findstring $2,$1),.,),.)
  eq = $(and $(call lteq,$1,$2),$(call gteq,$1,$2))
  lt = $(and $(call lteq,$1,$2),$(call not,$(call gteq,$1,$2)))

  add = $1$2
  sub = $(if $(call not,$2),$1,$(call sub,$(call dec,$1),$(call dec,$2)))

  fibo = $(if $(call lt,$1,..),$1,$(call add,$(call fibo,$(call dec,$1)),$(call fibo,$(call sub,$1,..))))

  numeral = $(words $(subst .,. ,$1))

  go = $(or $(info $(call numeral,$(call fibo,$1))),$(call go,.$1))

  _ := $(call go,)
  [me@host: ~]% make -f fibo.mk

"The language of GNU make is indeed functional, complete with combinators (map and filter), applications and anonymous abstractions. GNU make does support lambda-abstractions."


One solution that Steve didn't discuss is JSON. To be fair, JSON wasn't that popular in 2005, but it's still a great solution.

The way it works is that their are no mandatory newline characters in JSON. Whitespace between lexical elements is ignored, and any embedded newlines in strings can be escaped (i.e. as \n). So a log format that a few people are using today is like this:

{'kind': 'foo', 'id': 1, 'msg': 'hi'} {'kind': 'bar', 'id': 2, 'msg': 'there'}

Each log message takes up a single line in the file. You can trivially deserialize any line to a real data structure in your language of choice. You can (mostly) grep the lines, and they're human readable. I do this at work, and frequently have scripts like this:

scribereader foo | grep 'some expression' | python -c 'code here'

In this case we're storing logs in the format described above (a single JSON message per line), and scribereader is something that groks how scribe stores log files and outputs to stdout. The grep expression doesn't really understand JSON, but it catches all of the lines that I actually want to examine, and the false positive rate is very low (<0.1% typically). The final part of the pipe is some more complex python expression that actually introspects the data it's getting to do more filtering. You can of course substitute ruby, perl, etc. in place of the python expression.

I feel like this is a pretty good compromise between greppability, human readability, and the ability to programatically manipulate log data.

"XML is better if you have more text and fewer tags. And JSON is better if you have more tags and less text. Argh! I mean, come on, it’s that easy. But you know, there’s a big debate about it." — Steve Yegge


That's the problem with discussing old articles. Information gets updated

Not really. The arguments still hold.

JSON is great, but the thing that bugs me about this usage is that it is essentially a bloated version of a "normal log". You don't need the field names, braces, :, quotes or in-fact most of the characters there, just single character delimited columns (traditionally comma, space or tab) with rows delimted by some other character (traditionally newline), some rules for escaping (or not) and the first row as the field names (if they are not obvious). Its more human readable, more machine readable and shorter than JSON - and actually, its already the unofficial standard so you don't need to convince anyone of anything.

Sure, if my data is tabular I always end up with a CSV-like arrangement, usually space-separated. The original article however talks about how data always ends up hierarchical and tree-like. JSON represents tree-like data very succinctly and very readably.

Still, the real XML-killer for me is YAML. It's even more readable than JSON, and allows many documents in a single file. This makes it excellent for logs, or for any application where your files get big and you want to stream records off them without having to parse the whole file into memory at once. Sure, you can do this with XML and parser hooks, but it's so much more of a pain than just iterating over top-level YAML documents.

Another killer feature is that it's simple enough that I've been able to ask clients to provide me with information in YAML format just by giving them an example record to follow. They're non-technical, but they can read it as easily as me. That's a pretty big win.


    (:kind foo :id 1 :msg "hi") (:kind bar :id 2 :msg "there")

Err, my formatting got messed up. Pretend like there's a newline between the two log entries I described.

Indent the code two spaces. http://news.ycombinator.com/formatdoc

You could even trivially convert it to XML and use XSLT, if you were silly enough. But Lisp is directly executable, so you could simply make the tag names functions that automatically transform themselves. It'd be a lot easier than using XSLT, and less than a tenth the size.

And now anyone who can modify your log can execute arbitrary code in the reader process…

People always say that. Fortunately someone figured that out on the order of 34 years ago

[ http://en.wikipedia.org/wiki/Capability-based_security ]

(for a more recently active discussion, try http://en.wikipedia.org/wiki/Domain_Specific_Language)

Did anyone else think of CL-PPCRE as they read through the "lisp does not have regular expression support" implications? That was answered by the comparison of elisp to modern Common Lisp, and I wonder if anyone has done any work to make CL-PPCRE work for elisp. In spite of being someone's library, it is much faster than perl's built-in regex support.

There is a point to be made for the idea that you are solving the wrong problem with the log parsing. On the other hand, if you are trying to interface with other developers' popular engines, you may not have a choice.

Most of those benchmarks are against Perl 5.8, the version of Perl released in 2002. 5.10 had major regexp engine improvements, and 5.12 had minor improvements. Anything is fast when you compare it to 10 years ago.

Furthermore the 5.8 engine is pretty much the 5.6 engine, which was significantly slower than the 5.005 engine. Why the slow-down? Because 5.6 has logic to check when regular expressions are matching slowly, and then to add tracing when it is. This makes many of the exponential slowdowns that Mastering Regular Expressions describes be automatically caught and handled fairly quickly.

The CL-PCRE and Java regular expression engines don't have those somewhat expensive checks, and so are much more likely to encounter catastrophic behavior.

Yeah, but who cares about real-world code if you have a neat benchmark page? Most of your "users" will never even get to the point of using your package -- after saying how great it is on Reddit, they'll turn on their porn and forget all about it.

OK, maybe I'm too cynical...

I actually ran my own tests to see the difference, including tests against hash speed and so forth. But it is good to know there are other benchmarks out there that correlate with what I found, even if older ones. Has anyone published new ones?

Just to point out LINQ to XML http://msdn.microsoft.com/en-us/library/bb387098.aspx.

It takes a lot of pain out of XML processing. Don't have to remember the specifics of XPath/XQuery but you still have to deal with the pain of multiple namespace resolution inherit in XML.

It would be interesting to see an emacs implementation in language better suited for text manipulation like Ruby or Perl.

I've written a lot of Emacs Lisp and a lot of Perl. Perl is better for bulk text processing, but that's not what editors do. The Emacs API is great for text editing. Once you are willing to treat blocks of text as buffers, everything gets very easy with Emacs Lisp. But most people are afraid to do that, because they are used to working with big opaque text objects and big batch transforms.

(Say you want to prefix every sentence with the expression "And then he said, ". The perl way: replace every empty space after a period or the beginning of the string with "And then he said, ". Emacs way: While there are sentences, go to the beginning of the next sentence and insert "And then he said, ". Same result, different way of thinking.)

Modify your way of thinking, and the Emacs model is wonderful. (Why do you think there are so many more Emacs extensions than Eclipse extensions, even though you can pretty much use any JVM langauge to customize Eclipse? It's because customizing Emacs is fast and easy.)

Hell, even Scheme would be a lot nicer than Emacs Lisp. Actually, if I could even get Yi ( http://www.haskell.org/haskellwiki/Yi ) to work, I'd use that. Elisp sucks.

So, what have you written in Elisp?

Mostly stuff for my own personal use inside Emacs; i.e., nothing important. I'm sure that disqualifies me from ever speaking of it again, but there are objective reasons I dislike it (the dead horse of "lack of lexical scoping" comes to mind).

Probably the biggest reason I don't like it, however, is this: http://www.emacswiki.org/emacs/WhyDoesElispSuck#toc4


Lexical scoping is mostly for defensive reasons; its absence shouldn't prevent you from writing code. It only prevents a very narrow category of bugs -- a type of bug that is especially uncommon in elisp, because most variables are bound with let, rather than set with ... uh ... set. That means, even if you accidentally use a variable name that the caller has, you still don't clobber the caller.

Some parts of Emacs are made quite convenient with dynamic scoping, in fact. It's nice to be able to say "(let ((inhibit-read-only t))" and run a bunch of editing commands that ignore the fact that the buffer is read-only. If you had to take the Java-style approach of passing the "inhibit-read-only" flag to every buffer modification function, writing an editing function would be quite tedious. And, you'd have to remember to include the "inhibit-read-only" option on any composite functions you would write, otherwise you'd be deleting features from Emacs (for future users), just because you're lazy. Not good.

In general, programmers are taught that everyone using and maintaining their code is dumber than them, and to make sure that nothing unanticipated is ever possible. Programming for Emacs is the opposite, though -- you have to keep all possibilities open, because the next programmer is smarter than you, and understands the ramifications of what he wants to do. (Because if something fucks up, it only takes a few keystrokes to fix it. The user of your programming tool is very different than the user of your web app. He's a programmer in an environment that's meant to be programmed -- breakage can be corrected immediately.)

Anyway, people like to repeat "guh, global variables suck" over and over again, but the reality is that lexical scoping is not that important of a feature. When it's added (which it already is), it's not going to suddenly be easier to write extensions. You will just have one less possibility to think about when debugging, and many opportunities to accidentally make Emacs harder to program, by removing opportunities for future programmers to tweak the behavior of your code.

Dynamic scope is fantastic and I dislike using any language that doesn't have it. That said, I find it annoying in emacs to write code that uses menus because what I would like to do is build the menu and attatch a closure to the different "buttons". Since I couldn't find a way to get callable closures using the cl package I ended up writing some kind of global button-position table. Yuck.

As to your point about "global variables suck", people say a lot of misinformed things. The problem with global variables has always been the inability to detect their access. If a variable has a value you didn't expect, it's not lexical so you can't just read the code and it's not dynamic so you can't just check the stack for the culprit.

Ironically, the main thing our OO saviour has really brought us is the ability to limit the scope of our global (member) variables. With big classes you still have the issue of figuring out what in-scope method changed the member variable, and I've sorely missed dynamic scoping for member varialbes many time (e.g. anytime you use a finally block to make sure the variable gets set back, you could have just used dynamic scoping).

You just have to consider callbacks to be a (function . closure) pair, instead of just a function.

Hrm... I should have thought of that. I'll blame it on time constraints!

Yeah. Emacs Lisp requires you to solve problems in a different way than you are normally used to. I guess that's bad. But once you figure it out, it's not too much of a productivity drain, which is what matters in the end.

Well, in this case I would say it requires me to do more than I would expect to be necassary. When I set up my menu I just want to set each button to call a closure set up specifically with that button.

But as much as I think lexical scoping is just a requirement and as much as I wish elisp were closer to CL (in power, if not in size), emacs is still the best game in town for me.

You're arguing two separate things. "Dynamic scope is an important tool" by no means implies that lexical scope is not important.

It's important, but not "throw 1 million lines of existing code under the bus" important.

As an emacs newbie with a nagging desire to learn Perl, I would gladly commit to switching to an emacs implementation in Perl.

Not that I would use it, but there is Padre:


If you think emacs lisp is hard to read now... ;)

Perfect Emacs Reimplementation Language

Another Reimplementation Conundrum?

Emacs Lisp is better suited for text manipulation than Perl or Ruby. Powerful regexes just don't cut it, you need text related abstractions.

Emacs clones based on both Scheme and Common Lisp have been written, but they haven't taken off.

TextMate/Ruby = Emacs/Elisp

But Emacs is capable of intelligently editing source code which is not Elisp. As much as I have tried, I cannot say the same of TextMate.

(How hard can it be to select blocks on double-click? Even Terminal.app does it!)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact