Hacker News new | comments | show | ask | jobs | submit login

POVRay is a compiler. It takes a specification of a scene graph (give or take some details), and emits an image.

A JSON library is two compilers. One takes text and emits some form of language-specific representation, the other takes the language-specific representation and emits text. Serialization in general is a special case of a compiler.

A syntax highlighter is a compiler. One end of it looks like a normal language compiler, but it doesn't emit machine code, it emits annotated text. There are some fun intricacies related to that that have little relevant to conventional compilers, in fact: http://blog.sigfpe.com/2009/01/fast-incremental-regular-expr...

A web browser is a compiler. (Actually it's rather a lot of compilers.) It takes a web page and compiles it into the necessary set of instructions for the local display library to correctly display the page.

It is absolutely feasible to be an "expert on compilers" who doesn't know dick all about stack layouts on real machines, because you can write 10 compilers without any of them going down to machine language, or even byte code of any type. It's not even that hard. Counting up, my score is at least 5. And while I know what a stack layout is, what its for, and various tidbits about how they work in C, I certainly couldn't stand up to a blistering question session about it; after all, what I've learned about C stack layout was in the using of C, not the programming of a compiler for it.

Yes, when I say things that compile to machine language are a special case, I really do mean that less than 1% of all compilers fit that case. Easily. Probably less than .1%. You are surrounded by compilers and only a rare set of special cases goes down to anything like assembler(/bytecode). For the vast, vast majority of compilers, there is no relevant concept of "stack layout". The ones that do are an important and interesting subset, but when I say "everyone should understand compilers", it's very much precisely because that subset is relatively small. If the only value of knowing about compilers came from being able to hack on GCC, it really would be worthless to know about in the general case, but instead we are swimming in a sea of compilers, and not understanding their principles means you are in some real trouble.

I think I understand now. You don't know what a compiler is :-)

We should be clear about the definition of a compiler. From the Dragon book, arguably the definitive text in compilers. "a compiler is a program that reads a program written in one language and tranlsates it into an equivalent program in another language". From Wikipedia, "A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code)."

What you're describing is something much more broad... something called a "program". Programs take input and generate output. That process is NOT called compilation. Now I'm being a bit of a jerk in how I'm saying this, but I think the distinction is correct. What you're describing as compilers are simply programs. Programs that take input and generate output. But they don't pass even the first test of being a compiler, they don't generate new programs.

There's an additonal test to being a compiler. Take the input, I, and output O. Input I is written in a language with semantics, call it SI. The output O is written in a source language with semantics, call it SO. The semantics of I of SI should be the same as the semantics O over SO. That is SI(I) == SO(O). None of your examples maintain this. They all result in instances of a program, not a new program..

By your reasoning Notepad is a compiler. It takes keyboard input and generates text on a screen that is readable to humans. Pacman is a compiler. Takes joystick input and results in a video and audio that will delight anyone who was a child in the 80s.

And yes, compilers are a small set of the set of all programs written. But that wasn't the debate.

Would you consider LabView, Scratch, and Logo compilers/interpreters?

I don't know enough about LabView and Scratch to say one way or another (I just learned about both from your reply).

Logo is often implemented as an interpreter, not a compiler.

Well, seeing how you're unfamiliar with Prolog, LabView and Scratch, and how you're somehow drawing a distinction between compilation and interpretation, I think it's best to chalk this up for differences in terminology and move on :-)

If I had more time I would ask you about GNU Glade and the MS Resouce Compiler; the first converts visually manipulated GUI elements to XML, which gets converted to C, and the later converts a configuration-file type syntax into a GUI. IMO, both are compilers, and neither is troubled by machine layout.

I wont even go into symbolic algebra systems, constraint programming environments, model checkers, database query executors and optimizers, and many many other applications of compiler technology, because I have a feeling you will be up to debate their purity as "real compilers".

jerf made an excellent point above that I would advice others to consider.

I did want to address this statement, "symbolic algebra systems, constraint programming environments, model checkers, database query executors and optimizers, and many many other applications of compiler technology"

You're right I would argue they are not compilers. But I'd also state they often do use compiler technology, as do many other programs. I've spent a fair bit of time working on many of the things you note, model checkers and query optimizers in particular.

But lets be clear, compiler technology is not the same thing as being a compiler. And yes, terminology matters. If you say, "I've written a compiler" and you show me HangMan... I'll scratch my head. You don't get to just say "Everything is a compiler", simply because compilers manipulate data and so do all other programs.

No one said "everything is a compiler"; jerf and I listed particular applications, by name. That's hardly "everything".

And no one if offering you hangman as an example either.

I think you're exactly the type of people I was referring to as those who mystify compilation, in your case for the sake of unspecified native code generation ideal.

If you're solving a data processing and translation problem, and find your solution fitting a parse, validate, translate model. Using well known compilation techniques. I would say you're writing a compiler. Regardless of whether the output is native code, assembly, C, XML, byte-code, display transformations, or in-memory side-effects.

I just think that compilers generate programs or program fragments. Nothing mistifying about it. I really do think my definition is the mainstream.

I just draw a distinction between using techniques common in compilers vs actual compilers. I agree those things that Jerf listed may use techniques common in compilers, call them compilation techniques, but I still don't think they're compilers. I dont' think that makes them bad programs, but they're not compilers. And frankly, I think even those who the community deems as expert would say the same thing. They may use some compilation technologies, but they're not compilers.

You guys are arguing past the article. His entire point is that "compilation techniques" are important to a whole class of programming problems, and the best way to learn them is the standard university compiler class.

Well let me ask you... what isn't a compiler?

Visual C++ 6.0.

Wow that joke had a long set-up.

The Aristocrats!

The definition of "computer language" is very broad. Study your basic computer theory again; they don't call those things "languages" for laughs. JSON is a computer language; it sure as hell isn't a human language. Bitmaps are computer languages, as are PNGs or GIFs. HTML is a computer language. And to the extent the Dragon authors didn't mean it that way, I don't really care, because your definition sucks for a fundamental reason that I outlined before but will outline again: It convinces people they only need to know compiler theory if they are outputting "object code". And that's an absolutely terrible thing for a definition to do. By that standard mine is way better, in the way that a "good" thing is way better than a "bad" thing.

A compiler takes in a stream of symbols and translates it to another stream of symbols. Common compiler techniques include intermediate tree representations, parsing of symbols, various recursion-based optimization passes on the intermediate representation and phases, and various serialization techniques. Probably the most distinctive characteristic of a compiler is the existence of a usually-tree-based (though not always) intermediate representation that is neither the input language nor the output language, though in some cases you can optimize past this. (You won't get good program-size-scaling this way, but if your task is constrained it can be a good choice.)

There exist no software engineering terms that have universally agreed-upon meanings. If you wish to insist that a compiler must maintain semantic meanings, that is a valid choice, but I reject the idea that it is the only choice, and I reject the idea that it's even a good choice. A definition of compiler that focuses on the use of the distinctive techniques of compilers, the parsing, the intermediate representation, the manipulation of the intermediate representation for some effect, the re-serialization to some format, is much more useful for almost any use outside of a classroom. (Which I do not wish to impugn; that is a valid use.) Because, to hammer it home one more time, the techniques you learn in compiler class is useful all over the place and I have concrete experience (as does Stevey) that way way too many people have never learned those techniques. You aren't going to help anybody learn more about these techniques if you insist that the only "compiler" is a radical subset of the things that use compilation techniques. You do nobody any favors by standing on your definition and perpetuating the idea that these are esoteric techniques of limited use. (Note I did not say you believe the idea that these are esoteric techniques of limited use, I said you are perpetuating it.)

Notepad isn't a compiler that I can see; that's a strawman. It's a continuum, not a binary yes/no, but it uses basically no compilation techniques, and I wouldn't be surprised that at least the initial implementation used none at all. The original was limited to 64KB, IIRC, and at that point even on the computers of the era you could just directly manipulate an array. No intermediate representation distinct from the input, no parsing, no serialization, it's not checking any of the checkboxes.

Not to wade too far into an already pedantic dispute...

"A compiler takes in a stream of symbols and translates it to another stream of symbols. Common compiler techniques include intermediate tree representations, parsing of symbols, various recursion-based optimization passes on the intermediate representation and phases, and various serialization techniques."

The second sentence is a good definition. The first sentence is probably the cause of the disagreement -- it's so vague as to possibly describe a C compiler, as well as a Chinese Room, a signal detection algorithm, a kid doing his math homework, etc. I'm not quite sure what you call something that transmutes a stream of symbols into other symbols that is precise, but i feel "things that transmute symbols" covers a very broad class of things that are not compilers as well. Someone doing their chemistry homework, for instance.

I mean, yes, a compiler will transmute symbols, but it's the second sentence is how we distinguish that from, say, a DSP. When you start asserting things like "builds an intermediate structure representing the parse tree", then the definition starts going somewhere useful.

"Necessary but not sufficient," and all that.

I think it is fair to say that our definition of compilers and even languages is different.

I think we also disagree about the point of definitions. I think they're to make communication clearer. Not to motivate people to do something. I think there are better ways of doing that than to obfuscate definitions, which generally have well understood industry and academic meaning.

But I think these are things we'll just have to agree to disagree about... I just hope your definition of "agree" and "disagree" are the same as mine. :-)

He's just going with the academic meanings, where e.g. `regular expressions' describe `regular languages'.

A compiler takes in a stream of symbols and translates it to another stream of symbols. describes a more general class of programs called "translators". Those are interesting, but compilers are required to solve additional problems, such as emitting machine (often virtual) code (sometimes in the form of assembler).

Calling bitmaps a computer language is a bit of a stretch.

I would argue that most of these are parsers and not compilers. While a compiler needs a parser, a compiler and a parser are not the same thing.

That said, even though I took a compilers class in college, I think my education was severely lacking in the value and use of parsers. I've seen so many problems that lend themselves well to some type of parser solved in some asinine, brute-force (often regex) method.

Try using Haskell's parsec. It's a combinatorial parser library. (While that may sound horrifying, it's actually a very sane way to write a parser and easy to understand.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact