
Implementing a programming language in D: Lexical Analysis - felixangell1024
http://blog.felixangell.com/implementing-a-programming-language-in-d-part-1
======
landr0id
Wow, perfect timing! I'm currently writing my first lexer/parser in D as well
([http://github.com/landaire/hsc](http://github.com/landaire/hsc)) for a
Scheme-like language used in Halo 3. I found Rob Pike's talk on lexical
scanning to be pretty useful
([https://www.youtube.com/watch?v=HxaD_trXwRE](https://www.youtube.com/watch?v=HxaD_trXwRE))
and I've modeled mine pretty heavily after the text/template lexer:
[https://golang.org/src/text/template/parse/lex.go](https://golang.org/src/text/template/parse/lex.go).

------
dimgl
I've never seen the D language before and I have to admit, it looks very
elegant and has piqued my interest. Thanks for your post.

I'm wondering why you used D rather than Rust? Was it just that you were
curious about the D language or is there something about Rust that you don't
like?

~~~
thinkpad20
One advantage I could see of Rust over D for language stuff is sum types
(enums in Rust). So for example you can write:

    
    
        enum MyLanguage {
          Var(String),
          Int(i32),
          Sum(Box<MyLang>, Box<MyLang>),
          Let(String, Box<MyLang>, Box<MyLang>)
        }
    

(Apologies if I got a few things wrong; I just mean the general idea)

Seems like without a construct like this, you'd have to use subclasses or
something similar, which (to me) isn't quite as nice.

~~~
p0nce
Like others said, this can be done as a library type in D:
[http://p0nce.github.io/d-idioms/#Recursive-Sum-Type-with-
mat...](http://p0nce.github.io/d-idioms/#Recursive-Sum-Type-with-matching)

edit: not that it proves anything, just that "it can eventually be done".

~~~
tomjakubowski
Can you name the variants to distinguish two variants which "carry" the same
types? It wouldn't be as useful if you can't express something like:

    
    
        enum ConnState {
            Disconnected,
            Connecting,
            Connected(net::TcpSocket),
            Transferring(net::TcpSocket),
         }

~~~
jeremiep
I would probably do it like this in D (disclaimer: I haven't tested this
code!)

    
    
      import std.typecons;
      import std.socket;
      import std.variant;
    
      struct Disconnected {}
      struct Connecting {}
    
      alias Connected = Typedef!(TcpSocket, null, "connected");
      alias Transferring = Typedef!(TcpSocket, null, "transferring");
    
      alias ConnState = Algebraic!(Disconnected, Connecting, Connected, Transferring);
    

You could use plain structs instead of the typedef template for the same
result.

------
ksherlock
Another option is to use a lexer/state machine generator. Ragel support D and
Rust (as of the not-quite-released v7) as target languages.

------
nunull
Nice read! I'd really like to know when we can expect the next post to be
published.

~~~
felixangell1024
Thanks :) Second post is in the works, will probably be out next week when I
have time around college :)

