
TXR: A Programming Language for Convenient Data Munging - hashx
http://www.nongnu.org/txr
======
rout39574
I wish their page included something along the lines of "Why do I care?"

Maybe a few examples of "data munging" tasks which the authors view as poor
fits for [language X] and how their stuff solves the problem better.

Maybe something like "why is our language better than regexps in whatever
language environment you already know?"

~~~
kazinator
There is page with a navigation frame giving Rosetta Code examples, syntax
colored, with back links to Rosetta:

[http://www.nongnu.org/txr/rosetta-
solutions.html](http://www.nongnu.org/txr/rosetta-solutions.html)

TXR has regexps is you need them. The regex engine is geared in a different
direction from mainstream regex: it doesn't have anchoring, register capture
or Perl features like lookbehind assertions. On the other hand it has
intersection and negation (without backtracking).

TXR translations of Clojure, Common Lisp and Racket solutions to the same
problem:

[http://www.nongnu.org/txr/rosetta-solutions-
main.html#Self-r...](http://www.nongnu.org/txr/rosetta-solutions-
main.html#Self-referential%20sequence)

~~~
rout39574
I saw those; what I miss is "This is why I think this new way is better".

If it's supposed to be obvious by inspection, well... I guess I'm too
unenlightened.

------
nieve
TXR looks rather like the CRM114 language that's been used to implement some
rather amazingly accurate text classifiers (some better than most people on
their own mail), though a bit less bizarre and I think more accessible:
[http://crm114.sourceforge.net/docs/INTRO.txt](http://crm114.sourceforge.net/docs/INTRO.txt)
CRM114 too treats pattern matching as the fundamental construct and has
blazing performance for it and certain kinds of number crunching (it has to),
but I don't think it's nearly as useful for the average hacker trying to munge
a couple of text files. Still, worth a look both to users and possibly to
language implementors. I'm definit

------
spullara
Reminds of a trick I do with mustache.java. Templates can not only be used to
generate output, but because of the declarative nature of the mustache
language they can be used to parse output back into data that in combination
with the template would generate that output. Makes for pretty intuitive
parsers. In my case all text that isn't templating declarations are regexes.

~~~
vdm
[https://github.com/spullara/mustache.java/search?p=1&q=inver...](https://github.com/spullara/mustache.java/search?p=1&q=invert&utf8=%E2%9C%93)

------
danso
As I come to see more of my data-related work be consumed by data
munging/cleaning work, I'm convinced that a language/framework devoted to data
munging is at least as important as those devoted to data visualization.

------
slackstation
It looks ugly and akward to type. It's doesn't seem like it would be a
pleasure in which to write programs.

~~~
AnkhMorporkian
It's a very ugly language, I don't think anyone is going to disagree there.

That being said, it has some intriguing features that I'm not going to
dismiss. I work with COBOL on a daily basis, so I'm not going to say no to a
new language just because it's ugly. There seems to be a lot of utility here.

~~~
hawkw
Purely out of curiosity: what is it that you do that forces you to work with
COBOL on a daily basis?

~~~
AnkhMorporkian
I mostly do legacy code conversion for the banking sector. It's all contract
work, so it varies, but 95% of the time that's my deal.

------
bane
Cool ideas. I really like that it has support for grammars. What's the
performance like compared to Perl on similar tasks?

------
aurelius
Kaz Kylheku is one of the kooks from comp.lang.lisp where lisp is the One True
Language. The funny thing is that TXR is written in C!

Kaz: How come you didn't write TXR in lisp?

~~~
kazinator
Because I'm also one of the kooks from comp.lang.c where C is the One True
Language.

But seriously, TXR is built on its own Lisp: an infrastructure which provides
the managed environment and data representations which also support the TXR
Lisp dialect.

This is no different from any Lisp implementation based on a C kernel, like
CLISP, GNU Emacs, ...

If you do it from scratch, you lose a lot: you don't have a mature, optimized
dynamic language implementation. But, by the same token, you can experiment in
ways that you normally wouldn't. You get to dictate things like, oh, what is a
cons cell. I have lazy conses that look like ordinary conses: they satisfy
consp, and work with car, cdr, rplaca and rplacd. You can invent new
evaluation rules. I came up with a way to have Lisp-1 and Lisp-2 in a single
dialect, seamlessly, with the conveniences of both. I have Python-like array
access. I made traditional Lisp list operations work with vectors and strings:
you can mapcar through a string and so on. Sequences and hashes are functions.
For instance orf is a combinator that combines functions analogously to the
Lisp or operator. If hash1 and hash2 are hash tables, you can do something
like [orf hash1 hash2 func] to create a function which takes one one-argument
that will look that argument in hash1; then if that returns nil, it will try
hash2, and if that returns nil, it will pass the key to func and return
whatever that returns. Or ["abc" 1] returns the character #\b. [mapcar "abc"
'(2 0 1)] yields "cab": the numeric indices are mapped through "abc", as if it
were a index to character function. Fun things like this are good reasons to
experiment with your dialect.

I believe TXR is a great companion if you're a Lisper working in ... one of
those other environments.

Ah, one more thing. Well, two, or maybe three. Part of why I used C was to
create a project whose tidy, clean internals stand in stark contrast to some
of popular written-in-C scripting languages. You know, to sock it to them!
See, there is a hidden agenda: the call of "I can do this better". If you use
C, then a more direct comparison is possible. Secondly, people widely
understand C. Give them a cleanly written project in C, and maybe they will
hack on it, and from there understand something about Lisp too. C means low
dependencies from the point of view of packaging: easy porting with just basic
shell environment with make and a C compiler. Cross-compiling for ARM or
whatever is a piece of cake. Easy work for package maintainers, ...

~~~
aurelius
I don't buy it.

TXR is not built "on its own Lisp", it's built on C. If you believe that lisp
is so great, then why didn't you just use ANSI Common Lisp? Why is TXR even
necessary when I can do all the same data processing stuff in Perl, which is
far more versatile and ubiquitous?

And all this nonsense about writing TXR in C because it's "more widely
understood", "low dependencies", "easily packaged" \- after 15-some years of
advocacy in comp.lang.lisp, it's laughable that defsystem, asdf, and
SBCL/CLISP/CMUCL aren't good enough for you.

Lisp is either as good as all the Naggums, Tiltons, and Pitmans of c.l.l.
proclaim, or it's not. By writing TXR in C, you've just proved that it's not.

~~~
lispm
Oh shit.

SBCL's runtime contains traces of C.
[https://github.com/sbcl/sbcl/tree/master/src/runtime](https://github.com/sbcl/sbcl/tree/master/src/runtime)

CLISP is written on top of C.

CMUCL's runtime contains traces of C.

Now we are fucked...

I'm so glad that at least my Lisp Machine has no C. Oh wait, it has a C
compiler...

~~~
aurelius
The point is that lisp advocates rarely seem to use any of these lisp
implementations to do anything noteworthy or useful. They always seem to fall
back on C, or some other language that's more "widely available" or "has
minimal dependencies" or "has more potential contributors" or "can be more
easily compared with other similar programs".

I find this hypocrisy to be quite intriguing.

~~~
lispm
> The point is that lisp advocates rarely seem to use any of these lisp
> implementations to do anything noteworthy or useful.

That's possible. There are many Lisp dialects and implementations which have
few applications. That's true for a lot of other language implementations,
too. There are literally thousands implementations of various programming
languages with very few actual applications. Maybe it is fun to implement your
own language from the ground up. Nothing which interest me, but it does not
bother me.

If he wants to implement a small new Lisp dialect its perfectly fine to
implement it in C or similar.

> They always seem to fall back on C, or some other language that's more
> "widely available" or "has minimal dependencies" or "has more potential
> contributors" or "can be more easily compared with other similar programs".

Some new dialect is written with the help of C? That bothers you?

Wow.

Actually 95% of all Lisp systems contain traces of C and some are deeply
integrated in C or on top of C (CLISP, ECL, GCL, CLICC, MOCL, dozens of Scheme
implementations and various other Lisp dialects). There are various books
about implementing Lisp in C.

Really nobody in the Lisp community loses any sleep that somebody implements
parts of Lisp in C.

> I find this hypocrisy to be quite intriguing.

Because some random guys implement their own language in C? Why do we have
Python, Ruby, Rebol? There was already PERL or AWK or ... Somebody decided to
write their own scripting language. So what?

~~~
aurelius
> Because some random guys implement their own language in C? Why do we have
> Python, Ruby, Rebol? There was already PERL or AWK or ... Somebody decided
> to write their own scripting language. So what?

When a Python advocate wants to do some data processing, do they first write
their own Python implementation in C? No. When a Ruby advocate wants to make a
Rails website, do they first write their own implementation of Ruby in C? No.

Several fine implementations of lisp already exist that compile down to
machine code and, if the lisp community is to believed, have performance
"close to C". So why does a lisp advocate feel the need to re-write lisp in C
for a project that didn't actually need it? The lisp community would have us
all believe that lisp is the "programmable programming language", and all the
other rhetoric about how every other language has just stolen ideas from lisp,
etc., etc.. They all truly seem to believe that lisp is something special.
That's why I find it laughable that someone like Kaz Kylheku, a 15 year
veteran of comp.lang.lisp, decided not to implement TXR by using a pre-
existing lisp implementation.

~~~
lispm
> When a Python advocate wants to do some data processing, do they first write
> their own Python implementation in C?

They write it in C. Checkout the Python world sometimes.

* CrossTwine Linker - a combination of CPython and an add-on library offering improved performance (currently proprietary)

* unladen-swallow - "an optimization branch of CPython, intended to be fully compatible and significantly faster", originally considered for merging with CPython

* IronPython - Python in C# for the Common Language Runtime (CLR/.NET) and the FePy project's IronPython Community Edition

* 2c-python - a static Python-to-C compiler, apparently translating CPython bytecode to C

* Nuitka - a Python-to-C++ compiler using libpython at run-time, attempting some compile-time and run-time optimisations. Interacts with CPython runtime.

* Shed Skin - a Python-to-C++ compiler, restricted to an implicitly statically typed subset of the language for which it can automatically infer efficient types through whole program analysis

* unPython - a Python to C compiler using type annotations

* Nimrod - statically typed, compiles to C, features parameterised types, macros, and so on

and so on...

> So why does a lisp advocate feel the need to re-write lisp in C for a
> project that didn't actually need it? The lisp community would have us all
> believe that lisp is the "programmable programming language"

Why don't you understand the difference between 'a lisp advocate' and 'the
lisp community'?

> nd all the other rhetoric about how every other language has just stolen
> ideas from lisp, etc., etc..

Nonsense.

> That's why I find it laughable that someone like Kaz Kylheku, a 15 year
> veteran of comp.lang.lisp, decided not to implement TXR by using a pre-
> existing lisp implementation.

I find it laughable that you find it laughable...

~~~
aurelius
Every single Python project you stated simply proves my point. They are Python
compilers of some sort. TXR, on the other hand, is a data processing language
implemented in its own lisp which is implemented in C. In other words, TXR is
an application of lisp, not just a compiler or interpreter like those Python
projects you listed. So, all your examples are irrelevant.

TXR didn't need its own dialect of lisp. So, the question remains: why didn't
Kaz use SBCL or CLISP? They're good enough for c.l.l. kooks like him to
recommend to everyone else, but why're they not good enough for him to use?

~~~
kazinator
The kook here is you, and I can prove it: you have a bizarre view that
developers should be divided into political parties based on programming
language, and code strictly to the party lines. Bizarre views make the kook.

TXR _does_ need its own dialect of Lisp because Common Lisp isn't suitable for
slick data munging: not "out of the box", without layering your own tools on
top of it.

This is a separate question from what TXR is written in. Even if TXR were
written using SBCL, it would still have that dialect; it wouldn't just expose
Common Lisp.

That dialect is sufficiently incompatible that it would still require writing
a reader and printer from scratch, and a complete code walker to implement the
evaluation rules of the dialect. Not to mention a reimplementation of most of
the library. The dialect has two kinds of cons cells, so we couldn't use the
host implementation's functions that understand only one kind of cons cell.
So, whereas some things in TXR Lisp could be syntactic sugar on top of Common
Lisp, with others it is not so.

Using SBCL would have many advantages in spite of all this, but it would also
reduce many opportunities for me to do various low-level things from scratch.
I don't have to justify to anyone that I feel like make a garbage collector or
regex engine from scratch.

So, the reasons for not using "SBCL" have nothing to do with "good enough".
It's simply about "not mine".

TXR _is_ a form of Lisp advocacy.

TXR is also (modest) Lisp research; for instance I discovered a clean,
workable way to have Lisp-1 and Lisp-2 in the same dialect, so any Lispers who
are paying attention can stop squabbling over that once and for all.

It pays to read this:

[http://www.dreamsongs.com/Files/HOPL2-Uncut.pdf](http://www.dreamsongs.com/Files/HOPL2-Uncut.pdf)

Why we have Lisp today with all the features we take for granted is that there
was a golden era of experimentation involving different groups working in
different locations on their own dialects. For example, the MacLisp people
hacked on MacLisp, and it wasn't because Interlisp wasn't good enough for
them. Or vice versa.

That experimentation should continue.

~~~
aurelius
> So, the reasons for not using "SBCL" have nothing to do with "good enough".
> It's simply about "not mine".

Kaz, the C programming language isn't yours either. My point is that Common
Lisp is supposed to be a general purpose programming language with power far
greater than a primitive language like C, but you chose to implement TXR in C
simply because C makes it much easier for you to accomplish your goal than
Common Lisp. I'm just trying to point out the obvious, which nobody from
c.l.l. seems willing to admit.

~~~
lispm
Bizarre.

------
hyp0
At first I thought this was TXL, for source code transformation
[http://www.txl.ca/](http://www.txl.ca/)

------
stdbrouw
Hmm, this looks more like parsing than munging to me, but then I guess
"munging" is not exactly scientific terminology.

My own take on easy data transformations, if you'll allow me the plug:
[https://github.com/stdbrouw/refract](https://github.com/stdbrouw/refract)

