

Why Awk for AI? (1997) - mooreds
http://www.wra1th.plus.com/awk/awkfri.txt

======
patio11
I worked with Dr. Loui for a year-uh on research projects after loving his
undergrad AI class. He's crazy -- like "A demo to the customer is not working?
Let's edit that in real time then reload, proving why scripting languages are
better." -- but he's a very useful kind of crazy.

I think that Ruby/Python are the spiritual descendants to this argument in
2013, by the way. e.g. A class project was to scrape eBay and predict the
winnin bids on a variety of item classes. (Spoiler: "the current bid"
outperforms most algorithms.). With AWK, you scrape the HTML and do some ugly
parsing. Of course, with Ruby you'd hpricot a single CSS selector and have the
lab 80% complete in ~3 lines.

The single greatest disadvantage to using AWK for serious work is that nobody
but you and Dr. Loui does that so you get to invent everything for yourself
every time. (You probably do not appreciate how much your language ecosystem
bakes in code re-use until you've used a language that assumes essentially all
use is one-off, by the way.)

~~~
JoachimSchipper
You're not wrong about reuse, but note that shell programs make for very
reusable and performant awk libraries. ;-)

------
xaa
I don't think anyone would seriously advocate using awk for complex projects
these days, but the idea of keeping data really close to the OS/shell is a
very powerful one.

Take Python, which is supposedly a "scripting" language, but requires
relatively painful amounts of boilerplate to actually read from or write to
pipes, etc. It doesn't force you to keep everything in Python, but it
certainly nudges you that way. Without naming names, certain statically typed
languages that are obsessed with safety are even worse in this regard.

~~~
cturner
What would you put forward as an alternative to awk for the kinds of complex
projects the author describes?

Sam and vi are both descendants of ed, but go in very different directions. Is
there a path forward from awk that focuses on different strengths to perl's
choices? (perhaps avoiding perl's move to being multi-paradigm)

Is there something that sits close to unix in the way awk does, but which is
stronger?

~~~
UNIXgod
Ruby does awk for the most part. I can't comment on strength since awk has a
small footprint compared to both ruby and perl clocking in just under 96k.
Also both emacs and ed are descendants of TECO =)

The editors have had major impact on our languages mainly because regular
expressions, called regular sets at the time, where implemented in Thompson's
ed which was based on an earlier line editor implementation called qed used on
ctss and multics.

One can follow the evolution almost in dialect fairly well.

ed -> grep -> sed -> awk

ed -> em -> ex -> vi

awk -> perl -> ruby

so

ed -> grep -> sed -> awk -> perl -> ruby

The right tool for the right job when all you know is a chainsaw everything
looks like a hammer unix process are cheap and all that yada yada
philosophical paradigmy finite state automata pipelines vs pointers vs classes
vs recursive enumeration iterative parenthesis backtick mind expansions
expression logic =P

~~~
smalltalk
ed is not a descendant of TECO.

<http://web.mit.edu/kolya/misc/txt/editors>

~~~
smalltalk
Is this factually incorrect? If so, please cite the source that contradicts
the common knowledge that ed descended from QED, developed entirely separate
from the MIT environment that led to EMACS. Which explains why EMACS couldn't
be less like a good UNIX program.

------
xntrk
Does anyone have any examples of AWK vs. some other programming langauge for
AI. It would be interesting to take a look at.

------
stcredzero
Why [language] for [purpose]?

Universal answer: because it's workable, and I'm emotionally invested by now.

~~~
mooreds
And I need to get stuff done, and learning a new [language/framework/shiny
object] isn't always the fastest way of getting stuff done.

It is better to be an expert in a few languages rather than a dilettante in
many.

Of course, the best of all possible worlds is to be an expert in many
languages, but that often requires time that gets in the way of 'getting stuff
done'.

~~~
stcredzero
_> And I need to get stuff done_

That's the experienced version. I was describing the inexperienced one.

------
scotty79
> Jon Bentley found two pearls in GAWK: its regular expressions and its
> associative arrays.

When I encountered AWK I was amazed by associative arrays. It was the first
language I've seen where associative array were so accessible. Then there was
PHP (I think arrays are one of the things that strongly contribute to its
popularity).

Today pretty much every commonly used language has this feature. Often it
seems more mimicry that actual appreciation of this data structure. For
example when other languages creators bring this structure in they tend to
forget about important feature. Ordering. For example python didn't have
standard ordered dictionary type for a long time. Also ruby keeps order of the
items in hash only since 1.9

------
mozboz
Did some of my most enjoyable and productive work in awk and BBC Basic.

Minimise resistance of expressing a translation of a hypothesis from thought
into a computing language at all costs: get onto the highway as fast as
possible.

------
yoklov
On a vaguely related note, Darius Bacon's Lisp-in-awk has always brought a
smile to my face: <https://github.com/darius/awklisp/blob/master/awklisp>

------
sramsay
I know exactly what he means. Most people are surprised to learn that I study
direct methods in the calculus of variations (mostly with Sobolev spaces)
using bc, and then write out my results using ed.

------
reeses
Whew. By volume, most of my big data work is in awk. I hope my secret remains
safe.

------
abraininavat
Really strange that he spits out his last two _surprising philosophical
answers_ and then doesn't explain how the first one pertains to awk at all.

 _First, AI has discovered that brute-force combinatorics, as an approach to
generating intelligent behavior, does not often provide the solution ... A
language that maximizes what the programmer can attempt rather than one that
provides tremendous control over how to attempt it, will be the AI choice in
the end._

Okay. And... awk has this quality? What can I do in awk but not in C or a
lisp? In what way does programming in awk lead you toward less brute-force
solutions than any other language? He doesn't support this in any way at all.

~~~
Millennium
There's not much you can do in Awk that you can't do elsewhere: Turing-
completeness and all that. The reverse also applies, assuming you have an
appropriate set of bindings.

The thing to understand about awk is that it's basically a DSL. It's optimized
specifically for crunching data contained in line-oriented text files, and
within this niche, it is awesome. Since line-oriented text files are used for
just about everything in Unix, awk is an especially useful tool there. But
once you stray from awk's niche, things start to get awkward, and the further
you go, the tougher it gets.

Perl was written to be an awk-killer, and it didn't accomplish that by being
better within awk's niche. It did it by not being a DSL: it's still "good
enough" for the sorts of work that awk really excels at, but it works much
better for just about everything else.

~~~
kenko
"There's not much you can do in Awk that you can't do elsewhere: Turing-
completeness and all that."

The fact that Awk is Turing-complete has nothing to do with the fact (if it is
a fact) that there's not much you can do in it that you can't do elsewhere.

The SKI combinator calculus is Turing-complete, but you can't read a CSV file
with it.

~~~
throwaway1980
> The SKI combinator calculus is Turing-complete, but you can't read a CSV
> file with it.

Of course you can, it's just a matter of input handling. For that matter you
can read a CSV file with any Turing machine. It's just easier elsewhere.

~~~
kenko
Well, I must be very confused, then, because I don't see where the input comes
in given just the s, k and i combinators. (The Jot programming language, which
can be translated into SKI, for instance, doesn't do input or output. You want
the Zot variant for that.) How would one write a program, using just s, k, i,
and application, that takes the name of a file on the command line, opens it
and reads it in, then prints the lines in reverse order to stdout?

~~~
Millennium
If you really wanted to do pure SKI, you'd have to do more than just a
program: you'd have to implement the whole machine and the OS running on it.
You could do that, but as you might imagine, it's quite a lot of work.

That said, keep in mind that most likely, you'd want to start implementing
levels of abstraction pretty early on. The fact that SKI is Turing-equivalent
means that you can implement a Turing machine (or anything else that is
Turing-equivalent) in it. Build your favorite abstraction, and then implement
your machine and OS the way you would using that abstraction. It's still SKI
underneath, so you're golden.

~~~
kenko
Maybe so, but that completely gives the lie to the idea that two languages are
equipotent if they're both Turing complete. The sorts of abstractions you'd
need to implement go far beyond what's required for Turing completeness

Unlambda, for instance, is Turing complete, and moreover, it can do I/O. An
Unlambda program is nevertheless incapable of opening files or doing different
things depending on its command-line arguments. You can write cat (the version
of cat that _just_ echoes stdin to stdout) in Unlambda, but not ls.

You might be able to write an Unlambda-based operating system in which all the
various sorts of input events that an OS needs to respond to are represented
as elements in its input stream (or, even better, an OS in Lazy-K).

But when you've got that OS up and running, Unlambda programs running _on it_
still won't be able to open files. (Frankly I'd be surprised if the
"abstractions" necessary to get something like that up and running weren't
essentially an interpreter written in _another_ language dealing with the
encoding and decoding of input and output to your Unlambda/Lazy-K program,
rather than abstractions written _in_ Unlambda/Lazy-K. (Consider that the
numbers that Lazy-K outputs are church encoded and must be converted by the
Lazy-K _interpreter_ into C-like integers before characters can be output to
stdout.) This isn't really important, though.)

Consider also this final note from the Lazy-K page:

"Remove output entirely. You still have a Turing-complete language this way,
and it is if anything more elegant. But, as with the equally elegant SMETANA,
you can't do anything with it except stare at it in admiration, and the
novelty of that wears off after a few minutes."

That's not really true, of course: there are other things you can do, like
increase the temperature of your processor. Not many other things, though.

~~~
throwaway1980
I'm sure you've heard this before, but all you have to do is ask whether you
can build a Turing machine in whatever language. If you can do this, then you
can compute answers to the same things computed by any other Turing complete
language.

This says nothing about practicality, but nobody ever said it did. Of course
practicality is important, but a conversation about Turing completeness is a
conversation about "can compute", not "can easily compute". If you look at
venues such as POPL, ESOP, and PLDI, fairly often you will find proofs for
some abstract representation that is then implemented in a real world
language. Thus while it would be impractical to compute something in the
abstract form, it is often more elegant for proof construction, and then proof
results are transferable if you can demonstrate bisimulation between the two
forms. All this to say that "can compute" is nevertheless an important
determination with respect to equipotency.

If you had an Unlambda OS (or VM is better perhaps), then anything running on
it would be an Unlambda program, including C programs, just as anything
running on an x86 machine is an x86 program.

~~~
kenko
"I'm sure you've heard this before, but all you have to do is ask whether you
can build a Turing machine in whatever language. If you can do this, then you
can compute answers to the same things computed by any other Turing complete
language."

But you can't _do_ the same things that you can _do_ in other languages.

Computation is pure.

~~~
throwaway1980
I have never encountered this distinction between activity and computation
before. Assuming that you are okay to define (sequential) computation as
transforming an input sequence of 1's and 0's into an output sequence of 1's
and 0's, can you give me an example of something that a computer does that is
not a computation and explain why?

