Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I designed a language for code golf, compiling to Common Lisp (rhoscript.com)
102 points by n_c on Aug 25, 2013 | hide | past | favorite | 40 comments



Nice implementation of a concatenative language, and the relatively robust standard library it already has is nice. Is there anything different about its execution model from languages like Factor or Joy?

As far as being purpose-built for code golf goes, there are already even more compact concatenative languages, like F[1] and XY[2], that might give this a run for its money due to K's obvious influence. Then again, it wouldn't be hard to have a rewrite layer to turn mnemonic operations into their expanded rhoScript words :)

[1]: http://nsl.com/k/f/f.htm [2]: http://nsl.com/k/xy/xy.htm


The execution model of rhoScript isn't so different from that of any other concatenative language. There are two differences I can think of at the moment:

- The lazyness of lists. "1 naturals (add) map" will produce an infinite list of integers from 1. This introduces some interesting difficulties. For example, what would you expect "1 naturals (add) map 2 swap 5 take" to produce? It will clearly either be (1 2 3 4 5) or (2 3 4 5 6). I pick the first because it produces less surprising results. So, when map is called, it saves the state of the stack, so it doesn't end up seeing the 2 pushed on the stack.

- I added lexically-scoped "arguments" to functions because I found them useful. The commands "arg-[a-d]" will "remember" the top four values on the stack, even if the lazyness of the language changes the top four values on the stack by the time it reaches these commands.

I guess one other difference is that functions here are typed, but that's not really a fundamental difference with other concatenative languages.

As for F and XY, I haven't looked in to them much. A cursory glance, however, leads me to believe that when pushed to their limits, I'd expect the arithmetic-encoded programs to be significantly shorter. The 22 different tokens in the 8 queens program condense in to 17 bytes. (And, this assumes that all functions of a valid type are equally likely to be called -- I'd expect most programs to get shorter once I spend some time figuring out reasonable weights for the encoder). That said, I definitely see some things I'm going to steal from those languages.


How are you doing the concatenation typing? Is it like Cat's type system?

http://cat-language.com/


It's very similar. There are major differences with how functions are typed, however. Namely, in Cat, they are, and in this, they are not. There's a good reason for that, though!

Imagine you write the program "(add print) call". Clearly that function takes two integer arguments. Right?

Well, that's only because you're looking at the high-level version. Compile it down and you get AE 1D B0 88. If you run it on two integers, you get the sum, as expected.

But what happens if you run it with a list on the stack? Then it happens to run the program "(arg-min arg-a) call". Why? The arithmetic-decoder just sees some bytes, and it decodes them to match the correct types.

Representing the actual type of functions would require more bits, make programs longer, and so I haven't implemented that. I'm still pondering things to do here.


From a Haskell background I want to suggest typeclass mechanisms to enable valid polymorphism types. The compiled form can be the same, but the classes help to note that the initial syntax version has that kind of polymorphic effect.


This is the first time I've seen dynamic typing arising from simple systems as opposed to making the implementation more complex.

Beautiful!


IMO a types stack languages does not make much sense. If the stack side effect is typed a function does alway have to return the same number of arguments and consume the same number. That IMO takes the by far biggest advantage of postfix languages away.


Dynamic stack effects are pretty hard to reason about in nontrivial programs. Even Factor, which is largely dynamic wrt typing and dispatch, has a very limited number of words with dynamic stack effect.

Also, static types don’t necessarily preclude dynamic stack effects, as long as those effects can be characterised somehow by the type system. In my statically typed concatenative language, Kitten, I originally considered a notion of “regular” types that would let you do something like this:

    { :: r -> r End
    } :: r End a* -> r [a]

    { 1 2 3 } :: [Int]
In practice that proved to be too much of an implementation headache for not much benefit. Fixed arities also let you avoid sentinel values (as above) and parentheses (as in Lisps).


Factor's typing discipline is largely dynamic. However, they enforce the usage of annotations for higher order operations (recursive and inline) and strongly encourage the use of stack effect annotations. A stack checker, which is effectively a simple type system, proved pretty useful for debugging & optimizing.

See: http://docs.factorcode.org/content/article-inference.html


I do not have many experience with postfix languages. I mostly use RPL for simple ad-hoc programs, and I dynamic stack effects all the time. Mostly for logging and for error messages. In the "real world" one would not do that, so maybe I'm overestimating it.


Factor supports both dynamic and statically-unbalanced stack effects, but the former risks destabilizing the runtime and the later requires that defunctionalization (read: inlining) ultimately resolve to a static, balanced stack effect. It's all documented quite well in subpages of that article I linked you to.

In practice, all of the unbalanced stack-effects will become trivially balanced at word boundaries & dynamic stack-effects only occur when metaprogramming.


1. I am new to the concept of golf code and stack based languages, but I find the concept, and the implementation, quite compelling. Thank you for doing this - it is thought-provoking.

2. I was unable to get programs to run via http://www.rhoscript.com/try.html. For example:

   >>> "hello" print run
   Sending to server.
   Received response in 237ms.

   >>>
I would expect the output to be "hello" but instead get what I assume are 3 newlines.

3. There is a small error in the text of your tutorial, at rhoscript.com/tutorial.html:

    s/split-by-newlines results in a string with four elements/split-by-spaces results in a string with four elements/


re: the try.html page: don't include the "run" word. That's just a button I put on the pages to send you there automatically.

    "hello" print


Ah, ok. That answers another question I had which was why you added a run keyword at all when it was so repetitive! I'd left out the run keyword before, but was confronted with the LISP output - which I assumed was something like an "explain plan" on a database, and I didn't read the stack dump closely enough.

Another issue: I wanted to see the output of just range. So I typed in `8 range` and got no output. Then I looked through the examples for something simple that might output a list. I tried the primes example, `100 range-from-1 rest (gcd 1 neq) uniq-by` but got blank output. Thoughts?


Okay, I had an issue with www.rhoscript.com and rhoscript.com causing XHR errors.


I have always thought that a language optimized for code golf, like this one, would be ideal to use for evolving programs. Use genetic algorithms to modify possible program sources (make sure your ways of combining programs include concatenating snippets of them) and evaluate the result to see (within the limits imposed by the halting problem which make it impossible to be certain whether it achieves the desired goal. It would be a way (probably quite inefficient) of writing programs for any task where we can measure whether the task is at least partially performed.

Have you ever considered anything like this?


I submitted a link to this yesterday, but realized I misconfigured nginx/sbcl and so the "try" page ended up almost crashing the server, so I took it the link down a few minutes later. I'm resubmitting now.

If anyone has feedback let me know!


This isn't really substantive feedback about the language, but this is a magnificent bit of fun, even if I can't figure out any time I'd want to use it :)


Is there support for pretty-printed input?

I'm hesitant to mention this since it doesn't really take away from your project, but "concatinatitive" should be spelled "concatenative".


Not yet, that's definitely a priority though.

Well that's embarrassing! (fixed)


I'm a big fan of my HP-28 calc and it's RPL. The programming language might seem odd, but on a device with (soft) keys that map to commands it makes sense.

I was thinking about building my own physical calculator (i don't like to use smart phones for that, there is not nearly enough space for keys, and the keys give not appropriate feedback). I was looking for languages to use and I didn't find anything suitable. Forth was too raw. I didn't like factor, it seem unsuitable for a calculator. OpenRPL was, i don't know why, odd. This language, once it add other number types, and allow to omit the map, APL style, looks very promising.

But writing a whole calculator software, with the menus and integrating the programming language, ability to plot functions, inspect variables and so on, was not a weekend project as first thought because the language has to be integrated deeply.


How does it compare to APL/K+/J for some common problems?


(By "common problems" I assume you mean "common codegolf problems" and not "common issues programmers encounter using it" -- if you do mean the latter let me know.)

The best compilation of code-golfing problems I've seen is Anarchy Golf [1]. Pick almost any problem you want, implement it in rhoScript naively, and you'll get a much shorter solution. (Except those problems which use things not currently present, anything like, say, floating point numbers.)

If you look at some of the examples I have compared to the programs others have written, you'll get a brief sense. My eight-queens is 17 bytes, the shortest on Anarchy Golf [2] is 42. A solution to the 196 algorithm problem [3] in rhoScript takes 16 bytes compared to 24. But that's just cherrypicking problems I found interesting. I haven't yet done a real random sample to see how well it does.

Note, however, that the programs you write in rhoScript don't yet format the output to conform correctly, so they may require a few more bytes for that.

    [1] http://golf.shinh.org/
    [2] http://golf.shinh.org/p.rb?N+Queens
    [3] http://golf.shinh.org/p.rb?196+algorithm


I'd like to see a true apples to apples comparison (you can also save characters in J if you simplify the output requirements)


Yeah, I hope at some point to do that. I expect only a few bytes need to be added to make the outputs be exactly identical. Almost all code-golf programs will be correct output rule "if a 1-d list is on the top of the stack dump it joining on '\n', if a 2-d list is on the top of the stack dump it joining on ' ' first and then on '\n'.


K notation is a bit longer: generalized n queen with board display is

qn:{[n],/{:[n=#x;,x;,/_f'f x,\:/:(!n)_dvl,/x]}'(f:0 -1 1+/:)@!n} bd:{[p]`0:".Q"p=\:!#p}

(see http://nsl.com/papers/qn.htm)

I suspect that a binary length-optimized encoding of K will be comparably short.

And take into account that K was not designed for code golf - it's just a side effect of optimizing for orthogonality.


> qn:{[n],/{:[n=#x;,x;,/_f'f x,\:/:(!n)_dvl,/x]}'(f:0 -1 1+/:)@!n} bd:{[p]`0:".Q"p=\:!#p}

One of the reasons golf never interested me is because people kept making these totally inscrutable concoctions that you couldn't even take a "squinty eyed look" at. Contrast this with the rhoscript solution:

   8 range permutations (with-index (sum) map uniq arg-a with-index (*exploding subtract) map uniq concatenate length) keep-maxes-by
Now, overall it's still not exactly straight-forward, but you can at least understand pieces of it right away, and start inferring things. For example, '8 range' probably constructs a range of integers (it does). The other stuff can be broken apart and tested.

By contrast that piece of K code seems actively hostile to understanding it.


It's just notation. And much like math, it is very powerful, and not effectively verbalized. But just like math, it takes getting used to - and nearly impossible to go back from once you grok it.


I'm a bit rusty but I think if you're going to include the board display function on the same line you'll need to separate with a semi-colon.

I'm also going to point out that whilst it looks pretty cool (and it runs very fast) it's a frikkin nightmare trying to debug someone else's K-code in production. Which is why I don't recommend that anyone learn it.


HN formatting got the best of me... It was two lines (you can follow the link I gave to see the original version)

I'be found I can easily read Arthur's and Stevan's code. K takes a little more discipline (or talent) to be readable than other languages.


I made a translation visualisation example (using the given 8 queens example) at http://thinkinghard.com/correspondence/rhoscript/index.html.


This reminds me of GolfScript [1], another language designed specifically for code golf.

[1]: http://www.golfscript.com/golfscript/


>The last of the unimportant reasons is that we had never written a serious project in common lisp and wanted to get better at it.

How was your experience with coding in CL?


I found CL really nice. One nice feature about SBCL is it can be incredibly fast. Sure, not as fast as raw C, but it approaches it. (This is useful when you're using CL as your IR.)

The macros were the big win for me with this. I have only a few macros, but without them, the code would be significantly uglier. The entire builtin-generator is a big giant macro, and it was very nice to define my own little DSL to build commands. (fun story: I originally started developing this as a Python interpreter, and abandoned it because (a) it was far too slow, and (b) it was difficult to build a clean, abstract, method of writing builtins)

Other than the macros, the other big win was how easy it is to generate lisp code. There's no string munging, just (cons 'add cmds) and you're done.

There are some things I don't like about it. I think Scheme has it right with functions and variables sharing one namespace. I find the common lisp

    (defun something (a)
        (map #'square (funcall a)))
much uglier than the scheme equivalent of

    (define (something fn)
        (map square (fn)))
but that's purely cosmetic.


I am not the author, but I write most of my grad-student research-project-quality code in CL. Here are some things I have experienced so far:

1. Pure dynamic typing is a headache. I use type hints quite frequently, tend to use defstruct rather than cons, etc. SBCL has a pretty good type inference system and gives very useful compile-time feedback about types.

2. Macros can help improve organization and code re-use when used properly. Macros can also turn code into an unreadable mess when used improperly. The only way to learn what is "proper" is experience / practice; books like On Lisp and Let over Lambda help but you really do need to get your hands dirty.

3. Debugging across an FFI is really painful. The debugging tools for Lisp are fantastic, the debugging tools for C are fantastic, but those tools compose poorly.

4. I wish the standard allowed map, reduce, and similar constructs to be defined for arbitrary sequence types. There is more to life than lists and arrays.

5. I wish the standard came with more useful data structures. Associative lists in CL are great if you never have more than a tiny number of items in them; an associative tree type would be a great addition. It would also be nice to have a standard priority queue type, a standard random-access list type, etc.

6. Performance is easy to underestimate. SBCL does a bang-up job of optimizing code, with a few exceptions. The optimizer in SBCL could use some work, of course, but it is nowhere near as bad as people seem to expect for a high-level language.

Overall, I think what CL is desperately in need of is an updated standard. The world has changed a lot since CLtL2; we need a CLtL3. There are a lot of non-standard libraries that are widely used and that should just be standardized -- MOP, generic sequences, an FFI, etc.


> The debugging tools for Lisp are fantastic.

What is your setup for debugging CL?


Does this go against the spirit of code golf?


No. Code golfers love bending rules. See http://esolangs.org/wiki/HQ9%2B


Rho symbol isn't loading for me in Chrome, v.29


That is probably because you have no font installed with a rho in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: