Hacker Newsnew | comments | show | ask | jobs | submit login
Show HN: PDF parsing in Racket, my weekend project (racket-lang.org)
101 points by gcr 1210 days ago | 28 comments



This is excellent. Having recently discovered Racket, it has become my language of choice for personal projects. It has been a revelation to me after a lengthy false start with Haskell ultimately led nowhere.

Racket really deserves a lot more recognition and visibility: the language builds on the traditional strengths of Scheme and adds multiple compatible and innovative language dialects, first-rate documentation, extensive libraries, a great IDE and other tools, along with an active and enthusiastic community.

I really hope Racket grows in popularity and we keep seeing more stories like this, as the developers deserve praise for what they have created. And of course, HN is built on it too!

-----


Racket really is an awesome language. When you want a LISP that you can optimize to get near-C performance, go with Common Lisp. When you want a LISP that does concurrency more concisely than Scala or Erlang, go with Clojure. But for literally anything else, you probably want Racket. Don't forget to add its ridiculously awesome continuation-based web server to your above description!

-----


These days Racket is on a similar performance level as many lisps, and it also has three kinds of concurrency for you to choose from.

-----


What kind of difficulties did you encounter when you tried Haskell?

-----


Ultimately, I never reached the point where I could build anything much beyond toy programs. Haskell is a beautiful language full of great ideas, and I certainly learned a huge amount from wrestling with it, but eventually I decided that I wasn't really progressing as I wanted, and most importantly I was getting frustrated rather than having fun (e.g. constantly fighting against the type system and the cryptic error messages it produces).

I don't have a CompSci background, which may have made things more difficult, but I know Haskellites who are computer scientists, and have stronger mathematical backgrounds than I have, and many of those guys told me it took months of effort and several attempts before they were truly comfortable with the language.

In contrast, Racket feels like it's on my side, like it will help me to grow as a programmer rather than forcing me to grow on my own. I was able to start writing useful (but simple) stuff almost immediately, but at the same time there's plenty of scope to learn more sophisticated techniques and approaches later on. There's even a typed Racket dialect (http://docs.racket-lang.org/ts-guide/index.html)!

I may look at Haskell again in the future, but at the moment I'm very happy with Racket.

-----


The phrase "fighting against the type system" usually indicates some misconceptions about the language -- the approach is to program with the type system as a tool to help you get the right code written, sooner. Approach it more as a thorough personal assistant, than a mean boss.

-----


The learning curve of Haskell can indeed be daunting.

It took me about 4-5 months to get comfortable with Haskell, and a couple of years to make it the language I'm most comfortable in.

GHC type errors can also be very frustrating.

But I agree with dons that unless you're doing very advanced "type hackery" at the edges of what it can do, you really aren't "fighting the type system", just fighting the difficult error messages.

I do find that months of investment into a crucial tool such as a programming language is worth it for anyone who careers in programming.

I can testify that with time, the amount of type errors becomes much smaller, and that you start to smell GHC error messages a mile away :) That is, when seeing an error message, the cause is usually immediately clear simply by having been trained to see that same error message many times.

When I do rarely happen to struggle with a type error, it is a real bug, whose debugging in many cases would have been more frustrating in run-time.

-----


The learning curve of Haskell can indeed be daunting.

It took me about 4-5 months to get comfortable with Haskell, and a couple of years to make it the language I'm most comfortable in.

I can see your point and I know that lots of people are doing great things with Haskell, but I simply made the assessment that for me, right now, I don't want to put in hundreds more hours of effort into learning Haskell, just to see if maybe I can finally get to the point of achieving enlightenment and maybe even build something useful one day.

That's why I'm so excited about discovering Racket; at first, I simply thought of it as a stop-gap on the road back to Haskell. But as I've learned more about it, I see that Racket has got so much to offer in its own right that I don't necessarily need to go back at all; I'm achieving enlightenment already!

I do find that months of investment into a crucial tool such as a programming language is worth it for anyone who careers in programming.

I agree; I'm going to put that effort into Racket. I'm a geneticist who codes a lot at work. Perl, R, shell and a bit of C remain my workhorse languages for this, but I've gradually developed an interest in programming more generally, and I'm already building more and more new stuff in Racket while taking a more rigorous approach to the design process.

-----


Have fun with Racket! :-)

-----


As a sort of beginner to functional programming, could you explain to me why you chose a functional language to do this sort of thing? Just for fun? I love it but I can't see the benefits of functional programming.

-----


Belive it or not, this library grew out of a direct need of mine. :)

One thing that I've been using Racket for is to make research posters for conferences. Racket has an excellent library for functional picture/slideshow composition; you can read about that here (which doubles as a great intro to racket in general): http://docs.racket-lang.org/quick/index.html

It's sort of like a "LaTeX for pictures"; where you can say

  (vc-append (square 10) (circle 10))
to have a 10px square sitting on top of a 10px circle (vertical, centered). Once you build your poster this way, you can save it as a PDF. This is geat for having perfectly aligned blocks of text sitting in perfectly spaced colums, for example. It's much better than fiddling with the layout manually in powerpoint.

However, in designing my poster, I have to include PDF figures. Racket didn't include a way of rendering PDFs, so in my last poster, I had to use 600DPI bitmaps of my figures, which was slow and made the file terribly huge. This library binds to libpoppler, which is great because Racket's native pictures are Cairo-backed anyway, and Racket's FFI is top notch (once you can figure it out). Now I can use the usual functional composition to add these PDF figures to the rest of my poster.

-----


It's not a functional programming language. It isn't Haskell or Coq. It's Racket, a derivative of Scheme, which happily supports mutable or immutable state, monads or continuations, imperative, procedural, functional, object-oriented, or logical programming.

-----


It's not a PURE functional language, but it's certainly functional.

-----


That just becomes a debate about the definition of a functional language. The point is that unlike Haskell, SML, and OCaml (the last two aren't "pure" in the sense of syntactically enforcing referential transparency like Haskell, but they strongly encourage a functional design), Racket is first and foremost a LISP. You can use it to write and make use of a lot of functional abstractions, and it certainly is better suited to that style of programming than C or Java is, but Python and Ruby are just about equally suited for that style of programming, and they even have equivalents of map, filter, fold, lambda, etc.

Racket code doesn't look like Haskell, the MLs, or even dynamically typed "functional" languages (like Erlang). It looks like LISP. What drives your coding is not functional abstractions or object-oriented data structures (which it can do equally well) or anything like that. What drives your coding is the fact that syntax itself is a first-class data structure. You have access to the reader. You can write macros that adapt the language to anything you want. You can write a DSL in a few hundred lines that might save you tens of thousands of lines.

Now, Racket is certainly good for functional programming. In fact, some Racket developers prefer the Scheme-style tail recursion method of iteration (via the named let or letrec) to the looping constructs provided in the library, even when for loops would be just as effective. In the same way, not all Common Lisp programmers like the loop macro, and some (e.g. pg) actually use Common Lisp in a style that resembles functional programming. However, don't think of Racket as a functional language. That's as misleading as calling C++ a procedural one, even though you could write all your code C-style without ever using objects. Racket is a LISP, which means it can be adapted to fit virtually any paradigm. Racket is far closer to Common Lisp and Clojure than it is to literally any non-LISP.

-----


All of this sounds like you're one of these people who see parens and run away screaming "LISP!". Yes, the default Racket syntax uses S-expressions, but concluding that it's in some way lumped with Emacs Lisp is extremely wrong. And yes, Racket has syntax that can be tweaked using macros -- but it's a far stretch to go from this to that being the thing that drives all Racket coding. After all, OCaml now has CamlP4 as something that is an integral part of the language -- does that mean that meta-programming is now the thing that drives OCaml coding???

Another point: yes, Racket programmers know and use tail-calls, but that has nothing to do with "the looping constructs provided in the library" since those are implemented in terms of the same facility. The existence of these loops is therefore not making the language any less functional than the fact that you can implement a while loop in Haskell. The bottom line is that Racket is as functional a language as the interpretation of the term was before Haskell kidnapped it and turned it into some religious point.

(BTW, if you want to bash lisps, do yourself a favor and drop the all-caps "LISP" -- it immediately demonstrates the kind of limited knowledge you have on it.)

-----


I strongly disagree. The vast majority of Racket code I've looked at would be far easier to translate into, say, Common Lisp than Haskell (or Erlang, which is probably a more illustrative example since it has dynamic typing). Emacs Lisp is pretty different because it doesn't have lexical scoping and it's only really used in the Emacs runtime environment (although it resembles Common Lisp syntactically), but programs written in Racket, Common Lisp, and Clojure (and other LISP dialects that support lexical scoping and macros) tend to be way more similar to each other than are programs written in Racket and languages we traditionally think of as functional. If you're still not convinced, I'm sure a bit of digging online would reveal way more crossover between Racket and other LISP dialects by the same developers than you get with Haskell or ML developers crossing over to Racket.

As for your parenthetical add-on, I'm not bashing LISPs at all. I think Racket is a beautiful language, in part precisely because it has the power of a LISP dialect. As for your second point, using the capitalized form is the only unambiguous word that refers to the family because many developers in the Common Lisp community use Lisp to mean CL. The reddit style "do yourself a favor" and ill-formed judgments about people's "limited knowledge" are way less constructive than asking "why did you use that spelling." Please keep discussions on HN objective and civil.

-----


Um, when I write code in Typed Racket (and I have a whole course using it), the code tends to be much more similar to ML than to conventional Lisps. When I write code in Lazy Racket, it is somewhat like a dynamically typed version of Haskell, and unsurprisingly not too similar to other Lisps. Same goes for a whole bunch of stuff. Another random example: there's a whole library of functional data structures that is based on Okasaki's book.

In fact, if you want to focus on macros as some kind of a driving force for code -- then it that exact aspect (a) macros in Racket can be very different than macros in other Lisps; (b) more than that, there are many kinds of macros in Racket that you cannot write in those Lisps. As for digging on-line for crossover code from other Lisps that finds its way into Racket: you'll obviously find a lot of Scheme code, but practically nothing from other Lisps. The bottom line is that the syntactic "lots of parens" similarity is an extremely shallow one.

Bottom line: Racket is roughly at the same level of a "functional programming language" as ML etc, certainly more than Python and Ruby where side-effects are embraced much quicker. Like you said, "even have equivalents of map..." -- whereas in Racket these kind of functional/non-destructive operations are expected. (For example, the Racket GC is tuned to perform well when allocating lots of short-term objects, something that is a direct result of FP being the most dominant style.)

And yes, I know that you're not bashing Lisp -- you're just quick to lump all Lisps on the same pile, and reach the obviously bogus conclusion that first-class syntax is the thing that drives code. (That's a point that is subjectively obvious to me, as someone who has been in this part of the PL world for more than two decades.) "LISP" is, BTW, just an outdated spelling, period. It's true that in CL circles "Lisp" is taken as implicitly meaning "Common Lisp", but in the same circles "LISP" is taken implicitly as "an outdated spelling for Lisp, therefore Common Lisp" unless you're one of the old farts whose making a reference to LISP 1.5 or something as ancient.

-----


Aside from the fact that I still don't think that "a Lisp" is a thing, Racket is not bound to s-expr syntax. The racklog/prolog #langs make that very clear.

-----


I'm talking about the racket #lang, not the collection of all languages that the distribution can interpret. R5RS, for example, is included in the distribution, but the R5RS found in the Racket distribution is identical to that of any other Scheme interpreter or compiler, so discussion about its features has nothing to do with discussions specific to Racket. There are also experimental variations of the main racket language like typed racket and lazy racket, but I don't have enough familiarity with them to discuss their classification. I'd assume lazy racket is mainly used in a functional style. But anyway, the main Racket language is very much an s-expression based language (the typed and lazy varieties are as well). What do you mean by not believing in LISP as a classification of languages? Are you saying you don't think s-expression based languages are at least as similar to each other as, say, OOP languages or logic languages are too each other?

-----


What do you mean by not believing in LISP as a classification of languages? Are you saying you don't think s-expression based languages are at least as similar to each other as, say, OOP languages or logic languages are too each other?

I'm saying exactly that. Qi/Shen are basically Haskell with parentheses but because of those parens we call them Lisps. My Scheme->x86 compiler uses an s-expr assembly language as its final IR. That language has about as much in common with Lisp as C does.

-----


Racket also has an Algol mode :-P

-----


It also has a dataflow language, a typed language, a lazy language, etc. Racket is not one language, it's a language family platform (or language laboratory).

Matthias summarized this pretty well[1].

[1] http://www.ccs.neu.edu/home/matthias/Thoughts/Racket_is____....

-----


I found that essay most enjoyable. Thank you for the link, mate! :-)

-----


parsing is absolutely the sort of problem for which functional programming excels. if you think about it, there is no time dependence, no need to respond to external inputs, no concurrent access to mutable data, none of the things that would make pure functional programming a constraint rather than a help. all your are doing is writing one conceptual function of the form output = f(input), and that transformation is in turn made up of smaller transformations that can be written and tested independently, and then composed together to build up the solution.

now it might still be a little harder to do this sort of thing in haskell, which is aggressively pure, simply because some algorithms are pure from the outside but have internal steps that involve mutating data for efficiency. but racket is not a pure functional language; if you need to, say, transform an array in place rather than take an array and return a new one, it will not stand in your way. the difference between racket and, say, java, is not that it enforces functional over imperative programming, but that it makes functional programming a lot easier, and fp has a lot of powerful tools in its toolbox for tackling this class of problem.

-----


Did you try reading Why Functional Programming Matters [1] ?

[1] http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pd...

-----


Thanks for this!

-----


This is great of course, but I'd be more happy to see the actual PDF renderer with a more permissive license than GPL.

-----


I would too, but

- To my knowledge, there are no sane C-based PDF renderers with a permissive BSD-like license

- Poppler renders to Cairo --- a _huge_ win because I don't actually have to do anything to convert Cairo surfaces to Racket's native drawing type

- I don't know / don't have the time to implement my own PDF reader/parser from scratch.

-----




Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: