
Reading Files in Clojure - twampss
http://lethain.com/entry/2009/nov/15/reading-file-in-clojure/
======
yason
The difficult thing about file I/O in Clojure is that there are not many
Clojure primitives for doing it, except for slurp and duck-streams (which is
only in contrib): you actually have to learn a bit about Java and its
interfaces and libraries. For example, line-seq takes a BufferedReader.

Clojure especially embraces interop with Java in favor of wrapping Java
features into Clojure functions, so it's part of the official pain.

Can't say it was as easy as in Python but on the other hand, I can't say it
hurts to know a bit Java, too. At least it's for a good cause :)

------
mahmud
Too much duct-tape, imo.

SPIT and SLURP are just one argument functions without much control. If the
language was a bit more Lispy, they wouldn't even be standard functions, but
something left to the programmer. For example:

1) There is no control over input file character encoding.

2) No provision for reading from non-file streams; say, you want to "slurp"
input from a socket stream, N octets or up to a certain EOF delimiter. You
have no control with clojure in that regard, at least with SLURP/SPIT, you
will have to use Java modules with weird dot and bracket syntax.

3) What happens if the file you're attempting to read doesn't exist? what
happens if there is no input in the file yet (should you block or return an
error or return EOF?)

4) What happens if the file you're writing to doesn't exist? create? err? What
if it exists? truncate? append? err?

Separate between files, streams, sequences, and I/O methods and you will have
a clean user-extensive framework for I/O.

    
    
      (defgeneric slurp (stream
                             &optional &key
                             element-type
                             external-format
                             if-exists
                             if-does-not-exist)
        "Reads contents of element-type from stream")
    

And now in haste, implement that for files, reading back character input in
the given external-format encoding.

    
    
      (defmethod slurp ((path pathname)
                             &optional &key
                             (element-type character)
                             external-format
                             (if-exists :supersede)
                             (if-does-not-exist :create))
        (with-open-file (file path :element-type element-type
                           :external-format external-format
                           :if-exists if-exists
                           :if-does-not-exist if-does-not-exist)
           (let ((buf (make-array (file-length file) 
                         :element-type element-type)))
              (read-sequence buf file)
              buf)))
    
    

We have provided sane defaults for file existence conditions, and we can call
it as this:

    
    
      (slurp #p"/etc/passwd")   ;; read file in default encoding
     
      (slurp #p"/home/mahmud/sales/mideast2009Q2.txt :external-format :utf-8)
    
      (slurp #p"/var/log/hunchentoot/access.log"
             :if-does-not-exist :error)
    
    

Then, you can just write more methods for any new stream you need to deal
with. You can refactor the GF signature to take length as an argument, then
for socket streams:

    
    
      (defmethod slurp ((stream socket-stream) .. length)
        (let ((buf (make-array length .. :element-type '(unsigned-byte 8))))
          (read-sequence buf stream)
           buf))
    
    

None of this is new ground, btw, the API has existed for 25+ years, just USE
IT.

~~~
tayssir
I don't entirely agree. In CL, people pass around slurp-file snippets like:
[http://groups.google.com/group/comp.lang.lisp/msg/89501db253...](http://groups.google.com/group/comp.lang.lisp/msg/89501db25399ea73?hl=en)

Further, Clojure's slurp has an optional encoding argument. (I'm glancing at
its sourcecode, in the same emacs buffer that I'm typing this post in.)

If I understand your argument, slurp's too unlispy to include in the language.
But at least in the CL world, the user is constantly adding to Lisp as she
codes away. And there are many things in the CL spec which often considered
unlispy. (Take loop and format. Loop is even criticized for being not well-
defined, as well as being incomplete in ways you wouldn't expect, as you start
pushing its limits. Though I should mention that I love both loop and format.)

What you wrote was beautiful and more complete (I'm not being sarcastic; it
looks like the High CL style, which I consider very aesthetic and good, though
that's not the only aesthetic style I appreciate), but I think you're
comparing a Clojure feature with something which exists mostly as a CL snippet
passed around informally. (And is maybe now in some opensource library.)

