Hacker News new | past | comments | ask | show | jobs | submit | test9753's comments login

A Singly Linked List implementation in Terra (just testing out):

https://pastebin.com/xt3Wddtu

Output:

https://pastebin.com/KwSTiN8L


How does this compare to a wiki[1] ?

[1] https://en.wikipedia.org/wiki/Wiki



Beating Decades of Optimized C with 27 Lines of Ocaml:

  type t = { mutable words: int; mutable chars: int; mutable lines: int; mutable in_word: bool ; mutable finished: bool};;

  let () =
    match Core.Sys.argv with
    | [| prog_name |] -> Core.Printf.eprintf "Usage: %s file1 file2 ...\n" prog_name
    | _ -> (
      let args = Core.Array.slice Core.Sys.argv 1 @@ Core.Array.length Core.Sys.argv in
      let buf_size = 65536 in (* 64 KB -> Caml IO buffer size *)
      let buf = Core.Bytes.create buf_size in
      Core.Array.fold args ~init:() ~f:(fun _ file ->
        Core.In_channel.with_file file ~f:(fun in_ch ->
          let c = { words = 0; chars = 0; lines = 0; in_word = false; finished = false } in
          let set_words () = if c.in_word then c.words <- c.words + 1 in
          while not c.finished do
            let len = Core.In_channel.input in_ch ~buf ~pos:0 ~len:buf_size |> Core.Int.to_int in
            if len > 0 then (
              for i = 0 to (len - 1) do
                match (Core.Caml.Bytes.get buf i) with
                | ' ' | '\t' | '\r' -> (c.chars <- c.chars + 1; set_words (); c.in_word <- false)
                | '\n'              -> (c.chars <- c.chars + 1; set_words (); c.in_word <- false; c.lines <- c.lines + 1)
                | _                 -> (c.chars <- c.chars + 1; c.in_word <- true)
              done
            ) else ( c.finished <- true )
          done;
          set_words ();
          Core.Printf.printf "%s -> lines: %d, words: %d, characters: %d\n" file c.lines c.words c.chars)))
  ;;
Testing:

  $ ls -lh test_file.txt 
  -rw-r--r--  1 user  group   508M Oct 16 12:01 test_file.txt

  $ time wc test_file.txt 
  4863460 54621760 532480000 test_file.txt

  real 0m3.368s
  user 0m3.137s
  sys  0m0.195s

  #Ocaml version:
  $ time ./wcl.native test_file.txt 
  test_file.txt -> lines: 4863460, words: 54621760, characters: 532480000

  real 0m2.480s
  user 0m2.345s
  sys  0m0.123s


Oh this is fun! Beating Decades of Optimised C (best of three)

    $ time wc w.txt
     3156098 6312196 380648004 w.txt
    
    real 0m1.358s
    user 0m1.277s
    sys 0m0.066s
... with one line of q (best of three):

    q)\t g:{(sum[1<deltas where x]+sum[not x:max x],0),sum max x 0 1}0x0d0a2009=\:read1 `:w.txt
    783
    q)g
    380648004i
    6312196
    3156098i
783msec: almost twice as fast!


Won't this give an incorrect number of bytes for several consecutive whitespace characters? Why not using # to count the bytes instead?


in k, but with that same bug';

(b:#a;b++/,/a=/:" ";b+#,/a)


Here is one without the bug;

b:#a;c:,/a=/:" ";(b;b++/c@&~c&=':c;b+#,/a)

Now to see if I can check the performance.


We discussed this at length in the shakti mailing list [1]. This version by chrispsn is my favorite:

    {+/["\n"=x],+/[<':~x in "\n\r\t "],#x}
I think it's the one that more clearly express the intent. However, this other one by Attila Vrabecz performs better in the current version:

    {(+/x in"\n\n";+/0>':x in" \n\t\r";#x)}@1:"big.txt"
[1] https://groups.google.com/forum/?utm_medium=email&utm_source...


Nice, thanks for that!


That is quite beautiful. It may also have awoken Cthulhu, but that's a separate issue. :)


The system wc is multibyte friendly, the Ocaml version is not.


While I wouldn't be surpised that OCaml is fast, I'd like to be sure your disk cache was warm in both cases. I'd test it myself, but I don't have OCaml setup on my work computer.


I setup it on my computer, used it to count the lines of 250 files, it is indeed faster than my system's (ubuntu 18.04) wc, but it doesn't compute the total. I don't think this should change the timings much though.

A best of five run of the system's wc takes 0.042s, vs 0.028s for the OCaml version.


Nice.

I was going to say that it is unreadable, after spending a bit trying to understand it, (for someone that knows no ocaml) it actually is not. Only things I'm unsure are the ~ symbol and <-. What do they do?

edit: <- is just mutating assignment. I thought that ~ introduced named arguments but doesn't seem right.


~ does indeed denote named arguments. To call a function named f which has an argument named arg with arg set to the value x, the syntax is:

    f ~arg:x


But what about ~buf here:

  let len = Core.In_channel.input in_ch ~buf ~pos:0 ~len:buf_size


~buf is shorthand for ~buf:buf


Oh, nice.


A naive C version is also faster than the system wc.


Why was system wc implemented in a slower way than a naive C version?



Could somebody write this in Clojure? I'm too noob to do it myself. Will it be an order of magnitude slower?


Many languages will have counted a 1gb file before Clojure has even started up.


> Many languages will have counted a 1gb file before Clojure has even started up.

You mean before the JVM has started up. One could compile Clojure with GraalVM to eliminate startup time.


Not really, the initialization of the JVM is fast(er), but initializing the Clojure runtime took a lot of time. At least in 1.7, not sure if it got better in the meantime.

I could start and run Java programs way faster than just starting the Clojure REPL.


I wonder if most of these older standards could be replaced by new standards that use SQLite as the Application File Format [0], making life easier for everyone.

[0] https://www.sqlite.org/appfileformat.html


One approach to solve the slicing issue: https://play.rust-lang.org/?version=stable&mode=debug&editio...


Video of the penguin & the man: https://www.youtube.com/watch?v=oks2R4LqWtE


Reminded me of an old news article (2009): The dark side of Dubai [ https://www.independent.co.uk/voices/commentators/johann-har... ]


Purely functional data structures for OCaml, translated from Chris Okasaki's book "Purely Functional Data Structures":

https://github.com/mmottl/pure-fun


In Ocaml [LOC: 24] [Level: Beginner]

  open Core

  let () =
    if Array.length Sys.argv <> 5 then
      Printf.eprintf "Invalid args\nUsage: %s <input.csv> <column-name> <replacement-string> <output.csv>" Sys.argv.(0)
    else
      let input_file, column_name, replacement, output_file = Sys.argv.(1), Sys.argv.(2),Sys.argv.(3),Sys.argv.(4) in 
      In_channel.with_file input_file ~f:(fun in_ch ->
      match In_channel.input_line in_ch with 
      | None -> Printf.eprintf "Cannot read first line"
      | Some first_line ->
        let columns = String.split first_line ~on:',' in 
        match List.findi columns ~f:(fun _ col -> col = column_name) with 
        | None -> Printf.eprintf "Cannot find column: %s" column_name
        | Some (index, _) ->
          let _ = Out_channel.with_file output_file ~f:(fun out_ch ->
            Out_channel.output_string out_ch (first_line ^ "\n");
            In_channel.fold_lines in_ch ~init:() ~f:(fun _ line ->
              let in_columns = String.split line ~on:',' in 
              let out_columns = List.mapi ~f:(fun i col -> if i = index then replacement else col) in_columns in 
              let out_line = (String.concat ~sep:"," out_columns) ^ "\n" in
              Out_channel.output_string out_ch out_line
            )
          ) in ())


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: