Hacker News new | comments | show | ask | jobs | submit login
Things that Idris improves things over Haskell (deque.blog)
221 points by deque-blog 99 days ago | hide | past | web | 154 comments | favorite



Idris points the way to the future of programming.

A few examples that I find very intriguing:

1. type safe printf: https://github.com/mukeshtiwari/Idris/blob/master/Printf.idr

2. compile-time evidence that a runtime check will be performed: https://github.com/idris-lang/Idris-dev/blob/master/libs/bas...

3. Most of all, compile-time checking of state machine properties http://docs.idris-lang.org/en/latest/st/introduction.html

Think about being able to specify protocols in the type system and ensuring that your client and server meet the specifications. The example in that link is about user authentication. Imagine having a proof that the program can only get into a "LoggedIn" state by going through the authentication protocol.


> Think about being able to specify protocols in the type system and ensuring that your client and server meet the specifications. The example in that link is about user authentication. Imagine having a proof that the program can only get into a "LoggedIn" state by going through the authentication protocol.

Sounds like you are describing Session Types!

(http://groups.inf.ed.ac.uk/abcd/) (http://summerschool2016.behavioural-types.eu/programme/Dardh...)


I'm sure there's a branch of programming where that would be amazing, but in my experience, I've never had major bugs due to my client and server implementing their protocol wrong.

In fact, I question most of the value of formal verifications for non critical software. Most of the problematic bugs that make it through in my experience are either a complex combination of multuple parts of the system interacting together in a way that doesn't end up behaving the way it was intended, or programmers implementing the wrong thing on edge cases.


Have you ever tried to change the protocol used by a running production system? I agree that in greenfield development an agile / iterative approach with a loose process works reasonably well. Once you have a running production system with lots of interconnected components, change gets really hard and really scary. Tools that help manage change in those cases seem useful to me.

Re: formal verification in general, most of the research has been moving towards making it easier to use and giving programmers some of the benefits of verification for free. I don't have any problem with the compiler telling me that my code is going to crash before the crash happens in production. That's unquestionably a valuable thing. Idris (and other languages) are trying to give you those sorts of benefits without reducing productivity or breaking your brain. It might not be there yet, but the progress is exciting.

Finally, both of the problems you identified (complex interactions and edge cases) are things that formal techniques are well suited to address... so I'm not sure how they support your case. Again, the only problem seems to be that the cost outweighs the benefits. But the cost is decreasing quickly.


I have 8 years of industry experience and feel the exact opposite. When programming in Ruby I am forced to not fully express the concepts in my head. We do TDD but our tests are humorously thin; typically covering none, one, many. Idris proofs are tricky but more automated and powerful than writing tests. And I can still write tests in Idris.

To go beyond "protocols", UI form data to server response back to UI display in a web app is a source of major bugs. So many things are broken. Recently I had to add a spurious lap infant to book a flight due to JavaScript template being broken when rendering 0 children.


What happens if you try to book 9007199254740993 spurious lap infants? Just curious.

I'd love to QuickCheck the diapers out of this method.


spurious lap infant?



You seem to just be thinking about low level network protocols.

The idea of a protocol in this sense encompasses pretty much any communication between things. This includes web service calls, user input via a UI, transactions between multiple systems, or even modules within the same program.


I've encountered so many bugs because of mismatches between server and client APIs, it's not even funny. Even then it remains a persistent problem, as evidence with changing Thrift schemas forward.


For write once and forget type of code like a website implemented by a contractor or throwaway code in Jupyter notebooks, the formal verification has a negative value indeed. For anything complex under maintenance ability to see that code behaves as expected in the edge cases without running it is invaluable.


Most functional languages have a type safe printf / sprintf (i.e. OCaml, F#, Haskell) which is similar to the effect you get in Idris: the types required to apply the function are statically analyzed from the format string.

Edit (links):

OCaml: https://caml.inria.fr/pub/docs/manual-ocaml/libref/Printf.ht...

F#: https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual...

Haskell: https://hackage.haskell.org/package/base-4.9.1.0/docs/Text-P...


The Haskell version of printf does not statically analyse the format string and will raise an exception if the provided arguments are invalid e.g.

    Prelude Text.Printf> printf "%d\n" "invalid"
    *** Exception: printf: bad formatting char 'd'
The behaviour of F#'s printfn is hard-coded in the compiler and could not be written in F# iteself.


Aren't the Ocaml and F# examples relying on some dedicated compiler support for the format string type, though? The Idris example is interesting because its printf implementation can actually be written in Idris.


The OCaml implementation used to be hard-coded in the type system, but has now been rewritten (in 4.02, I think) using GADTs.

https://ocaml.org/meetings/ocaml/2013/proposals/formats-as-g...

https://caml.inria.fr/mantis/view.php?id=6017

The compiler only does syntactic desugaring of format strings now.


> the types required to apply the function are statically analyzed from the format string.

Not so, in that Haskell package:

    Prelude Text.Printf> printf "%d\n" 1
    1
    Prelude Text.Printf> printf "%d\n"
    *** Exception: printf: argument list ended prematurely
    Prelude Text.Printf> printf "%d\n" 0.5
    *** Exception: printf: bad formatting char 'd'
    Prelude Text.Printf> printf "%d\n" 1 2
    1
    *** Exception: printf: formatting string ended prematurely

It could certainly be done with TH/QQ. I'm not sure, offhand, if there is already a package to that effect.


There are several packages on hackage offering a type safe printf. Some use TH, some use variadic functions (defined with a type class) or other advanced type level features.


Variadic functions are used in the package under discussion, and don't get you compile time checking of the format string.


Sure, and I know about those. I wasn't claiming that this was only possible in Idris -- and I don't think their implementations are nearly as clean or straightforward as the one in Idris.

In Idris, the pattern used to implement this is available to anyone that knows the language. It feels a lot more natural than using a macro system or typeclasses -- it is just functions.


One could also do this in D as well. it's not that impressive.


Does anyone remember the language that wanted to be future of programming, making everything (API calls, lists on screen etc) inherit from 1 base class, and it used SQL-like searching for filtering on collections?

It had a browser based demo.


Eve? http://witheve.com/

The Eve core is based on a distributed logic programming semantics (although SQL-like is a good accessible description). It was inspired by a datalog-based language called Dedalus (amongst other things).


Yes, thank you!


No but tell me if you remember.


Fortress?


Depressing thing to Google. Seems like an interesting project that couldn't keep it's funding in volatile times. It might have been Swift or F# if developed at a better company.


they did the best job of operator associativity and precedence i've ever seen. still hoping some other languages pick up on it. http://lambda-the-ultimate.org/node/2943#comment-43411


Whitespace-implied parenthesis.

    1+2 * 3 == 9
Sick! I wish Haskell notices this.


It's Eve


A haskell version of 1) is talked about here: https://www.reddit.com/r/haskell/comments/55bvt4/typesafe_pr... It uses the "singletons" library (no relation to the singleton design pattern) that emulates some features of dependent types.


If anyone is having trouble understanding the raw code linked above, the official book[1] introduces all three of those concepts gently and intuitively. I highly recommend it!

[1] https://www.manning.com/books/type-driven-development-with-i...


The book is written by the creator of the Idris language. I highly recommend it to those interested in type level programming as well.


Re typesafe printf, I assume the compiler can produce efficient code, since the pattern matching occurs at compile time?

Also, if you have an arbitrary string that isn't a constant that can be analyzed statically — for example, a string read from a config file — how do you prove to the compiler that it's a valid format-code string?


the compiler knows the failure modes. if the file doesn't exist, the printf isn't reachable. if the file isn't at least X bytes long, the printf isnt' reachable if bytes 10-20 can't be parsed as a number the printf isn't reachable.

it's just showing if you get through these 50 checks, by the time you get to printf the bytes will have been converted to what you need, or the program will have already errored out.


Any string is a valid format string.


I mean the combination of format string + values.

Anyway, is that really true? "%-1s" should probably give you an error, for example.


Type safe printf has been around for more than a decade, going all the way back to at least Groovy.


It is weird to me that so many people are getting caught up on the fact that type safe printf has been around forever. If someone has looked into Idris they almost certainly know that it isn't a new concept.

I linked it because its implementation is amazing comparatively and it is just one clean example of what is possible.


It's been almost a decade since I used Apache Groovy, but I don't remember its printf being type safe. Its method signature is

  void Object.printf(String, Object)
and it evaluates types at runtime.


I used it in OCaml well before there was a Groovy. But what's interesting here is that you get it without compiler special casing and without macros, as a natural consequence of how the type system works - in a way that generalizes to other problems.


>without macros

Type systems are macro systems, it's just not obvious because type languages are very different from expression languages. The main "innovation" of Idris is hammering harder on this front. And in any case, the fact that most type languages are logic languages gives people with a formalist bent a warm feeling in their stomachs.


There's a valid perspective from which type systems are particularly-constrained macro systems. That doesn't mean there aren't contexts where it's useful and/or interesting to distinguish. But if you wish, pretend I had written "traditionally conceived general purpose macros" :)


I know clang will not let you make a mistake, but it must be some kluge since the C language clearly doesn't anticipate that.


and then you need to log in a VIP user or a debug user without auth but cant produce a proof so now it wont compile at all. not a problem i you own the code but in client/server a third party could be the client or server so will have to be very careful about placing proof obligations on code you dont own.


Admins; please consider editing the title, it has a redundant "things" that makes it hard to read and confusing.

The blog post's real title ("10 things Idris improved over Haskell") is better; unless that's a problem due to being a list (which are often spammy). Seems fine/serious to me, though.


I find it unfortunate how many languages treat strings as lists of chars for no clear benefit. This issue is apparent in dynamically typed languages where a function might expect either a string or a sequence, and now it all has to come down to type comparisons because both can be iterated. I think it'd be time to start treating them as atomic values instead.


The benefit is that you reuse the build-in type for lists instead of making a new one. What benefit would not treating strings as lists of chars bring?


Memory use: Unicode scalar values go up to 0x10ffff, which on most machines means a 32-bit value for each character. A UTF-8 representation can be less than 30% the size. And that's not even counting the fact that many languages (Haskell included) represent lists as a linked data structure, with the overhead of a pointer per list entry.

Correctness: you often don't want to operate on individual Unicode scalar values. Extended grapheme clusters can combine multiple scalar values to form a single human-readable character, and that's usually the unit you care about. Representing a string directly as a list of extended grapheme clusters would use even more memory.

Fundamentally, a string has more structure than a list representation gives you (encoded bytes vs. scalar values vs. grapheme clusters). I think it's better to expose this structure than it is to pretend a string is just a list of characters.


On the contrary, UTF-8 is the one that is long, up to 50% longer than UTF-32. (Unless you happen to have a disproportionate number of low code points.)

No free lunches!


That's UTF-16, not UTF-32.

UTF-8 is one to four bytes, UTF-16 is two or four bytes, and UTF-32 is always four bytes. For some code points, UTF-8 is 50% longer than UTF-16 (3 vs 2), but UTF-8 is never longer than UTF-32.


Sure, UTF-8 isn't always the shortest, but for many common strings (like JSON-encoded objects with ASCII keys) it is much shorter than UTF-32. The point is that using a list representation means you can't do any better than UTF-32, even if you wanted to.


If you have ASCII, might I recommend the ASCII character set and encoding?


Performance and efficiency. Just as many problems need a way to store a matrix of unboxed integers or floats, many problems need a way to treat strings, possibly very large strings in a way that doesn't include any language overhead/metadata for each separate character, and allows fast random indexing, which lists don't.

List of characters works fine for the string 'Hello, world!'. It doesn't work fine for the string representing, for example, a whole webpage that you're returning, of for the string that you need to pass to some external code e.g. a regex engine implemented in C (which requires to transform it to a memory-contiguous array of chars, and then transform the results back), or for a 100 megabyte plaintext/xml/json/csv/whatever file you're processing.


The challenge for language designers is to abstract away the details of representation, but still present a uniform interface. (In this case, the designers have failed.)


But iterating over the characters is a very useful feature. It's used in all string algorithms that I can remember in the time it takes to write this comment.


But iteration is not a property of a list, but of the Foldable type class. A list is just one implementation of something that's Foldable/iterable. You can easily implement Foldable for custom text types.


>But iterating over the characters is a very useful feature. It's used in all string algorithms that I can remember in the time it takes to write this comment.

And you still do that with atomic strings -- you just either add a helper method like charAt(i) that gives you access to each character (or, rather, each rune), or you have some way to turn a string of length N into a list of N strings of length one.


And those string algorithms likely break in subtle ways when they handle characters that span multiple codepoints.


> break in subtle ways when they handle characters that span multiple codepoints

Or equivalently: there is more than one way to turn a string into a list. It can e.g. be a sequence of bytes, unicode chars or grapheme clusters. Being explicit about the conversion is therefore a good idea.


Don't forget splitting on word boundaries and/or whitespace - going from a string of text to an iterable collection of words (strings).


Or for the case of (e.g.) domain names, splitting on dots. Generally, given a collection of split chars, breaking the string into a collection of substrings.


Not if the "iterating over character" function iterates over actual characters and not codepoints.


You mean grapheme clusters? Swift is the only language I know that uses that by default, and you still wouldn't want to store strings as a list of grapheme clusters.


I believe Perl 6 does so as well, see e.g. https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-p...


The apple dev documentation has a nice overview of some of the concerns that need to be taken into account for this to work:

https://developer.apple.com/library/content/documentation/Co...

It's probably one of the better approaches - but it's still not clear if it (alone) allows a developer that speaks only English to develop a text indexing or editing system that works well across English, Japanese, Arabic, Hangul and Dutch for example.


Elixir as well.


The problem is “actual characters” are an ill-defined term; that could mean either code points or graphemes. See, e.g., http://unicode.org/faq/char_combmark.html


Sure, it is useful. Maybe have a "string to [Char]" function in the stdlib?


I am not a fan of `cast` (also seen in other languages as well) as it leads to reductionist thinking.

There are usually more than one way to convert from one type to another. How is Float to Int rounded? Shouldn't String to Int return conversion errors? Just looking at `cast` means I have to learn what the language decided to default to.


You should be able to write your own named implementation of Cast and explicitly use it when calling cast to have different casts with different semantics http://docs.idris-lang.org/en/latest/tutorial/interfaces.htm.... But yeah, you'd still have to see what the standard library implementation does


String to int returning 0 for an uncastable string seems like a terrible design. It should blow up. How otherwise can you tell between "0" and "ABC"?


`cast` is a lower-level function that isn't really used for parsing strings. Idris has higher level concepts that are more appropriate for that purpose.

For that specific case, you could use an Idris "view" over the string that either gives you a proof that the string is numeric, or a proof that it isn't. Then you have a "stringToInt" method that requires such a proof as one of its parameters. This allows you to write the parsing code as efficiently as possible since you know you'll only work with valid inputs. It's now a compile-time error to forget to check if a string is numeric before attempting to parse it. Also once you obtain a proof of a certain property (in this case by running a runtime check, elsewhere by proving it statically), you can avoid duplicate checks by just passing around the proof object. And of course these proofs are erased at compile time, so the run time code will be similar to an ordinary 'int.TryParse()' method in c# or java, except without the ergonomic issues of using 'out' references or the possibility of accidentally using an uninitialized variable.


> these proofs are erased at compile time

are you certain about that? my understanding was that witness terms are indeed passed at runtime especially for non-total functions


I haven't yet tried Idris (or even Haskell), so I could be misunderstanding, but surely in a pure language, a function with a type of String -> Int can't 'blow up', it can only output an Int? So if you want to tell the difference, you'd use something that outputs a (Maybe Int) rather than an Int, which'd be a wrapper around `cast` that does validation you want

Looking at the interfaces tutorial[0], it gives an example of something similar:

    readNumber : IO (Maybe Nat)
    readNumber = do
      input <- getLine
      if all isDigit (unpack input)
         then pure (Just (cast input))
         else pure Nothing
[0] http://docs.idris-lang.org/en/latest/tutorial/interfaces.htm...


In Haskell, such a function can definitely blow up.

    > :t read
    read :: Read a => String -> a
    > read "10" :: Int
    10
    > read "x" :: Int
    *** Exception: Prelude.read: no parse
Haskell functions are pure, but not necessarily total.


If it could fail, wouldn't you want it to return an 'either' type, rather than an int?


Yes, exactly. Here's the type signature for `cast`:

  cast : Cast from to => from -> to
So if the types `from` and `to` are part of the `Cast` typeclass, then cast will always convert one into the other, without "failing". The standard library defines a handful of "sensible" casts like Int to Float, but you can't even attempt to cast a function to a String, or a List to an Int, since those aren't part of the `Cast` typeclass.

For an operation that can fail, like a parse method, a more appropriate return type is `Maybe Int`, or `Either String Int` (where the `String` is an error message).

Also Idris provides a dependently-typed way to connect the check (`all isDigit (unpack input)`) to the conversion itself (`cast input`), so that you can't write an invalid check or forget to include the check at all. See my reply to vosper above.


And this is the case in Idris, where `cast` is a member of the `Cast` interface, and there's an implementation of `Cast String Integer`.

`cast` is a function `String -> Integer`, instead of the (more sensible) `String -> Either Integer CastError`.


Haskell definitely has runtime exceptions. It's only relatively pure.


Idris is a really great language and I recommend that anyone struggling with Haskell give it a try. As an OCaml programmer struggling to adjust to Haskellisms, I (ironically) ended up learning Idris before Haskell. Some of the features -- strict evaluation by default, IO evaluation vs. execution, more alternatives to do notation, better records, effects instead of monad transformers -- make Idris vastly easier to understand as a beginner despite the use of dependent types. As a language, Idris is still lacking a couple of things I'd like (and there's still plenty of bugs), but it definitely feels like it's a much refined version of Haskell.


Frustrated Scala users take note: Idris can compile to JVM bytecode (https://github.com/mmhelloworld/idris-jvm), JavaScript (displacing Scala.js), and native code (displacing Scala Native). Idris may be a very good choice for a post-Scala language.

One thing that I find strange, though, is that some of the most prolific Scala developers who are critical of the language seem to stick to languages such as Haskell/Eta and PureScript. Maybe it's the immaturity of the Idris ecosystem.


What about non-frustrated Scala users, should they drop Scala and Scala.js in favor of an effectively experimental language with zero industry adoption?

Post-Scala is already under way with the new compiler, Dotty[1], which will replace present day Scala.

[1] https://github.com/lampepfl/dotty


Non-frustrated Scala user here. I recommend you not switch in that case.

You'll have to admit though: the greatest strength and weakness of Scala is Java.


How does Idris' Java compatibility compare? Simply compiling to the JVM is nice, but unless it can cleanly interoperate with the rest of the Java ecosystem, it simply cannot take the place of something like Scala.


I recently experimented with the Idris JavaScript backend and it is not ready for prime-time. In particular, the FFI doesn't handle functions well. I was unable to call into Idris code from JavaScript which is a deal-breaker, IMO.


An interview with the creator of Idris in the Code Podcast: https://soundcloud.com/podcastcode/edwin-brady-on-dependent-...

Another cool feature of Idris is elaborator reflection https://www.youtube.com/watch?v=pqFgYCdiYz4 which I believe has no direct Haskell analogue (template Haskell perhaps?)


More generally if you like functional programming, the archives of the "Functional Geekery" podcast are a great resource:

https://www.functionalgeekery.com/

There's at least one episode that's devoted to Idris:

https://www.functionalgeekery.com/episode-54-edwin-brady/


Many of these you can work around with language extensions or a custom Prelude, but then you need to have been bitten by them to know that you need to work around them.

I hear "dependent types" and I think "theorem prover", but it seems like Idris is a cleaned up Haskell with some light proving features built in?

Haskell is good for compilers and parsers. What's a good excuse to try Idris?


> I hear "dependent types" and I think "theorem prover", but it seems like Idris is a cleaned up Haskell with some light proving features built in?

It's a full-strength dependent type theory, but is intended primarily for use as a programming language. So the design focus is on using full dependent types to make ordinary programming easier.

> What's a good excuse to try Idris?

The best excuse of all: it is super fun.


Dependent types can help a lot when writing both of those actually. Idris' documentation contains an example interpreter that uses dependent typing (outside of theorem proving) heavily[1]. Compilers are a place where formal verification is likely worth it in some parts, particularly when writing optimization passes.

I think one of the biggest advantages of dependent typing is that it lets you implement a lot of things that other languages need to add as ad-hoc language features. A good example (if you're familiar with Haskell) is typeclasses. These can be done entirely at the value level in Idris, and IIRC interfaces are just syntactic sugar over this implementation. Other examples are generics that depend on values instead of types (without resorting to tedious things like type-level integers) and cleaner replacements for a lot of macro functionality.

[1]: http://docs.idris-lang.org/en/latest/tutorial/interp.html


Many of the points touched in the article are simply the difference between designing a programming language for giving the authors a feeling of bleeding-edge cleverness and designing it for the benefit of programmers using it. It's quite more than an "excuse" to prefer Idris.


> Haskell is good for compilers and parsers.

While true, the same applies to any other language in the ML family.


The problem with working with strings in Haskell is that there are too many datatypes: Data.Text, Data.Text.Lazy, Data.ByteString Data.ByteString.Char8, Data.ByteString.Lazy, Data.ByteString.Lazy.Char8. All of them share the same function names so you have to do imports like

    import qualified Data.ByteString as BS
    import qualified Data.Text.Lazy as TL
    import qualified Data.Text as TS
and somehow the library you need always use a different ByteString variant from the one you already chose so you have to pack/unpack here and there. There should be a way to make `length`, `map` and all work on every string type. Maybe a type class or some better idea is needed.

By the way the link to the caesar cipher is broken.


I know it's only a cure for the symptom, but there's the OverloadedStrings extension


Surely the proper way to do length is to have it be, essentially:

    length :: forall f a n. Foldable f => Semiring n => (f a) -> n
`length` is not specific to any one type and it's a complete waste to think of it as something to do with strings in general. They should all just be foldable and that'll give way more.

Likewise, they could all just be functors and `map` is free.


Are you sure that Semiring shouldn't be a Monoid?

What happens if you feel the urge to define several Semirings (or Monoids) on the data type?


I'm not sure I follow, sorry. The line says that we also pledge to produce a Semiring from this Foldable structure. Given it's a length function it's about as general a specification I think you can give it without becoming something completely different.


To the first question, a Semiring has two operations, + and × conventionally (along with ...). That seems to be more than length needs. A Monoid is just an operation and its identity, which should be exactly what you need for length.

For the second question, traditional (+,×,...) is a Semiring for integers; so is (&,|,...) (bitwise boolean operations). How can you define both Semiring implementations without getting them confused?

There is another reply to my comment that I haven't digested yet. Maybe it answers the second.


"A Monoid is just an operation and its identity, which should be exactly what you need for length."

And I'm wrong about that. You need a zero and a one, right. What would that structure be?


I've been experimenting with answering that question.

An extremely heavily-commented approach to defining equality for things like Double (where there are multiple sensible ways to compare things) will probably be a better introduction to these ideas:

https://github.com/mrkgnao/noether/blob/master/library/Noeth...

https://github.com/mrkgnao/noether/blob/master/library/Noeth...

The rest of the repo implements abstract algebraic structures along similar lines.

Here is what monoids look like in this formalism at present. It's a bit involved: first I use a fine-grained separation into highly polymorphic Magma and Neutral classes:

https://github.com/mrkgnao/noether/blob/master/library/Noeth...

And there are "strategies" for combining those pieces of structure, whose use looks like

  type instance MonoidS (op :: BinaryNumeric) Rational 
    = DeriveMonoid_Semigroup_Neutral op Rational
Which means "derive the preferred monoid structure with operation op" (MonoidS op) on Rational from the preferred semigroup structure and preferred neutral element on it with that operation.

https://github.com/mrkgnao/noether/blob/master/library/Noeth...

You can always use some other monoid structure for the same combination of type and operation (!) if you like. For instance, here's a demo of using the self-action of a ring on itself (i.e. considering a ring as a module over itself) to compute something, even if your preferred choice of action has been declared to be something else (or hasn't been defined at all).

https://github.com/mrkgnao/noether/blob/master/library/Noeth...

This says "here, use the self-action derived from the multiplicative magma structure on R".


That is way more advanced Haskell than I am familiar with, but it does look like what I was wondering about. Thanks!


I'm not familiar with UTF-8, which -- I believe -- is what Data.Text is used to represent. Do all UTF-8 strings have a well-defined length?

A list of ASCI chars obviously does (the length of the list), but I'm not sure about Data.Text.Text.


Yes, I'm sure the length of a UTF-8 string is something tricky but Data.Text has a length function nonetheless. The APIs are very similar if not identical. You can see it here:

https://hackage.haskell.org/package/text-1.2.2.2/docs/Data-T...

https://hackage.haskell.org/package/bytestring-0.10.8.1/docs...


Of course, a real string type would store length along with the characters, making the length function trivial and fast.


It's hard to say what length should be because it could mean different things. How many u8 values there are? How many graphemes are present?

If it's just counting how many u8's in a slice, that's trivial. But storing the total number of variable-length graphemes isn't. I assume that's why Data.Text's length function is O(n), and other languages that have UTF-16 or UTF-8 strings also have length functions that are linear.


Graphemes? Strings can be expected to be transformed so that they contain the "right" type of characters (in particular, plain letters followed by the respective combining diacritical marks vs a smaller number of precomposed accented letters) before asking about their length, which is how many characters they contain. If you want to count "graphemes" you can have a different function and possibly a separate cached result.

This in theory. It is readily apparent from the Data.Text page at https://hackage.haskell.org/package/text-1.2.2.2/docs/Data-T... that the authors care about performance only selectively ("fusion" of buffer allocations is fun, relatively messy data structures to cache important data such as string length are not fun), and that they don't care enough about Unicode to add serious string-level abstractions over the Unicode tables exposed in the Char type.


Well, how long is the sequence "a<Backspace>". It's two characters and one, two or zero letters?


This has been covered a few times before. See this for example: https://mathiasbynens.be/notes/javascript-unicode

Data.Text seems to recognize symbols made of several codepoints (like emoji) as one but still counts diacritics and combining characters as different symbols.


Data.Text is represented as a UTF-16 vector


This is what the backpack feature landing in GHC 8.2 will allow you to do


As far as I can see, backpack isn't going to solve converting between string types, unless you want to use the same string type for everything your application does (e.g. sending/receiving bytes over the network and displaying a UI component with text in it).

Or are you suggesting that we specialize the TCP receive function to only receive UTF-8 strings, so we don't have to convert when displaying to the user?


With Backpack most of the time you would not have to think about the type of the string.

See: http://blog.ezyang.com/2016/09/the-base-of-a-string-theory-f...


Doesn't mono-traversable/classy-prelude solve this?


How does Idris compare to Agda (1) and F* (2)?

[1] http://wiki.portal.chalmers.se/agda

[2] https://fstar-lang.org/


Agda is primarily a theorem prover, Idris doesn't try to compete with that. They're rather similar in a lot of ways, but Idris targets more general purpose programming. F* is more similar in its goals, but it kind of has its own paradigm. I think a lot of the focus behind F* was based on crypto applications, so it's a bit less general purpose. It also approaches I/O without monads (to my understanding, it just uses Tot, ML, etc. to mark effects for types). The syntax is more ML-like (not super important), but in terms of the type system it's very strange compared to Idris (it focuses on refinement types versus using things like GADTs to construct a lot of the primitives -- the F* way of creating nat is to assert that an integer is >= 0 instead of defining the natural number as Zero | Succ Nat).


Indeed. Idris aims to be "PacMan complete". As in it should be perfectly feasible to write PacMan using Idris.


From using Agda it is fairly similar since both do theorem proving by programming in a unified language vs Coq which has separate languages for programming and proving. Where Idris excels is a standard library focused on systems programming, things like good console input/output support that Agda doesn't focus on.


Idris & Agda are based on calculus of constructions

F* is based on type refinements and smt solver.

in theory all DT are equivalent but type refinements are less expressive in practice than CoC.

sometimes the smt solver gives up and you are in trouble.

alternatively the smt solver can provide simple refinements with less effort than CoC when it does work.


And ATS [3]?

[3] http://www.ats-lang.org/


So, the unpack/pack solution seems a bit weird to me. Why do you need to convert a string to a list just to iterate? I'm assuming "List" is a linked list.

Why not have an Iterable/Enumerable typeclass/trait/interface which is implemented by each dataset that is listly? Seems a lot more efficient and easier to understand then having to convert between representations just to iterate or change elements.


Wow, I didn't realize that record field names have to be unique in Haskell. The following:

    data Human = Human {name :: String}
    data Dog = Dog {name :: String}
is illegal, because they can't both have a field accessor called `name`.

That's... crazy.


Yea... in theory it makes sense. In Haskell, field names are functions. If you have an instance of Human and you want it's name you do 'name human' not 'human.name()'. Polymorphism is done through type classes, not by ad-hoc overloading. So if you want the name function to take a Human or a Dog you need to define a type class with name and have both implement it (kind of like interfaces, but not really).

Pragmatically, it's really annoying. There's a solution in the space between pure (a la Haskell) and magical (a la Scala) that makes sense. I think Idris might have found it.


AFAIK -XDuplicateRecordFields and -XOverloadedRecordFields solve this now, creating a typeclass for each of the record labels. In this case, the compiler would generate a HasName class. (This is conceptually similar to classy lenses, but minus the TH.)


They only have to be unique per module. As of GHC 8.0, there is a flag that enables duplicate record names.


Wait, Idris sounds like a much improved Haskell.

Any downsides (in the core language) besides the smaller community?

Any chances for Haskell to get some of the same things?


The core is based on a different type theory. It is very unlikely that the part of GHC called "Haskell core" will make such a massive change.

In general the Idris type inference is not nearly as good as Haskells. I don't mean this in the "dependent type inference is undecidable" sense, but instead just generally.

Idris is also strict instead of lazy like Haskell. This is good or bad depending on who you ask. Very unlikely for it to change in Haskell though.

Many of the other issues are the issues with every small language. Community size and libraries.


> In general the Idris type inference is not nearly as good as Haskells. I don't mean this in the "dependent type inference is undecidable" sense, but instead just generally.

Do you know if this is just because Idris is a much less mature language than Haskell, or its it something fundamental about the design?


Probably both. Dependant type systems are still an area of active research.


Idris is one of those languages that when I learned it felt like it was offering me a glimpse into a possible future.

I love it when that happens.


Would you mind sharing what it was about Idris that made you feel like you were offered a glimpse into a possible future?


I can't speak for op but I had a similar feeling going through the free chapters of 'Type driven development with Idris'. What blew me away was once you have such a strong type system it eliminates whole classes or errors at compile time I previously considered only checkable at runtime. Furthermore it showed that doing so allows the compiler to go from being a gatekeeper to being more of a virtual assistant that helps you write your code not just check it.


> What blew me away was once you have such a strong type system it eliminates whole classes or errors at compile time I previously considered only checkable at runtime.

Can Idris turn Haskell runtime errors into compile-time errors and, if so, which ones?


The classic one is the `head` function in idris can only be called on `Vec`s of length 1 or more. More pressingly, you can actually prove lawfulness of type classes.


I'm curious how does this compare to something like Cats' NonEmptyList[1] in Scala?

[1] - https://github.com/typelevel/cats/blob/master/core/src/main/...


Checking the head of an empty list is a simple example. (Although see https://wiki.haskell.org/Non-empty_list for an approach to avoiding these runtime errors in Haskell).

Here's a more advanced example: https://news.ycombinator.com/item?id=14569605.


head [] in haskell will crash and burn, because you cannot carry vector size information in the type (at least not easily).

In idris, you can lift the non-empty list check at type level, making such operation a compilation error.


Pony language's "reference capabilities" are like this--which separates the idea of type from the access to memory, and provides another level of error-class elimination and its composition of these reference capabilities and the actor model for another reduction of runtime overhead in garbage collection (being lock-free and concurrent.)

https://tutorial.ponylang.org/capabilities/reference-capabil...

https://tutorial.ponylang.org/appendices/garbage-collection....


Idris style/semantics question: why does the last line of the first example in the article use parentheses[1] instead of another dollar sign[2]?

[1]

    caesar_cipher : Int -> String -> String
    caesar_cipher shift input =
      let cipher = chr . (+ shift) . ord
      in pack $ map cipher (unpack input)
[2]

    caesar_cipher : Int -> String -> String
    caesar_cipher shift input =
      let cipher = chr . (+ shift) . ord
      in pack $ map cipher $ unpack input
To me the second one seems more readable. Are there semantic differences? Performance differences?


I know idris (apart from improving some haskell warts) also adds dependent types. Are there any good simple examples of dependent type use, that isnot vectors-of-length-N?


The type safe printf implementation is pretty cool as another small example. https://github.com/mukeshtiwari/Idris/blob/master/Printf.idr


Yes. They're amazing for theorem proving (see Agda, Idris). They're also useful for refinement types, proving type class laws like for monads, etc.


Is there a hello world-y example of refinement types, similar to Vectors with length?


Interfaces are as powerful as modules in ML:

'Interfaces in Idris are similar to Haskell type classes, but with support for named overlapping instances. This means that we can give multiple different interpretations to an interface in different implementations, and in this sense they are similar in power to ML modules (Dreyer 2005). Interfaces in Idris are 1st class, meaning that implementations can be calculated by a function, and thus provide the same power as 1st-class modules in 1ML (Rossberg 2015).'

Quoting from https://www.idris-lang.org/drafts/sms.pdf


Can anyone explain the last point to me? How exactly is the example considered "abuse"? What would an attempt to write similar in Haskell look like?


The 'abusiveness' is the use of `>>=` as a constructor, instead of its more standard role as member of an interface.

The author is using it to take advantage of syntactic sugar that is now reliant only on names and types, instead of implementation of the relevant interface.

A similar idea is to use `::` and `Nil` as constructors, which let you use the `[a, b, c]` list syntax (desugared to `a :: b :: c :: Nil`) for things that aren't actually lists. It can be convenient, but can also result in confusion.


Presumably one of the applications of a compiler with a built-in theorem prover is mission critical code. But my understanding is that most mission critical code environments prohibit recursion. What's the main target usage here?


I've never worked on something "mission critical," but most common uses of recursion are tractable, so I'm not sure why it would be disallowed.

The simplest form of tractable recursion is where some argument in the recursive call is, by some metric, "smaller" than the outer call, and is heading towards a base case that terminates recursion when the argument reaches "zero." Two examples are: decrementing integers towards zero, and recursing on the tail of a list. You can prove termination and many other properties of such a function by induction. This concept is fairly easy to generalize to any algebraic data type, and to many other domains.

Edit: I realized I didn't really answer your question. The folks who are working on dependently typed languages (like Idris) see them as broadly useful, and they have a good point. Consider Java generics: they let you have types that depend on other types, like a "List of Integers." The compiler can make guarantees based on that type information and a lot of people consider that useful. With dependent types you can have types that depend on _values_, like "List of exactly 5 Integers that are all between 0 and 100." Again, the compiler can make guarantees based on that type information, and it could improve code safety, modularity, etc.

The challenge with dependently typed languages has been coming up with a "surface syntax" that's easy to use. Idris (and Agda) are pretty much state of the art there.


"most common uses are tractable" does not guarantee that many common uses will have been correctly analysed.

As to why it's disallowed: stack space is limited, and tail call optimisation isn't taken for granted. Here's what the Joint Strike Fighter coding standards (thataway -> http://www.stroustrup.com/JSF-AV-rules.pdf) say:

    AV Rule 119 (MISRA Rule 70)
    Functions shall not call themselves, either directly or indirectly 
    (i.e. recursion shall not be allowed).

    Rationale: Since stack space is not unlimited, stack overflows are possible.

    Exception: Recursion will be permitted under the following circumstances:

    1. development of SEAL 3 or general purpose software, or

    2. it can be proven that adequate resources exist to support the maximum level of
    recursion possible.
So yes, you're allowed it if you do that extra work, but given that you can replace tractable recursion with a loop anyway, the win you'd have to get from expressing the problem recursively has to compensate.


Interesting. Thanks for the details.

In theory a compiler can do this optimization for you and can use simple syntactic checks to determine whether recursion terminates and whether the optimization applies (at least given a language designed with these things in mind, I'm sure this is much harder or maybe impossible in C++). With something more modern than C++ I'd imagine you could adopt a rule that says "you can use recursion but we compile with --dont-allow-nonterminating-or-unoptimizable-recursion".


Why does it prohibit recursion? Tail call optimization can prevent stack overflow


Can if it's present. "Mission critical" tends to mean c/c++ where chances are you've got to do that sort of thing yourself.


What is the problem with strings being lists in Haskell?


There's a problem with strings being lists in general because it forces a representation that's not perfect for every use case. Keeping strings abstract and giving them their own interface (even if it's very similar to list's) lets you optimize the representation, or even specialize the representation for different use cases. You can intern, use vectors, use ropes, use tries, or any number of other crazy things. It also lets you expand the string interface beyond the list interface.

Practically, certain operations end up being slow when you use the standard "strings are lists" mechanism provided by Haskell's standard library. Stuff like building a list is hard or slow for stupid reasons: Haskell uses cons lists with fast append-to-head and slow concatenation, so the common case of building a list by appending characters to the end is slow. Or you have to prepend and reverse. Stuff like that. It's dumb, but it matters.

The Haskell community has reacted by creating other ways to represent strings, like `Data.Text` and `ByteString`. These representations have certain benefits, so they're widely used. This adds _another_ problem: different libraries use different representations, so you end up having to convert back and forth between them all the time. Again, this is annoying and inefficient.

So yea. I think the lesson for language designers is clear. Strings are a distinct concept. Yes, they do list-like things, but they're not lists.


Was this written in a hurry? There's lot of spelling/grammar errors in the article.




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: