I wouldn't recommend following the Haskell approach. It hasn't worked well for us. (I took part in creating the Haskell Platform and the process used to add packages to it. I also used to maintain a few of our core libraries, like our containers packages and networking).
Small vs large standard library:
A small standard library with most functionality in independent, community-maintained packages has given us API friction as types, traits (type classes), etc are hard to coordinate across maintainers and separate package release cycles. We ended up with lots of uncomfortable conversions at API boundaries.
Here's a number of examples of problems we currently have:
- Conversions between our 5(!) string types are very common.
- Standard library I/O modules cannot use new, de-facto standard string types (i.e. `Text` and `ByteString`) defined outside it because of dependency cycle.
- Standard library cannot use containers, other than lists, for the same reason.
- No standard traits for containers, like maps and sets, as those are defined outside the standard library. Result is that code is written against one concrete implementation.
- Newtype wrapping to avoid orphan instances. Having traits defined in packages other than the standard library makes it harder to write non-orphan instances.
- It's too difficult to make larger changes as we cannot atomically update all the packages at once. Thus such changes don't happen.
Empirically, languages that have large standard libraries (e.g. Java, Python, Go) seem to do better than their competitors.
I don't think most of these are applicable to Rust.
> - Conversions between our 5(!) string types are very common.
> - Standard library I/O modules cannot use new, de-facto standard string types (i.e. `Text` and `ByteString`) defined outside it because of dependency cycle.
We have one string type defined in std, and nobody is defining new ones (modulo special cases for legacy encodings which would not be worth polluting the default string type with).
> - Standard library cannot use containers, other than lists, for the same reason.
> - No standard traits for containers, like maps and sets, as those are defined outside the standard library. Result is that code is written against one concrete implementation.
Hash maps and trees are in the standard library already. Everyone uses them.
> - Newtype wrapping to avoid orphan instances. Having traits defined in packages other than the standard library makes it harder to write non-orphan instances.
This is true, but this hasn't been much of a problem in Rust thus far.
> - It's too difficult to make larger changes as we cannot atomically update all the packages at once. Thus such changes don't happen.
That only matters if you're breaking public APIs, right? That seems orthogonal to the small-versus-large-standard-library debate. Even if you have a large standard library, if you promised it's stable you still can't break APIs.
But if you have a large standard library and want to break the API, you can.
If you have 100 different libs that are basically "standard" (who doesn't have `mtl` in their applications at this point), now you have to coordinate 100 different library updates roughly at the same time. If you forget even one of them, then you've broken everything.
I think the argument for a large Prelude/standard lib is similar to Google's "single repo" argument: Easy to catch usages and fix them all at once. Plus you're making the language more useful out of the box. People coming from python can understand this feeling of opening a python shell and being productive super quickly form the get-go.
Arguments for small std lib exist, of course. But Giant standard libraries are more useful than not.
EDIT: I think the failure of the Haskell Platform has a lot more to do with how Haskell deals with dependencies, and the difficulties it entails, than with the "batteries included" approach itself.
Standard libraries - types, in particular - are the lingua franca between unrelated libraries. The more that's in your standard library, the easier it is to integrate different libraries.
The higher level the library (e.g. containing content specific to an application domain), the more magical-seeming libraries can be added to the ecosystem. The counter-risk is the standard library growing in undesirable directions that you can never change because you can't remove stuff.
The interstitial glue that lets third party libraries integrate with one another and be usable by your app: that's the single biggest reason for having a bigger standard library than a smaller one. It has very little to do with including the batteries in the box.
If you think it has something to do with including the batteries in the box, you'll be lured into the trap of making it easy to fetch the batteries from across the internet (that's almost the same, right?). The trouble is, the internet has 100 different batteries to choose from, and not only have you offloaded the choice onto the user, but the batteries use mutually incompatible terminals and you have to jerry-rig interfaces between them. Let a thousand flowers bloom, say some people: trouble is, waiting for the biggest flower can take years, and people pick different ones in the early days. A bad choice is better than indecision.
Low effort updates are even less what large standard libraries are about. Large standard libraries are much harder to update, not easier: there's much more surface area, so it's far easier to break an application - and since every application uses the standard library, you could potentially break them all. Easier versioning and updates are a strong argument for extracting out things into third-party libraries.
But even then, languages that have great, thriving easy to use dependency systems and package managers with small standard libraries still run into problems
The issue with comments written this way is that there are no details to support the claim.
Writing "see: JavaScript" doesn't really help without context. Without context, one does no know if you meant "JavaScript in browser" or "JavaScript via Node.JS" or "I simply don't like npm".
I'm not claiming there aren't any problems; however, "problems" are situational and one person's "problem" is another person's meh.
I just think it's irresponsible to not provide detail when making such claims.
The standard library also includes Path/PathBuf and OsStr/OsString. And third-party libraries also use [u8] for bytestrings.
It'd be nice to improve handling for user-supplied text where you can't assume UTF-8. For instance, git2-rs provides the contents of diffs as [u8], because it can't assume the diffed files use UTF-8. That led to this commit today: https://github.com/ogham/rust-ansi-term/pull/19/commits/a0da...
That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?
(As much as I'd love to just say "use UTF-8", that would break on many git repositories, including git.git and linux.git.)
I think rust needs to slow down in this regard. I have been with Python since 1999 and the stdlib has held it back, I have also used Scala and Haskell and have witness the mess that platform libs on each have caused.
What Rust has right now is pretty amazing. What needs to happen is a way for devs to easily break the dependency cycle and include multiple versions of the same crate. Something that has plagued Haskell. I dunno what the answer is, trait only crates, struct only crates?
If people want to 'curate' (shop) a set of packages, they can make a meta package that exports its deps.
There is literally no reason to ship libs with the compiler aside from the basic verbs and nouns.
With verioned and properly name-spaced imports, one could use different curated libs.
If you can, could you elaborate more on python's stdlib holding it back? I think batteries-included experience is one of the reasons why so many people (including myself) use python.
It's also one of the features I sorely miss when using Rust. Luckily, Rust's stdlib is starting to tend towards being more practical with recent additions like system time.
The 'std lib is where libraries go to die' was invented by Python. The libs are shallow, don't break backwards compat and provide a substandard experience. Things that continue to improve provide an out of tree alternative package name. Python codebases that are resilient don't use much of "core", arrow for time, requests for http, simplejson, etc. Using core is an antipattern that will get you stuck on a version of the language which is ridiculous.
Linking the language and the libraries together is a mistake.
In the enterprise space it is quite common that we only get to use what it is in the computer and access to anything else is strictly controlled by IT.
So if it isn't in the standard library or some internal library mirror, we don't get to use it, as simple as that.
I think it would be terrible for Rust design/evolution/policy be constrained with that kind of enterprise badness that basically bans crates.io, and crates.io is an awesomeaspect of the Rust ecosystem.
I can tell it is lots of "fun" when you can only use a Maven mirror, with approved jars.
To get a jar into that mirror, a request needs to be sent to the legal team describing the license and business case use, after approval the IT team will add the said jar to the mirror.
The same applies to version upgrades of already approved jars.
This is a typical scenario I had already in a couple of projects.
I agree that this sucks, but not doing it that way is dangerous for the company because developers might not care enough about license compliance when they include some stuff into their project.
So maybe there's value in shipping a "standard bundle" that includes popular libraries or some such. But it's not worth distorting the whole language design to accommodate bad policies.
I see where you're coming from, but I feel like it would be a mistake to expect the language or std lib to try to solve problems that are effectively organizational/cultural issues.
That's a failure at the moment of inclusion. I'm guessing it was done for convenience and to increase adoption (getting decent libraries in the standard library faster).
Just as a data point... I like and heavily use the core libs... And not once i used arrow, request or simplejson, while knowing them, because i didn't feel the needs
Arrow seems particularly useless as it just wraps stdlib datetime and its awful 10 byte size rather than moving to an 8 byte representation like np.datetime64 uses.
Just because you haven't found a use for it doesn't mean it's useless.
The stdlib datetime class is terrible and desperately needs to be wrapped. Arrow is a good wrapper. I don't know what you're on about with counting bytes.
I've wrapped datetime for company work (pre pandas, pre datetime64) to make sure it follows the rules of the data analysis platform we developed (adding functions for moving to next month of year based on various financial calendar rules for example). I wish I hadn't done it and had just wrapped a boost_datetime since the performance of datetime is slow when you have a large timeseries of them. The performance is especially unacceptable if you also have timezones attached to your datetimes.
Now we have pandas, yay. But I don't see why one would use arrow. If you're patient enough, could you explain why you would use it? The website doesn't seem to be very convincing.
The thing I like about python is it gives tools for library writers to build things without going too low level.
Application writers will always write with better libs, but don't have to worry about third party lib compatiblity on platforms because of the stdlib serving as a virtual machine (most of the time)
Many libraries in the stdlib have much better alternatives, because libraries with their own release cycle can evolve much quicker. But people get stuck on the "standard" version because it's what's in the stdlib. Worse, people write for compatibility with whatever was in stdlib 2.4 because that's what RHEL6 ships.
Which I guess is normal since it does not create any dependency cycle. A new version might as well be thought as a completely different package (of perhaps similar functionality).
One of the things I love above all about Python and Ruby are the kitchen-sink standard libraries. The node ecosystem is deeply frustrating in this respect.
It has been a while since I did anything with Python, but I did like its standard library. It was reasonably comprehensive without feeling bloated, and the documentation was pretty good (mostly).
Having a good standard library also makes deployment easier.
(In Go, OTOH, I tend to care less, even though its standard library is quite good, because thanks to static linking, deployment is always easy, no matter how many third-party libraries I use.)
> We have one string type defined in std, and nobody is defining new ones (modulo special cases for legacy encodings which would not be worth polluting the default string type with).
There's also `inlinable_string`, `string_cache`, `tendril`, `intern` if you need inlining for performance.
The bigger problem is with other things like 2D/3D points which can be (f32, f32), [f32; 2] or a custom struct.
I really really would advise having a word with Snoyberg about this. The Haskell Platform has been a pretty deadly experience. It's also ridiculously beginner-hostile (sounds like it won't be, is in practice).
Hash maps and trees: fine. What about database interfaces (e.g. a JDBC/ODBC/whatever equivalent)? What about HTTP servers - even the minimal declaration for what a synchronous request handler might look like? How about threadpooling - if you have multiple libraries that have parallelizable work, you certainly don't want multiple threadpools each thinking they have X many cores to work with, and you don't want the user to have to partition these things either - that's not a happy problem.
All things you can delegate to third parties, but not without lots of cross-talk and confusion until things settle down to winners and losers, which may be a long time in the future. Indecision can be costly.
Consider standard library profiles, with progressively higher levels of abstraction supported. It's the right decision for creating a good ecosystem. C and C++ took decades to build consensus on the more complicated libraries, and C++ eventually grew a pseudo-standard library in the form of boost to centralize efforts, simply because it is more efficient that way.
You were being down voted, maybe for perceived snark, but I think you raise an interesting point.
To me, C did have a standard library: Unix. It's a runtime system too! Due to the nature of the original C bootstrapping process it just happens to be possible to remove this standard library, and Windows was evidence of this.
There is another interesting potential counter example: Lua. It's minimalistic standard library is part of what makes it so attractive for embedding, eg. in game engines. However, Lua's embedding API is so good, you could almost say that it comes with a large standard library too: Your existing C code!
I guess my larger point is that languages rarely are able to stand completely on their own. They need some sort of valuable body of code to justify people to choose the language and libraries together. It might have been the case 40 years ago that you'd reasonably choose to build something "from scratch", but today, if you start on an island, you need to build a bridge, lest you remain on an island forever. Better to start on the mainland.
It's one thing to build a layered system with a small core. It's another thing to completely ignore the fact that the libraries and community _are_ the language, in the only ways that actually matter.
Lua's lack of a stdlib is also a curse. I can't imagine how many incompatible versions of string.trim and OOP libraries are out there in the wild right now...
Things have been getting better lately because of Luarocks but its still an uphill battle.
where mt.__index has all the methods. How you assign to mt.__index can vary across modules according to style, but that's a _purely_ asethetic issue. The mechanics are identical. Using a module to accomplish it creates a useless dependency.
There are many criticisms one could make of Lua, but I don't think those two particular criticisms are legit. They're classic bikeshedding.
The function you presented that trims to the right has quadratic runtime behavior if your string has a long sequence of spaces that is not at the end of the string. For example, "a b". A similar performance bug was behind a 30 minute downtime at stackoverflow.com, because a code snippet with 20 thousand spaces inside a comment showed up on their frontpage.
Anyway, I wasn't trying to say bad things about Lua with my examples. Its just that if you go to any large Lua project out there there is a very good chance you will find some "utils" module in there with yet another reimplementation of a lot of these common functions. Ideally we should have people reusing more stuff from Luarocks than they are right now.
If you're reading a pile of string processing code, seeing
s.rstrip()
helps make code self-documenting, compared to
s:gsub("%s*$", "")
I don't want to argue for a massive standard library (for instance, I don't think Python should have shipped modules for dbm, bdb, sqlite, or XML-RPC), but simple string processing seems like a good thing to standardize.
String processing is never simple. Simply identifying "what is whitespace?" is a big undertaking in Unicode.
Lua's philosophy seems to be to include the absolute minimum that is unacceptably painful to omit. This is a perfectly reasonable tradeoff for Lua's primary use case: embedding.
With respect to strings in particular, most systems that Lua is embedded in has its own string type, or inherits one from a framework. This is an unfortunate reality of the C/C++ world.
Returning to my point about language standard libraries: The lack of a traditional "standard library" is a feature for Lua, but only because Lua has a strong FFI and C API that acts as a "bring your own standard library" mechanism. It's less about needing a standard library, and more about admitting a language is only one piece of the puzzle. For a language to flourish, you need to have some story for interfacing with the rest of the world in a rich way.
I'm not sure I'd classify them as a standard library; they're essentially just pervasive global variables. For a comparison, think of Java; the standard library is things like `java.util` and `java.swing`, which goes far beyond having the `System` and `Math` classes available in `java.lang`.
Well, JS was originally competing with Java applets in the browser, but, like you said, fitness for purpose is pretty significant!
My point (or rather, the point of the parent comment that I'm agreeing with) there's a lot more than just the presence and characteristics of a standard library that determine how widespread a language becomes
HTML doesn't do anything for JS other than provide a way to create visual interfaces. It might be comparable to the role that `tkinter` plays for Python's stdlib, but HTML alone is emphatically not a standard library.
My point is that most languages are totally useless on their own. JavaScript the _language_ doesn't offer any FFI or other mechanism to call outside services. Without a browser or something like Node's libuv, JavaScript wouldn't be useful at all. The capabilities provided in the box are part of the language in terms of what actually matters in motivating people to choose to use the language, no matter what form those capabilities come in.
> You seem to be overlooking the ultimate counterexample: C. :P
I think one reason (of many) that C++ has replaced C almost completely for new development is the STL. Of course, the STL fundamentally depends on the language feature of templates, which you can only approximate in C, but considering that Java and Objective-C, among other languages, lasted pretty long with no generics and only non-type-safe containers, I think C could have benefitted greatly from basic things like resizable arrays, hash tables, trees, better strings, etc. in the standard library. Now it is probably too late for it to matter (which most people consider a good thing).
It had, Modula-2 and Pascal dialects usually had richer libraries.
For example check Turbo Pascal libraries, including Turbo Vision, already on MS-DOS.
C took off thanks to UNIX's adoption, like JavaScript on browsers nowadays, it became the language to use for anyone working on the enterprise on those new shiny UNIX boxes.
In Europe it was just another systems language to choose from, back when CP/M and other 8 / 16 bit systems were common.
Both C compilers and Turbo Pascal already existed in CP/M, which preceded MS-DOS.
Also there were C, Pascal and Modula-2 compilers available for ZX Spectrum.
And on my part tiny of the globe I can guarantee that everyone only cared about x86 Assembly, Turbo Basic and Turbo Pascal, with Clipper for business stuff.
I only got to learn C in 1993, after having been a Turbo Pascal 3, 5.5 and 6.0 user.
Being able to compile stuff on CP/M wasn't much help if you wanted to develop MS-DOS applications.
I first used C in 1983 on MS-DOS, I didn't use UNIX until a couple of years later. I bought Turbo Pascal 1.0 when it was released but already had a C compiler at that point.
We only got to buy the compilers that were available on the computer local store, not always 100% original or find some magazine and order international via post.
BBS access was only available to a few fortunate capable of paying the high connection rates and the modem in first place.
We got to do with what was available to us and could afford to pay.
Some of my first Assemblers were taken from the Input magazines and typed in, because there was nothing else.
This is exactly the main problem with Haskell. A stunning language with a lousy standard library. In my opinion, Haskell should offer arrays and maps as built-ins (like Go) and ship with crypto, networking, and serialization in the standard library (I know serialization is already there, but everyone seems to prefer Cereal, so...)
> (I know serialization is already there, but everyone seems to prefer Cereal, so...)
This is precisely why shipping things in the standard library is a bad idea. It ends up full of cruft that no-one uses because there are better alternatives.
>Haskell should offer arrays and maps as built-ins
Why? What does that gain?
The standard platform provides Data.Map for maps, Data.Vector for arrays, and Data.Sequence for fast-edit sequences.
It's not even clear what a "built-in" array or map in Haskell would even look like, or what semantics it should have. Especially in a pure functional language, you need to be clearer about what your intentions are. A regular mutable packed array won't work most of the time.
> This is exactly the main problem with Haskell.
> A stunning language with a lousy standard library.
I dream of the day where we can say that the main problem of Haskell is which libraries are included in the standard library. To me, we would already have reached programming nirvana at that point.
Haskell's actual problem isn't the lack of a comprehensive standard library, but rather the presence of core language features that actively hinder large-scale modular programming. Type classes, type families, orphan instances and flexible instances all conspire to make as difficult as possible to determine whether two modules can be safely linked. Making things worse, whenever two alternatives are available for achieving roughly the same thing (say, type families and functional dependencies), the Haskell community consistently picks the worse one (in this case, type families, because, you know, why not punch a big hole on parametricity and free theorems?).
Thanks to GHC's extensions, Haskell has become a ridiculously powerful language in exactly the same way C++ has: by sacrificing elegance. The principled approach would've been to admit that, while type classes are good for a few use cases, (say, overloading numeric literals, string literals and sequences), they have unacceptable limitations as a large-scale program structuring construct. And instead use an ML-style module system for that purpose. But it's already too late to do that.
How are type families worse than fundeps? That's a pretty ridiculous assertion; the things you can do with fundeps are strictly fewer than the things you can do with type families.
> The principled approach
You're dead wrong. The principled approach here is dependent types and full-featured type-level functions. Fundeps are a hack that let you implement a small subset of such functions (while type families gets us a bit closer to the ideal).
> they have unacceptable limitations as a large-scale program structuring construct.
Such as?
> And instead use an ML-style module system for that purpose.
How about we just use C macros for parametricity?
ML-style modules have their uses, but they aren't nearly as elegant as a clean type-level solution.
> How are type families worse than fundeps? That's a pretty ridiculous assertion; the things you can do with fundeps are strictly fewer than the things you can do with type families.
It's not about how much you can do (otherwise, just use a dynamic language, you can do everything, even shoot yourself in the foot!), it's about whether the result makes sense, and how much effort it takes to make sense of it.
> You're dead wrong. The principled approach here is dependent types and full-featured type-level functions. Fundeps are a hack that let you implement a small subset of such functions (while type families gets us a bit closer to the ideal).
You wanna play the dependent type theory card? Type families as provided in Haskell are incompatible with univalence.
type instance Foo Bool = Int
type instance Foo YesNo = String
Please kindly provide the isomorphism between `Int` and `String`.
Case analysis only makes sense when performed on the cases of an inductive type, which the kind of all types is not.
> Such as?
The insistence on globally unique instances?
> How about we just use C macros for parametricity?
What does this even mean?
> ML-style modules have their uses, but they aren't nearly as elegant as a clean type-level solution.
> You wanna play the dependent type theory card? Type families as provided in Haskell are incompatible with univalence.
Hi. As someone that knows type theory and knows homotopy type theory and also knows Haskell well I would pose the following question to you: what purpose on god's green earth would be served by introducing univalence directly to haskell?
(Oh, and furthermore, you realize that fundeps have precisely the same issues in this setting?)
Contrariwise, don't you find it _useful_ that we can have two monoids, say And and Or, which have different `mappend` behaviour?
Now, can you imagine having that feature and _also_ respecting the idea that set-isomorphic things should be indistinguishable? How?
> what purpose on god's green earth would be served by introducing univalence directly to haskell?
Generally, when I want to reason about tricky data structures, what I do is:
(0) Define a set-isomorphic auxiliary type that's easier to analyze, and whose operations are easier to implement, but have worse asymptotic performance.
(1) Prove that transporting the operations on the auxiliary type along the isomorphism yield the operations on the original tricky type.
I need univalence for this argument to hold water.
> (Oh, and furthermore, you realize that fundeps have precisely the same issues in this setting?)
Type classes are already Haskell's controlled mechanism for adding ad-hoc polymorphism “without hurting parametricity too much”. I consider it healthier to reuse and extend this mechanism (which is what functional dependencies do) rather than add a second one for exactly the same purpose (type families).
> Contrariwise, don't you find it _useful_ that we can have two monoids, say And and Or, which have different `mappend` behaviour?
Sure. In ML, I'd just make two structures having the MONOID signature. Haskellers have this wrong idea that the monoid is just the type - it's not! A monoid is a type plus two operations. Same carrier, different operations - different monoids.
> Now, can you imagine having that feature and _also_ respecting the idea that set-isomorphic things should be indistinguishable? How?
Yes. Acknowledging that an algebraic structure is more than its carrier set.
> I need univalence for this argument to hold water.
No, you don't. Univalence is the axiom that transporting operations across such equivalences _always_ works. If you're doing equational reasoning directly it doesn't arise.
Furthermore, all you need to do is to establish that the _type operations_ regarding one type respect the equivalence to the other type as an additional step.
As you say "a monoid is a type plus two operations" -- so fine, we can treat the monoid And as the type bool and the dictionary of operations on it, and all this still works out.
> No, you don't. Univalence is the axiom that transporting operations across such equivalences _always_ works.
Sure, but the strategy I outlined is risky (as in “may lead to getting suck and having to undo work”) in a language where this isn't guaranteed to work.
> As you say "a monoid is a type plus two operations" -- so fine, we can treat the monoid And as the type bool and the dictionary of operations on it, and all this still works out.
Yup, but Haskell doesn't let you define types parameterized by entire algebraic structures. It only lets you define types parameterized by the carriers of algebraic structures.
> otherwise, just use a dynamic language, you can do everything, even shoot yourself in the foot!
Type classes allow huge flexibility while maintaining type safety, to a much greater degree than fundeps allow.
> it's about whether the result makes sense
Which they do. Perhaps you have some examples of when type families confused you or made you perform an error?
> Type families as provided in Haskell are incompatible with univalence.
TFs aren't dependent types. However, they are on the right track. Fundeps are farther away from the right idea. Could you explain to me what's wrong with your example? I'm not up to date on HoTT, but it seems like there's nothing in principle wrong with pattern matching on elements of *. That seems like an important feature of type-level functions.
>The insistence on global unique instances?
Why is this a problem? It makes sense from a theoretical perspective (we don't associate multiple ordering properties with the things we call "the integers"), and it's very easy to use newtype wrappers to create new instances if needed.
> What does this even mean?
ML modules are flexible, but backwards from a theoretical perspective. Parametricity is something that should be embedded in the type system, not the module system.
> See here
Interesting example. However, I doubt that the syntactic cost of using such a system is less than the syntactic cost of enforcing global instance uniqueness and using newtype wrappers.
> TFs aren't dependent types. However, they are on the right track.
Dependent types are a good idea. The way Haskell attempts to approximate them is not. Parametricity is too good to give up. With the minor exception of reference cells (`IORef`, `STRef`, etc.), if two types are isomorphic, applying the same type constructor to them should yield isomorphic types.
You know what type families actually resemble? What C++ calls “traits”: ad-hoc specialized template classes containing type members.
> Fundeps are farther away from the right idea.
Functional dependencies are a consistent extension to type classes, which don't introduce a second source of ad-hoc polymorphism, unlike type families.
> Why is this a problem? It makes sense from a theoretical perspective (we don't associate multiple ordering properties with the things we call "the integers"),
What if I want to order them as Grey-coded numbers? In any case, the integers are far from the only type that can be given an order structure, and many types don't have a clear “bestest” order structure to be preferred over other possible ones.
> and it's very easy to use newtype wrappers to create new instances if needed.
Creating `newtype` wrappers is easy at the type level, but using them is super cumbersome at the term level.
> ML modules are flexible, but backwards from a theoretical perspective.
> Parametricity is something that should be embedded in the type system, not the module system.
It's type families, as done in Haskell, that violate parametricity! Standard ML has parametric polymorphism, uncompromised by questionable type system extensions.
> Interesting example. However, I doubt that the syntactic cost of using such a system is less than the syntactic cost of enforcing global instance uniqueness and using newtype wrappers.
I can't imagine it being more cumbersome than wrapping lots of terms in newtype wrappers just to satisfy the type class instance resolution system.
>Um, aren't functional dependencies an add-on to multiparameter type classes?
You're right, I meant "type families".
> I defined two type instances that violate the principle of not doing evil:
We're not doing abstract category theory; we're writing computer programs (well, I am). Have you ever run into a problem with type families in that capacity?
>if two types are isomorphic, applying the same type constructor to them should yield isomorphic types.
Agreed, but there's a difference between type functions and type constructors. TFs are (a limited form of) type functions. Value-level constructors admit lots of nice properties that value-level functions do not, and I see no reason to be uncomfortable with this being reflected at the type level.
> What if I want to order them as Grey-coded numbers
Use a newtype wrapper. Even if a language allowed ad-hoc instances, I would consider it messy practice to apply some weird non-intuitive ordering like this without specifically making a new type for it.
> Creating `newtype` wrappers is easy at the type level, but using them is super cumbersome at the term level.
And using ML-style modules is easy at the term level, but cumbersome at the type level.
It's a tradeoff, and I suspect that newtypes are usually the cleaner/easier solution.
> ML modules are plain System F-omega
I hadn't seen the 1ML project. That's pretty cool.
> It's type families, as done in Haskell, that violate parametricity!
How so? I really don't understand your argument here, if you just take TFs to be a limited form of type function.
> We're not doing abstract category theory; we're writing computer programs (well, I am). Have you ever run into a problem with type families in that capacity?
I like being able to reason about my programs. For that to be a smooth process, the language has to be mathematically civilized.
> Agreed, but there's a difference between type functions and type constructors. TFs are (a limited form of) type functions.
By “type families”, I meant both data families and type families. Case-analyzing types is the problem, see below.
> And using ML-style modules is easy at the term level, but cumbersome at the type level.
Actually, ML-style modules are also more convenient at the type level too! If I want to make a type constructor parameterized by 15 type arguments, rather than a normal type constructor in the core language, I make a ML-style functor parameterized by a structure containing 15 abstract type members.
> How so? I really don't understand your argument here, if you just take TFs to be a limited form of type function.
“In programming language theory, parametricity is an abstract uniformity property enjoyed by parametrically polymorphic functions, which captures the intuition that all instances of a polymorphic function act the same way.”
> I make a ML-style functor parameterized by a structure containing 15 abstract type members.
You can do this in Haskell with DataKinds (you just pass around a type of the correct kind which contains all the parameters). Admittedly, it is quite clunky at the moment. I did this to pass around CPU configuration objects for hardware synthesis a la Clash, as CPU designs are often parametrized over quite a few Nats.
> parametricity is an abstract uniformity property enjoyed by parametrically polymorphic functions
Whenever one introduces a typeclass constraint to a function, one can only assume that the function exhibits uniform behavior up to the differences introduced by different instances of the typeclass. There is no particular reason to assume that (+) has the same behavior for Int and Word, except insofar as we have some traditional understanding of how addition should work and which laws it should respect. The same is true for type families. It is not a problem that they introduce non-uniform behavior; we can only ask that they respect some specified rules with respect to their argument and result types.
Case-analyzing types in type families is no worse than writing a typeclass instance for a concrete type. Would you say that the fact that "instance Ord Word" and "instance Ord Int" are non-isomorphic is a problem? After all, the types themselves are isomorphic!
> Whenever one introduces a typeclass constraint to a function, one can only assume that the function exhibits uniform behavior up to the differences introduced by different instances of the typeclass.
Of course.
> Would you say that the fact that "instance Ord Word" and "instance Ord Int" are non-isomorphic is a problem? After all, the types themselves are isomorphic!
It's already bad enough, but at least the existence of non-uniform behavior is evident in a type signature containing type class constraints. OTOH, type families are sneaky, because they don't look any different from normal type constructors or synonyms.
>OTOH, type families are sneaky, because they don't look any different from normal type constructors or synonyms.
That is fair.
I think we're on the same page at this point. You have made me realize that ML-style modules are useful in ways I did not realize before, so thanks for that.
Question: How would you feel if the tradition was to do something like
insert :: Ord a f => a -> Set f a -> Set f a
That is, "f" is some type that indicates a particular ordering among "a"s. Then, "Set"s are parametrized over both "f" and "a", and one cannot accidentally mix up Sets that use a different Ord instance.
Seems a lot more cumbersome than the direct ML solution:
signature ORD =
sig
type t
val <= : t * t -> t
end
functor RedBlackSet (E : ORD) :> SET =
struct
type elem = E.t
datatype set
= Empty
| Red of set * elem * set
| Black of set * elem * set
(* ... *)
end
structure Foo = RedBlackSet (Int)
structure Bar = RedBlackSet (Backwards (Int))
(* Foo.set and Bar.set are different abstract types! *)
> Parametricity is too good to give up. With the minor exception of reference cells (`IORef`, `STRef`, etc.), if two types are isomorphic, applying the same type constructor to them should yield isomorphic types.
You know that's not what parametricity means, right? Like, at all?
Here's a challenge.
`foo :: forall a. a -> a`
Now, by parametricity that should have only one inhabitant (upto iso). Use your claimed break in parametricity from type families and provide me two distinct inhabitants.
i should have specified "modulo bottom" because i somehow didn't cotton i was talking to someone more interested in pedantry than actual discussion.
that said, constructing an inhabitant of false a _different_ way (when we can already write "someFalse = someFalse") is not particularly interesting, and again doesn't speak to parametricity in any direct way.
The lack of a standard library can be fixed relatively easily: write libraries! OTOH, the existence of anti-modular language features that are extensively used in several major libraries, is a more serious problem, because:
(0) It means that libraries in general won't play nicely with each other, unless they're explicitly designed to do so.
One of the common mantras I've heard among Rust core devs is "std is where code goes to die". Where do you feel the line should be drawn between standard lib and external libraries?
The counterpoint being Python simplejson vs json. Most working Python developers I know try simplejson first (when they are not controlling dependencies in the environment) and fall back to stdlib json because simplejson got much faster as it evolved outside of the standard library[0]. Most who don't know this go the other way[1].
There are a number of counterpoints in Python, in fact, which epitomizes the "standard library is where code goes to die" thing. Adding modules to the standard library in Python is, more often than not, overall a bad thing for the module. Python has not historically been awesome with standard library quality, either; see Java-style logging and unittest (I mean naming, not "Java idiomatic," which I think is fine for both).
This comes down to release cycles for the language, mostly. So I think API stability is a bit of a red herring when discussing Python, at least.
I tend to appreciate languages where I can remove the entire standard library and "start over," like C. (Yes, you can.) This can be good for a number of things: porting, embedding, frameworks, and so on.
Case by case. If you absolutely need the speed, go with what gives you the boost. Otherwise, I always encourage people to use json and not have to have an additional external dependency. One of the reasons some people was using simplejson isn't really speed IMO, but because json module was not in stdlib until what, late 2.6?
I try to keep my dependency list as tiny as possible, and use what makes sense for my development and for future maintenance. Also, look at the result, in Python 3, json module beats simplejson.
It wasn't "late" 2.6 (that's not how Python releases work for changes like that), it was 2.6, which was October 2008. Nearly eight years ago. Most distributions are even on Python 2.6 now.
Anyway, my point isn't the specific example. That you and I even have this discussion at all and that there are hundreds of thousands of caught ImportErrors on that specific example on GitHub is my point regarding standard library stability; folks seem to think the standard library is the end-all (wherein we wouldn't be having this conversation at all), but Python has shown it is anything but when not carefully maintained. I think Rust is wise to approach this with caution.
Honestly, I'm not extremely familiar with Rust, but it seems it elected the C approach where you can gut the language. A+. Good. How it should be for a systems language like that, because now it can be ported, embedded, and so on.
> That you and I even have this discussion at all and that there are hundreds of thousands of caught ImportErrors on that specific example on GitHub is my point regarding standard library stability;
That fact that thousands of files catching ImportError does not necessarily implies folks are questioning stdlib's stability. That merely means some people are deliberately choosing to prefer simplejson over json. The benchmarks demonstrated json module before Python 3 could be slower than simplejson, but json module since Python 3 has beaten simplejso in terms of speed of execution. Furthermore, there are old Stackoverflow threads on usjon vs simplejson vs json regarding performance. All the above would naturally suggest folks who choose to prefer simplejson over json is due to the concern of speed, rather than opinion on stability.
Also, stability is the wrong term for the problem you are describing. Agility is probably the better word. Python release tends ot be backward compatible (of course except Python 2 vs Python 3 and a few other modules like asyncio). Python core developers try not to break applications. If anything, non-core libraries will break compatibility more frequently without having to face larger opposiitons; I can break simplejson if I were the maintainer of simplejson. The consequence is maybe a couple angry GitHub issues and a few blog posts, unlike Python 3 which still gets a lot of angry media coverage till this day.
The problem with stdlib is absolutely about agility. The core community is extremely small. It can take many weeks and sometimes months to get your commit merged. The reason I like to keep stdlib around is good citizenship. I would love to have requests in the stdlib, but in a more agile and more frequent release. Python isn't the only player. OS distro are also responsible for the slowness. There's been discussion on python-dev regarding more frequent release and even potentially breaking up stdlib could be an option for the Python community.
The way I see it, the packages that support both do so because they know over 90% of users are satisfied with the performance of the standard library package and don't want to install extra dependencies to get the library or utility to work.
Even more code just use the standard json package without any fallback. The ease of development or deployment is clearly worth more to them than what small speed advantage they can get from going with the external dependency.
The calculus will be different for Rust, of course, with different build and deployment system.
In the Ruby world, very few people use the standard library because it's got so many flaws, and they can't be fixed. So you end up with Nokogiri rather than REXML, all the various HTTP libs rather than net/*, etc. So it just ends up being bytes sent over the wire, wasting disk and bandwidth...
I wonder if identifying the atomic aspects of what you intend your language to be used for ultimately helps in narrowing down what should be in std lib.
Go prioritizes network programming and bundles the necessary components, like http & rpc servers and json.
The http libraries are extensible enough to allow for customization where it's wanted (like http mux) while still creating a canonical implementation that'a still viable.
Has Rust identified the core demographics of who they're targeting in order to provide the most applicable platform? Is the target everyone and all application type, therefore there is no default platform?
Edit: To put it another way, is there a set of packages that is either necessary for rust, rust development, or most development in rust? If std lib includes everything necessary, then who are you targeting with the default platform?
> Has Rust identified the core demographics of who they're targeting
> in order to provide the most applicable platform? Is the target
> everyone and all application type, therefore there is no default platform?
Our target audience is still a bit too broad; "systems programming" can mean a lot of things. Application developers build a _lot_ of different applications, those who embed Rust in other languages have different set of requirements, OS/embedded devs have another. There's a lot of stuff in common, but there's also significant differences.
Well, the trick is to actually get it right before standardizing it - much easier said than done. Keeping the standard library small helps with that since the bar is higher.
Go isn't immune to the problem either. See the `flag` package, which is something that new users are encouraged to avoid in favor of e.g. https://github.com/jessevdk/go-flags .
If you want to write command line apps that conform to the GNU flags convention you can't use the "flag" library. I wrote my own simple getopt implementation (github.com/timtadh/getopt) years ago so I could just get some work done. It works fine and has no dependencies. I write a lot of complicated command line applications and having a small simple getopt implementation makes it a lot easier. Sometimes a higher level tool would be nice but I have never found one I actually like.
Being able to tightly control how the sub-commands chain together is important to me. Support for both short (-s) and long (--long) options make it easy to write both one off commands and self documenting commands in scripts and makefiles.
I write programs in more languages than just Go, and the programs in Go need to work the same way the programs in other languages work. That means GNU option syntax, which is the superior syntax for my needs in any case.
I think that Go can pull off a good standard library because there's a big corporate sponsor behind it, whereas Ruby may have had difficulty with its standard library for the lack of a sponsor.
Standard doesn't mean completely done. Standard should be able to accomodate things like HTTP2, as Go has done, whether that means expanding the API or whatever.
The "big corporate sponsor" argument often comes up when discussing language success. Google doesn't really put more than a few people's time into Go, the rest is open source. Other languages like Python didn't have any real backing until way after success.
It's too early to draw conclusions about Go's standard library. Python's standard library seemed like a good idea at the time too. Come back in 15 years and let's see how good it looks then.
It does all the wrong things; singletons, no testability, cgo for implementations, side effects and you have to use every database differently based on their individual semantics.
Virtually everyone I've ever spoken to either uses a high level wrapper around the sql library or a no-sql solution.
That's the definition of 'stdlib is where packages go to die'.
It's not that the API is unusable, it's just basically not used by the community because there are other better things out there...but you're stuck with it forever, because it's there and some people do use it, and changing or removing it would be a breaking change.
Anyhow, we're just speculating. Does anyone actually collect metrics about the usage of different parts of the stdlib for any language?
Without hard data to back it up, you couldn't really make a strong argument either way.
I didn't see anyone actually mention sql so I'll just assume your first line is to be interpreted as "sql is the counterexample of why Go's standard library is not as great as it may seem."
>Virtually everyone I've ever spoken to either uses a high level wrapper around the sql library or a no-sql solution.
How does that reflect the quality of the std lib implementation? All the high-level wrappers I've seen still utilize database/sql, they just provide convenience methods on top of the existing functionality. Are people using NoSQL databases because database/sql is so bad or merely because that technology fits their project's requirements?
>That's the definition of 'stdlib is where packages go to die'.
steveklabnik's example of Ruby XML parsing libraries is a better example of this, if only because the std lib implementations are almost completely ignored by all other gems. Go's database/sql is actively used outside of the std lib to great affect, whether in wrappers and ORMs or in implementing other SQL databases (like Postgres).
> "Sql?
It does all the wrong things; singletons, no testability, cgo for implementations, side effects and you have to use every database differently based on their individual semantics."
SQL has its flaws, but it is testable. The testing approaches available vary depending on the implementation. For example, can write unit tests for SQL Server (using tSQLt, to give one example: http://tsqlt.org/ ).
String: Linked list of Char. Nice for teaching, horrible in every other aspects.
Text and lazy Text: modern strings, with unicode handling and so on.
ByteString and lazy ByteString: these are actually arrays of bytes. Used to represent binary data.
Because haskell is lazy by default, and sometimes you want strictness (mostly for performances), there are two variants of Text and ByteString, and going from one flavor to the other requires manual conversion.
Risking to go off-topic a bit, I think the lazy versions of Text and ByteString wouldn't have been needed if we had nice abstractions for streams (lists are not, they cause allocation we cannot get rid of) so that you don't need to implement a concrete stream type (e.g. lazy Text and lazy ByteString) for every data type.
The problem is that streams actually have very complicated semantics when they interact with the real world. What does it mean to traverse an effectful stream multiple times? Can you even do that?
Data.Vector provides a very efficient stream implementation for vector operation fusion, but it's unsuitable for iterators/streams that interact with the real world. Pipes, on the other hand, combined with FreeT, provides good, reasonable semantics for effectful streams.
As with many other things, Haskell forces you to be honest with what your code is actually doing (e.g. streaming things from a network) and this means that there's no one-size-fits-all implementation we can stuff everything into.
Just sticking with the pure types there's currently no generic stream model that works well. No stream fusion system fuses all cases (even in theory) and they also fail to fuse the cases they're supposed to handle too often in practice.
I haven't looked at pipes, but I'm guessing it doesn't all fuse away either.
You're right, I believe Haskell's fusion framework could be greatly improved (although it is the best production solution I'm aware of). However, how would you go about solving this? I don't think there's any generalized solution to the problem of creating no-overhead iteration from higher-level iterative combinators.
> Haskell's fusion framework could be greatly improved (although it is the best production solution I'm aware of). However, how would you go about solving this?
Given that we're in a rust thread... are you familiar with rust's iterator fusion [0]? Basically there are three components: iterators (something like a source), iterator adapters (where all manipulations happen), and consumers (something like a sink). LLVM will compile all the iterator adapters into a single manipulation such that the underlying stream/vector/whatever only goes through it once.
I personally like it much better than Haskell's. With rust the fusion is guaranteed to happen, although it makes the types a little verbose and tricky to work with, but with, e.g., Haskell's Text's stream fusion I was never really sure that it was working, or if I could do something to prevent it. It seems like in Haskell it's more of a behind the scenes optimization that you hope kicks in, rather than designed into the types. Or do I misunderstand? I only dabbled in Haskell.
Yes, I have used Rust a bit. Basically the primary difference is (and correct me if I'm wrong) you can't re-use an iterator in Rust without cloning it. On the other hand, you can use a Haskell pure stream object as many times as you want (without explicit cloning, because "draining" an iterator is stateful), so fusion becomes a bit of a more complicated problem.
If I had some Haskell code that was like
map f . map g . filter x . map y $ stream
It would almost certainly get fused into a single low-level loop without extraneous allocations. However, I can also do something like
foo = map y $ stream
bar = map f . map g . filter x $ foo
baz = map z $ foo
And now what do you do?
Haskell's fusion is also more general, because it allows you to do pretty much arbitrary syntactic transformations.
Unfortunately, this means it's somewhat fragile and is easy to prevent from functioning. Rust can guarantee fusion because you're restricted in the kinds of things you can do with iterators.
On the other hand, Haskell's Pipes restrict you from doing things like re-using an iterator, and I'm not sure what the optimization story is there.
And they work! It's not stream fusion, but the composed functions being applied to whatever container or stream of values are applied per-value, so (map (comp xf1 xf2)) applied to [1 2 3] applies (xf2 (xf1 1)), (xf2 (xf1 2)), and so on, with similar allocation savings to stream fusion.
>Conversions between our 5(!) string types are very common.
All five of those string types do different things. This isn't a problem; we just have increased expressivity. We couldn't fix this by having a more coordinated standard library. 5 is also a very manageable number IMO.
>It's too difficult to make larger changes as we cannot atomically update all the packages at once.
If a language is to play the long game, they must be conservative on what they
add. Even a minimal runtime like Node is still wounded by the addition of a few
broken interfaces into the core platform (even emitters, streams, domains to
name a few). These cannot be stripped out of the platform because they're
"blessed" and now everyone will forever have a bad time. I suggest we don't
do that for Rust.
For a language to remain relevant in the long term, a system must be capable of
evolving. Going out there and blessing potentially conceptually incorrect
packages such as Mio is therefore not a good idea. The notion of "platforms"
best resides in userland, where collections can compete and evolve.
By keeping platforms / package collections in userland we can get organic
collections of stuff that make sense to bundle together. Imagine a "server
platform", an "event loop platform", a "calculus platform", a "programming
language platform". It would be detrimental to creativity to have all these
collections live in the shadow of the "rust committee blessed platform".
But so yeah, I'm not opposed to the idea of platforms - I just don't think
blessing userland stuff from the top down is the right play in the long term.
Tl;Dr: package collections sound like a cool idea, but if you care about the
long term, imposing them from the top down is bad
> These cannot be stripped out of the platform because they're "blessed" and now everyone will forever have a bad time. I suggest we don't do that for Rust.
It sounds like the proposal in the OP avoids this problem by having versioned platforms that are independent of the Rust version. So if something turns out to be a bad idea, it can be stripped out of later platform versions and replaced with something better without disrupting users of the older platforms.
Yeah, I really liked how DirectX did versioning in that respect. Lets you improve the API while preserving backwards API and signature. If you can give Microsoft something it's that they do backwards compatibility well.
I don't see why that couldn't be worked into this idea.
> By keeping platforms / package collections in userland we can get organic collections of stuff that make sense to bundle together. Imagine a "server platform", an "event loop platform", a "calculus platform", a "programming language platform". It would be detrimental to creativity to have all these collections live in the shadow of the "rust committee blessed platform".
That's exactly what we want to do. We don't have the domain expertise in many of these areas, but we want to enable those ecosystems to help develop their own platforms. It's one of our goals to make sure any infrastructure we develop for the "Rust Platform" is usable throughout our community.
I'm a bit worried when a language develops a "platform" and an "ecosystem". This usually means you need to bring in a large amount of vaguely relevant stuff to do anything. It adds another layer of cruft, and more dependencies.
Write standalone tools, but don't create a "platform". Don't make the use of the language dependent on your tools.
C++ does not have a "platform". Nor does it need one.
C++ has innumerable de facto "platforms" and "ecosystems". You have to choose one to get anything done, whether that be various Boost libraries, POSIX, Win32, Cocoa, Qt, GTK(mm), even stuff like XPCOM…
Helpfully, many of these platforms reinvent basic things like strings [1] and reference counted smart pointers [2] in incompatible ways.
Wouldn't it be better if there were just one platform?
And notably, C++11 actually moves in this direction, standardizing things like smart pointers [0][1]. It's a very smart move for Rust. Core or near-core library wars in the early days of adoption of a language leads to duplication of effort, and for those invested in seeing Rust gain a set of libraries to rival other languages, this is a great thing.
Actually, C++'s STL is in a weird situation, compared to the standard library in other languages, because the STL is a spec, not an implementation. And there are as many implementations of the STL as there are compilers. This might arguably happen if there were multiple Rust compilers, though. Anyways, the result on the C++ STL is that in many cases, the same types have different performance characteristics on different platforms, or worse, different behavior/bugs.
I work on multiple medium-sized projects that disagree. If you're not writing GUI code, it's quite possible to write 99% platform agnostic code without the help of a 3rd standard library supplement, especially with C++11.
It's possible, yes. But (a) large pre-existing industry codebases make use of their legacy libraries that predate "modern C++"; (b) that directly contradicts the parent poster's point, in that you're saying that having a standard platform is a benefit.
Some languages have also seen great success with having a standard platform but still allowing users to replace it as needed. The many Haskell Preludes and Jane Street's ocaml core are two such examples.
I think the ability to opt out of the rust-platform metapackage is a great feature.
We already have `std` for that. The point of a "Rust platform" is that you can have confidence that the libraries you are using are of decent quality, reasonable popularity, and will be maintained.
I get the feeling that you didn't really read the post. Nothing in the platform is required to use Rust, and you can trivially write Rust packages that don't use the platform. The point of the platform is for convenience. In most cases it will make sense to use it because it provides a convenient set of libraries that are known to work well together, but you could also choose to just ignore the Rust platform entirely and continue to use Rust the same way we've been using it up to now.
"In general, rustup is intended to be the primary mechanism for distribution; it’s expected that it will soon replace the guts of our official installers, becoming the primary way to acquire the Rust Platform and all that comes with it."
Then, of course, the other installers will gradually break and be abandoned. The effect is that users must run the "Rust Platform", unless they have the resources to build their own distro.
Is there a monetization scheme behind this? Does someone aspire to be the Canonical of the Rust ecosystem?
Rustup being the mechanism to install the Rust Platform does not mean that installing the Rust Platform is required to use Rustup. Rustup is an existing tool that installs Rust for you, and it's highly likely that you'll be able to use it in the future to install just Rust or to install the whole Rust Platform at your discretion.
> Is there a monetization scheme behind this? Does someone aspire to be the Canonical of the Rust ecosystem?
This seems like a complete non-sequitur. I have no idea what you're trying to suggest here.
I don't see anything in this proposal which indicates that rustup would by default install any of this, and it's already the primary target of development efforts, Rust Platform or not.
What other installers are you referring to, exactly? The old rustup.sh which couldn't support multiple toolchains installed alongside each other? multirust which didn't work on Windows? rustup is a massive improvement over both, IMO.
I'm sorry, but this is incoherent FUD. Do you expect the Rust project to maintain multiple installers? Why? What does this have to do with monetization?
Go watch recent talks by Herb Sutter and Bjarne Stroustrup--they both lament the fact that C++ never developed as strong a standard library as Python, etc. With C++14 and beyond the C++ working committee is actively trying to make the language and libraries more complete and comparable to larger libraries out there.
In a low-level language like C there's _never_ a right solution out of the box. Instead, you use the language because it permits you to tailor the solution to the problem.
In a high-level language like Python there's always a right solution. It's just rarely well-tailored to the problem.
I'm not at all surprised C++ is still muddling around in the middle somewhere.
Qt / Boost / .net are C++ platforms. The difference is you can choose one or none, and the OP actually explicitly talks about how important it is not to have one absolute blessed platform like Java has.
Haskell Platform is the last thing you should take inspiration from. Many of us have been doing our best to kill it off. Maybe the downsides involved wouldn't affect Rust in the same ways.
My suggestion, look at how Stack (the tool) and Stackage (the platform) work:
Most of your arguments linked in a comment below are unrelated to the application of metapackages to cargo. Cargo already includes a good portion of the behaviors found in Stack and Stackage. The idealogical battle of Stack vs Haskell Platform is irrelevant to this proposal.
Haskell Platform is a perfectly adequate example to take high-level inspiration for the core of this idea: use Cargo (which is like Stack/Stackage for rust) to help bootstrap Rust libraries when using rustup to install and upgrade (mostly equivalent to `stack setup`).
The one thing stack does do nicely that this proposal doesn't is allow curated comparable versions without forcing course dependencies. In plainer English, stack can pick the versions while you still opt in pack by package. I beleive that this is crucial to not slow down the evolution of the ecosystem.
In Cargo jargon, a solutuon would be for metapackages could double as sources: `foo = { metapackage = bar }` to use foo from bar.
I like the general theme of having the platform be just a set of known-compatible versions, but on the other hand this feels like it loses out on many of the ease-of-use advantages of just specifying a platform version and knowing that you have all the crates inside of it.
I don't think we need to pick one way exclusivity. My specific use-case is making PRs for packages to work in kernelspace / in unikernels. The library night initially be packaged the easy way but then I'd use this. I don't want the PR recipient to also worry they might get out of sync with the platform as a side effect.
If I'm understanding your concern correctly, that's totally a part of the proposal:
But we can do even better. In practice, while code will continue working
with an old metapackage version, people are going to want to upgrade.
We can smooth that process by allowing metapackage dependencies to be
overridden if they appear explicitly in the Cargo.toml file.
I actually cross-posted this to /r/haskell to get explicit feedback. Someone else mentioned stack/stackage. In my understanding, Cargo already does this specific behavior. Can anyone who's more familiar with both confirm this?
I checked that other thread, I agree completely with jeremyjh's summary.
The problem with Haskell Platform came down to a set of intersecting issues:
- A culture of setting very strict & narrow version bounds. (Based on known-good rather than based on avoiding known-broken)
- Tools (Cabal) that enforce dependency version bounds. If there's mutual/transitive incompatibiltiy in version bounds - the build fails, period. You had to figure out the problem and fix it yourself if there was a truly irreconcilable issue.
- Recommended installer on the website (Platform) was unnecessarily installing packages into the global package database, making you "stuck" with those versions for _all_ packages you attempted to build. Cabal was restricted to finding builds that abide by dependency versions narrowed to the ones provided by the global package database.
These problems led to beginners being confused by seemingly spurious build failures because Platform would fall out of date with the rest of the ecosystem. Cabal would be unable to find sets of compatible dependencies and say it couldn't build the package.
What non-beginners were doing to avoid these problems was:
1. Install the bare compiler, no Platform
2. Use package database sandboxes for each project
All of these (UX and technical) problems were solved without compromising dependency conflict enforcement via Stackage and Stack.
Speaking hypothetically, if Cargo behaves like Maven or Ivy and pulls in two dependencies who want conflicting versions (1.1 and 1.2, say) of a particular library and just picks a winner, then you'll never see something like this.
From my perspective: one of the biggest issues in cargo right now. I know it's not the same as the HP problem, but current cargo is definitely not good at solving this.
My email is what finally moved the committee and GHC devs on including Stack with the Platform and on some other decisions concerning the website.
Like I said, not all the downsides may be applicable to how Cargo works or what aturon has in mind, but please don't cite it as an exemplar of anything.
That's quite the email! Without a TLDR, I'm not sure how to assess how susceptible Cargo is to those problems. My personal experience with Rust has been that there are very few instances where versioning or dependency issues cause me problems. The specific instances I have dealt with would be resolved by the proposal in the blog post (trying to use a version of serde which differs from what another core-ish library wants to use, etc.).
IMO the root problem is that the Haskell Platform is both a set of curated packages and a bandaid for the fact that so many things Haskell are hard to build/install for no-good reason.
On the first front I guess it's alright (but stackage is better) and on the second front it's totally inadequate—maybe even harmful in that it probably made the problem just less painful enough to spur procrastination.
I think most of this lesson doesn't apply to rust, but still good to be aware. It certainly gave me a strong eversion to course-grained dependency management as a bandaid.
>so many things Haskell are hard to build/install for no-good reason.
Were. Were hard. It's quite fine now with Stack. I know users of all kinds of languages that all miss Stack when they're working on their non-Haskell projects.
Not sure why you're being downvoted. Haskell tooling definitely started turning around with Stack. I've used a fair amount of package managers (including cutting edge stuff like Nixpkgs) and Stack is by far my favorite.
EDIT: To elaborate on why I like Stack:
+ Fully declarative. I don't run commands to edit my Stack environment, instead I modify the relevant stack.yml file. This means that the current environment can always be easily examined, committed to Git, etc.
+ Easy NixOS integration which means I can also describe the non-haskell dependencies for a project, and enforce that _absolutely_ nothing else on my system gets used. This is amazing.
+ I like that it uses Stackage by default, and that I can pin projects to LTS Stackage releases. This means that many users of my projects will probably already have most dependencies installed -- especially cool for small things like tutorials where users won't have the patience for long build times.
+ It reuses the already existing `.cabal` file format so it's easy to make libraries compatible with cabal-install, the other package manager in the Haskell ecosystem.
Hey! So I'm deeply appreciative towards cabal-install for historical reasons. The situation before cabal-install was . . . not the best (https://www.gwern.net/Resilient%20Haskell%20Software) so I think we can credit cabal-install with a lot of the success the ecosystem has had over the last decade.
That said, nix-style builds don't actually address my issues with cabal-install. My (very personal) preference is to always have the current dependencies being used reflected exactly in a local file. I picked up this attitude from NixOS, but it's an extremely useful way of doing things -- the knowledge that as long as I don't lose that file I can always rebuild my current environment exactly is just too awesome.
I think that if you need to track version bounds as well (say you're developing a library) they should be kept in a separate file, and you should always have a master file that describes what you're currently using locally. Happily this is exactly what Stack does with my-project.cabal and stack.yml.
That said, I appreciate that others like cabal-install, so I do what I can to make my code easy to use with both package managers (the main thing is keeping aggressive upper bounds on the few libraries I maintain, which especially helps cabal-install users since they do dependency solving more).
Having used Java and having experienced how you learn to replace the JDK URL parser with something else, the JDK HTTP client with something else, the JDK encoding libraries with something else, etc., I'm worried about entrenching first-published libraries beyond their natural first-mover advantage and making it even harder for better later-comers to be adopted.
OTOH, compared to e.g. C++, where even strings aren't standard across multiple real large code bases (e.g. Gecko and Qt) having standard types especially for foundational stuff can be very useful if the foundation is right or right enough. Java having a single (though not optimal) foundational notion for Unicode (UTF-16 even though UTF-8 would be optimal) and a common efficient and spec-wise correct (though not design-wise optimal) XML API (SAX) was a key reason to develop Validator.nu in Java as opposed to e.g. Python (whose notion of Unicode varied between UTF-16 and UTF-32 based on how the interpreter was compiled!).
Still, at this early stage, I'd point to crates.io in general as the "batteries" instead of creating a layer of official approval on top. But I'm biased, because lately, I've been writing a second-to-market crate for a topic for which a crate already exists.
Good work! The idea of dropping extern crate is worrying, however. Most of the ways that that would be done would add more irregularity, complexity, and implied environment to the language (Rust Platform is always there, whether or not you want it), all of which are opportunities for bugs to creep into code.
You have modules, use, extern, cargo.toml, cargo.lock. It looks quite verbose and redundant to me. I expected it to simplify but moving more to config files seems a backwards step. I may be biases but I find the Go import/packing vastly superior in terms of usuability(i.e using a single import statement you make the import explicit and ready to use). I think the trick is to build the tools on top of the source code not on top of config files(i.e ala C/C++).
Would it still work correctly when invoking rustc on its own or from another tool? I've seen a couple of people on IRC asking about driving rustc from a non-Cargo tool.
rustc already requires passing an --extern flag for each 'extern crate' in the source, so in some sense, yes. You'd be passing that stuff along. You already have to know where those deps are on disk; the difference is that you wouldn't have a list of which deps in the source code. But if you did, it would work.
> rustc already requires passing an --extern flag for each 'extern crate' in the source
Only if it can't find the crate otherwise. Generally it just searches your library path and any directories you specify with -L, e.g. most cargo executables can be recompiled (the first compilation via cargo used to compile the dependencies) via `rustc src/main.rs -L target/debug/deps`
I mean really, why is this necessary?
The reasons for adding this much bloat are tiny and irrelevant. You get a big download full of packages you dont all need, you dont know if you need and all for what?
We all have Google, if I want an HTTP library for Rust I'll google whatever the best one is and make a judgement call myself.
This is all optional. You can just ... not add the one line to the cargo.toml and manually specify your favorite http library. Folks using many of these libraries or people new to the language can just specify rust-platform deps. This has the added benefit of bringing in versions that are known to work well together -- because of semver this isn't usually a problem, but people aren't perfect so sometimes things break. An added guarantee against that is nice to have.
You can still have a tiny install if you want to. But most people not knowing what to do will be able to have batteries included setup.
Basically: people knowing will still have freedom, people not knowing will have an easier life.
And in the end, in 2016, you generally don't care if you download a few extra 100Mo. If you do, you will just find spend the time to find to have a smaller setup.
So if you don't know what to do the solution is to download all the possible packages?Is there a user story/use case solved this way? The documentation for each package is available online so wouldn't make sense to actually read it before to download a package?
One of the great things with Python is the stdlib with a lot of things provided for you. And one of the most terrible things in JS is the total lack of it. Rust choose a middle ground: provide a plateform by default, yet let you choose to not use it. Win win.
people who dont know what to do exactly? Downloading a package? If they can't download a package then they have an entirely different problem to solve which they have to solve anyway if they want to ever productively program Rust.
If the aim of the Rust Platform is to provide a 'blessed set' of 3rd-party libraries, why not have something akin to an official, curated 'Awesome Rust', where depending on what you want to achieve you can get a streamlined list of well-maintained libraries, perhaps with user-submitted usage-examples, comments, alternatives and the like - that way, you're not constraining Rust itself to be in sync with all the 3rd-party libs, (and thus by necessity stagnating to a certain degree), nor you're making authors of libraries that are not in the Platform feel essentially invisible.
How is that different than the proposal? The only part that's maybe different is "feel essentially invisible", which is something that might happen, I'll grant you that. But I don't think that people feel invisible when some package is in a standard library, which is roughly equivalent here.
I just don't think that going beyond a simple website with 'blessed' crates has much benefit, but it certainly has a lot of disadvantages as outlined in this whole thread. I'm simply saying that if the goal is to have a central place to point to when somebody asks, 'how do I do X', then perhaps making a website/improving crates.io in this regard is the better approach for the continued growth of the still very much evolving Rust ecosystem, rather than creating a Rust "distribution" as such.
P.S. Thanks for all your work Steve, it's really appreciated.
Combine this with my stdlib deps RFC[1], and I think we could shrink the nunber standard library crates versioned with the language! The standard library metapackage would pull crates from crates.io as needed.
Can someone summarize what's going on here for lay people? I know this "Rust" thing is pretty popular on HN nowadays but not sure what the difference is between a language and a platform. I was trying to read the article, but I don't have enough knowledge about rust itself to understand what it's talking about...
Rust is the language itself. The platform in this context is the standard libraries you use for things like string manipulation, network connections, etc.
Pedantically, the standard library (strings, basic networking, containers/collections, etc.) is already well defined and quite small. The Rust Platform idea seems more aimed at providing access to curated, stabilized versions of community-developed librares for common higher-level tasks (async I/O, serialization/deserialization, etc.).
Personally, I think that letting users (especially beginners) opt in to a larger set of starting libraries could be very beneficial for adoption.
Wouldn't be enough to have list of curated libraries documented somewhere in the official rust documentation? It seems to me that a "platform" support would encourage monolitic designs(i.e packages that work only within a specific "platform")
The relationship between Boost and the C++ standard library might provide a good example (especially in recent years, as there's been sustained focus on expanding stdlib). Boost has a stringent peer review and comment period. Sometimes I think they let in designs that are too clever, but they rarely let in the clunkers that are seen in Python's standard library.
People then gain experience with it and sometimes subtle issues emerge. Those issues can be fixed, either in Boost itself or when libraries move from Boost to stdlib (and sometimes the issues inform language features).
While C++'s standard library is not "batteries included" like Python's, it's been very gratifying to see it expand slowly and surely over the last few years.
Couldn't much of the function of such a "platform" be automated?
One of the main constraints that would guide selection of crates for this platform metapackage is that they must have compatible dependencies. So package A depending on Bv1 and package C depending on Bv2 wouldn't work because they Bv1 and Bv2 would both have to be included, leading to a conflict.
But this information is (in theory) encoded in semantic versioning. Assuming proper semantic versions for crates, a target set of crates to be included could be specified, and then automatically the various sets of crate versions that do not have conflicting dependencies could be calculated.
These compatible crate/version sets could be automatically generated and published as metacrates.
Consider the following crate/version dependencies:
By imposing an ordering on these compatible sets they could be automatically identified. compatible_A_C_0 is {'Av1','Cv1'}, compatible_A_C_1 is {'Av1','Cv2'}, and so on.
Obviously the semver could be wrong and unexpected incompatibilities could crop up. But couldn't these just be autogenerated and then voted on? Then the top best compatible sets will filter to the top and, de facto, the Rust Platform has been autogenerated?
> So package A depending on Bv1 and package C depending on Bv2 wouldn't
> work because they Bv1 and Bv2 would both have to be included,
> leading to a conflict.
As mentioned in this thread, this already works just fine. Rust can handle both versions.
Furthermore, it's more than just a constraint problem; there's also integration issues, testing the whole thing together, etc.
I love seeing Aaron's work within the Rust community. I had the pleasure of studying under his father and his family's gifts are clear in both their work.
Are metapackages going to be available for others to utilize? If so, how will conflicts be resolved if packages require two different versions of the same package?
Yes, metapackages are intended to be a general cargo feature, available to all. I suspect the design has not progressed far enough to definitively answer your question about conflict resolution, but I'd imagine you have to override that dep explicitly to fix it.
The TeXLive of the Rust world? Could be helpful. But the small std probably implies a constantly revolving cast of "best practice" libraries that's hard to keep up with. I know in TeXLive there are no stability guarantees regarding the collection as a whole. It's more of a collection than a platform.
Historically, there have been a number of proposals by the Rust team that were received badly. We've always benefited from outside eyeballs on things; usually the final proposals end up much stronger. We've only sought to _increase_ this kind of thing over time.
For contrast, with .NET Core, Microsoft decided to cut up .NET's quite extensive stdlib (BCL + FCL) into packages. On paper, it looks pretty good, but it remains to see how well the versioning will work in the long run.
Small vs large standard library:
A small standard library with most functionality in independent, community-maintained packages has given us API friction as types, traits (type classes), etc are hard to coordinate across maintainers and separate package release cycles. We ended up with lots of uncomfortable conversions at API boundaries.
Here's a number of examples of problems we currently have:
- Conversions between our 5(!) string types are very common.
- Standard library I/O modules cannot use new, de-facto standard string types (i.e. `Text` and `ByteString`) defined outside it because of dependency cycle.
- Standard library cannot use containers, other than lists, for the same reason.
- No standard traits for containers, like maps and sets, as those are defined outside the standard library. Result is that code is written against one concrete implementation.
- Newtype wrapping to avoid orphan instances. Having traits defined in packages other than the standard library makes it harder to write non-orphan instances.
- It's too difficult to make larger changes as we cannot atomically update all the packages at once. Thus such changes don't happen.
Empirically, languages that have large standard libraries (e.g. Java, Python, Go) seem to do better than their competitors.