A post was recently on the frontpage of HN about using Haskell in production  that divided the common documentation experience between "hard" and "soft" docs. Far too often with Haskell you only get the 'hard' docs where you get descriptions of functionality and functions but it lacks why (and cohesively how) you would want to use the various functionality.
This makes a strong assumption you are already deeply familiar with the usecase and implementation concept.
This may apply to Rust as well. Rust will likely attract experienced developers, much like Haskell, where in most cases a decent level of code quality would be anticipated. But one of the hardest things to get right as an OSS developer is documentation. You're often so busy with the burden of maintenance that the explanatory side gets sidelined. Especially as a library and the underlying language evolves. So I hope this is a priority focus during their reviews.
There was actually a thread on /r/haskell a few months ago  in response to an HN comment. In that thread, a few libraries were mentioned as examples of having amazing docs.
For instance, Gabriel Gonzalez writes a Tutorial module for his libraries, which introduces the basics of the library and shows you the big picture. The tutorial walks you through different pieces of the library, how to use them, and how they come together.
It's hands down one of the most effective ways of documenting I've ever seen and I'd love to see other communities' take on it.
To point out a specific example, you should check out Turtle's tutorial .
It feels like the real issue is that anyone who considers themselves a writer, as well as a programmer, tends to have a blog; and that people who have blogs are incentivized to write "soft docs" as tutorial blog posts for their blogs, instead of as encyclopedia-style or cookbook-style additions to the given library's docs.
I wonder if a programming language could adopt a Code of Conduct discouraging tutorial blog-posts in favor of soft-doc PRs...
There's only so much you can put in docs, and docs can only really encompass one way of learning things.
Blog posts help teach things in other ways, and having that diversity of approaches is great.
This is why, for example, I love Julia Evans' (http://jvns.ca/) work. Most of the blog posts don't really uncover undocumented stuff. They just explore the topic in a novel way, which some people may find more accessible.
Rust is open to importing blog posts into the docs; this has happened a few times (once to one of my own posts!). So I'm happy that there are blog posts out there.
That said, if you're writing a blog post, it's good to see if there are bits than can be put into the docs too.
Instead, they should work to change the incentives -- likely by paying more attention to them as an organization, talking about people who contribute them on the project's page and blog the same as they do for major technical work, and possibly even bringing on maintainers focused on (soft) docs.
Trading attention and prestige for content might work, shaming the people who produce content for not giving you the credit for free is unlikely to.
I don't think projects give soft docs the credit they deserve, so why are people so surprised that people develop that content elsewhere, where they're recognized for it?
I'll only start doing the tutorial doc and/or blog post if others have interest in the library. There's just not enough time :/
I don't think there's a dearth of people willing to write blog posts. There's just a dearth of people willing to do the equivalent job to what Wikipedia editors do for general-purpose articles: collating those "primary sources" into a single, coherent, up-to-date overview.
See for instance the "image" crate docs: https://docs.rs/image/0.13.0/image/
A simple API doc, I don't know where to start. But then if I go to the repo I find a nice intro with some example code: https://github.com/PistonDevelopers/image/blob/master/README...
It's language design the way it should be: incorporating the cutting edge ideas from academia while still striving to cater to beginners; drawing on the strengths of other languages communities to build out good library and solutions to package management; designing everything in the open, and constantly seeking feedback from their users.
It's a great blend of theoretical CS, HCI, computer systems, and application development, and it's always fun to hear about what they're up to.
Yeah, me as well. It's a perfect marriage of academic know how and pragmatism. Very exciting year for Rust.
- "No, I stopped using Rust"
- "No, I've never used Rust"
When making a survey the alternatives for the answers are very important. I feel that none of these alternatives apply to me. That's bad. Unfortunate because I would have liked to participate in the survey.
I've done a little bit of beginner programming in Rust in order to try and learn the language. However I haven't yet used it to implement anything actually useful so I wouldn't say "yes, I use Rust". It's been a while since last I did something in Rust but I wouldn't say "no, I stopped using Rust" because to me that implies that I have decided that Rust is not for me, which is not something that I feel, I want to use it, I just keep pushing it down on the list of things to do because other more immediate desires and problems keep popping up.
*"roll a die" for those who prefer more atavistic conventions - see research on Latin plurals rendered in English over time and with regard to frequency of use.
Clojure survey had something along these lines
This can be addressed in a language with sufficient annotation and good parser tools. In some future language, there should be a unification between the version control, the de-facto codesharing site, language/library versions, and syntax-driven tools to automatically rewrite code.
It should be possible to "publish" a language and its libraries such that any breaking changes will automatically be updated when you switch library versions. (This should also be applicable to Entity-Relation diagrams and Object-Relational mappings -- those can be treated as a versioned library.)
To avoid that, you need to standardize on a blessed set of versions to be tested together, much like assembling a release of a Linux distro.
People will still swap in alternate versions of libraries occasionally, but keeping things mostly standard and a few cherry-picks is still better than everyone choosing differently.
The point is not to let people hang out in whatever obscure snowflake version-set they want to. The point is to make migration going forward as painless as possible. However, that expectation is not so much about the tooling as it is about the developer/language community.
Yes, there should be this! However, you will still have some stragglers and outliers -- this is what the historical reality shows us. The point of such tooling is precisely to minimize the pool of stragglers, not to maximize them!
That's pretty much the Stackage/LTS Haskell model. Seem to be working pretty well there.
You actually get a lot of that in Perl 6, which identified modules not only by name, but by version and authority (so you can choose to load author Bar's version of Widget instead of author Foo's version.
Is there a plan to make these things statically checkable by rustc/rustfmt/rust-tidy or some sort?
It's not about the crate being non checkable, it's about the check being something that needs a human to look at.
There's a rather computationally intensive technique called mutation testing where you introduce bugs into code (e.g. change + to -, flip conditionals, etc.) and check that tests fail.
You don't need to completely dismiss other testing techniques. Mutation testing can be useful at the same time that code coverage is useful.
This could be used, sure.
Maybe the checklist should get a more prominent link or notice in the blog post. It took me a while to find it.
In short, the blog post makes very little mention of the role of the primary author(s) of the libraries in question, beyond "Every two weeks, hold a library team meeting [...] with the author in attendance." While I imagine the reality will be quite different, this sounds an awful lot like "oi, you, code review in my office now!"
One of the attractions of working on open source is that it offers more scope for autonomy and individual recognition than the typical commercial software job. It's slightly alarming that this doesn't seem to be recognised here.
As I say, I'm sure the reality will be fine (and I remain very keen to give Rust a serious try when time permits), but the rather collectivist presentation here is a tiny bit off-putting.
Nobody is forcing stuff on random crate authors; if they didn't want to participate, then that's 100% okay.
I still feel there's a slightly-unfortunate ambiguity in the linked announcement post, though.
Words are hard.
As a prospective crate author - and every programmer is a prospective crate author - i interpreted this as saying that if i publish a crate which people find useful, i might be getting an unexpected visit from the Rust Police.
Must say that I agree with you that the authors do not seem to take a very central role in the process as described. But I see that more as an easy fix in the announcement text.
It's not just this, either, a lot of the top-down communication in the Rust ecosystem has a feel like this. I'm not sure what to make of it, except that I think their attitude is probably too authoritative.
"Vec" currently needs unsafe code, because Rust doesn't have the expressive power to talk about a partially initialized array. Everything else with unsafe code is an optimization. Often a premature one. Maps should be built on "Vec", for example.
When you start looking through Rust libraries, "unsafe" turns up way too often.
> When you start looking through Rust libraries, "unsafe" turns up way too often.
You keep making this claim without substantiation. Yes, there is some level of unnecessary unsafe, but certainly not "way too often". I recall going through all the crates in my .cargo and finding very little unnecessary unsafe, and showing you the audit: https://news.ycombinator.com/item?id=13280347
Please stop throwing around this claim without substantiation.
I think that would be a pretty easy way to put this convo to bed yea?
Have there been talks about this before? Is it worth me opening an issue/RFC on Cargo?
There are good reasons for having an unsafe-code flag on a crate. This is not one of those.
This has been discussed in the past and the main concern IIRC is that this may make people afraid to use crates that contain unsafe code even if it is well audited. There's a finer balance you want to strike here.
C is winning that benchmark, around 2x faster than Rust.
(And I was only really talking of the comparison of rust v rust, not rust v c, i.e. that ordermap is faster than the built in hashmap despite not using unsafe)
The previous discussions made a few things clear:
- The big design-level problems with data structure safety are 1) partially initialized arrays, and 2) backlinks. The first is needed for growing arrays in Vec, and the second is needed for doubly-linked lists and some kinds of trees. It's very hard to handle either of those in safe Rust. A very small number of packages need unsafe code for those functions, and those should be tightly controlled.
- Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?
pub fn strcpy(dst: *mut c_char, src: *const c_char) -> *mut c_char;
I could give many more examples.
Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged.
It only takes one hole to create a security vulnerability.
Denial is not a river in Egypt.
Maybe there's a reason it's in a repository labeled "deprecated", namely that it's deprecated. The replacement serialization framework, serde, has exactly one appearance of "unsafe", in a function that is only compiled when you explicitly enable the "unstable" feature flag and that in fact appears to be a reasonably safe use of unsafe.
Also the amount of CVEs in FOSS projects show that even the process of code review for patches isn't enough.
Those patches also come in small blocks.
That's ... exactly what I did? Talk about denial. I went through the libraries in my .cargo folder (which filters for libraries that actually get used, not just random libraries out there). I linked you to that audit in the comment.
Yes, libstd contains a lot more unsafe, but that's kind of the raison d'etre of libstd -- to contain OS-abstractions and very common internally-unsafe abstractions. It's better to have one unsafe implementation of Vec (in a library with a lot of eyeballs on it) than to have ten that the ecosystem relies on.
> which then turns up in the JSON decoder at
rustc-serialize is deprecated. It was deprecated before 1.0, and went into maintenance mode. It was still kinda-maintained because of the difficulty of using serde on stable, but now it's basically going to be full maintenance mode.
Also, that line is trivially safe to execute on untrusted input; `char` is utf8. Yes, a comment would be nice, but like I said, maintenance mode.
It is possible to do what that function does in safe Rust today, but it needs an API that probably didn't exist ~two years ago when that library was actually relevant. Fixing.
> The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?
libc is just bindings (to everything in libc). You still need to use unsafe to call those functions (everything in `extern "C"` is unsafe to call even if not marked explicitly as such). This is not a valid example.
> A very small number of packages need unsafe code for those functions, and those should be tightly controlled.
which is pretty true already? The "partially initialized arrays" problem is generally handled by just using Vec. Yes, you need unsafe to write vec, but then you can just build things out of it.
Backlinks turn up rarely -- if perf isn't involved folks just use Weak safely, but otherwise people use petgraph or something.
> I could give many more examples.
Go ahead then. You literally never have, and none of these examples are valid "unnecessary unsafe" for reasons I gave above. I did substantiate with an audit. I'm genuinely interested in finding places where we have too much unsafe code, because I'd like to avoid depending on those crates and/or fix them.
> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility.
Sure. And folks are always looking to improve this. But this doesn't mean that there's some widespread problem of unsafe being used too much in Rust. I'm not denying that this shouldn't be improved, I'm denying your allegations that `"unsafe" turns up way too often`.
It's the fourth most downloaded crate. Somebody may have "deprecated" it, but the users aren't paying attention. It's not listed as "deprecated" on its own Cargo page. There's a weak note about deprecation on its Github page, but users don't look there. It has 2,950,353 downloads, probably because other crates are pulling it in.
Will it be in the new set of "approved" packages? Or will all the crates that use it be fixed?
Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.
The most downloaded stat appears to be for the most downloaded of all time. Considering the difficulty of using serde on stable until recently, it's not surprising that it has been downloaded so much and would continue to be as projects move away from it.
> It's not listed as "deprecated" on its own Cargo page.
This is true, however as you said the GitHub page and the docs page both mention that. Unless one already knows how to use the crate, they are probably going to check the docs and see the deprecation. Although having a deprecation notice on crates.io does seem prudent.
> Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.
You kind of skipped over the part where the parent poster explained why it wasn't even a big deal in the first place. It really feels like you're grasping at straws to make a point and I don't see why. No one seems to be denying that a lot of unsafe code is potentially bad, but there is really no solid proof that there is an unreasonable amount of unsafe usage. In fact, the parent has mentioned and linked to a list of such usages. If you're going to make such an extraordinary claim I think it behooves you to provide more than just a handful of examples.
And what would you prefer a tool like Corrode use? Or programmers manually doing an initial "no refactoring" conversion pass from C or C++ to Rust? Should they all use their own ad-hoc libc bindings when converting to Rust? That sounds harder to audit busywork with no upside. Or perhaps it should be impossible to incrementally convert a C codebase, instead being forced to do an up front monolithic rewrite? Although my experience has been that leads to more debugging, bugs, pain, and abandoned rewrites.
This isn't to say that improvements cannot or should not be made, but a reminder that perfect is the enemy of the good. Yes, you should really really really not use strcpy in new code if you can possibly avoid it - and you can. And you should refactor it away in existing code ASAP. But that argument probably extends to libc in Rust code in general. Being able to test-compile if your codebase builds yet when you remove libc sounds way better than trying to hunt down and remove all your ad-hoc defines.
And as a stepping stone, I'd suspect the good outweighs the bad here. Not because there will never be a security vulnerability in Rust code because of an unrefactored strcpy in Rust (there very well may be one if there hasn't already!) - but because it will encourage conversion, and thus subsequent cleanup and refactoring, that might not have otherwise happened.
> There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.
A quick look at blame shows this originally was applied to a buffer that was encode_utf8(...)ed. The refactoring (to remove a libunicode dependency) has made the safety less clear, yes. I could see the use for a method:
let buf = v.to_str_with(&mut buf);
> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.
I encourage you to submit pull requests (at least for non-deprecated projects) if you haven't already.
I feel the need to restate the obvious, although I'm sure you're aware already: There will be one hole. In fact, there will be two holes. I dare say there will be no less than three security vulnerabilities, even - for mistakes are inherent to programming. Perfect safety is not an achievable goal - only better safety. Worse - some of the people using Rust are going to have a higher tolerance for unsafe than you. But at least they're no longer using C though, right?
(I'm afraid I'm still using C and C++.)
You need unsafe code to invoke a syscall. Doing I/O is not an "optimization".
Rust tries to help you as much as possible but recognizes that it sometimes gets in the way and provides an escape hatch. The idea is to minimize and abstract the use of the escape hatch so its easily auditable.
I wish they hadn't named the escape hatch "unsafe". I think I heard the name came from PL theory but it makes it sound scarier than it is.
I remember hearing something about wanting to improve documentation around unsafe for how to best use it without running into problems.
http://doc.rust-lang.org/doc/stable/nomicon/ is the documentation for unsafe. I do plan on improving it heavily, but I'm very busy right now, and I'd like to wait for some of the unsafe semantics stuff to be pinned down so that I can go in full depth when I write this.
> I'd like to wait for some of the unsafe semantics stuff to be pinned down so that I can go in full depth when I write this.
Thats the work I was thinking of.
I believe the stdlib one could be refactored, but I'm not sure. There are a lot of optimizations in that one.
IMO "no unsafe code" is a stretch. "no unnecessary unsafe code" is what it should be.
It imports std::collections::hash_map, which has unsafe code, but only for RandomState, which does not. If RandomState were pulled out of hash_map and moved to a crate with no unsafe code, that dependency could be removed and it would be safe crates all the way down.
IMO "no unsafe code" is a stretch. "no unnecessary unsafe code" is what it should be.
The author of the code doesn't get to determine "unnecessary". That should require extensive and hard to get justification.
No, it is more specialized for certain kinds of loads.
> It imports std::collections::hash_map, which has unsafe code, but only for RandomState, which does not. If RandomState were pulled out of hash_map and moved to a crate with no unsafe code, that dependency could be removed and it would be safe crates all the way down.
That's a very vacuous distinction.
This is all in the same crate (std) anyway, it's just a different module.
> The author of the code doesn't get to determine "unnecessary".
The auditors (proposed in this blog post) do.
I've audited unsafe in the past to ensure our dependencies are fine, there are some pretty reasonable ways to define "unnecessary" here.
That unsafe code is filling a want or need, so otherwise people are going to use crates that haven't gone through the process, or even worse, roll it themselves.
> Everything else with unsafe code is an optimization. Often a premature one.
Rc, Arc and Box require unsafe code for the obvious reasons that are not premature optimisations. Any attempt to use a syscall (look at the nix crate which wraps libc with safe APIs) requires unsafe code.
Unsafe doesn't mean "this code is bad". It is true that unsafe code should be treated very carefully, but it's purpose is so that a sufficiently clever human can implement safe code that the insufficiently clever compiler cannot verify.
Really it should've been called trustme rather than unsafe. ;)
It is very unfortunately named.
The most recent CERT advisory reporting a buffer overflow exploit was April 17th, 2017. About one per week is reported, year after year.
Others not reported are probably being exploited. Rust can stop that, but one "unsafe" declaration can break Rust's safety.
Hash maps need to be 100% safe code. They're complicated, and involve elaborate calculations that output subscripts.
It is not easy to do the same with a C library where it's basically prone to overflow issues everywhere.
These are vastly different issues. Yes, we should totally be strict on unsafe code. No, it is not the end of the world when the stdlib hashmap uses unsafe code. Unsafe is designed exactly for this purpose, dealing with the innards of safe abstractions. It does it well, and the hashmap code is pretty ok here.
(Sure, if it has a design which could be done in safe Rust, it should, but I suspect with the way the robin hood stuff works it might not. this has been on my list of things to do when I get more time, primarily to just learn about the hashmap impl, but also to improve it if possible)
To the extent I've described it, that might be possible to accomplish in Rust without unsafe code, using a safe abstraction around NonZero (an unsafe type which the compiler assumes will never be zero, allowing it to automatically use a zero value as a marker if the type is found in an enum, such as Option). However, HashMap goes a step further: it actually stores all the hashes contiguously in one array, and the (K, V) pairs in another. The nth index in the hashes array and in the (K, V) array together represent a single bucket, but storing them separately makes HashMap's typical access patterns somewhat nicer on the CPU's cache. However, this means that slots in the (K, V) array can contain either valid data (of arbitrary types, which might contain pointers etc.) or garbage, depending on a value stored somewhere completely different. There's no good 'automatic' way to make that safe.
I think the use of unsafe code here is fine - it's not that hard to audit, and if you don't audit you're in trouble anyway (hash table DoS is also a security risk) - but in the long term, it would be very interesting if Rust could integrate an optional theorem prover, with the ability to prove arbitrarily complex code safe, given sufficient annotations...
You have to audit more than just the lines within the unsafe block. For example, I discovered a buffer overflow in a Rust library this week that was caused by an integer overflow outside the unsafe block. The unsafe code itself was written correctly.
Scope your unsafe however you want, but check the invariants when auditing, and make sure they don't escape the module.
This is still not that hard.
Can you share the ecosystem wide analysis that you did that led to this conclusion?
Coroutines are kinda more like a language feature that people have hacked libraries to do codegen for instead, so IMO it's pretty understandable that it needs nightly.
Many of these libraries are prototyping designs that will eventually be proposed as part of the language.
andy@xps ~> python3
>>> "Libz" == "libz"
>>> title_case("Libz") == title_case("libz")
We detached this subthread from https://news.ycombinator.com/item?id=14275990 and marked it off-topic.