Hacker News new | past | comments | ask | show | jobs | submit login
Rust: Not So Great For Codec Implementing (multimedia.cx)
215 points by lossolo on Aug 1, 2017 | hide | past | favorite | 278 comments



Personally, I've been happy with Rust for writing an MPEG-2 subtitle decoder. I posted about this earlier here: https://news.ycombinator.com/item?id=14753201

For me, the biggest advantages over C/C++ were:

1. Rust's compile-time checks, run-time checks and fuzzing tools make it much easier to write secure code. Since decoders are a notorious source of bad security bugs, this is a big plus in my book.

2. Rust's dependency management makes it rather pleasant to rely on 3rd-party libraries. The combination of an immutable package repository and semver makes it easy to trust that things won't break. And in my experience, the number of 3rd-party libraries is relatively small compared to more popular languages, but the quality tends to be fairly high (especially relative to npm).

3. The tooling is surprisingly nice for a young language. cargo has solid defaults, Visual Studio Code provides auto-completion and tool tips, and there are good libraries for basic logging, argument parsing, etc.

4. Rust makes it relatively easy to write fast code, as long as you use references and slices when appropriate.

Downsides include the learning curve (about a week or two for a C++ programmer to start feeling semi-comfortable), and slower compile times for large modules or ones using lots of parameterized types. My coworkers will write Rust, but they tend to say things like, "Rust is intense."


> And that’s why C is still the best language for systems programming—it still lets you to do what you mean (the problem is that most programmers don’t really know what they mean)

Honestly after a ~1yr of embedded C co-op/intership experience, I'm familiar enough but not too entrenched to say that C is not that great for embedded/systems. When you're dealing at the hardware architecture level you need more detail than C (or any PL) can reasonably provide.

C doesn't "let you do what you mean", it has no knowledge of special registers, interrupts, timers, DMA, etc. Companies have coped with a slew of macros that are just ugly to write with and make testing much more difficult than need be. If the language had actual support for embedded you'd see support for architecture strictly as libraries (which may be possible in C but certainly not ergonomic or supported by the culture around the language). Library architectures would make writing simulators and embedded unit tests __MUCH__ simpler.

Not to mention the minefield that is the undefined sections of the C spec. A lot of people "mean" for an integer to rollover, but that's undefined and the C spec doesn't care about what you "mean". [1]

Just today I meant for a constant lookup table not to be overwritten by the stack (I had plenty of unused RAM), but unfortunately C doesn't care what I meant. I had to dig around and find an odd macro to jam that data into program space--effectively hiring some muscle (gcc) to break the spec so it behaved the way I wanted. [2] I run into this sort of thing all the time. I know the C spec. I know my hardware. C doesn't care and needs to be beaten up. The only real benefit of C in these situations is that it's a huge sissy and people are really good at beating the piss out of it now.

I'm not very familiar with rust, but if it makes an honest effort to be ergonomic for embedded (might need to be forked). It will eventually crush C.

[1]: https://blog.regehr.org/archives/213 [2]: https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Named-Address-S...


> And that’s why C is still the best language for systems programming

I think the only reason C is very popular for systems programming is that it allows you to do most of what you could do in assembler, but in an easier-to-work way. So basically C is an "acceptable, easier-to-use assembler", but far from ideal, because of the reasons you point out.

Honestly, the popularity of C itself is due to the success of UNIX, not due to any particular quality. On the other hand, the K&R "The C Programming Language" book is a classic, a well written book which surely does a great job of introducing the language.

But back then in the late 1960s, a new language called ALGOL-68 was specified, back then a very advanced language which had powerful features even for 2017. The problem is that nobody dared to implement the full language. CPL was a stripped down implementation of Algol-68, this was then stripped down even more to BCPL; then it was extremely stripped down to B (only one datatype), then Dennis Ritchie kept most of the syntax and added other data types and C was born. C was never designed from the ground up to be the best systems programming language possible; only to be able to work for Dennis Ritchie & friends' "toy" operating system, UNIX, a very stripped-down MULTICS.

I think the "UNIX-Haters Handbook" has a section devoted to C criticism.


> Honestly, the popularity of C itself is due to the success of UNIX,

I beg to differ. C was a fairly obscure language until MS-DOS came out. C turned out to be ideal for programming on DOS, and DOS programming was far and away the most programmed system in the world for a decade and a half.

MS-DOS also made C++ into a major language (via Zortech C++). When ZTC++ came out the penetration and popularity of C++ went through the roof.

(Yes, I'm tooting my own horn a bit here. But I honestly believe it is ZTC++ that got C++ its critical mass.)

MS-DOS made C and C++ into the juggernauts they became.


> I beg to differ. C was a fairly obscure language until MS-DOS came out.

On the early days of DOS, what was used the most was IBM-BASIC (and GW-BASIC) for users, assembler for games like Alley Cat, IBM LOGO for teaching, some games were made on FIG-FORTH even

Then borland brought Turbo Pascal which was hugely popular together with MS QuickBasic.

Meanwhile, and we're at 1986-7, a good amount of software was already made for the Unices, in C. I don't think anyone was doing C for DOS back then. My grandpa -a PC nerd- had a huge (200+) collection of diskettes around that era, which now I own, there are all sorts of compilers for the above languages, plus Fortran, Clipper, etc; but no C compiler. Even though he owned the K&R book.

First time i saw use of C on the DOS world was with Borland Turbo C++.


There were probably 30 C compilers for the IBM PC. Turbo C didn't appear until 1987, it was quite a latecomer. C was immensely popular on the C before then.

Borland produced Turbo C not to introduce C to the PC, but because C was so dominant on that platform. After all, why would they have done that after being so successful with Turbo Pascal?


The Aug 1986 issue of Dr. Dobb's lists:

* Aztec C

* C86

* Datalight C (mine!)

* DeSmet C

* Eco-C88

* Hot C

* IBM C

* Lattice C

* C Prog Sys

* Let's C

* High C

* Microsoft C

* Mix X

* Toolworks C

* Whitesmiths C

* Wizard C

PC Tech Journal reviewed 8 C compilers for DOS in their Nov-Dec 1983 issue, though I don't have a copy of it.


The majority of them unknown or impossible to buy in the 1983's reality of Portugal and Spain, as I mention in another comment.


Sorry but on my little corner of the planet, in the Iberian Peninsula, everyone was using Turbo Basic and Turbo Pascal, when not coding in Assembly for MS-DOS.

On the PC, C and C++ only started to be widely adopted as we started to code for Windows 3.x, due to the APIs being available as C instead of interrupts.

At the technical school I was attending, I was the outlier by having had the opportunity to get hold of Borland's Turbo C++ 1.0 for MS-DOS, everyone else couldn't care less.

Most Portuguese and Spanish magazines of the 80's and early 90's were full all imaginable programming languages for C64, ZX Spectrum, Amiga, Atari and MS-DOS. C didn't had a better spot on those articles than the other languages, quite the contrary.

Fellow Iberian HNers are free to correct my experience of those years.


In my part of the world, we used the Borland compilers for MS-DOS, Turbo C and Turbo C++. Turbo Assembler and Turbo Pascal was also popular. Never heard of Zortech C++ before


Borland decided to develop TC++ because of the success of ZTC++. (I know some of the people involved.) Before ZTC++, C++ was a niche language, and Borland was having great success with Turbo Pascal. ZTC++ came out in 1987, and TC++ in 1990.

After the success of ZTC++ and TC++, Microsoft changed direction and decided to develop a C++ compiler, too. I heard (but was never able to confirm) that Microsoft had earlier been developing their own object oriented extensions to C called C*.


Coming from Turbo Pascal TC++ was a bit underwhelming. When I tried it didn't colour syntax, there was those weird #include and compin=ling felt so slow. So my first contact with C was pretty negative :\


ZTC was a fantastic compiler, that is until it was purchased by Symantec, renamed, and become slow and bloated.

I could even say something similar about turbo c vs turbo c++, but that was more due to the language issues with c++ vs c.


Turbo Pascal was much better for programming on DOS IMHO.


TP was indeed a fine project. But it wasn't OOP (which was very hot at the time) and TP was very customized to the PC, meaning it had no penetration outside of DOS. TP died when DOS died.


TP 5.5 (1989) had OOP. https://en.wikipedia.org/wiki/Turbo_Pascal#Object-oriented_p...

I only started using TP with version 6, so all I know is OOP TP...


TP's OOP was modelled after Apple's Object Pascal. It was an awkward ugly implementation that had e.g. slicing problems and weird initialization syntax, and was fairly quickly deprecated when Delphi came around.


I loved Object Pascal, and it was my introduction to OOP, literally.

As I had to give a class on OOP to fellow students at the technical school as exchange for having access to Turbo Pascal 5.5.

Never was a big fan of some of the Object Pascal changes made by Delphi's class model, specially the fact that it was a kind of lost the opportunity to introduce RC alongside those changes.

Which eventually lead to the schizophrenic model that RC would only exist for COM based classes, but not others.


RC only for strings originally, later dynamic arrays, variants and interfaces; I no longer recall if the legacy OLE automation classes did automatic RC, but I don't think they did (beyond explicit RC in constructor / destructor implementations).

I maintained the Delphi compiler front end for 6 years or so, adding closures, enhanced RTTI, a bunch of work on generics and 64-bit porting, and some other things that ultimately didn't see the light of day. I have a long list of things I don't like about the Delphi language, corner cases you only really become fully aware of when living with the workarounds they force on the codebase.

Overload resolution is an almost wholly unspecified mess, for example. It started out Java-style, but the definition of more specific isn't locked down, and everything from strong typedefs (type TFoo = type TBar;) to closures (MyFunc(methodRef) - do you mean to pass method ref or result of calling method ref), to ranges (are smaller ranges more specific? what about overlapping ranges?)... and don't get me started on all the different string types.


I followed some of your blog entries, hence my earlier comment. :)

The first time I went through the Turbo Pascal for Windows 1.5 manual I wasn't that happy to see PChar, let alone the other variants that came later.

Although it seems that in today's world, most languages end up with a jungle of string types, for every possible variation of Unicode, ANSI and C ABIs.

I eventually did the full transition from Object Pascal to C++, so I only used the very first versions of Delphi.

So in what concerned Windows development, I ended up moving from Borland to Microsoft compilers when Visual C++ 6.0 was released, which means I lack the experience how later Delphi versions evolved.


I was VERY surprised to read about the linguistic traits of [B]CPL .. C was a pragmatic regression in that regard.


For big computers, C is good for systems programming because everyone else uses it and because of the titanic effort that has been put in to building really good optimizing C compilers. UNIX came along but when C killed pascal, IBM, Microsoft, everybody was on C's dick.

On small embedded computers, I think C is popular because it's so easy to build something that resembles a C compiler. The fact is, undergrad compiler classes almost build C compilers (some universities might actually go all the way.) You can almost build a yacc grammar that will read C and emit assembly. The small computer embedded vendors have a difficult problem, they need to provide tooling and they don't usually have giant piles of developers to sell to. So they dust off some abandoned "C compiler" they found in the gutter, add their hardware to it and call it good. At least that's my theory and I've seen some rough "compiler work" from some pretty substantial hardware vendors. In fact, if you're running a 16bit or smaller part or a weird part, I have yet to see a decent C compiler from a hardware vendor.


> You can almost build a yacc grammar that will read C and emit assembly.

I have looked at a lot of C parsers and compilers and have written a C to Lisp transpiler and this statement is ludicrously wrong. You cannot even parse C code with yacc because of typedefs. The grammar of C is context-sensitive. And this is after the C code has been preprocessed, something that also cannot be done with yacc.

C compilers are easy to port to different machines, but that is because C is a very poor language in terms of features and control flow constructs. For the limited amount of things that C gives you as a language it also comes with a huge amount of complicated baggage when it comes to implementation and corner cases.


True about the preprocessor. But the typedef issue is usually (trivially) solved with a link between the symbol table and the scanner.


This is a very important point. The hardware world itself suffers from this effect even more than software with synthesis and PNR for verilog/VHDL.

I'm hoping the compilation world progresses enough with efforts such as LLVM to break our chains to crufty languages.


"do what I want" usually means "simple ABI that is trivial to interface with" which means you "can layout structures the way you want" and "calling conventions are simple so you can call C from assembly and any HLL FFIs and vice-versa".

Really, that's all. If it was possible to build C++ classes in C or assembly, it would be done. But it's not because C++ doesn't have the kind of ABI (or any) that would allow one to do that.

Of course, in practice C only gets you 95% of the structure layout functionality you need, as the rules for packing structures and bitfields(!) aren't sufficiently nailed down. Still, 95% out of the box is pretty good, and with some care you can get 100%. This matters when writing, e.g., drivers, but also codecs for another example.

Any systems programming language aiming to replace C has to provide support for a C-like ABI.

I'm not praising C here, mind you. It can't be replaced (with Rust or similar) quickly enough.


> If it was possible to build C++ classes in C or assembly

You would be surprised, but Object-oriented programming has been done in C many times... and even before C++ existed.


I'm not surprised. But you'll note I said "C++ classes", and I did that for a reason: it's not possible to write C++ classes in C because there's no stable, public ABI for one to write to. (Well, for some compilers there may be a public ABI, and some might even be stable, but even so, it's a mess.)


On particular platforms they have nailed down the C++ layout enough to make this possible. I think windows did it for COM (or DCOM?). But in general you are right.

On of the most infuriating things about fancy languages is the way they all have their own ABIs. I mean it's nice that they support special features through those ABIs, but standardized vtables and dictionaries would take us a long way.

And Kudos to Microsoft for having a second crack at getting that right with .Net.


Yes, that's right, and multiple compilers support the Windows C++ ABI. You still couldn't write portable C++ in C or assembly (or anything else not C++). Does anyone write C or assembly to that ABI? Probably not, though to be fair that's probably in large part because if you're going to do that you might as well write in C++ :^)

Still, I think a public and stable ABI with fine-grained control of binary elements is a critical aspect of a systems programming language.

Also, the linker contributes a lot to making a systems language. E.g., being able to map globals to specific physical addresses. And, of course, either an ABI has to be dead trivial from a linker's perspective, of the linker has to be very intimate with it (think of symbol mangling). The linkers we use all evolved in a C world, and it shows. C w/ a really good link-editor and RTLD (I'm thinking of Illumos') is a much more advanced beast than C with plain old static linking. Even the ABI aspects of dynamic linking are simple, public, and stable.

A great systems programming language has to have a great linker story as well.


I would like a declarative language for laying out the heap, stack, objects, etc. Why have one ABI and not an ABI configuration protocol? Would love to share data structures between languages w/o having to write serializers.


You can't just wake up and decide to be Don Box like that!


I am going to out Box, Don Box!


As a counterpoint, PowerPC has at least 3 mutually incompatible ABIs for C.


But generally there's only one per-OS, so it's not that bad. C++ ABIs are per-compiler, or worse, per-{compiler, version}, with few exceptions (Windows).


> And Kudos to Microsoft for having a second crack at getting that right with .Net.

IBM did it first with TIMI for AS/400.

Also Microsoft is now in their third crack at getting it right with UWP (based on COM).


Objective-C used to be a precompiler to C. Hence the weird syntax, they needed some character patterns that weren't used yet in C to be able to preprocess the Objective-C parts.


> Really, that's all. If it was possible to build C++ classes in C or assembly, it would be done.

IBM's mainframe OS/400, nowadays known as IBM i supports OO Assembly.

MASM also used to have OO Macros.


Just a tiny nitpick: Only signed integer overflow is undefined, unsigned integer overflow follows modulo arithmetics. It's pretty much impossible to remember all those details.


These things originate in hardware variations. Apparently all architectures use binary the same way so unsigned overflow just drops the MSB. But signed integers aren't always two's complement, so there is variation.

See, no tedious rules to remember, you just have to understand how computers work. But the C standards call such things "undefined behaviour" rather than "platform specific" behaviour, and then try to pretend that the compiler can abstract the underlying machine away. That is they pretend programmers can understand the standard, and don't need to know how computers work.

The result is a maze of arbitrary - but historically rational - rules about when the compiler has to do something sane, and when it is allowed to do whatever the hell it likes to squeeze some micro-improvement from a benchmark.


> The result is a maze of arbitrary - but historically rational - rules about when the compiler has to do something sane, and when it is allowed to do whatever the hell it likes to squeeze some micro-improvement from a benchmark.

Which is a nice way of saying that sometimes they'll decide to elide chunks of code because they would only be reachable because of undefined behavior, and if that elided code happened to specifically be checking for and handling that undefined case as an error, too bad.[1] :/

1: https://news.ycombinator.com/item?id=14163111


The C standard actually has many alternatives to undefined behavior: implementation-defined behavior like the propagation of the high-order bit when a signed integer is shifted right, unspecified behavior like the order in which the arguments to a function are evaluated. Related to those are the implementation-defined, unspecified and indeterminate (unspecified or trap) values.

Now unspecified values are tricky again. They can propagate their unspecifiedness and can be different each time you look at them. x == x can be both true and false and is actually unspecified again. If your compiler is using this, things can get insane. I think here the definitions of the C standard would need to be changed a bit.

Also I don't see why signed integer overflow cannot be implementation-defined behavior.



Unless you are paid by the line that should either work as "expected" (i.e. wrap) or produce an error about a meaningless comparison.

Is it worth a few developer days worth of work to track down a hard to repro bug that only happens with hard to debug optimizations enabled?


No, which is one reason why I almost always use unsigned types in my C code, particularly in the context of data structure management where negative values are unnecessary and usually non-sensical.

GCC supports -fwrapv and -fno-strict-overflow; and I think clang supports both, too. I've never cared to use them because I only rarely use signed types. But some projects and programmers use those options habitually.

AFAIU, Rust panics by default on signed overflow. And even if it wraps, that's not unequivocally better. Unlike with enforced buffer boundary constraints, neither is clearly better than what C does. Arithmetic overflow is a common and serious issue in just about every language. Short of a compile-time constraint or diagnostic that triggers if the compiler cannot prove overflow is either explicitly checked or benign (that is, a negative number is no worse than a large positive number in the context of how the value is used), there's no obvious solution that really forecloses most exploit opportunities across the board.

Because so much code, regardless of language, has some unchecked signed integer overflow bug, if you panic you make it easy to DoS an application. And a DoS can sometimes turn into an exploit when you're dealing with cooperating processes. For example, you occasionally see bugs where an authentication routine fails open instead of failing closed when the authenticator is unreachable.

If you silently wrap signed overflow, all of a sudden the value is in a set (negative numbers) that might be completely unexpected. Even in so-called memory safe languages negative indices can leak sensitive information or erroneously select privileged state. For example, in some languages -1 selects the last element of an array. You can check for negative values explicitly, but multiplicative overflow can wrap around to positive numbers, which is no better than using an unsigned type; a check for a negative values is typically redundant work which adds unnecessary complexity--and unnecessary opportunity for mistakes--relative to sticking to unsigned types.

IMO, signed overflow is the worst option. I just don't see the point. The only three options I like for avoiding arithmetic overflow bugs, depending on language and context, are

1) Check for overflow explicitly (independently from array boundary constraints) and bubble up an error;

2) Carefully rely on unsigned modulo arithmetic;

3) Carefully rely on saturation arithmetic.

IMO the C standard's fault isn't in its refusal to make signed overflow defined or implementation-defined, but in providing neither a standard API for overflow detection, a construct for saturation semantics of integer types, nor a compilation mode to warn about unchecked signed overflow (e.g. something at least as useful as -Wno-strict-overflow in GCC).

Fortunately both GCC and clang have agreed on a standard API for overflow detection. That's something. But unfortunately it'll be years before you can consistently rely on those APIs without worrying about backward compatibility.


> AFAIU, Rust panics by default on signed overflow

Overflow of any integer type is considered a "program error", not undefined behavior. In debug builds, this is required to panic. In builds where it doesn't panic, it's well-defined as two's compliment wrapping.

You can also request explicit wrapping, saturating, etc behavior.


Thinking about hardware is definitely the move when writing C.

This is the major struggle with abstraction. We want to remove the burden of knowing the ins and outs of the target architecture. Inevitably, we create trouble and fall on our faces when it turns out that the hardware is still in fact there and doesn't like when we ignore it.

It's really an impossible problem. One can't account for every architecture when designing a language. Likewise, one can't feasibly remember the details of every architecture while programming. Honestly I'd be interested to see some tools that approach the problem from a direction other than maximum portability. Not that I think they'd be popular or "good".


Easy check the family of Algol, Xerox PARC, Wirth languages.

Where safety is more relevant than maximum portability.

Everything that isn't really portable is marked as explicit language extension or unsafe construct.

One might complain that it leads to language dialects, but the same is true for C, where certain semantics depend on the compiler and even change between versions.


At least one extant (or recently extant) system has to emulate unsigned, modulo arithmetic. This can be handled by the C compiler transparently, however. From the C compiler documentation:

  | Type          | Bits | sizeof | Range                                  |
  +---------------+------+--------+----------------------------------------+
  | ...                                                                    |
  | unsigned long | 36   | 4      | 0 to (2^36)-2 (see the following note) |

  ...
  Note: If the CONFORMANCE/TWOSARITH or CONFORMANCE/FULL
  compiler keywords are used, the range will be 0 to (2^36)-1.
  See the C Compiler Programming Reference Manual Volume 2 for
  more information.

  -- Section 4.5. Size and Range of C Variables of the Unisys
  C Compiler Programming Reference Manual Volume 1.
  https://public.support.unisys.com/2200/docs/cp16.0/pdf/78310422-012.pdf


A range of 0 to (2^36)-2 implies that there's one bit combination not mentioned here (that range has only 2 ^ 36 - 1 values; 36 bits can store 2 ^ 36). What's the last combination used for?


I don't know off-hand. AFAIU the Unisys machines use ones' complement representation. My guess is that the native unsigned set of values includes the representation for both positive and negative 0. Or there could be a trap representation that is hidden in unsigned mode, which presumably would also make these machines examples of hardware that traps on signed overflow.


Wow. 9-bit words with 1's-complement arithmetic.

I wonder if it still does end-around carry...


I wonder how many processors there are in use these days that don't use 2s complement. I don't think I've ever seen one.


Certainly not general CPUs, but there are probably domain specific processors out there that use something else. Why would you want to design a domain specific processor and still use C? Beats me.


Even if you don't really want C on such a process there will be an emergent and unholy aliance between (1) a pointy-hair impulse within the manufucturer to have "programmable in C" on the feature list and (2) an empire-building impulse within the C standards writing ecosystem that wants to encompass every chip under the sun.


Those forces are so strong people are still pushing for C on FPGAs.


The bigger reason it's still undefined is to enable this type of optimization: https://news.ycombinator.com/item?id=14857316


The comment thread seems to suggest that even if you define the behavior you can still optimize that case, and in fact Clang does.


> you just have to understand how computers work.

rude.


Yup. You'd probably like the first footnote if you haven't already read it.


> C doesn't "let you do what you mean", it has no knowledge of special registers, interrupts, timers, DMA, etc.

Special registers are just slots at specific memory addresses.

As far as the rest is concerned, not a single language in the world will provide you with primitives for that and make them portable across a thousand architectures with a million different peripherals. That's the role of libraries and OSes with their drivers.


> Special registers are just slots at specific memory addresses.

> As far as the rest is concerned, not a single language in the world will provide you with primitives for that and make them portable across a thousand architectures with a million different peripherals.

I'm well aware, that's why I said.

> When you're dealing with the hardware architecture level you need more detail than C (or any PL) can reasonably provide.

A language does not need to provide every hardware primitive, just realize that it can't and implement a sane way to support hardware features that don't require abusing a cumbersome macro system and heavy compiler modification.

> That's the role of libraries and OSes with their drivers.

A ridiculous amount of vulnerabilities and wasted effort are sunk into operating systems and their drivers because of the garbage state of tooling for low level work. It's easy to make statements such as yours until it comes time to write/maintain a multi million line kernel and a only the lord knows how many line driver/subsystem fleet.

Kernel/driver devs do their best to make due, but the situation is far from ideal.


> A language does not need to provide every hardware primitive, just realize that it can't and implement a sane way to support hardware features that don't require abusing a cumbersome macro system and heavy compiler modification.

The information has to somehow be passed to the compiler, your options are: compiler specific features (see gcc attributes, etc), language level (see the fun that becomes with explicit_memset vs. memset) or a really fancy IR that can hold these types of requirements.


You're right, most of these problems stem from companies extending the Garbage C Compiler that encourages abuse of C's awful macro system to cover up its poor decisions.


> hardware features that don't require abusing a cumbersome macro system and heavy compiler modification.

I'm confused in that, it's true, I've seen the macro system abused. But for working with things like memory mapped registers and interrupts you don't _need_ to abuse it. It actually looks really nice with with memory-mapped structures that ply on the register structure. Maybe I'm just missing something.


> but if it makes an honest effort to be ergonomic for embedded

There are constant improvements but I wouldn't say we're there yet. I think we will be though, embedded is something a lot of folks care about.


Good to hear. I'm sure there is a large group of developers sitting on the sidelines that check in from time to time hoping to be able to leave C behind.

Has anyone on the rust team tried reaching out to get input/contributions from companies who are heavily invested in the embedded world? I know xilinx pours a lot of effort into language research, they do seem like an very proprietary company though.


We reach out and talk with companies regularly. We are both happy to organize a meeting with companies who approach us, and often (~yearly) organize calls with multiple companies using or evaluating Rust at once whom we reach out to. I believe there have been some of these focused on embedded stuff, but I wasn't on these calls.


The latest on this front: http://blog.japaric.io/rtfm-v2/


Found this paper useful: "The Case for Writing a Kernel in Rust" (https://www.tockos.org/assets/papers/rust-kernel-apsys2017.p...) linked from the "tock os" blog post: https://www.tockos.org/blog/2017/apsys-paper/


Yes, this is a good paper, but was posted after my comment :)


It sounds like you're looking for Forth. It doesn't have everything on your wish list, but it's a good bit more predictable when you have to get down into the weeds.


Thumbs up for driving more nails into C's coffin. What any pretender to the throne needs most is C interoperability because we're stick with legacy code. You need to be able to just code in C when necessary including the datatype zoo, the unsafety and the ugly macros. If it takes even 5 minutes to wrap a C function for your new language, that's years of work for some codebases.


Whilst unrelated to the article, my complaint about codecs in Rust is that they seem to be slow. Whilst the reason for this might be the immaturity of the libraries that I've used, but they've always been slower than their C counter-parts. The native JPEG decoder spins up 4 threads to decode the same amount of frames at 3x the time as the libjpeg-turbo does. There's a similar story for the FLAC decoder.

I don't think that any of this has anything to do with the language itself, it's just that it takes time for things to mature. As for the issues outlined in the article - the language is different enough from C and C++ that one just has to accept the fact that you cannot write idiomatic C and C++ in Rust and expect it to be comfortable, performant or safe. However, with Rust, I'd say, that one can achieve 99% of what one can achieve in C today. The only thing that Rust is missing currently is the ability to arbitrarily jump around the call stack due to the way destructors are implemented, but there's a way to mitigate this and it's being worked as far as I was aware.


Decoding in libjpeg-turbo is mostly implemented in ASM. Rust, like C, is slower than hand-optimized SIMD ASM.


If anyone's interested, this is the (epic!) discussion thread for "Getting explicit SIMD on stable Rust":

https://internals.rust-lang.org/t/getting-explicit-simd-on-s...

It still wouldn't compete with really good hand-tuned asm, but it might help reduce the perf/usability tradeoff a bit.


I'd just like to see 2,3, and 4 element vectors as a first class citizen in C, C++, or Rust. These are incredibly common for so many things it's hard for me to understand this omission. I want to be able to pass them by value and as return values from function. I want to do operations like a=b+c with vectors without creating classes or overloading operators. For a lot of people the SIMD instructions are about parallel computation, but for me they represent vector primitives.


What would baking them into the language add that a library solution couldn't? I'm not keeping track of Rust very closely these days, but it supports arithmetic operator overloading and the Copy trait. What's missing?

I agree that this needs to be standardized for vector-ey crates to be able to talk to each other seamlessly. Otherwise we'll end up with a rerun of the C++ strings fiasco, with char* and wchar_t* and std::string and std::wstring and BSTR and _bstr_t and CString and CComBSTR and QString and GString and...


>> What would baking them into the language add that a library solution couldn't?

Standardization. Optimal performance. Better Syntax.


OOP often involves overhead. If someone defined a really clean vector type/object in a library that let me write expressions the natural way, that could be added to the language I suppose. And that's what I want.


Rust, C and C++ all possess ways (it's the even default/only way in C and Rust, and almost so in C++) to write types like this that don't involve the typical pointer-soup/dynamic-call overhead of typical OOP.

People can and do write vector in libraries right now, usually using existing compiler support that gives them guarantees about SIMD.


What I find odd about your reply, is that you downgrade C++, yet its really the only language that has sufficient operator overloading to do what the poster was requesting (transparent support without OO overhead).

AKA, its completely possible, and there are quite a number of C++ libraries that make vectors look like native types complete with long lists of global operator overloads for interaction with other base types. Generally these libraries are just thin wrappers around inline assembly or intel intrinsics for SSE/AVX and generally don't bring any form of OO syntax into the picture.

That said, even with C++ the libraries tend to fall down a bit when it comes to individual element operation/masking because the closest method is generally using array syntax for the elements which limits the operations to a single element in the vector at a time. Which means you end up with OO method calling syntax for element masking (or creative solutions with operator() )


I didn't denigrate C++, I'm specifically talking about the behaviour of the data types/mention of OOP which is orthogonal to the surface syntax like using symbolic operators instead of textual names. It is factual that the default data declaration in C++ has a little of the "pointer soup" due to methods and `this`. This can often get inlined away and so is usually irrelevant (hence "almost so"), but the poster did emphasise their desire for pass-by-value. This can be avoided by e.g. using friends more than methods, but this isn't the default way a lot of people write C++.

(Also, Rust has pretty transparent support in the same manner also without OO overhead, but differs slightly because methods can and often do use pass-by-value. This is the distinction I was drawing.)


You could also wrap a reference to a plain old struct and operate on it using a class. That's often how I write C++ when I need to be explicit about how the data is structured. Not everything in C++ requires an OO approach.

OP was specifically recommending operator overloading on a plain old struct, which can be done without declaring a class. Indeed, operator overloads can be declared as global functions in C++. The this pointer doesn't enter in to it at all in that case.


I think you're also missing my point. Techniques like what you describe are exactly what I was referring when I said "C++ possess[es] ways" and "using friends rather than methods", although you don't even need a wrapper for a reference (which is actually a pointer, and so also part of what I mean by "pointer soup"!) for this sort of code: just the struct works fine (although a `class` does too, the only difference is the `struct` keyword has different privacy defaults).

The only "downgrade" I made was saying that it is only "almost" the default in C++, versus the other two where they are completely the default.

To be clear, like C and Rust, C++ has great semantic attributes for this sort of thing:

- classes/structs that don't require allocation/pointers

- precise control over pass-by-value (for everything except the `this` pointer of methods)

- pervasive static dispatch of methods/functions (including operators, which can be considered to be method/function with an unusual name and call syntax)

The only downside, and the reason I said "almost" (which is what the C++ETF (C++ Evangelism Task Force) seems to be up in arms about), is methods are what most people reach for by default and so the `this` pointer comes into play. But as you point out, and as I implied in my original comment, this isn't required, just the default.


Look into the Clang vector extensions. They're very similar to GLSL/OpenCL vectors. They implement some very basic operator overloading too, so they're much less annoying that calling a function for everything. I made a simple linear algebra library using them, if you want to see an example: https://github.com/GavinHigham/glla


Have you tried Halide? http://halide-lang.org/

I've seen it used for various video/photo processing operations.


Off-topic: the "Halide Talk" video in that page was very good: https://www.youtube.com/watch?v=3uiEyEKji0M


Do you mean something like this[0] (probably the worst possible implementation but I was in a rush)? I suppose you still have to create a "class" but you could set it up to use syntax like `V4(1, 2, 3, 4)` if you prefer.

[0]: https://play.rust-lang.org/?gist=989fe1c05e3e68df89642032743...



What if your target architecture has a different vector width and hence you're using 4 element vectors but wasting 12?

If you want to use vectors explicitly, you can do so using intrinsics for most vector architectures.


The sizes I mentioned are extremely common in 2d and 3d geometry. This is due to the number of dimensions visible in our world. While someone may want to run 11 dimensional calculations in string theory, there are a large number of common real-world applications of the lengths 2,3,4. In C and C++ you can often use intrinsics but the big 3 - ARM, Intel, PPC - all define them differently. I want this common stuff to be part of the language. Sure go ahead and support general vectors via class definitions and such, but give me direct support for the common sizes.


Your issue with intrinsics are that the different ISAs have different specs. Fine. But if that was your only issue, then your use case is that you're manually vectoring hot loops correct?

Assuming that's true, you want to maximise performance by using as much of the parallelism that vectors give you. So if you're dealing with [4 x int32], on a 128 bit vector ISA you would be fully utilising your vector registers, but moving to say AVX-512 you're now only using 1/4 of your potential parallelism.

Your architecture independent vector types would have to target the lowest common denominator, and completely defeat the purpose of vectorisation.


I want to do math with 2,3, or 4-element vectors. These will typically represent 2d or 3d coordinates or velocities. 4-element vectors may be used for homogeneous coordinates or similar. My point is that these are very common mathematical entities and should be explicitly supported by the language.

How these map to any particular processors resources is not my problem - though the three major vector extensions today all have 4-element vectors. Some support more, but that's not terribly relevant to the math I want to do. A smart compiler could pack multiple small vectors into a wide vector register just like they try to pack multiple scalars in there today.

I am not interested in vectorizing loops. I want to write code like this:

Vec3double position = {5.8,3.9,2.1};

Vec3double velocity = {1.0,0.0,0.0};

double timestep = 0.01;

position += timestep*velocity;

and so on. Yes, I also want the common use case of multiplication of vector and a scalar to be that easy.

Any paint program or graphics library (including font rendering) does a ton of this stuff. So does every 2d or 3d physics engine. Ray tracing. FEA software. CAD. The list of uses for these vector sizes is long and has nothing to do with auto-vectorizing loops. Of course there are plenty of applications where loop vectorization is valuable and I don't want to take anything away from that. I just want built-in support for these common mathematical entities in the base language.


In another comment you say that you want these to turn into SIMD instructions. FYI vectorizing vectors of elements in an array of structs fashion is usually less performant than structures of arrays.

On ARM you have specialised load and store instructions that can de-interleave into vector registers such that register a contains VecType.x and register b has VecType.y etc, but are a bit slower.

If you don't care about SIMD performance then fair enough, but if you care enough about this issue to want the compiler to generate SIMD instructions, you better be willing to change your code to be performant on your particular target because even small changes can impact whether or not it's worth vectoring vs leaving it as scalar code.



I'm familiar with that. The problem is that it's not standard. Having the actual types v4float or v4double as part of the language would also make it easier to mix and match code from different places/libraries.


Fortran?


It doesn't really make sense in C as any modern optimizing compiler will turn it into SIMD. IIRC rust needs explicit SIMD due to bounds checking.


Bounds checking gets in the way for simple cases but anything even slightly more complicated needs to be written extremely carefully in any language, for autovectorisation to work. It is like, essentially, writing explicit SIMD without using intrinsics and without guarantees it will work as desired.

And, that is assuming that the autovectoriser is able to synthesise the desired instructions, e.g. I believe SSE2's packssdw & packusdw ("pack with signed/unsigned saturation") and pmaddwd ("multiply and add packed integers") are useful in a JPEG codec but I find it extremely unlikely that any compiler will autovectorise to them.


There are thousands of vendor intrinsics and no compiler that I'm aware of is able to just automatically use all of them in a reliable way. The idea that "Rust needs explicit SIMD due to bounds checking" is very wrong.


Because SIMD ins throughput is highly processor specific? Rust will also not 'automatically use all of them' there is no magic abstraction would make any compiler use some of the really fancy and useful SIMD ins.


I don't know what you're talking about unfortunately. My statement about compilers and SIMD isn't Rust-specific. My point was that "rust needs explicit SIMD due to bounds checking" is factually wrong.


No it isn't, it is one of the reasons that rust is getting SIMD, if it cannot eluide the bounds checking then obviously it will not vectorize the code in question.


I'm one of the people working on adding SIMD to Rust, so I'm telling you, you're wrong. If you want better vectorization and bounds checking is standing in your way, then you can elide the bounds checks explicitly. That doesn't require explicit SIMD.


How do you safely elide bounds for something the compiler cannot reason about? How would Rust handle SIMD differences when trying to generate specific code as you would in C?


> How do you safely elide bounds for something the compiler cannot reason about?

Who said anything about doing it safely? You can elide the bounds checks explicitly with calls to get_unchecked (or whatever) using unsafe.

> How would Rust handle SIMD differences when trying to generate specific code as you would in C?

Please be more specific. This question is so broad that it's impossible to answer. At some levels, this is the responsibility of the code generator (i.e., LLVM). At other levels, it's the responsibility of the programmer to write code that checks what the current CPU supports, and then call the correct code. Both Clang and gcc have support for the former using conditional compilation, and both Clang and gcc have support for the latter by annotating specific function definitions with specific target features. In the case of the latter, it can be UB to call those functions on CPUs that don't support those features. (Most often the worse that will happen is a SIGILL, but if you somehow muck of the ABIs between functions, then you're in for some pain.) The plan for Rust is to basically do what Clang does.

The question of safety in Rust and SIMD is a completely different story from auto-vectorization. Figuring out how to make calling arbitrary vendor intrinsics safe is an open question that we probably won't be able to solve in the immediate future, so we'll make it unsafe to call them.

And even that is all completely orthogonal to a nice platform independent SIMD API (like you might find in Javascript's support for SIMD[1]), since most of that surface area is handled by LLVM and we should be able to enable using SIMD at that level in safe Rust.

And all of that is still completely and utterly orthogonal to whether bounds checks are elided. Even with the cross platform abstractions, you still might want to write unsafe code to elide bounds checks when copying data from a slice into a vector in a tight loop.

[1] - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Optimizing compilers are rarely successful at turning non trivial code into simd.


> IIRC rust needs explicit SIMD due to bounds checking.

Bounds checks can be eliminated and code can be vectorized if the optimizer can prove it; explicit SIMD is useful for the cases where it can't.


I want it to turn it into SIMD instructions. What I don't want is to write classes, functions, or loops that have to be automatically converted to SIMD. I want a simple built-in type for these three size vectors. I also mentioned that they should be passed by value and be able to be returned by value in a (SIMD) register. This is the most efficient way to write and execute vector math.


Can't a library just add that? Make some types, implement some functions and/or overload some ops. If you're defining the special type anyway, I'm not sure why it has to be built in.


I'm the author of https://github.com/dropbox/rust-brotli and I certainly ran into the issues mentioned in the article, and the initial version of my brotli decoder and encoder were each almost 10x slower. I also worked around each and every one of them, and the result is something that performs at 80-90% of the speed of the original on average, and some files compress faster with rust than with the original brotli codec.

allocator: I ran into the same situation and abstracted it with a generic allocator: https://github.com/dropbox/rust-alloc-no-stdlib it's an ugly solution, but it works and can allow for significant perf improvements down the line by bundling all same types together

benchmarks: I simply made a test that printed the time

primitive types: it's easy to write generic functions that do this

macro system: I found to be amazing, but I never used it to compact data types

borrow checker: split_at_mut was super helpful...also core::cmp::replace was useful to taking away pieces of a structure and manipulating them, then putting them back

Though it wasn't all roses: to try to get SSE vectorization I had to convert this 6 line short string matching function https://github.com/dropbox/rust-brotli/blob/238c9c539b446d7d...

into this multi hundred line monster with macros and so forth https://github.com/dropbox/rust-brotli/blob/238c9c539b446d7d...

but in the end it was as fast as hand-tuned intrinsics in C

I also found myself scared by dependencies, including onto the std library, so I tried to get everything to remain within the core library. This should allow something as low-level as a codec to be used in kernel space or in another place where a custom allocator is needed.

I also found that "rewriting in safe rust from C" was made easy by corrode https://github.com/jameysharp/corrode since C and rust can be interface compatible, it's easy to go one file at a time and turn it into working rust, then safe rust.


How can they allow jumping around the call stack while also destructing/freeing objects?


[flagged]


While hijacking a thread for idle pedantry is something of a sport around here your comment is unusually pointless for the form. Go eat something, you'll feel better.


It's still widely used and not considered archaic in British and Australian English.


https://www.merriam-webster.com/dictionary/whilst

The definition is singularly, "while", with no extra connotation, other than it is as you said chiefly British.

https://trends.google.com/trends/explore?q=while,whilst

"While" is, not surprisingly, much more widely used, "whilst" being more affected.


Since the link is a bit slow, here's an IPFS version:

https://www.eternum.io/ipfs/QmYrdKpWbHCuNPUrBd6MPFrcTqPBW55w...


Nice to see IPFS links popping up


I agree! Are you running a local node? I was thinking that a browser extension that automatically redirected URLs like ".*/ip[fn]s/\w+" to the local daemon address would be pretty useful.


Check out ipfs station if you use chrome.

https://chrome.google.com/webstore/detail/ipfs-station/kckhg...


Fantastic, that's exactly what I wanted, thank you. It doesn't seem to like my remote node, but I'll figure it out.


No, just like the idea behind it. Yeah, even a bookmarklet might be usefull. Or a single purpose site like http://isup.me


The only legitimate complaint there is that it's tough in Rust to get two arbitrary mutable slices into the same array. Rust wants to be sure they're disjoint, to prevent aliasing. For some matrix manipulation, this is inconvenient.

It would be easier if Rust had real multidimensional arrays. If the compiler knows about multidimensional arrays, some additional optimizations are possible. For example, if you want to borrow two different rows, that's fine as long as the row subscript is different.


As the article notes in an edit, there's a method for this: split_at_mut(). It's very useful, and I reach for it quite a bit in low-level array code.

It might be nice to have some sort of pattern matching syntax for it, I suppose:

    let [ref mut a..16, ref mut b..] = *c;
Meh. Sure is ugly. I'm not sure adding syntax would be worth it.


I feel like split_at_mut is a bit of a red herring - sure, it's exactly what the author was looking for and unable to find, but it's masking a bigger problem/opportunity here.

Why does the author have to worry about aliasing in the first place? If they're dealing with POD types, there's no memory safety issue implicated by allowing overlapping slices (except possibly when accessing from multiple threads). No-alias guarantees can improve performance in many cases (though rustc currently isn't doing the best job conveying them to LLVM), but not always, and especially in something like a video codec, the author may prefer to manually optimize memory access patterns.

In my own code, I worked around this by using &[Cell<u8>] as an aliasable mutable buffer type. I had a function to cast to it from &mut [u8], as well as functions to read and write integers of various sizes by directly casting the pointer (although this may not be necessary, as LLVM can probably optimize the naive approach of calling .get()/.set() on each Cell individually). Simple enough, and I'm pretty sure it's safe, but I've never seen anyone else use this approach, and I don't know of any crates.io crates that implement it. Maybe I should write one...


It is safe, and was even modeled explicitly as part of https://www.ralfj.de/blog/2017/07/08/rustbelt.html .

There's the external https://crates.io/crates/alias and even an approved RFC for adopting it into std: https://github.com/rust-lang/rust/issues/43038 .


Doh, way ahead of me then. Though it seems there are more steps needed, e.g. adding support to the ‘bytes’ crate.


Well, if we're having some bike-shedding fun:

    let (a, b) = c[..16..];


.split_at_mut is a method on a slice ... one is free to implement their own abstractions that ultimately terminate in an unsafe { block } that enable multiple mutable borrows.

   fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str) {
		// is_char_boundary checks that the index is in [0, .len()]
		if self.is_char_boundary(mid) {
			let len = self.len();
			let ptr = self.as_ptr() as *mut u8;
			unsafe {
				(from_raw_parts_mut(ptr, mid),
				 from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
			}
		} else {
			slice_error_fail(self, 0, mid)
		}
	}
this is literally the src to .split_at_mut Make your own version. Rusts power isn't what it enables, it is what it removes, undefined behavior, false sharing, etc. Frankly, I am tired of C/C++ programmers complain about Rust. This is a non-starter. I am happy to convert the F#, Haskell, OCaml, Scala and Clojure devs to Rust. Systems programming isn't magic.


What does Rust have Ocaml doesn't? If Ocaml solves the problem why Rust? Rust was made to solve the problem Ada solves.


> What does Rust have Ocaml doesn't?

Safe memory management without a garbage collector.

> If Ocaml solves the problem why Rust?

Ocaml doesn't solve every problem any more than Rust does. Rust may solve a subset of the problems people use the listed languages for better than those languages, so it might be beneficial for people to use it in those cases.

> Rust was made to solve the problem Ada solves.

Not all solutions have the same efficiency, or are as easy to understand. Rust aims to enforce safety at compile time to eliminate runtime safety checks where possible. Ada performs many safety checks at runtime, which introduces a performance cost. This does allow Ada to provide some additional safety checks though (such as limiting a type/subtype to a range of values).


It has [ ] nothing where a garbage collector would be. ML is beautiful language. I am not trying to get anyone to stop speaking their language of choice, but I am trying to get people who need to do systems-y things to use Rust. There are more similarities between the Rust programmer and the OCaml programmer than there are differences.


> The only legitimate complaint there is that it's tough in Rust to get two arbitrary mutable slices into the same array. Rust wants to be sure they're disjoint, to prevent aliasing. For some matrix manipulation, this is inconvenient.

I've only looked into Rust briefly, but I thought this was the point of unsafe blocks? You can roll your own multi-dimensional array data structure using unsafe blocks if you're certain what you're doing is actually safe.


Yes, unsafe blocks one way in which the Rust doesn't privilege built-ins and the standard library (much) over external code: people can write high performance a abstractions without them needing to be baked into the language. This is exactly what has occurred with multidimensional arrays with external libraries like ndarray.


Unsafe allows you to dereference raw pointers and foreign-function calls, but you cannot go around the borrow checker, which is what's causing annoyance.

That said, you can create raw and dereference raw pointers, which lets you to do the above.


If you take a ref into a pointer that lets you work around the borrow checker(and I've bitten myself just about every time I've done it now that I'm used to the borrow checker).


Using unsafe blocks and pointers can silence the borrow checker, but that doesn't mean that code is well defined. For example, using swap (mentioned in the article) with overlapping references would be undefined behaviour.


Multidimensional arrays are... completely orthogonal: if the compiler could determine row indices were different, it could determine that slicing indices were too.


Is that true? Consider variables a=3, b=4, c=7.

With a large single array of 100 elements, but divided into 10 10 element pieces by convention, a slice of elements might be accessed by getting the elements array[10a+b] through array[10a+c]. If we also want a slice from the next segment, that would be array[10(a+1)+b] through array[10(a+1)+c].

With a multidimensional array, of 10 bounds checked segments of 10 elements, we could access by array[a][b] through array[a][c] and array[a+1][b] through array[a+1][c]. If access is bounds checked, or the compiler can prove the values are within the sizes, then the multidimensional version provably has no overlap.

Or am I missing something here?


That's true: it probably does make some (probably common) things easier because you get to disjoint slices with simpler indexing conditions, but the problems are equally hard in the general case: the type checker/borrow checker has to understand arithmetic rules like `a != a+1`, and be flow sensitive for things like `if a != b { ... }` and other ways to imply the inequality. (And, it's only a subset of the cases when one wants to slice an array multiple times.)


> the type checker/borrow checker has to understand arithmetic rules like `a != a+1`

I was actually thinking of this when I wrote, but the opposite case. A lot of compilers already know about that, and exploit it to their benefit (and sometimes the programmers dismay[1]). Overflow semantics matter here.

That said, it's looks like an interesting area for Rust. I'm sure it's already been discussed to death somewhere in the community. :)

1: https://news.ycombinator.com/item?id=14163111


Optimisers reason about this, yes, but they don't define the user-facing language model. They take valid code and transform it in to (hopefully) faster valid code[1] and don't generally emit diagnostics or fail compilation, this means that they can do much more guess work and heuristics rather than having to have predictable, reliable or definable behaviour, all of which are useful for humans writing code (having a predictable optimiser is useful too, but difficulty writing any code at all is worse than difficulty optimising the spots where the optimiser isn't hitting performance targets).

Integrating this sort of thing into the type system "properly" (as in, actually part of the language and doing all of what I think Animats wants) basically means going a long way towards full dependent typing.

[1]: behaviour on invalid code, such as C relying on signed integer overflow, is unspecified/undefined. Languages like, say, Rust and Haskell generally try to guarantee the optimiser never gets invalid code by flagging such instances at compile time, unlike most/all C and C++ compilers.


Is it hard to implement such primitives for distinguishing ownership of subsections of an array? Or to extend Rust to do do that?


It's pretty easy to do it in unsafe code, or to use an existing safe function (implemented internally with unsafe code) like the standard library function split_at_mut to do it.

Here's a doc on how to write such a function yourself, and includes the (actual) source code for split_at_mut:

https://doc.rust-lang.org/nomicon/borrow-splitting.html

As others have mentioned, there's nothing special about the standard library in this regard; you can just write the method yourself and it'll work equally well. (The standard library is moderately special in that it gets to use nightly features because dealing with breaking changes is easy for code that literally lives inside Rust's own source repository, but it's not special in terms of its ability to use unsafe code or anything.)


I meant primitives that preserve Rust's (safe) ownership model.


Well, you can use the existing split_at_mut if you're worried about getting your hand-rolled unsafe implementation wrong. It's definitely reasonable if you're splitting on rows; it may or may not be what you want if you're trying to get a slice-like thing that splits on columns.

(There's no reason to avoid safe functions that happen to have unsafe implementations. For instance, the standard-library implementation of indexing a slice, as in b"hello"[3], is just a bounds check plus an unsafe dereference.)


> And don’t tell me about Bytes crate—it should not be a separate crate

I'd be interested to hear the author's reasoning behind this, if it does what they want then why not use it? It's small and well written, so I don't think vetting it should be a problem.

The rest of the article seems quite sensible, that comment just strikes me as a little odd.


There is a real change in mentality you have to go through if you transition from a fairly strict C/C++/ even Java background to trying out Rust. In the former languages, adding dependencies rapidly becomes a painful experience, whereas Rust does much better dependency management and automatic building than even Python (where you need a requirements file or something similar to go pull down all the deps).

With Rust, you really should just use crates. The std is meant to be limited to just the most used code and that which should not change for the sake of keeping the ecosystem stable.


> even Python

Off topic, but it shouldn't be better than "even Python", because Python has a really, really broken dependency system. Far more so than Java, which has Maven/Gradle which are both infinitely better than the pip/virtualenv disaster.

People complain about things like shading in Maven being complicated. What they might not realize is that pip doesn't even try to address conflicting dependencies, it will just silently give you the wrong version! You ask for A==0.2 and it will give you A==0.1 if another dependency asked for A==0.1 first. And it won't even warn you even though it's straight up broken behavior. Virtualenv makes packaging annoying since it's almost vendoring but not quite. To totally understand the packaging system forces you into the world of eggs, wheels, disutils, conflicting versions, condas, etc.

Sorry for tangent, just thought it was funny you would hold Python up as a standard of dependency excellence when it's probably the worst overall ecosystem of major languages.


I find it super weird how Python is still in this state while even PHP has managed to get something good going.

To me, this is a clear case of trying to fix a broken system whereas starting from scratch would be much better.


Yeah, as someone who's done a deep dive into Python dependency management (mostly because virtualenv has some insane ideas about how to make shell tools) that bit made me laugh.

There are a lot of great things about Python, but dependency management is right there under the GIL on the list of things that are very painful.


My language development went from C -> C++ -> Java -> Python. So when I got there and figured out pip was a thing (or easy_install back in the day) it was a major innovation at the time.

Additionally, for anyone coming from almost all compiled native languages, a native environment like Rust with better dependency management than Python (which, as you say, and in retrospect, is pretty broken) is a bit of a mind screw.


The Python devs I worked with always envied me for npm. I asked them if they don't have something similar with pip, but seems like npm is a whole different level.

Cargo should even be better than npm.

On the other hand I always asked myself if they couldn't simply use Nix?


I can't simply use Nix; it doesn't support Windows.


It doesn't run on the Linux subsystem?


I hear it sorta works, but nix-shell doesn't.

Regardless, while it's cool and useful, it's not really actual Windows support.


true.

Didn't take you for a Windows guy, hehe


Historically not! If you managed to dig up my old /. account you'd see "M$" and all that. Lots of growth and change since then, ha!

It's due to video games... but I'm actually enjoying Windows 10.


Haven't used Rust yet, is it kind of similar to Go where any file can simply import and then Go knows how to fetch and build when needed ? And then when you remove the calls (eg during a refactor) the compiler force you to remove the imports too, stopping the infinite bloat caused by "no one really knows if we still need this" that can be common in other language.

I really liked that "dependancy as part of the language and build tool".

Of course, Go still had its own dependancy issues (versionning, availability, ...) that they now work on, but that was a step in a direction I quite liked.


> is it kind of similar to Go where any file can simply import

You declare your crate dependencies in a Cargo.toml file. Rust does proper versioning of dependencies so having a separate manifest is desirable. Within your code you declare the existence of a crate via `extern crate` and then you can use it wherever.

> the compiler force you to remove the imports too

it warns.


Do you know if there are plans for rustfmt to auto-import like gofmt does? In the sense that if a crate is available (in the .toml file) and you reference it, rustfmt will automatically insert the required "import" and "use".


If the compiler gives you an error with a suggestion "You probably need to add `extern crate foo;`" (and it already does in some cases), RLS (i.e., your editor plugin) will be able to automatically add it for you (soon™).


There's been talk of it, but it hasn't been built yet.

It's more likely to be a part of RLS than rustfmt.


This would probably be an RLS thing.

There's a separate tool called rustfix in development that can apply the suggestions given by the compiler, so it could theoretically prompt for these.


Its kind of the reverse. You declare versioned dependencies in the toml file, and (soon) Cargo will add them to rustc so they just show up for your source files. Right now you declare them twice, once in the toml and once in the crate root - ie, you add url = 1.5 in toml, and extern crate url in src/main.rs. There is an RFC coming that will make the extern crate part optional.


A current problem with " should just use crates" is that cargo only works with source code dependencies, not binary libraries.

So each new crate added to your project ends up increasing the overall build time, which gets exponentially bad if you happen to add a dependency to a crate than happens to have a big dependency list.

I have a very basic word counting application with a GUI written in Gtk-rs, a fresh build straight out of "git clone" takes a few minutes, mainly thanks to Pango.


> So each new crate added to your project ends up increasing the overall build time,

That's only a problem the first time you compile though, isn't it?


A problem that I usually don't have with C++[0], Java[1] or .NET[2], thanks to the build systems support for binary libraries.

[0] - I eschew header only libraries, only if there is no technical way around them (e.g. templates).

[1] - I can AOT compile with Excelsior JET, IBM J9, JamaicaVM, PTC, , Oracle Java 9 (Linux x64 for the time being), ...

[2] - I can AOT compile with Mono, NGEN, .NET Native, CoreRT, IL2CPP, ...


I'm not saying it isn't a problem, I just wanted to know if I was correct in thinking that it is only a problem the first time you compile each crate (or rather each version of each crate, I think).


I see.

Yes, it is a problem when you do a clean build, or when you have common crates across projects, because cargo doesn't have a concept of build cache.

You can try to workaround it by setting all target directories to same one via target-dir in your .cargo/config file, but there is no guarantee that the crate won't get rebuild.


Every dependency can run code on every computer your project runs on. That means you have to trust its author to:

1. not be malicious

2. not write a vulnerability by accident

3. not get their computer infected, their email account hijacked, etc.

4. be wise in transferring ownership

5. not add a dependency with a license incompatible with your project

All the above concerns apply recursively to the dependencies of the dependency.


A little copying is better than a little dependency. https://go-proverbs.github.io/

As a developer at a large software company, every dependency that is not part of the language runtime itself is a pain because legal paperwork and evaluation has to be done for each individual component before I can use it / ship it.

NOTE: I am not referring to copying I would be doing, but to the crates thats have little dependencies that should consider including a copy of their little dependency instead via whatever method is appropriate for licensing.

This makes languages like Python, Go, Perl, etc. preferable over languages/projects like node.js, rust, etc. because (at least I) don't generally end up with tens/hundreds of dependencies since the standard library is rich enough for most work.

Additionally, I personally despise have many, small dependencies because it means I have more to think about and manage. Instead of just being able to think about the version of the compiler/standard library I'm using, I have to consider every individual crate.

I'm well aware of why rust chose to make certain tradeoffs with crates, but having numerous pieces of what many of us consider "basic functionality" as a third-party dependency is frustrating. Languages with richer standard libraries have spoiled us all.


> A little copying is better than a little dependency. https://go-proverbs.github.io/

I really disagree with that quote. Copying is how you get bugs sticking around in software for all time (for example, doing binary searches in a way that avoids overflow is surprisingly tricky, and the endless copying of naive binary search code is why this bug is so difficult to eradicate). Honestly, that quote is just an excuse to avoid the hard work of making the language ecosystem handle dependencies properly.

> As a developer at a large software company, every dependency that is not part of the language runtime itself is a pain because legal paperwork and evaluation has to be done for each individual component before I can use it / ship it.

How is copying better than dependencies in this regard? You presumably need legal signoff either way.

> Additionally, I personally despise have many, small dependencies because it means I have more to think about and manage. Instead of just being able to think about the version of the compiler/standard library I'm using, I have to consider every individual crate.

This is what the "Rust platform" is designed to address. It is nice to be able to refer to a specific version of the Rust platform, but that doesn't mean you have to give up on the massive ergonomic benefit of the Cargo ecosystem relative to copying and pasting code.


I believe they mean, each time a source code is downloaded and used that contains a LICENSE file, a legal review must occur. So if it's bundled into one licensed work "Rust With Lots Of Crates Bundled", then that's one form to fill out, but if it's "Rust" and then "Download And Use Crates", that's one form per addon to fill out.


But you're bound by the license even if you copy and paste the code into your project instead of using Cargo.


I think this is where reality meets theory. In reality, the developers are probably just taking the code as if they had written it, and the people that may know, such as immediate supervisors, don't care to point it out for the same reason the developers are stealing it, it's much easier than the alternative. The code vetting team is just left in the dark.

Employees take shortcuts around bureaucracy all the time. Sometimes (often?) that bureaucracy is for legal reasons.


I'm not going to endorse copying over package managers on the grounds that copying makes it easy to get away with violating big companies' legal procedures on the use of third-party code.


I wasn't endorsing, just providing an explanation of why while in theory copying and package inclusion are the same from a license standpoint, they likely often aren't in reality. That doesn't mean it's a good thing.

Your argument is the correct moral and legal one. Unfortunately that doesn't always matter. For another example, see the cognitive dissonance many express regarding ad blocking (not to come down entirely on one side of that issue, it's complicated).


> and the people that may know, such as immediate supervisors, don't care to point it out for the same reason the developers are stealing it, it's much easier than the alternative.

If your company gets aquired one day, the code will probably be audited during the due diligence process. If licence violations related to copy-pasting is found, your team will be asked to remove the infringing code and your supervisor may be fired. This happened to my team in the first company I worked for (not the firing part though) : we had a lot of code which was just copy-pasted from lodash and the audit found it.


Yes, anytime source code is retrieved that isn't an existing, approved version, legal review of some sort must occur. This includes even referencing it despite what the other poster mistakenly believed I was implying.


I see what you mean, would something like cargo-vendor[0] help with this?

[0]: https://github.com/alexcrichton/cargo-vendor


I think the argument is where that line is. Maybe copy/pasting binary search code is too much, but do you need a dependency for left pad? There's a line somewhere.


Left pad was a problem for a number of reasons, none of which apply to Cargo (cargo yank never breaks code, by design, while the npm equivalent did). It's not relevant at all.


Sure it is. We're talking about what should and shouldn't be a dependency. The acute problem with left pad was npm's design, but the cultural problem (if you consider it a problem) was that anything depended upon something so small in the first place.


The circumstances that led to the left-pad fiasco were because of Javascript's uniquely anemic standard library (at least until very recently). Rust's stdlib is not small in the same way that Javascript's historic stdlib was. Rust's stdlib is narrow, yet deep: a relatively small number of modules that themselves provide a very large number of operations and convenience functions. Rust dependency graphs can get pretty big, but in practice they're nowhere near as big as the dependency graphs you'll see in big Node apps because the stdlib is so much more fleshed out. That order of magnitude difference is crucial; one might call it "microdependencies versus minidependencies".


Agree 100%. I think Rust's stdlib is useful and coherent, and shows the way for other libraries. It's a great piece of engineering.


Rust has the capability to, and I believe the developers have expressed they are amenable to, internalizing crates that become the best solutions for a problem.

Would you rather a flawed, or later deemed incomplete internal solution be implemented and then the language is forced to support it in perpetuity, or would you rather one or more solutions get tried and the best implementation and syntax eventually accepted into core?

EcmaScript can do the same, and finally has[1], but it moves so slowly and has so many competing interests that it seems to take forever for that to happen.

1: https://www.ecma-international.org/ecma-262/8.0/index.html#s...


Oh God no, haha. I started out in Python and I'm pretty sure all of us have fallen out of love with batteries included.

You can see what happens when you go too far the other way though. C has effectively no basic data structures like strings, lists, hash tables, etc., so anything you interface with has its own idea on how to handle that stuff. Library X might return an array of Things that's NULL terminated. Library Y might return an array of Thangs and a size_t output param. Or like you pointed out in JS, its standard library is full of holes so you get tiny projects that attempt to plug them, or larger projects that try to make JS into a specific kind of language (Underscore), or full on programming languages that transpile to it.

I just don't think the problem is definitively solved though. Personally I think Bytes should be in Rust's stdlib. I think bit and byte manipulation is a fundamental part of a language and there should be a standard way of doing it, especially if there are things like TCP/UDP and hash tables in there. I understand the arguments against; I really like the design of Rust's stdlib, but I feel like there's room for disagreement. That's all I'm saying :)


> the cultural problem (if you consider it a problem) was that anything depended upon something so small in the first place.

It's not a problem, in my view.


How is copying better than dependencies in this regard? You presumably need legal signoff either way.

I wasn't referring to copying that I would do myself, but copying that other crates would do.

That is, if a crate only needs a little bit from a little dependency, then copying it into their crate can make everyone else's life easier (obviously taking licensing into consideration when doing so).

In short, the context here was the bytes crate, which is fairly tiny. If rust is going to insist on not including the bytes crate, or a copy of it, in the standard library, then I would hope others that consume it would consider embedding a snapshot of it into their own crate for their own, private use so that I don't have to worry about it.

I'm well aware there's a fine line here, hence my reference to the Go proverb.


> so that I don't have to worry about it.

What would you be worrying about?


The short version is that a component distributed with an embedded copy of its dependencies means a single legal review since it's a snapshot in time of a particular version of that component and its dependencies.

A component that instead references its dependencies and that have their own release schedule/versions, etc. requires a legal review for that component and each of its dependencies.

This has been true at multiple employers I've worked for, so seems unlikely to be a consideration unique to my current employer.


Again, that's what the Rust Platform is for. It's a better solution than copying code, because it doesn't throw away all of the benefits of Cargo just to make some legal policies at some large companies a little easier.


We're going to have to agree to disagree.

This is where I actually prefer Go's "vendor" approach to dependencies. It would be great if rust / cargo eventually had the same and more authors adopted it or simply copied their little dependencies instead of having external dependencies on them.

Something like this proposed command, except for crate maintenance instead of distribution:

https://users.rust-lang.org/t/cargo-cook-subcommand/10288


I sincerely hope that people never start copying code into their packages. I see virtually no upsides, except for making it easier to dodge bureaucratic hurdles at some big companies, and a huge number of downsides (basically forgoing all the benefits of Cargo).


FWIW `cargo vendor` already exists, just not part of Cargo itself, but rather a tool by one of the core devs.

It's even used for releasing the official Rust tarballs as we now employ crates.io dependencies in the standard library and the compiler.


Does it do more than using relative paths in a Cargo.toml would do?

I think this thread is about copying and pasting code versus using a small library in the Go case, which might be a philosphical difference with Rust.

It might help to point out that vendored crates are compiled from source making the required review process referenced by that poster just as possible with server crates.


Gotcha, thanks.


"Dependency" doesn't imply "third-party library." There are plenty of crates that are maintained by the Rust organization itself. You could think of them as a "non-standard library."

(This isn't uncommon; it's also true in, say, Elixir: there are a few useful Hex packages owned by the elixir-lang GitHub org itself. And I believe it's true in Haskell as well.)


It usually does for legal review purposes, in my experience. If those things aren't part of the "standard distribution", they have to be evaluated separately. Especially if they have a different release schedule.


Hmm. I guess this might justify the Erlang/OTP approach: shipping a "platform" or "distribution" release that contains your core packages/stdlib—along with a bunch of other, seemingly "extraneous" packages that you also take responsibility for—bundled together as your language's SDK.

Unlike a huge stdlib, a "distro"-style SDK is still factored into packages (in Erlang terms, "applications"), that can be included or excluded from any given release of your project. But it's all released monolithically, and comes as one big package. Probably helps a lot with getting legal sign-off for using the relevant packages. I wonder if that's why they (still) do it?


I personally agree with this philosophy - more the "use the standard library" than "copy stackoverflow".

I can't count the number of times we've had problems with the requests library, either because of the huge tree of dependencies (both explicit and implicit) that requests has, and because of some of the assumptions made by requests.

On the other hand, when a bit more time is taken (yes, this means a few lines of boilerplate) and the code uses urllib2, it rarely has to be touched again.


Some C & C++ programmers are averse to third-party libraries (moreso than other languages in my experience). It's a valid position, but if you really value that then perhaps Rust is not for you.


I can't speak for the author of course, but probably their argument is that a systems language should be able to directly manipulate bits and bytes without outside dependencies. I don't know that I agree. A reasonable counter example is that you need library support to allocate memory in C. The argument is that it's a feature to not require C implementations to include dynamic memory allocation because not all projects allow it. My point being that what "should" be in a language usually depends on what you're using it for, and for a general purpose language keeping that very small is at least a consistent design.


You don't need a library to allocate memory in C. For example on Linux, use sbrk() system call to grow the heap, then start using it.

But your larger point still stands.


I think that the point was more that heap allocation is not a standard language primitive in C. And indeed, it would be ludicruous to require dynamic allocation support from freestanding implementations (no standard library because typically your platform doesn't even have an OS).


For what it's worth, C++ can have freestanding implementations and it provides dynamic allocation support at a syntactic level. Freestanding applications, however, need to provide an "operator new" function (with the appropriate memory allocation code) if they want to use it.


Ah sorry camgunz I interpreted the counter example as claiming malloc() was a dependency. referencing the wrong subject!


Aha yeah, fair point :)


I agree with the author.

Bytes is basically just C++'s `std::string` (kind of don't bite my head off people who've memorized the C++17 standard). Its an ARC backed array [1]. This suggests it should be a fairly fundamental abstraction.

Edit: ^^^My C++ is wrong sorry :(

Really I disagree with its purpose. In its immutable, non-threaded safe form you can create the same structure by just borrowing a value. This ofc requires making your peace with the borrow checker and `Cow<'a, T>` copy on write types.

By in large Bytes advertised purpose is network code. And for networking code its really only _super_ useful if your using a Packet Ring in Linux. As jemalloc will not return regularly re-allocated buffers.

Really this is all performance theater. How you manage/architect your socket reads/writes will have an order of magnitude larger effect then what abstraction you _store_ those bytes in once read.

[1] https://carllerche.github.io/bytes/bytes/struct.Bytes.htm


Bytes is intended for use with the tokio ecoystem where you cna't use references often because borrowing across a yield point in a future would be a lifetime error.


Can't `Vec<u8>` serve the same purpose?


Then you have to do a deep clone every time. Arc<Vec<u8>> is also not sufficient, because Bytes lets you share a reference count among slices to different offset into the buffer.


Being an ARC backed array isn't like std::string, and I don't see why having that copy-on-write behaviour implies it is fundamental. Could you expand?


In pre C++11 it was quite typical for std::string to be implemented with COW semantics. Since C++11 standard it is no longer permitted, though it is not necessarily reflected in all stdlib implementations.


This is my mistake. I assumed shared pointer semantics and applied it to the wrong type :\


Codecs are actually a wonderful test-case for Rust: they're a frequent source of vulnerabilities because they have to deal with arbitrary input, they usually have to perform fiddly sub-byte and/or array manipulation, and their pressure for maximum performance mean they have to make as efficient use of the hardware and memory as possible (and any mistake lead back to the first point)

So even if you don't agree with some of the author's points, his use-case is a good test for the suitability of Rust as a Systems Programming Language.


Yes, let us never forget how many exploits there are in trivial codecs like ICO and BMP, because they're written in C(++): https://bugzilla.mozilla.org/show_bug.cgi?id=775794#c0


BMP isn't so trivial, largely because it's been extended multiple times while never having a proper spec.

http://searchfox.org/mozilla-central/rev/bbc1c59e460a27b2092... has some of the gory details.


And .ico is just .bmp with some extras.


A lot of good comments on Reddit, with answers/solutions/discussion of the points brought up here: https://www.reddit.com/r/rust/comments/6qv2s5/rust_not_so_gr...

These kinds of experience reports are so valuable. (I've been following the whole series and they're very interesting.) Even if solutions to these issues do exist, if people can't find them, well that's a problem too.


Seems like (from that Reddit thread) a lot of the things are in progress in the nightlies. I wonder if that means the nightlies are holding Rust back (because it relieves the pressure to actually ship features for use by normal developers).


Trust me, being available only on nightly does not reduce pressure :)

Most of our users use stable, and want to stick to stable. There's just some use-cases that need features we haven't finished designing yet, such is life.


Might very well be; I have the luxury of not looking a nightlies at all :) it's just been frustrating how often problem reports get a response that the solutions are in the nightlies. I suppose it's possible that it's more meant as "it's being worked on" rather than "this is not a real issue, stop complaining", but the acknowledgement that it's actually an issue for people using release builds seem to be missing whenever nightlies are brought up in these sorts of responses.

Contrast this with other languages (C++, Python, Ruby, Go) where I have no idea what features are under development until they hit a stable release. Compare to JavaScript, where stable releases are so slow people built transpilers ;)

This could also be because I'm interested enough in Rust to occasionally land on the first-party discussion forums, of course (probably helped by how open the language development is). I'd say it's also young, except that it now has a release number much larger than 1.0 and doesn't really get to claim that anymore.


> I suppose it's possible that it's more meant as "it's being worked on" rather than "this is not a real issue, stop complaining", but the acknowledgement that it's actually an issue for people using release builds seem to be missing whenever nightlies are brought up in these sorts of responses.

I think that's the right assumption, because it's an acknowledgement that the issue actively being worked on to the degree that they have at least a partial solution implemented, as opposed to the "someone is working on that" response language developers sometimes give which usually predates that and could mean someone is reading literature and surveying the field and no code has actually been written.

As you note, because development is so open, nightlies are easily accessible, so people might incorrectly infer that they are being recommended to use them, or that the problem is solved. I think what the devs are attempting to communicate is that it's close, and if you want you can test it out to see what it's like and give feedback. This might be a case of similar groups with slightly different context (lang devs / lang users) interpret a statement differently. From what I've seen, the Rust devs see fairly responsive to communication issues when pointed out, so maybe this will result in some change (or perhaps they are acutely aware of the issue already and instituted changes have not shown benefit, at least yet).


Yeah, I appreciate that nightlies are an option. It's just frustrating how often that option is trotted out without the caveat that it isn't a real solution. (This is again basically reacting to a particular Reddit comment linked above, but those sorts of comments are things that are more memorable.)

Actually interacting with rust-related people, whatever their capacity (developer, user, etc.) tend to go pretty well; it's just that once in a while a random comment shows up that seems dismissive of actual problems people are reporting.

You're probably right though, it's just that people have different contexts and I'm reading more into things than intended. Again, rust development is extremely open (which is great) and these things just tend to not be visible for other languages, rather than not happening.


> I suppose it's possible that it's more meant as "it's being worked on" rather than "this is not a real issue, stop complaining",

Yup, that's absolutely it. I'm not sure of a good, succinct way to imply the former rather than the later...

> it now has a release number much larger than 1.0

We do releases every six weeks, so while it is in some sense, it's also just going to be higher than other languages.


> Yup, that's absolutely it. I'm not sure of a good, succinct way to imply the former rather than the later...

Personally, a sentence saying something to that effect (i.e. showing that it's understood that it doesn't immediately hello just yet) would be sufficient. It would probably get tiring every time nightlies are mentioned, though, but it means people (like me) who randomly land on a thread would know that the problems are acknowledged.

> We do releases every six weeks, so while it is in some sense, it's also just going to be higher than other languages.

Right, but the problem is that the rest of the world has expectations on those numbers (e.g. x.0 is probably buggy, >0.x is a thing external people are expected to be able to use). The particular minor version number doesn't matter as much. That ship has sailed (and probably circumnavigated the world a few times) though.

The biggest problem is probably that I'm comparing it to golang, which was the last 1.x language I learned, and that had a more complete standard library by 1.0. Again, likely the effect of the much more open development model (no baking behind closed doors).


I think that was a minority of the points; the majority of the points was in the space of the author not realizing something was possible in Rust.

Out of these points they were all for features which are steadily on their way to stable (or future things which are actively being worked on right now).

Very few folks use nightly -- many use it locally for improved developer tooling, but mostly everyone wants their crate to work on stable.

So I wouldn't say it relieves pressure that way.


The link doesn't seem to work for me at the moment. Here is Google Cache: http://webcache.googleusercontent.com/search?q=cache:qOUhWkj...


Not a Rust expert, but some thoughts on the negatives.

> Compilation time is too large

Can you try compiling incrementally? https://blog.rust-lang.org/2016/09/08/incremental.html. Might still only be on nightly.

> And, on the similar note, benchmarks.

I agree, profiling as well isn't as full featured as in more mature languages. Clojure, incidentally has great benchmarking due to being on the JVM.

> Also the tuple assignments.

Can't you just do

    fn main() {
        let (a, b) = (5, 2);
        println!("{}, {}", b, a);
    }
>There are many cases where compiler could do the stuff automatically.

I think this will be solved with the new non-lexical lifetimes RFC. Also a problem I had when starting, I generally assume referential transparency.


> Clojure, incidentally has great benchmarking due to being on the JVM.

Eh? Benchmarking on the JVM is notoriously difficult bordering on impossible. There's things like Google Caliper but test runs take forever due to attempting to force JIT warmup and doing GCs after every run. And the project's own wiki tells you the results are basically meaningless for a variety of reasons.

Benchmarking things like C++ or Rust are trivial by comparison since when you call method foo() it's gonna do pretty much the same instructions every time. Highly consistent, highly repeatable, highly benchmarkable. Call method foo() in a JVM language and there's not a single person on the planet that can reliably tell you what's going to get executed on the metal.

That's why you typically profile JVM languages rather than benchmarking them.


My bad, I meant profiling, not benchmarking.


Benchmarking is pretty meaningfree in the best cases. The JVM isn't any worse in this regard.

The JVM is slow in many regimes due to things like JIT warmup. Rather than accepting that many JVM people are hypersensitive: thou must only benchmark in the JVM's best case scenario.


> Benchmarking is pretty meaningfree in the best cases.

That's completely false. Benchmarking is a core staple of building & evaluating performance-sensitive libraries or other routines. It doesn't work (well) in non-deterministic languages which reduces its usefulness scope, but in things like C++ it's highly useful and reliable for evaluation of libraries and monitoring for regressions.

> The JVM is slow in many regimes due to things like JIT warmup. Rather than accepting that many JVM people are hypersensitive: thou must only benchmark in the JVM's best case scenario.

Well it's not just the JIT that's a problem. It's also things like GC passes. Does that get included in the results or not? Do you force GC passes between runs? How about finalizers? The answers to those questions depends on the state of the rest of the system and the expected workload, it's not something you can just trivially answer or even accommodate in a framework since most of the behavior is up to the particular implementation, which can then further vary based off of command line flags.


> Can't you just do

That's a declaration not an assignment, TFA is (somewhat oddly) using the language precisely and fittingly: https://doc.rust-lang.org/reference/expressions.html#assignm...

And there are cases where shadowing is not acceptable e.g. when updating bindings within a loop, you usually want to see the updates from outside the loop.


I've found that using Linux tools like perf work well for profiling Rust, since it's compatible with C.


Yup! perf and kcachegrind actually work nearly as well for Rust as they do for C/C++. It's all just another LLVM compiler that emits debug info, as far as Linux tools are concerned.


None of these seem related to codecs as opposed to the author's preference for how s/he likes to code.

Not that this isn't super valuable feedback, but the title is misleading.


I tried Rust for the first time a few months ago. One of the things that really impressed me was that it was very easy to just get started. Even though the language is very different than I'm used to, I was able to "just jump in" without jumping through a lot of hoops. I was working with 3rd party crates very quickly.

One thing that I think does need improvement is the error handling. Far too many idiomatic examples just panic for error handling. There's no good way to just group errors by a higher-level classification and fall through to a generic error handler. As a result, the choice is between panic on error, or very detailed and verbose error handling. There's no middle ground.


> There's no middle ground.

Not at all. This is what the ? operator is for (was try! before).

The idiomatic rust way is to have your own Error type that implements From for other Error types that you may encounter (io::Error, Utf8Error, etc.).

Then you can simply just write let result = dosomething()?; for any operation that may fail. If an error happens, function will return with the error.

I don't think you can do it any cleaner, given the constraints Rust is operating under.


The From is The Right Thing, but it can be hard to convince people used to unchecked exceptions and costless casting of this. (I say costless, you pay for it every time you don't cast...)


For error handling, I'd encourage you to look at error-chain. It makes it pretty simple to propagate all your errors with try!/? so long as all your functions return a Result. And the quick_main macro can handle the panicking if you let it propagate all the way to main. Hopefully once the ? In main RFC gets released, quick_main won't even be necessary.

There's a couple of annoyances to it. The documentation is really sparse for something as integral as it aims to be. And there's a frustratingly large number of third-party crates that don't use standard error types and prevent you from calling chain_error to convert from their error types to yours.

But once you get the gist of it, a few lines of boilerplate gives you the ability to handle errors concisely and correctly throughout your program.


There actually is a way to do that, which is to return Results, with error types that implement From so the nested errors can be "wrapped" in the generic ones, and then you can just use the ? postfix operator as an alternative to unwrapping in order to bubble up the errors.

If you're designing your own errors then there's some boilerplate you need to allow for the error wrapping, though you can always take the cheap way out and use Box<Error> as your error type (as any error can be wrapped in that).

And there's an RFC that's been accepted (https://github.com/rust-lang/rfcs/blob/master/text/1937-ques...) to allow main to return a Result, so you can use the ? operator in main, which will let you avoid even more unwraps, especially in example code.


There's certainly middle ground. I spent some time explaining it my CSV tutorial: https://docs.rs/csv/1.0.0-beta.4/csv/tutorial/index.html#bas...


There's a whole chapter in the second edition of the book devoted to error handling https://doc.rust-lang.org/book/second-edition/ch09-00-error-...


Yes, that chapter painfully explains my point.

Most of the replies pointed out the new ? operator, which I'll have to check out. (One of the things that I like about the Rust community is that they constantly improve.)

The thing is, if you've only handled errors via return codes in C, then Rust's system is a major improvement. The challenge comes once you've programmed with exceptions that have inheritance. It's pretty easy to set filters higher up in the stack and "not care" lower in the stack. This allows you to effectively ignore error handling for functions that only fail in obscure corner cases, because a filter higher up in the stack can handle so many different errors. (Operations that require cleanup can also be written in a way that cleanup code always runs, as some languages support try-finally.)

That's probably the hardest thing for me to adjust to in Rust. Lifetimes take care of the try-finally part, but I still can't figure out how to have generic error filters higher in my stack.


At the point you want to handle them, you use match instead of ?, and then handle the cases you care about.


Existing error handling facilities in Rust may not suit every use-case. For me personally, the difficulty with the `?` operator (and try! macro) is that the File + Line info about the source of error is lost. Also, I want something simpler than the approach taken by the `error-chain` crate. My current solution is to use some custom macros to check for errors and display an execution trace (example here: https://play.rust-lang.org/?gist=a7fb903ce2bbb37914ab380d342... )


Does rust have __FILE__ etc.? Special constants?



I was thinking the same as you until I learned about Result<(), Box<Error>>. It’s super quick and not really that far off from the detailed hand-roll solution.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: