Rust lifetimes: Getting away with things that would be reckless in C++

missblit · on Sept 20, 2014

In C++ if the string is a rvalue reference you could std::move it into part of the return value. Think a signature like

    template<typename T>
    std::pair<std::string, std::vector<std::string_view>>
    tokenize_string(T &&str);

This would be efficient when the user passes a temporary, and it would be safe.

Which isn't to say the Rust solution isn't totally cool. Being able to easily check this class of errors at compile time is probably a lot nicer than needing to learn all the relatively complicated parts that would go into a easy to use / safe /efficient / still slightly weird C++ solution.

lilyball · on Sept 20, 2014

Sure, and then somewhere along the way you throw away the first element of the pair, because you're not using it, but you're still using the views, and oops you just reintroduced the bug you tried to fix.

Which is to say, yes, you can obviously write C++ code that works. But you run the risk that one tiny mistake, or a change weeks, months, or years later, causes memory issues. Being able to completely rule out this class of error at compile-time is really amazingly useful.

missblit · on Sept 20, 2014

Yes I agree completely. I was thinking about that after I wrote my comment, the code is only safe relative to the original sample, and it's only a band-aid for the lack of more comprehensive lifetime guarantees.

Of course you can jump through more hoops, by encapsulating the string, string_view data in a proper class. But when there's tradeoffs between safety and convenience, there's a point where people will understandingly choose convenience.

ajross · on Sept 21, 2014

That's a forest for the trees argument. The ability to introduce bugs ("make a tiny mistake") in future changes is an inherent property of software, and you can't fix this in the general case. Rust just fixes this for the case of free-memory-read bugs. That has value, but it's a much more limited scope than you're implying.

Really I think this is the biggest problem with Rust. The stuff broken about C++ isn't really its memory model, it's simply its complexity. And every time I look at what's happening in Rust I see the same kind of complexity being introduced at every turn. I mean seriously: "named lifetime parameters"? And the syntax is a bare single quote? That's a joke, right? No one but Rust nerds is going to grok this, and real code needs to me maintained by mediocrities.

Frankly Go seems to have this "don't introduce new nonsense" problem under much better control, so if I need to pick a horse in the race, it's that one, not Rust.

dbaupp · on Sept 21, 2014

Rust prevents significantly more than reading freed memory, it is designed to make all memory safety problems (dereferencing an invalid pointer, reading freed memory, data races, ...) impossible without explicitly opting in.

Memory safety bugs are particularly bad bugs, as they represent a program that is completely off the hooks, allowing an attacker to take control of the program/computer by inserting shell code or to read private keys directly out of memory (for example). Even if you're not in a situation with aggressors, memory safety problems can lead to the program scribbling all over memory resulting in random crashes and misbehaviour, and these are so hard to diagnose since there's spooky action at a distance, and they can easily be timing sensitive/heisenbugs.

The memory model of C/C++ is definitely broken, look at all the security flaws in programs in those languages.

> Frankly Go seems to have this "don't introduce new nonsense" problem under much better control, so if I need to pick a horse in the race, it's that one, not Rust.

If you're doing something were Go is the appropriate tool for the job, that's sensible... but, if you're trying to write a library to be called from other languages, or run on embedded devices, or have precise control over memory layout (including the heap), languages like C, C++ and Rust will likely be much easier.

lilyball · on Sept 21, 2014

This comment is a joke, right? I find it hard to believe that any programmer would say "the compiler prevents me from writing broken code? That's too complex!"

Yes, a proper typing system involves more visible moving parts than a broken typing system. But a proper typing system is vastly more powerful. C++ may not have lifetimes, but every time I touch C++ I have to be really careful that I don't accidentally introduce memory or threading bugs, because they're both really easy to do. Someone who's programmed C++ for a long time might not even realize it, but then again, the vast majority of C++ code in this world almost certainly contains numerous memory and threading bugs.

And why the comparison to Go? Go doesn't live in the same world as Rust. It's not a systems programming language. At this point it's mostly a glorified web server application language. And if you're going to pick Go because of the lack of memory lifetime issues, you may as well pick any other garbage-collected language out there.

ajross · on Sept 21, 2014

Again, forest for the trees. Your laser focus on one particular type of bug blinds you to the fact that complex systems lead to bugs in general. I'm not interested in defending C++ (or Go, or anything else). I'm saying I see Rust falling down the same "cool new ideas" rabbit hole as C++, and Haskell, and Erlang, and Common Lisp, and... Of those languages only one has achieved any notable success.

And btw: the idea of calling something a "systems programming language" that can't even make a system call is laughable. Can I mmap a buffer to pass to a hardware video codec or make an ioctl to an audio driver in rust? Guess which language I can do that in? If "Go doesn't live in the same world as Rust", then Rust isn't in the same galaxy as C/C++.

dbaupp · on Sept 21, 2014

Rust has inline asm support, so system calls are easy and efficient, e.g. http://mainisusuallyafunction.blogspot.com/2014/09/raw-syste... .

If you quibble that it's not in the standard library, note that we are talking about the actual languages here, not their standard libraries (that crate could easily be pulled into the official distribution if it was deemed sensible to do so).

mercurial · on Sept 21, 2014

> Your laser focus on one particular type of bug blinds you to the fact that complex systems lead to bugs in general.

Therefore it is clearly in your interest to use languages that remove various classes of bugs. No language claims to be able to prevent bugs. But if your compiler can prevent you from making silly mistakes, that's a win.

> I'm saying I see Rust falling down the same "cool new ideas" rabbit hole as C++, and Haskell, and Erlang, and Common Lisp, and... Of those languages only one has achieved any notable success.

What bizarre argument that is. Haskell is a research language, what else could it be but be about "cool new ideas"? And it does effectively remove different classes of bugs, though like most functional languages, it's a managed language, and thus not terribly suitable for low-level programming. Erlang was designed to solve a certain class of problems, and it clearly solves these issues better than C++. Not everything and everybody needs to solve low-level issues or use inline assembly.

Rust can use unsafe code (guarded by unsafe blocks), but this code has to expose a sane interface. So you can do pointer trickery that is otherwise impossible, but instead of having an entirely unsafe program, locations with dangerous code are clearly outlined.

pcwalton · on Sept 21, 2014

I don't know where you got the idea that you can't mmap a buffer or call an ioctl in Rust. You certainly can do those things.

ajross · on Sept 21, 2014

I got the idea from the docs, honestly, which don't talk about system interfacing at all. Though I do now see the "unsafe" page, which at least has assembly hooks.

Serious question though: has anyone ever done this? I mean, are there kernel header import utilities I can use to get the flag definitions and parameter structs, etc...? You can sneer and downvote me all you want, but it seems that the clear truth is that Rust has not been used as a "systems programming language" in the only regime where IMHO that term has any meaning.

Basically: drilling a hole through the compiler to the assembler (which I'll state again is done in Rust as yet more complicated syntax users need to learn and not as, say, a native library interface) is a valuable tool, but it does not a syscall interface make.

pcwalton · on Sept 21, 2014

There is rust-bindgen, which you can use to convert C headers to Rust FFI declarations. This should be able to convert the kernel userland headers to Rust declarations, so that you can call directly to the kernel.

(I didn't downvote you.)

Ygg2 · on Sept 21, 2014

> Rust has not been used as a "systems programming language" in the only regime where IMHO that term has any meaning.

Rust has been used to make a toy kernel (as in press key, get message printed and/or change the color of the message). And this was several months ago. I'm pretty sure that makes it a decent systems programming language.

dbaupp · on Sept 21, 2014

The Rust language provides the tools required for people to write a system call interface (which can be in the standard distribution or outside it, both have the same power), I don't think building such a thing into the language (rather than in a library) is at all sensible.

As others have pointed out people have written linux kernel modules, kernels and all sorts of other low-level things in Rust.

(Rust's inline assembly interface is essentially "write in ASM"... If you're truly needing to go that far down yourself, then knowing assembly seems very reasonable. However, this shouldn't be needed much, just use libraries that do it internally.)

lilyball · on Sept 21, 2014

People are writing kernels in Rust. If that doesn't qualify as systems programming language, I don't know what does.

pjmlp · on Sept 21, 2014

> And btw: the idea of calling something a "systems programming language" that can't even make a system call is laughable.

So C and C++ are laughable for systems programming, because the language standard doesn't cover system calls.

The APIs for systems calls and support for inline assembly are language extensions.

pcwalton · on Sept 21, 2014

That we can't fix all bugs doesn't mean we shouldn't take steps to fix as many as we reasonably can.

Your argument applies equally well to sandboxing, for example--why go to all the trouble to lock down a complicated Windows sandbox if installing malicious software is only one kind of security vulnerability? The amount of subtlety in a Windows sandbox is incredible (all it takes is one leaked privileged file handle), and there are plenty of ways that security vulnerabilities can happen that aren't thwarted by a sandbox. But nobody is arguing that sandboxing is pointless, because it massively improves security.

dllthomas · on Sept 21, 2014

"No one but Rust nerds is going to grok this"

This is the first rust code I've looked at, and I grok this. As someone currently doing C dev in my day job, and who's done a lot of C++ historically, ability to enforce lifetimes sounds outstanding - particularly as it probably combines marvelously with RAII.

lilyball · on Sept 21, 2014

It does combine marvelously with RAII, you're absolutely right! For example, the Rust Mutex class contains the data it's protecting, and it returns a RAII object when you take the lock. This RAII object provides mutable access to the contained data, and it uses lifetimes to a) prevent you from keeping a reference to the data after unlocking, and b) prevent the RAII object itself from outliving the Mutex.

This would wonderfully, and it means it's impossible to bypass the lock. Which is something I really wish I had in other languages; just a few days ago I finally fixed a subtle intermittent crash in a C++ program that was caused by taking the wrong lock when touching shared state.

lilyball · on Sept 21, 2014

s/This would wonderfully/This works wonderfully/

Ygg2 · on Sept 21, 2014

As a mediocrity, I think you are making a big fuss about it.

Yes. Lifetimes are hard. Yes, it requires lots of compiler wrestling. However Rust compiler gives pretty good hints what lifetimes should be added and deduces lifetimes in few simple cases. Additionally you can opt out if if you are willing to allocate more memory.

pfultz2 · on Sept 21, 2014

Actually, it should be written more like this:

    template<typename T>
    std::pair<T, std::vector<std::string_view>>
    tokenize_string(T &&str);

So then it uses `std::forward` to conditionally move the string for rvalues, and `T` will be a reference to the original string for lvalues.

svalorzen · on Sept 20, 2014

Or, you know, instead of returning two C pointers which in modern C++ makes no sense, return a vector of `std::pair<size_t,size_t>` with position and length of each substring, and if needed use `std::string::substr` to extract the parts you need.

ekidd · on Sept 20, 2014

(Original poster here.)

Yes, returning indexes is also an excellent solution in this particular case, because it forces the programmer to keep the underlying string around. And you probably can save memory by using a pair<uint32,uint32>.

Still, as several other people have pointed out, the truly idiomatic C++ solutions here are things like sub-ranges and string views. And these all use pointers internally. I deliberately chose to use an pair of pointers in place of these idiomatic types because it was easier to explain what was happening, not because I think C++ programmers should mess around with bare pointers.

What I like about the Rust version is that I can work directly with low-level types in high-performance code, and the compiler still has my back.

zyngaro · on Sept 20, 2014

If you want to force the programmer to keep string around, remove the 'const' from the function signature. The compiler will rigtly complain when you call the function with a r-value as argument and will only accept l-values. l-values and r-values are the C++ way for dealing with this kind of situation. C++ has also some means to prevent from doing weird stuff.

expr- · on Sept 20, 2014

I would say returning regular pointers is more C++-esque than your handmade range implementation. Pointers are, after all, a kind of iterator, which is an essential C++ concept. (std::string::iterator is a handful.)

(Furthermore, there is nothing "C" or wrong with regular pointers. Their only "flaw" is that they can't manage an object, but they're still the semantical way to refer to one.)

svalorzen · on Sept 20, 2014

Returning regular pointers which point to memory which is not managed by you and that you by default know could disappear at any time is not the C++ way of doing anything. Thing is, a class has a public interface for a reason, and if you as an outsider want to start playing pointer games you know from the start what you are in for.

I never implied that pointers are bad, just that this particular usage of pointers is bad, and as such should not be used as an example of why we would want something else over C++. If I ever saw a function with a signature like the one proposed I'd be extremely suspicious of what was going on.

Note that this would happen in the same way if the function returned pairs of iterators. It is a choice that depends on the documented and intended usage of a function. If you feel your tokenizer will be used on temporaries (I fail to see how that would ever be useful though) you could overload the function to also take universal references or change the return parameters so that nobody can get hurt. Otherwise you return whatever delimiters you are comfortable with, with the assumption that they will not be used in some way. All code can be broken if you actively try to, so it makes no sense to protect against everything.

repsilat · on Sept 20, 2014

> Returning regular pointers which point to memory which is not managed by you and that you by default know could disappear at any time is not the C++ way of doing anything. Thing is, a class has a public interface for a reason, and if you as an outsider want to start playing pointer games you know from the start what you are in for.

Maybe not "regular pointers", but returning iterators is very common (all throughout the STL, for one thing.) Those iterators are part of the public interface, and they provide no real guarantees over bare pointers.

I don't have a horse in this race, though. I think the "nicest" solution in abstract is for `tokenize` to be a token-generator* that takes a character generator argument. If you want to store the tokens yourself you can copy them, if you're happy to stream-process them then you can do that, and if you want a reference into the original string storage you can go fly a kite.

*: Where a "token" is probably another character generator.

pcwalton · on Sept 20, 2014

> All code can be broken if you actively try to, so it makes no sense to protect against everything.

That's not Rust's philosophy. Rust is memory-safe, period.

steveklabnik · on Sept 20, 2014

I guess technically 'if you try to' means also abusing unsafe, so in theory he's still sorta right. Every time I see someone ask a question about transmute on IRC I shiver a little.

But yes, Rust is way, way, way better in this regard.

seabee · on Sept 20, 2014

The difference is safe-by-default languages force you to turn off the safeties before they will let you blow your own foot off. C++ doesn't believe in safeties since (at the time the language was designed) it introduces an unacceptable delay between you wanting to pull the trigger and your target and/or feet blowing up.

darksaints · on Sept 20, 2014

> I never implied that pointers are bad, just that this particular usage of pointers is bad, and as such should not be used as an example of why we would want something else over C++. If I ever saw a function with a signature like the one proposed I'd be extremely suspicious of what was going on.

You would. I wouldn't because I don't know any better. How many others are there out there that don't know any better? If you code by yourself, fine, but if you work with a team you are gonna need a higher level of training, so you could teach people like me to catch the problem. Are you always going to be disciplined enough to do code reviews that are in depth enough to capture fishiness in the type signature?

With Rust, if it compiles and passes unit tests, the code review is already 90% done.

yohanatan · on Sept 20, 2014

Compilation and unit testing will not catch suboptimal expression of logic/thought (which in my experience count for way more than 10% of code reviews).

EpicEng · on Sept 20, 2014

>Their only "flaw" is that they can't manage an object, but they're still the semantical way to refer to one.)

Yes, that's their only flaw. A flaw which has caused countless bugs and will continue to do so in the future. This is why C++ provides so many utilities designed to keep you from using them in 99% of cases. But yeah, that's their only flaw.

kybernetyk · on Sept 20, 2014

> I would say returning regular pointers is more C++-esque

In C++98 maybe ...

dbaupp · on Sept 20, 2014

I would think that (in C++14 at least), std::string_view would be more appropriate than both?

http://en.cppreference.com/w/cpp/header/experimental/string_...

seabee · on Sept 20, 2014

Using string_view in this example doesn't solve the use-after-free bug.

If you are concerned about memory safety, the only appropriate choices are to copy the string and have the string_view objects share ownership, or to use the primitive index range idea and punt the problem back to the programmer.

dbaupp · on Sept 20, 2014

Yep, definitely, I was just responding to the question of which view-into-a-string type is the most appropriate.

robmccoll · on Sept 20, 2014

If you really wanted the pointers, you could also just make your own class that contains the pointer vector and a copy of the entire string which would be slightly less costly than new strings for each token.

akavel · on Sept 20, 2014

This would have advantage in case you only needed a few results; but in case you needed most of them, then substr would practically result in duplicating the memory anyway. Plus, this is sure more cumbersome than the Rust version.

bsaul · on Sept 20, 2014

Which makes me wonder :

1/ could you build the same unsafe behiavor in Rust if you wanted to by not specifying lifetime constraints ?

2/ If yes, shouldn't lifetime constraints be mandatory ?

dbaupp · on Sept 20, 2014

The compiler will not compile code it cannot verify as safe (outside an `unsafe` block), meaning it will complain about lifetime constraints that are missing, e.g. it is a compile-time error to not propagate the lifetime information out of the enum.

http://play.rust-lang.org/?run=1&code=enum%20Token%20{%0A%20...

nikic · on Sept 20, 2014

The lifetime constraint in the enum is mandatory, the code won't compile without it.

The tokenization function itself doesn't have to explicitly specify lifetimes like it does in the blog post. It would also work without them and Rust would still ensure that you don't end up with a use-after-free.

asuffield · on Sept 20, 2014

There's an obvious extension here for lifetime inference - the example given doesn't need to be an error, it could compile correctly by increasing the object lifetime to the outer block. I don't know offhand whether there is a universally correct inference algorithm for that (if every other language feature was static then unification would solve it easily, but the other language features are not static and I don't know how it would interact with rust type inference).

Drakim · on Sept 20, 2014

In general I prefer the compiler to yell at me rather than to secretly help me behind the scenes. Otherwise, I don't learn anything and I get burned later on when there is a situation where the safeguard can't help me.

asuffield · on Sept 20, 2014

This appears to be a general objection to inference algorithms in languages? Personally I find type inference to be immensely valuable.

rwmj · on Sept 20, 2014

OCaml recently added a bunch of "help" -- for example, if you reference a field in a struct, but the name of the field could be from one of several structs, then it will guess which struct you mean.

TBH I don't find this to be that useful -- it covers up potential mistakes, and of course means that code breaks when compiled with the older version of OCaml which didn't do this.

Edit: I should note that this doesn't break type safety.

zem · on Sept 20, 2014

I like clang's general approach, where if it can guess what you mean to do, it says "did you mean ..." in the error message, and carries on trying to compile the rest of the code on an "assume we did make this change" basis, but fails overall.

hamstergene · on Sept 20, 2014

If the object has destructor with side effects, I believe that would massively complicate reasoning about program's behavior for human who's reading it.

fanf2 · on Sept 21, 2014

That is essentially what region inference in the MLKit compiler does. http://www.elsman.com/mlkit/

It has some unexpected properties: it can sometimes produce dangling pointers because it knows the target data will never be used, even though a GC would treat it as live. And on the other hand it will often promote data to an excessively long-lived region (because region lifetimes have to be nested). So current versions of the MLKit use GC as well as region inference.

riffraff · on Sept 20, 2014

wouldn't extending the lifetime in this situations lead to subtle hard to trace memory leaks?

keeperofdakeys · on Sept 20, 2014

No. Currently a reference in a struct or enum has no inferred lifetime, and must be explicitly stated. Having it default to the lifetime of that containing struct or enum would simply mean you don't need to specify it. The danger is it could infer wrongly, leading to lifetime errors that might be cryptic.

But at the end of the day, the Rust compiler will never allow a reference to outlive the original object.

dbaupp · on Sept 20, 2014

You've misinterpreted, it's not a question about reducing the annotation in a struct/enum definition, but about postponing the destruction of the String so that the later references are valid, i.e. currently we have

  fn test_parse_unsafe() {
      let v = {
          let text = "The cat".to_string();
          tokenize_string3(text.as_slice())
      }; // `text` destroyed here
      assert_eq!(vec![Word("The"), Other(" "), Word("cat")], v);
  }

but the suggestion/question is about changing this to

  fn test_parse_unsafe() {
      let v = {
          let text = "The cat".to_string();
          tokenize_string3(text.as_slice())
      }; 
      assert_eq!(vec![Word("The"), Other(" "), Word("cat")], v);
  } // `text` destroyed here

so that the references in `v` are valid.

This could lead to "memory leaks", where a destructor is implicitly postponed to a higher scope, but I don't think it would be much of a problem in practice (the promotion would only be through simple scopes, not through loops, and maybe not through `if`s). In fact, there's an yet-to-be-implemented accepted RFC covering this[1] (there's no guarantee that it will be implemented though, just that the idea is mostly sound).

[1]: https://github.com/rust-lang/rfcs/blob/master/active/0031-be...

dllthomas · on Sept 21, 2014

I like this - you can make sure something isn't named in subsequent code while still allowing it to be used. Ideally there could be some mechanism to statically assert that it is destroyed by some particular point, though I'm not sure what that should look like (I guess that's just making the lifetime explicit?).

asuffield · on Sept 20, 2014

Ah yes, that RFC is roughly what I had in mind, thanks.

I believe that it's safe to promote through an if, although obviously not through a general loop.

dbaupp · on Sept 20, 2014

Yes, I agree that it should be safe, I've softened my original text. However, it would require dynamically tracking if the destructor needs to be run, and there's currently discussion[1] about Rust possibly moving to a static model, for the highest performance.

[1]: https://github.com/rust-lang/rfcs/pull/210

asuffield · on Sept 20, 2014

Consider this:

On a two-way if statement, then a given storage location is either set on zero, one or both branches. If it is set on neither branch then the if statement is irrelevant and can be ignored. If it is set on one branch, then either it had an original value and hence can be treated as being set on both branches, or it must be destroyed within the branch of the if (no null pointers - think about it until it is clear that the type system guarantees this). Hence we are only interested in cases which are isomorphic with the location being set on both branches.

We can treat this as a phi node following the if: there is one output value, which has been created in one of two different ways. In this case we don't know statically which value has been constructed, but we do know statically how and when to destroy it regardless of which one we get, because both branches have the same type and storage location. We don't actually need to know where it came from.

Any obvious problems? I think it works...

dbaupp · on Sept 20, 2014

It doesn't work, the &str could come from completely different types in the two branches.

I.e. one branch could be created by .as_slice() on a String, the other could be created by referencing a global, e.g.

  let s = if cond {
      let some_string = create_it();
      some_string.as_slice()
  } else {
      "literals are always-valid &str's"
  };

`some_string`s destructor should only be run if the first branch was taken.

asuffield · on Sept 20, 2014

I'm not sure that this particular example can ever use some_string outside the if without hitting a type error, but I see what you mean.

That seems like a reasonable case to raise a type error. That defines the cases quite neatly: if it's temporary on both branches then it can work, and if it has different lifetimes then the values can't be merged and should be rejected. If the programmer really meant for this to work then they need to copy the global, and copies should be written explicitly.

dbaupp · on Sept 20, 2014

There's no type error at all, we're talking about delaying the destruction of some_string so that the `s` (which is a &str) is valid outside the if. The string literal is a &str with a infinite lifetime, and so can of course be safely restricted to have the same lifetime as the other branch (done implicitly).

However, it's easily possible to have the &str come from temporaries of different types in the two branches. This would restrict the static destruction case to only working through an `if` when the "parent" values have exactly the same types; which doesnt seem nearly as valuable and possibly not worth the effort.

TheLoneWolfling · on Sept 20, 2014

Couldn't the compiler get around that by introducing a boolean variable that is set depending on the branch of the if statement taken, that it checks before running the destructor?

Although this starts getting really messy.

dbaupp · on Sept 20, 2014

Yes, that's exactly how it would be handled, but it's then dynamic destruction, not static.

TheLoneWolfling · on Sept 21, 2014

Can you elaborate? I fail to see why this is dynamic - it still determines at compile time if/when to run destructors. I was under the impression that dynamic destruction was when you determine when to run destructors at runtime. Garbage collectors or reference counting, in other words.

dbaupp · on Sept 21, 2014

It is dynamic because it is not known if the destructor call is executed at compile time. I'm using the terminology from RFC PR #210 that I linked above.

TheLoneWolfling · on Sept 22, 2014

Oh, ok.

On a related note, the approach I mentioned is actually mentioned in that RFC PR:

> Store drop-flags for fragments of state on stack out-of-band

dbaupp · on Sept 22, 2014

Yes, correct, that's what "dynamic destruction" is referring to.

asuffield · on Sept 20, 2014

The obviously "safe" thing to do is to push it out as far as the containing function, and no further. That would be sufficient for this scenario and probably all the ones where you would ever want this to happen.

enjoy-your-stay · on Sept 21, 2014

In C++, the best way to hand out pointers to anything where the creator may not necessarily be the last one referencing that object or chunk of RAM is to use reference counting, which would have solved the posters' problem.

It would mean that you would have to wrap the incoming string in a class, and probably add the tokenize_string method to that class. Then you would also have to wrap the results vector in a class that then addrefs the original string wrapper class.

But after that, handing out pointers to the contents of the string would be no problem as the results class would addref the string class and then release it when done ensuring that the string wrapper class remains alive as long as the results object has not gone out of scope.

Of course Rust's approach of alerting you when your code path causes dangling pointers is also interesting, but I wonder how that would work if you were to link against a static library that handed out references to internal objects like that - could the compiler see the scoping problem?

keeperofdakeys · on Sept 20, 2014

Just as an aside, the &str is not stored as two pointers, but a pointer and a length.

shmerl · on Sept 21, 2014

> The function get_input_string returns a temporary string, and tokenize_string2 builds an array of pointers into that string. Unfortunately, the temporary string only lives until the end of the current expression, and then the underlying memory is released. And so all our pointers in v now point into oblivion

So what stops you from returning a shared pointer in case of get_input_string? Then take over that ownership and use it. It's still a potential problem that v is logically disconnected from lifetime of that pointer, but at least you could avoid the problem you described.

overgard · on Sept 20, 2014

This seems like the kind of place where std::shared_ptr would really shine. The author's point on the danger of pointers is well taken, but some of the new pointer types get around a lot of these issues. You couldn't use it to point into the middle of the string, but if you paired it with some offsets you wouldn't have to worry about the ownership of the pointer anymore.

dbaupp · on Sept 20, 2014

shared_ptr doesn't fix this, as soon as you access the string via any standard type (char* or string_view) you can copy that type around freely, disconnected from the shared_ptr. Hence, you can drop all the shared_ptrs and leave the string view dangling.

pcwalton · on Sept 20, 2014

std::shared_ptr has (atomic!) reference counting overhead.

monocasa · on Sept 20, 2014

Not to mention the issues with reference counting in general. Having to make sure you don't have (even indirect) circular references means you have to explicitly think about ownership anyway.

GoGolli · on Sept 20, 2014

Rust is the best complicated language I have seen!!!!!!

linguafranca · on Sept 20, 2014

I'm hearing an awful lot about Rust on HN, even though afaict it still does't have a basic http package yet, limiting the main types of apps I would build with it. Maybe I'm in the minority, but perhaps we can slow down on Rust news until it's a little closer to usable?

Iftheshoefits · on Sept 20, 2014

I understand there are a lot of people on HN with "web goggles" and a severe case of "all development is web development" myopia, but seriously this is just over the top. I'm a "C++ guy" (that is, I like writing programs using C++ and probably always will, even if I use others from time to time), but I would never object to a language like Rust on the basis that it lacks an http package. HTTP is a high level communication protocol. It isn't the only such, certainly not the most efficient, and definitely not even the best. To bash on a language for lack of "native" support for http is just a bit ridiculous.

Jweb_Guru · on Sept 20, 2014

While I'm certainly not going to say Rust is ready for use in production or anything like that, I'll have to dispute your points here:

(1) It has at least three HTTP packages that I'm aware of (not finished ones, admittedly).

(2) While perhaps for you lacking an HTTP package is a dealbreaker, many of the areas Rust is targetting (such as embedded) don't require it at all.

(3) Why should people stop talking about a language because it hasn't been released yet, anyway? I'm not sure I follow the logic here.

wismer · on Sept 20, 2014

I like reading this stuff. Sure, I may not use Rust for any real application yet, but it's been teaching me a whole lot on different paradigms than I am used to. I think these articles are great because it pushes boundaries in understanding. And this is a science we are talking about - experimentation is a lot of fun!

jey · on Sept 20, 2014

Maybe there's just enough people interested in types of software that don't involve making HTTP requests. Programs that make REST requests is only one tributary within the taxonomy of software.

nicklaforge · on Sept 20, 2014

> slow down on Rust news until it's a little closer to usable?

"Usable" is an ironic way to spell "hackable".

You mentioned the words "news" and "use". Does the H in HN still stand for something? Maybe HN stands for "HTTP News"?

adamnemecek · on Sept 20, 2014

One way a new FOSS platform can become more usable is by attracting new contributors. If a new platform is talked about, it will attract new contributors. Ergo by being talked about, a new platform becomes more usable.

Yardlink · on Sept 20, 2014

Is there are reason this language exists? They're solving a problem that's been solved many times over for at least 2 decades in the form of managed languages.

mattdw · on Sept 20, 2014

Because Rust's "managed" benefits are compiled away, they have zero runtime impact. Meaning that there's no reason it couldn't be as fast as C/C++ for all workloads. So you get a safe language with bare-metal performance. (Bare metal like, people have written basic OS kernels in it.)

ludamad · on Sept 20, 2014

Actually, many people have been waiting long for such a language to come out, and are very excited about Rust for good reasons.

Managed languages solve a large set of problems, but introduce two big problems as I see it:

- Cause nondeterministic overhead which is particularly problematic for kernels, games, and software with heaps over, say, 10 gigabytes.

- Limit ability to safely pass objects outside their ecosystem. How many libraries written in managed code are worth using outside of the language they are written in?

pjmlp · on Sept 21, 2014

Actually it depends on what one understands by managed.

Strong typed languages with automatic memory management, having native compilers and used for writing OS, exist since Mesa/Cedar (70's at Xerox PARC)

Back when C was UNIX only, no one talked about C ABI. The only ABI that mattered was the OS ABI.

Amoeba, OS/400, Lisp Machines, Oberon are a few examples of having multiple languages interoperate via the OS ABI..

ludamad · on Sept 21, 2014

I was just alluding to the fact that garbage collection runtimes are quite less than ideal if another system calls into them sporadically. It just doesn't fit well for software libraries intended to be language-agnostic.

pjmlp · on Sept 21, 2014

I just jumped the gun, because many tend to think stronger typing that what C offers is a synonym for managed.

Regarding the interoperability, it really depends at what level OS, GC or RC services are available.

If just on the language runtime, yes it complicates the distribution of libraries. As it raises the issues how many copies of the runtime one gets and version compatibility issues.

If the OS offers the services, then any language on that specific OS, can enjoy exposing libraries that interoperate with GC or RC.

COM (now WinRT) is a possible example of such OS services.

wofo · on Sept 20, 2014

> They're solving a problem that's been solved many times over for at least 2 decades in the form of managed languages.

With the overhead of GC...

pjmlp · on Sept 21, 2014

Because strong type alternatives to C and C++, like Modula-2, Modula-3, Ada, Turbo Pascal, Delphi, ... faded away while UNIX conquered the enterprise.

Now to get back what was lost in terms of safety, new languages are required, as the old ones are considered legacy by the average Joe/Jane coder.