
Rust lifetimes: Getting away with things that would be reckless in C++ - dbaupp
http://www.randomhacks.net/2014/09/19/rust-lifetimes-reckless-cxx/
======
missblit
In C++ if the string is a rvalue reference you could std::move it into part of
the return value. Think a signature like

    
    
        template<typename T>
        std::pair<std::string, std::vector<std::string_view>>
        tokenize_string(T &&str);
    

This would be efficient when the user passes a temporary, and it would be
safe.

Which isn't to say the Rust solution isn't totally cool. Being able to easily
check this class of errors at compile time is probably a lot nicer than
needing to learn all the relatively complicated parts that would go into a
easy to use / safe /efficient / still slightly weird C++ solution.

~~~
eridius
Sure, and then somewhere along the way you throw away the first element of the
pair, because you're not using it, but you're still using the views, and oops
you just reintroduced the bug you tried to fix.

Which is to say, yes, you can obviously write C++ code that _works_. But you
run the risk that one tiny mistake, or a change weeks, months, or years later,
causes memory issues. Being able to completely rule out this class of error at
compile-time is really amazingly useful.

~~~
ajross
That's a forest for the trees argument. The ability to introduce bugs ("make a
tiny mistake") in future changes is an _inherent property of software_ , and
you can't fix this in the general case. Rust just fixes this for the case of
free-memory-read bugs. That has value, but it's a much more limited scope than
you're implying.

Really I think this is the biggest problem with Rust. The stuff broken about
C++ isn't really its memory model, it's simply its complexity. And every time
I look at what's happening in Rust I see _the same kind of complexity being
introduced at every turn_. I mean seriously: "named lifetime parameters"? And
the syntax is a bare single quote? That's a joke, right? No one but Rust nerds
is going to grok this, and real code needs to me maintained by mediocrities.

Frankly Go seems to have this "don't introduce new nonsense" problem under
much better control, so if I need to pick a horse in the race, it's that one,
not Rust.

~~~
eridius
This comment is a joke, right? I find it hard to believe that any programmer
would say "the compiler prevents me from writing broken code? That's too
complex!"

Yes, a proper typing system involves more visible moving parts than a broken
typing system. But a proper typing system is vastly more powerful. C++ may not
have lifetimes, but every time I touch C++ I have to be really careful that I
don't accidentally introduce memory or threading bugs, because they're both
_really easy_ to do. Someone who's programmed C++ for a long time might not
even realize it, but then again, the vast majority of C++ code in this world
almost certainly contains numerous memory and threading bugs.

And why the comparison to Go? Go doesn't live in the same world as Rust. It's
not a systems programming language. At this point it's mostly a glorified web
server application language. And if you're going to pick Go because of the
lack of memory lifetime issues, you may as well pick any other garbage-
collected language out there.

~~~
ajross
Again, forest for the trees. Your laser focus on one particular type of bug
blinds you to the fact that complex systems lead to bugs in general. I'm not
interested in defending C++ (or Go, or anything else). I'm saying I see Rust
falling down the same "cool new ideas" rabbit hole as C++, and Haskell, and
Erlang, and Common Lisp, and... Of those languages only one has achieved any
notable success.

And btw: the idea of calling something a "systems programming language" that
can't even make a system call is laughable. Can I mmap a buffer to pass to a
hardware video codec or make an ioctl to an audio driver in rust? Guess which
language I _can_ do that in? If "Go doesn't live in the same world as Rust",
then Rust isn't in the same galaxy as C/C++.

~~~
pcwalton
I don't know where you got the idea that you can't mmap a buffer or call an
ioctl in Rust. You certainly can do those things.

~~~
ajross
I got the idea from the docs, honestly, which don't talk about system
interfacing at all. Though I do now see the "unsafe" page, which at least has
assembly hooks.

Serious question though: has anyone ever done this? I mean, are there kernel
header import utilities I can use to get the flag definitions and parameter
structs, etc...? You can sneer and downvote me all you want, but it seems that
the clear truth is that Rust has _not_ been used as a "systems programming
language" in the only regime where IMHO that term has any meaning.

Basically: drilling a hole through the compiler to the assembler (which I'll
state again is done in Rust as yet more complicated syntax users need to learn
and not as, say, a native library interface) is a valuable tool, but it does
not a syscall interface make.

~~~
pcwalton
There is rust-bindgen, which you can use to convert C headers to Rust FFI
declarations. This should be able to convert the kernel userland headers to
Rust declarations, so that you can call directly to the kernel.

(I didn't downvote you.)

------
svalorzen
Or, you know, instead of returning two C pointers which in modern C++ makes no
sense, return a vector of `std::pair<size_t,size_t>` with position and length
of each substring, and if needed use `std::string::substr` to extract the
parts you need.

~~~
expr-
I would say returning regular pointers is more C++-esque than your handmade
range implementation. Pointers are, after all, a kind of iterator, which is an
essential C++ concept. (std::string::iterator is a handful.)

(Furthermore, there is nothing "C" or wrong with regular pointers. Their only
"flaw" is that they can't manage an object, but they're still _the_ semantical
way to refer to one.)

~~~
svalorzen
Returning regular pointers which point to memory which is not managed by you
and that you by default know could disappear at any time is not the C++ way of
doing anything. Thing is, a class has a public interface for a reason, and if
you as an outsider want to start playing pointer games you know from the start
what you are in for.

I never implied that pointers are bad, just that this particular usage of
pointers is bad, and as such should not be used as an example of why we would
want something else over C++. If I ever saw a function with a signature like
the one proposed I'd be extremely suspicious of what was going on.

Note that this would happen in the same way if the function returned pairs of
iterators. It is a choice that depends on the documented and intended usage of
a function. If you feel your tokenizer will be used on temporaries (I fail to
see how that would ever be useful though) you could overload the function to
also take universal references or change the return parameters so that nobody
can get hurt. Otherwise you return whatever delimiters you are comfortable
with, with the assumption that they will not be used in some way. All code can
be broken if you actively try to, so it makes no sense to protect against
everything.

~~~
pcwalton
> All code can be broken if you actively try to, so it makes no sense to
> protect against everything.

That's not Rust's philosophy. Rust is memory-safe, period.

~~~
steveklabnik
I guess technically 'if you try to' means also abusing unsafe, so in theory
he's still sorta right. Every time I see someone ask a question about
transmute on IRC I shiver a little.

But yes, Rust is way, way, way better in this regard.

~~~
seabee
The difference is safe-by-default languages force you to turn off the safeties
before they will let you blow your own foot off. C++ doesn't believe in
safeties since (at the time the language was designed) it introduces an
unacceptable delay between you wanting to pull the trigger and your target
and/or feet blowing up.

------
bsaul
Which makes me wonder :

1/ could you build the same unsafe behiavor in Rust if you wanted to by not
specifying lifetime constraints ?

2/ If yes, shouldn't lifetime constraints be mandatory ?

~~~
dbaupp
The compiler will not compile code it cannot verify as safe (outside an
`unsafe` block), meaning it will complain about lifetime constraints that are
missing, e.g. it is a compile-time error to not propagate the lifetime
information out of the enum.

[http://play.rust-
lang.org/?run=1&code=enum%20Token%20{%0A%20...](http://play.rust-
lang.org/?run=1&code=enum%20Token%20{%0A%20%20%20%20Word%28%26str%29%2C%0A%20%20%20%20Other%28%26str%29%0A}%0A%0Afn%20main%28%29%20{})

------
asuffield
There's an obvious extension here for lifetime inference - the example given
doesn't need to be an error, it could compile correctly by increasing the
object lifetime to the outer block. I don't know offhand whether there is a
universally correct inference algorithm for that (if every other language
feature was static then unification would solve it easily, but the other
language features are not static and I don't know how it would interact with
rust type inference).

~~~
riffraff
wouldn't extending the lifetime in this situations lead to subtle hard to
trace memory leaks?

~~~
keeperofdakeys
No. Currently a reference in a struct or enum has no inferred lifetime, and
must be explicitly stated. Having it default to the lifetime of that
containing struct or enum would simply mean you don't need to specify it. The
danger is it could infer wrongly, leading to lifetime errors that might be
cryptic.

But at the end of the day, the Rust compiler will never allow a reference to
outlive the original object.

~~~
dbaupp
You've misinterpreted, it's not a question about reducing the annotation in a
struct/enum definition, but about postponing the destruction of the String so
that the later references are valid, i.e. currently we have

    
    
      fn test_parse_unsafe() {
          let v = {
              let text = "The cat".to_string();
              tokenize_string3(text.as_slice())
          }; // `text` destroyed here
          assert_eq!(vec![Word("The"), Other(" "), Word("cat")], v);
      }
    

but the suggestion/question is about changing this to

    
    
      fn test_parse_unsafe() {
          let v = {
              let text = "The cat".to_string();
              tokenize_string3(text.as_slice())
          }; 
          assert_eq!(vec![Word("The"), Other(" "), Word("cat")], v);
      } // `text` destroyed here
    

so that the references in `v` are valid.

This could lead to "memory leaks", where a destructor is implicitly postponed
to a higher scope, but I don't think it would be much of a problem in practice
(the promotion would only be through simple scopes, not through loops, and
maybe not through `if`s). In fact, there's an yet-to-be-implemented accepted
RFC covering this[1] (there's no guarantee that it will be implemented though,
just that the idea is mostly sound).

[1]: [https://github.com/rust-
lang/rfcs/blob/master/active/0031-be...](https://github.com/rust-
lang/rfcs/blob/master/active/0031-better-temporary-lifetimes.md)

~~~
asuffield
Ah yes, that RFC is roughly what I had in mind, thanks.

I believe that it's safe to promote through an if, although obviously not
through a general loop.

~~~
dbaupp
Yes, I agree that it should be safe, I've softened my original text. However,
it would require dynamically tracking if the destructor needs to be run, and
there's currently discussion[1] about Rust possibly moving to a static model,
for the highest performance.

[1]: [https://github.com/rust-lang/rfcs/pull/210](https://github.com/rust-
lang/rfcs/pull/210)

~~~
asuffield
Consider this:

On a two-way if statement, then a given storage location is either set on
zero, one or both branches. If it is set on neither branch then the if
statement is irrelevant and can be ignored. If it is set on one branch, then
either it had an original value and hence can be treated as being set on both
branches, or it must be destroyed within the branch of the if (no null
pointers - think about it until it is clear that the type system guarantees
this). Hence we are only interested in cases which are isomorphic with the
location being set on both branches.

We can treat this as a phi node following the if: there is one output value,
which has been created in one of two different ways. In this case we don't
know statically which value has been constructed, but we do know statically
how and when to destroy it regardless of which one we get, because both
branches have the same type and storage location. We don't actually need to
know where it came from.

Any obvious problems? I think it works...

~~~
dbaupp
It doesn't work, the &str could come from completely different types in the
two branches.

I.e. one branch could be created by .as_slice() on a String, the other could
be created by referencing a global, e.g.

    
    
      let s = if cond {
          let some_string = create_it();
          some_string.as_slice()
      } else {
          "literals are always-valid &str's"
      };
    

`some_string`s destructor should only be run if the first branch was taken.

~~~
asuffield
I'm not sure that this particular example can ever use some_string outside the
if without hitting a type error, but I see what you mean.

That seems like a reasonable case to raise a type error. That defines the
cases quite neatly: if it's temporary on both branches then it can work, and
if it has different lifetimes then the values can't be merged and should be
rejected. If the programmer really meant for this to work then they need to
copy the global, and copies should be written explicitly.

~~~
dbaupp
There's no type error at all, we're talking about delaying the destruction of
some_string so that the `s` (which is a &str) is valid outside the if. The
string literal is a &str with a infinite lifetime, and so can of course be
safely restricted to have the same lifetime as the other branch (done
implicitly).

However, it's easily possible to have the &str come from temporaries of
different types in the two branches. This would restrict the static
destruction case to only working through an `if` when the "parent" values have
exactly the same types; which doesnt seem nearly as valuable and possibly not
worth the effort.

------
enjoy-your-stay
In C++, the best way to hand out pointers to anything where the creator may
not necessarily be the last one referencing that object or chunk of RAM is to
use reference counting, which would have solved the posters' problem.

It would mean that you would have to wrap the incoming string in a class, and
probably add the tokenize_string method to that class. Then you would also
have to wrap the results vector in a class that then addrefs the original
string wrapper class.

But after that, handing out pointers to the contents of the string would be no
problem as the results class would addref the string class and then release it
when done ensuring that the string wrapper class remains alive as long as the
results object has not gone out of scope.

Of course Rust's approach of alerting you when your code path causes dangling
pointers is also interesting, but I wonder how that would work if you were to
link against a static library that handed out references to internal objects
like that - could the compiler see the scoping problem?

------
keeperofdakeys
Just as an aside, the &str is not stored as two pointers, but a pointer and a
length.

------
shmerl
_> The function get_input_string returns a temporary string, and
tokenize_string2 builds an array of pointers into that string. Unfortunately,
the temporary string only lives until the end of the current expression, and
then the underlying memory is released. And so all our pointers in v now point
into oblivion_

So what stops you from returning a shared pointer in case of get_input_string?
Then take over that ownership and use it. It's still a potential problem that
v is logically disconnected from lifetime of that pointer, but at least you
could avoid the problem you described.

------
overgard
This seems like the kind of place where std::shared_ptr would really shine.
The author's point on the danger of pointers is well taken, but some of the
new pointer types get around a lot of these issues. You couldn't use it to
point into the middle of the string, but if you paired it with some offsets
you wouldn't have to worry about the ownership of the pointer anymore.

~~~
pcwalton
std::shared_ptr has (atomic!) reference counting overhead.

~~~
monocasa
Not to mention the issues with reference counting in general. Having to make
sure you don't have (even indirect) circular references means you have to
explicitly think about ownership anyway.

------
GoGolli
Rust is the best complicated language I have seen!!!!!!

------
linguafranca
I'm hearing an awful lot about Rust on HN, even though afaict it still does't
have a basic http package yet, limiting the main types of apps I would build
with it. Maybe I'm in the minority, but perhaps we can slow down on Rust news
until it's a little closer to usable?

~~~
Iftheshoefits
I understand there are a lot of people on HN with "web goggles" and a severe
case of "all development is web development" myopia, but seriously this is
just over the top. I'm a "C++ guy" (that is, I like writing programs using C++
and probably always will, even if I use others from time to time), but I would
never object to a language like Rust on the basis that it lacks an http
package. HTTP is a high level communication protocol. It isn't the only such,
certainly not the most efficient, and definitely not even the best. To bash on
a language for lack of "native" support for http is just a bit ridiculous.

------
Yardlink
Is there are reason this language exists? They're solving a problem that's
been solved many times over for at least 2 decades in the form of managed
languages.

~~~
ludamad
Actually, many people have been waiting long for such a language to come out,
and are very excited about Rust for good reasons.

Managed languages solve a large set of problems, but introduce two big
problems as I see it:

\- Cause nondeterministic overhead which is particularly problematic for
kernels, games, and software with heaps over, say, 10 gigabytes.

\- Limit ability to safely pass objects outside their ecosystem. How many
libraries written in managed code are worth using outside of the language they
are written in?

~~~
pjmlp
Actually it depends on what one understands by managed.

Strong typed languages with automatic memory management, having native
compilers and used for writing OS, exist since Mesa/Cedar (70's at Xerox PARC)

Back when C was UNIX only, no one talked about C ABI. The only ABI that
mattered was the OS ABI.

Amoeba, OS/400, Lisp Machines, Oberon are a few examples of having multiple
languages interoperate via the OS ABI..

~~~
ludamad
I was just alluding to the fact that garbage collection runtimes are quite
less than ideal if another system calls into them sporadically. It just
doesn't fit well for software libraries intended to be language-agnostic.

~~~
pjmlp
I just jumped the gun, because many tend to think stronger typing that what C
offers is a synonym for managed.

Regarding the interoperability, it really depends at what level OS, GC or RC
services are available.

If just on the language runtime, yes it complicates the distribution of
libraries. As it raises the issues how many copies of the runtime one gets and
version compatibility issues.

If the OS offers the services, then any language on that specific OS, can
enjoy exposing libraries that interoperate with GC or RC.

COM (now WinRT) is a possible example of such OS services.

