Hacker News new | past | comments | ask | show | jobs | submit login
Using {blocks} in Rust and Go for fun and profit (taylor.town)
124 points by surprisetalk on Jan 25, 2023 | hide | past | favorite | 61 comments



Well, yeah, you can do this in pretty much any language that has delineated scopes like that.

Of course, blocks-are-expressions is necessary if you want to return the result of your computations from the block and store it in a variable outside the block. (You can of course declare the storage variable outside the block and assign to it inside, but that's less nice.)

GNU C even has an extension called "statement expressions" where blocks can return values; the syntax looks like this:

    int foo = ({
        int bar = 4 * 3;
        bar;
    });
Clang implements it in addition to GCC, as do a few other compilers. (Notably, IIRC, MSVC does not.)


This is one of my favorite features of rust, especially since you can return a value from a block since the language is 'expression oriented'.


Yes, not having everything-is-expression in js hurts.


There's a dormant proposal about this: https://github.com/tc39/proposal-do-expressions


One caution on use of blocks is that at least for C++, you can end up having a large amount of stack usage. While destructors follow strict lexical scoping, stack allocations are only guaranteed to be released at the end of a (non-inline) function. The compiler can reuse memory allocated on the stack for multiple variables assuming the lifetimes don't overlap, but this isn't guaranteed. For example of when reuse doesn't help:

    void someProc() {
        char buffer1[256];

        {
            char buffer2[256];
            // string ops
        }

        // we still have 512 bytes on the stack whereas it would only be
        // 256 if we used a non-inline function instead of a block
        someDeepCall();
    }
I'm not sure what Rust or Go do in cases like this.


Rust does the same thing as C++ here. Note that rustc is pretty aggressive about using the LLVM lifetime intrinsics to allow for stack coloring (i.e. reuse of stack slots as necessary).


C# has the same issue for stack-allocated spans. Doing that in a loop is often a stack overflow.


A somewhat related technique which I often find useful is something that’s known as “immediately invoked function expressions” (IIFE) in JavaScript. That also creates a sub-scope in place, but it let’s you return values to the enclosing scope. E.g.:

   result := func() string {
       helperVar1 := //...
       helperVar2 := //...
       return helperVar1 + helperVar2
   }()
So it’s basically an anonymous function that is invoked right away. You could achieve the same scope separation by pulling it out as named function, but sometimes I like it better to keep things closer together.


This is really helpful in Go for `defer`. For example, if I'm manipulating files in a loop, I don't want to do:

    for _, fileName := range fileNames {
        f, err := os.Open(fileName)
        if err != nil {
            return err
        }
        defer f.Close()
        doSomething(f)
    }
... because I might run out of quota for open file handles. I want the defer to trigger at the end of the loop rather than at the end of the function, so I'll often put a closure in the loop body:

    for _, fileName := range fileNames {
        if err := func() error {
            f, err := os.Open(fileName)
            if err != nil {
                return err
            }
            defer f.Close()
            doSomething()
            return nil
        }(); err != nil {
            return err
        }
    }
That said, I don't like the ergonomics and if I'm doing a lot of file things, I'll write a `func withFile(fileName string, callback func(*os.File) error) error` function which often composes more nicely.


I use this for package globals especially:

   var pkgGlobal = func() string {
      ... 
   }()

Much better than:

   var pkgGlobal string

   func init() {
      pkgGlobal = ...
   }


Yet again the heavy initialism that looks like it came right out of a C++ standards document is just a long name for a simple thing.


Conceptually yes, but IIUC there should be worlds of difference at the compilation output level (i.e. unlike calling a function, the compiler's not obligated to set up or tear down a context every time it enters or exits a block; it can just treat the static scope semantics without having any impact on the runtime semantics).

ETA: except for go's `defer`, and off the top of my head I don't actually know if Go is obliged to run the defer immediately upon exiting the block or can choose to run it at some other point in the function.


In Go, `defer` runs at the end of the enclosing function, not the enclosing block.


> unlike calling a function, the compiler's not obligated to set up or tear down a context

I guess it depends on what you mean by "context" but the spec is very clear that a block creates scope, and the end removes scope.

https://go.dev/ref/spec#Declarations_and_scope

> The scope of a constant or variable identifier declared inside a function begins at the end of the ConstSpec or VarSpec (ShortVarDecl for short variable declarations) and ends at the end of the innermost containing block.


That shouldn’t be the case for c++ and Rust lambdas that are immediately invoked. The compiler should see through it.


Not necessarily for C++. At least for msbuild, you can use `__declspec(noinline)` on a lambda. This can be handy for complex macros that would otherwise allocate a bunch of memory.


I find this useful in unit tests where I often find myself duplicating blocks of assertions multiple times. The {block} helps to prevent accidentally reusing variables which are defined by previous blocks of assertions.

I rarely use a free-standing {block} in actual code. I think it's because if something is worthy enough to be logically grouped into a {block}, then it is probably worthwhile to pull it out into its own function, method or lambda expression.


I use blocks in rust in normal code as a way to control how long a lock is held. Not sure if that's the best practice


That makes sense, you can instead specifically Drop the lock guard but the block scope ending will do that.

If you aren't already it's perhaps better to identify an object that's being locked, which you can have the Mutex wrap. So e.g. you could have a Mutex<Goose>, and then functions, even methods can take a reference to a Goose to ensure you can't call them by mistake without locking the Mutex - as you otherwise don't have a Goose. If the Goose doesn't need any actual data this will be free at runtime, the compiler type check ensures you took the lock as needed but since it's a Zero Size Type no machine code is emitted to deal with a Goose variable.

Probably your application has a better name for what is being protected than Goose, that's just an example, but having some object that is locked can help ensure you get the locking right and that your mental model of what is being "locked" is coherent.

Of course sometimes there really is no specific thing being locked, even an imaginary one, it's just lock #4 or whatever but in my experience that's rare.


Other people in the comments have mentioned you can use functions to achieve some of the benefits of the {blocks}.

I wanted to point out how q/kdb handles this, because I think it's quite nice.

In kdb, blocks are how you define functions.

    {x+1};
That is a function which takes your argument, adds one to it and returns it. (x is the default name for the function argument, another lovely piece of design).

If you want to have a named function, just assign it to a variable:

    my_inc: {x+1};
Now you can call my_inc(1) and get back 2.

Light, consistent and reusable language design, very nice. No need to have two ways of defining functions (e.g. the needless separation of def and lambdas in Python).

The upshot of this is that you get these block constructs for free:

    myvar_a: 1;
    myvar_b: 2;
    
    my_top_level_var: {
      //less important work
    }[];
(The one downside is that you need to call the function with the [] brackets)


Wow, super cool design decision!

Can you give multiple named variables? How would you write a function like this?

    const f = (a,b,c) => { return a * b + c }
Also, is this just q/kdb, or does this also apply to K?


K4 is the implementation language of q, kdb is the database part of the language. Most q is just named utility functions written in K4. There isn't really much difference in what the machine does with them under the covers, but when stuck on a problem talking about q is more likely to get help than k.

You can provide up to 8 named inputs in a function definition.

q example for running sum of 8 inputs:

  q)f:{[a;b;c;d;e;f;g;h]sums a,b,c,d,e,f,g,h}
same in K4:

  q)\
  f:{[a;b;c;d;e;f;g;h]+\a,b,c,d,e,f,g,h}


The default variable names go all the way up to z. So you I would write your function as:

    f:{z+y*x}
(k is evaluated strictly right to left - another design decision that I quite like -, so I had to move the variables around in the expression)

If you wish, you can provide your own variable names as follows:

    f:{[a;b;c]c+b*a}
This works in k and q (q is largely k with some nice-to-have functions defined on top).


This feature is also there in C and C++ and probably other languages. The catch is that js seems to have inherited the syntax but not the scoping rules.


Has that not been fixed since ES2015?

  const a = 5;
  console.log(a)
  {
    const a = 10;
    console.log(a)
  }
  console.log(a)

  5 
  10 
  5
What we are missing from say Rust in js is the blocks being expressions, though there is a proposal ("do expressions") to allow this:

  const a = do {
    if (b) { 5 } else { 10 }
  };
https://github.com/tc39/proposal-do-expressions


Try it with var


s/var/let/g and you're good. This is the entire reason "let" was added in the first place, to correct this historical mistake. It's supported pretty much everywhere even by extremely conservative standards, and it's even the same number of letters. Literally no reason not to use it.

I personally dislike JavaScript. Even modern JavaScript. But this is a bullshit reason to hate on JavaScript.


Fortunately ES2015 was 7+ years ago, so no I won't. :)


Every few years I'll jump into JavaScript for one thing or another, and recently I jumped back into TypeScript and holy cow it's shaping up to be a really nice language. I really like `const` as well as TypeScript's concept of type widening/narrowing (`"foo"` is its own type as a specific string but can be widened to `string`), which allows the compiler to know that `document.createElement("table")` returns an `HTMLTableElement` rather than `HTMLElement`--this could have been avoided by having a `document.createTableElement()` method with its own signature, but given that it has to work with older, dynamically typed APIs, this is a pretty elegant solution.

Similarly, if I have a discriminant union `type FooBar = "foo" | "bar";`, TypeScript seems to know that `if (["foo", "bar"].includes(x)) {...}` exhaustively handles all permutations of `FooBar` (no need for a `switch` statement with explicit arms).

The static typing really helps me avoid a bunch of "undefined is not a function" stuff that I would waste time with in JavaScript.

Pretty cool stuff!


TypeScript's type language is extraordinarily powerful. It's completely reframed the way I do web development; I tend to do much more functional and less method-based semantics these days because interfaces and generic types make that feasible without going mad from losing track of what functions can be applied to what data (and receiving no help from the very lax type semantics and runtime of regular JavaScript).


It's in Java and D as well, which likely inherited it from C. I'm pretty sure C inherited it from ALGOL's "BEGIN ... END" blocks.

The catch with JS is that variable scoping rules are different when you use 'var', if you use 'let' and 'const' the scoping rules work like most programmers would expect for block statements.


`var` has weird scoping rules, but you can get the "right" behaviour in modern javascript, using `let`.


I do this quite a bit in C and C++; I find it's a great way to reduce the mental effort required to understand a long function without having to jump around the source like when it is broken up into multiple functions.

In C++ you can really (ab)use it to do things like scoped mutex locks and "stopwatches" that start a timer on construction and print the elapsed time on destruction.

Some people find it a bit bizarre though, to each his/her own I guess.


If your C++ codebase uses RAII to manage locks and other expensive resources, I think it's fine to use blocks to define their scope. The alternative is to encapsulate the scope in a new function or method, which is great if it improves readability, but not if it reduces readability.


In rust you can use these almost everywhere in place of an expression (if statements, while loops, in function arguments, etc). It can sometimes make code a bit unreadable to some people, but its still a cool feature imo.


Conveniently in Rust because a block is an expression it lets you "convert" a statement (or sequence thereof) into an expression.

To be used sparingly, but very useful when it applies e.g. in the precise capture clause pattern, or for non-trivial object initialisation (as Rust doesn't have many literals or literal-ish macros).


Here's one notable use that I doubt many are familiar with:

    let iter = some_iterator_expression;
    {iter}.nth(2)
For some reason, nth takes &mut self instead of self (note that T: Iterator => &mut T: Iterator<Item=T::Item> so there was no need for this in the API design to support having access to the iterator after exhaustion; it was a mistake). So if you tried to use it like iter.nth(2) with the non-mut binding, that would fail. But {iter} forces iter to be moved - unlike (iter) which would not. Then iter becomes a temporary and we're free to call a &mut self method on it.

In general: {expression} turns a place expression into a value expression, i.e. forcing the place to be moved from.


Interesting. I don't see any use for it (since adding `mut` to the `let iter` declaration sounds less annoying than dropping `iter` altogether) but I learned something nonetheless, thank you.


I love this feature. I use it most of the time in large functions where I can group some lines as one logical piece but not generic enough to be a function. It helps with reading the code back and prevents leaking vars into rest of the function scope. Such blocks end up becoming scattered single use functions when extracted out which isn't ideal.


Some of the error handling examples in Go are unnecessary as Go allows you to access variable defined in your if-statement in other branches. For instance:

    if result, err := something(); err == nil {
        if result.RowsAffected() == 0 {
            return nil
        }
    } else if err != nil {
        return err
    }


What's the point of the “if err != nil” after the “else”?


I guess nothing other to illustrate you can read from err (and result)


One of the most common usages I find is to restrict the scope of RAII locks such as Mutexes.


Functions do the same thing, but are more reaadable due to explicit naming.

It can be great though as an intermediate step to extracting functions.


These blocks have other advantages such as having access to the outer scope. Yes those could be passed in as arugments to a function, but I think there are many cases where that extra overhead feels unreasonable given you have already gathered all of the values here for this purpose anyway.


Yep. And they also keep single-use code local to the only place it's being used at the moment, which i think can help the readability and maintainability of said code :)

Functions do have a place too of course; even single-use ones. Especially when you can give them a clear purpose and name.


A sort of reverse or corollary to DRY, “don’t be unintentionally repeatable.”


It's more of an extension to "Don't use globals", but at a more fine-grained level. Code using this can still be DRY, if you understand DRY to be "don't repeat the same information excessively". Reusing the same variable name in multiple (similar or dissimilar) contexts is not a DRY violation in that sense if the names are more a coincidence than actual shared information.

Most programmers have no problem, for instance, with seeing multiple loops declared like:

  for(int i = ...; ...; ...) { ... }
`i` is "repeated", but there's no objection in having multiple declarations since each has a meaning dependent on its local context.


Functions fulfill the same function, but they move the code to a different place. Sometimes that's desirable for readability, sometimes it's more readable to keep things inline. It's nice to have both options


I like this for some things where the logic is somewhat large-ish, but also strongly coupled. A good example is setting up state for integration tests; I'm not going to re-use any functions I create for that. It's not bad to use functions, I just find it more convenient to keep it all in one function, but split out a little bit in these faux-subfunctions.

I don't use it often, but when I do, I find it convenient.


"A good example is setting up state for integration tests; I'm not going to re-use any functions I create for that."

And if it turns out I'm wrong about that, the braces provide a very nice, guaranteed cutting point in the future that anyone can understand without having to load all the context of the function.

This isn't the sort of thing I want splattered all over a code base, but it is a nice niche tool.


I’ve tried doing this a few times in a legacy codebase to contain the definition of `err` when it changes type (this team did not believe in the error interface for awhile).

The problem is that it doesn’t stand out visually, and it’s uncommon, so less-experienced team members would have difficulty comprehending what’s going on. In the end we just opted to use two different variables for the two types of error.


This is a thing in Lua as well, with do-end blocks:

    local foo
    do
        local bar = 42
        -- Same as `foo = function() ... end`, so this sets the local foo variable.
        function foo()
            return bar
        end
    end
Because Lua has no significant whitespace, it can be made to look like some kind of specific syntax:

    local foo do
        local bar = 42
        function foo()
            return bar
        end
    end
Though I think this is too clever, so I like to insert a semi-colon to make it clear what is happening:

    local foo; do
        local bar = 42
        function foo()
            return bar
        end
    end


I have thought about this but sometimes I reuse the variables on purpose to reuse the memory allocated, especially in a language like Golang where sometimes it is not that clear whether it will be allocated in the stack or the heap. I guess we should avoid doing this in hot paths, right?


Best use I've found for this is to make sure borrowed a Rc is dropped when you want it to be. Otherwise I've run into "this thing is already borrowed" errors when the runtime decided the thing should still be borrowed by something else.


Rust also supports (forward-jumping) GOTOs with blocks!

'asdf:

{

    do_tasks();

    if condition {

        break 'asdf

    }

    do_more_tasks();

    // ...
}

tasks_after_jump();


Blocks have been one of Ruby’s main language features and is heavily used, but I wonder, where did it originate from?


Ruby blocks are distinct from the blocks this article is talking about. They are equivalent to “closures” in most other languages. For Ruby, its blocks and their ubiquitous comes from Smalltalk.


I'm not sure if this is the best advise for rust, since you will play with variable lifetimes than.


In the context of Rust, using a block can only strictly reduce the lifetime of a local, so if that would lead to a problem it would just not compile. :P


This used to be standard practice in Rust before NLL (non-lexical lifetimes) were implemented. As a trivial example, here's some code that works on Rust today: https://play.rust-lang.org/?version=stable&mode=debug&editio...

However, try to compile this with Rust 1.0 (which you can get by running `rustup update 1.0.0` and then using `cargo +1.0.0 run` or `rustc +1.0.0`) and you'll get an error saying that you can't push another element onto the vector due to it already being borrowed by the slice. This is because the borrow checker previously assumed that any borrow would remain in use for the remainder of the scope. The "fix" to this was to manually put in a block to "end" the borrow early. However, a lot of work was done to allow the borrow checker to be more sophisticated, and in Rust 1.31 (near the end of 2018: https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-201...), the work allowing the compiler to recognize that a borrow was no longer used before the end of a scope and therefore would allow subsequent borrows that it previously would have considered conflicting.

All that being said, there's still a useful feature of blocks in Rust that I don't see mentioned in this blog post: blocks in Rust are actually expressions! By default, the last value in a Rust block will be yielded, but you can also use the `break` keyword earlier (similar to how the return value of a Rust function will be the last value, but you can also explicitly `return` earlier). As an added bonus, this also works for `loop` blocks; since they will only ever terminate if `break` is explicitly invoked (unlike `while` or `for`), whatever value is specified will be yielded from the loop.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: