Well, yeah, you can do this in pretty much any language that has delineated scopes like that.
Of course, blocks-are-expressions is necessary if you want to return the result of your computations from the block and store it in a variable outside the block. (You can of course declare the storage variable outside the block and assign to it inside, but that's less nice.)
GNU C even has an extension called "statement expressions" where blocks can return values; the syntax looks like this:
int foo = ({
int bar = 4 * 3;
bar;
});
Clang implements it in addition to GCC, as do a few other compilers. (Notably, IIRC, MSVC does not.)
One caution on use of blocks is that at least for C++, you can end up having a large amount of stack usage. While destructors follow strict lexical scoping, stack allocations are only guaranteed to be released at the end of a (non-inline) function. The compiler can reuse memory allocated on the stack for multiple variables assuming the lifetimes don't overlap, but this isn't guaranteed. For example of when reuse doesn't help:
void someProc() {
char buffer1[256];
{
char buffer2[256];
// string ops
}
// we still have 512 bytes on the stack whereas it would only be
// 256 if we used a non-inline function instead of a block
someDeepCall();
}
I'm not sure what Rust or Go do in cases like this.
Rust does the same thing as C++ here. Note that rustc is pretty aggressive about using the LLVM lifetime intrinsics to allow for stack coloring (i.e. reuse of stack slots as necessary).
A somewhat related technique which I often find useful is something that’s known as “immediately invoked function expressions” (IIFE) in JavaScript. That also creates a sub-scope in place, but it let’s you return values to the enclosing scope. E.g.:
So it’s basically an anonymous function that is invoked right away. You could achieve the same scope separation by pulling it out as named function, but sometimes I like it better to keep things closer together.
This is really helpful in Go for `defer`. For example, if I'm manipulating files in a loop, I don't want to do:
for _, fileName := range fileNames {
f, err := os.Open(fileName)
if err != nil {
return err
}
defer f.Close()
doSomething(f)
}
... because I might run out of quota for open file handles. I want the defer to trigger at the end of the loop rather than at the end of the function, so I'll often put a closure in the loop body:
for _, fileName := range fileNames {
if err := func() error {
f, err := os.Open(fileName)
if err != nil {
return err
}
defer f.Close()
doSomething()
return nil
}(); err != nil {
return err
}
}
That said, I don't like the ergonomics and if I'm doing a lot of file things, I'll write a `func withFile(fileName string, callback func(*os.File) error) error` function which often composes more nicely.
Conceptually yes, but IIUC there should be worlds of difference at the compilation output level (i.e. unlike calling a function, the compiler's not obligated to set up or tear down a context every time it enters or exits a block; it can just treat the static scope semantics without having any impact on the runtime semantics).
ETA: except for go's `defer`, and off the top of my head I don't actually know if Go is obliged to run the defer immediately upon exiting the block or can choose to run it at some other point in the function.
> The scope of a constant or variable identifier declared inside a function begins at the end of the ConstSpec or VarSpec (ShortVarDecl for short variable declarations) and ends at the end of the innermost containing block.
Not necessarily for C++. At least for msbuild, you can use `__declspec(noinline)` on a lambda. This can be handy for complex macros that would otherwise allocate a bunch of memory.
I find this useful in unit tests where I often find myself duplicating blocks of assertions multiple times. The {block} helps to prevent accidentally reusing variables which are defined by previous blocks of assertions.
I rarely use a free-standing {block} in actual code. I think it's because if something is worthy enough to be logically grouped into a {block}, then it is probably worthwhile to pull it out into its own function, method or lambda expression.
That makes sense, you can instead specifically Drop the lock guard but the block scope ending will do that.
If you aren't already it's perhaps better to identify an object that's being locked, which you can have the Mutex wrap. So e.g. you could have a Mutex<Goose>, and then functions, even methods can take a reference to a Goose to ensure you can't call them by mistake without locking the Mutex - as you otherwise don't have a Goose. If the Goose doesn't need any actual data this will be free at runtime, the compiler type check ensures you took the lock as needed but since it's a Zero Size Type no machine code is emitted to deal with a Goose variable.
Probably your application has a better name for what is being protected than Goose, that's just an example, but having some object that is locked can help ensure you get the locking right and that your mental model of what is being "locked" is coherent.
Of course sometimes there really is no specific thing being locked, even an imaginary one, it's just lock #4 or whatever but in my experience that's rare.
Other people in the comments have mentioned you can use functions to achieve some of the benefits of the {blocks}.
I wanted to point out how q/kdb handles this, because I think it's quite nice.
In kdb, blocks are how you define functions.
{x+1};
That is a function which takes your argument, adds one to it and returns it. (x is the default name for the function argument, another lovely piece of design).
If you want to have a named function, just assign it to a variable:
my_inc: {x+1};
Now you can call my_inc(1) and get back 2.
Light, consistent and reusable language design, very nice. No need to have two ways of defining functions (e.g. the needless separation of def and lambdas in Python).
The upshot of this is that you get these block constructs for free:
myvar_a: 1;
myvar_b: 2;
my_top_level_var: {
//less important work
}[];
(The one downside is that you need to call the function with the [] brackets)
K4 is the implementation language of q, kdb is the database part of the language. Most q is just named utility functions written in K4. There isn't really much difference in what the machine does with them under the covers, but when stuck on a problem talking about q is more likely to get help than k.
You can provide up to 8 named inputs in a function definition.
This feature is also there in C and C++ and probably other languages. The catch is that js seems to have inherited the syntax but not the scoping rules.
s/var/let/g and you're good. This is the entire reason "let" was added in the first place, to correct this historical mistake. It's supported pretty much everywhere even by extremely conservative standards, and it's even the same number of letters. Literally no reason not to use it.
I personally dislike JavaScript. Even modern JavaScript. But this is a bullshit reason to hate on JavaScript.
Every few years I'll jump into JavaScript for one thing or another, and recently I jumped back into TypeScript and holy cow it's shaping up to be a really nice language. I really like `const` as well as TypeScript's concept of type widening/narrowing (`"foo"` is its own type as a specific string but can be widened to `string`), which allows the compiler to know that `document.createElement("table")` returns an `HTMLTableElement` rather than `HTMLElement`--this could have been avoided by having a `document.createTableElement()` method with its own signature, but given that it has to work with older, dynamically typed APIs, this is a pretty elegant solution.
Similarly, if I have a discriminant union `type FooBar = "foo" | "bar";`, TypeScript seems to know that `if (["foo", "bar"].includes(x)) {...}` exhaustively handles all permutations of `FooBar` (no need for a `switch` statement with explicit arms).
The static typing really helps me avoid a bunch of "undefined is not a function" stuff that I would waste time with in JavaScript.
TypeScript's type language is extraordinarily powerful. It's completely reframed the way I do web development; I tend to do much more functional and less method-based semantics these days because interfaces and generic types make that feasible without going mad from losing track of what functions can be applied to what data (and receiving no help from the very lax type semantics and runtime of regular JavaScript).
It's in Java and D as well, which likely inherited it from C. I'm pretty sure C inherited it from ALGOL's "BEGIN ... END" blocks.
The catch with JS is that variable scoping rules are different when you use 'var', if you use 'let' and 'const' the scoping rules work like most programmers would expect for block statements.
I do this quite a bit in C and C++; I find it's a great way to reduce the mental effort required to understand a long function without having to jump around the source like when it is broken up into multiple functions.
In C++ you can really (ab)use it to do things like scoped mutex locks and "stopwatches" that start a timer on construction and print the elapsed time on destruction.
Some people find it a bit bizarre though, to each his/her own I guess.
If your C++ codebase uses RAII to manage locks and other expensive resources, I think it's fine to use blocks to define their scope. The alternative is to encapsulate the scope in a new function or method, which is great if it improves readability, but not if it reduces readability.
In rust you can use these almost everywhere in place of an expression (if statements, while loops, in function arguments, etc). It can sometimes make code a bit unreadable to some people, but its still a cool feature imo.
Conveniently in Rust because a block is an expression it lets you "convert" a statement (or sequence thereof) into an expression.
To be used sparingly, but very useful when it applies e.g. in the precise capture clause pattern, or for non-trivial object initialisation (as Rust doesn't have many literals or literal-ish macros).
Here's one notable use that I doubt many are familiar with:
let iter = some_iterator_expression;
{iter}.nth(2)
For some reason, nth takes &mut self instead of self (note that T: Iterator => &mut T: Iterator<Item=T::Item> so there was no need for this in the API design to support having access to the iterator after exhaustion; it was a mistake). So if you tried to use it like iter.nth(2) with the non-mut binding, that would fail. But {iter} forces iter to be moved - unlike (iter) which would not. Then iter becomes a temporary and we're free to call a &mut self method on it.
In general: {expression} turns a place expression into a value expression, i.e. forcing the place to be moved from.
Interesting. I don't see any use for it (since adding `mut` to the `let iter` declaration sounds less annoying than dropping `iter` altogether) but I learned something nonetheless, thank you.
I love this feature. I use it most of the time in large functions where I can group some lines as one logical piece but not generic enough to be a function. It helps with reading the code back and prevents leaking vars into rest of the function scope. Such blocks end up becoming scattered single use functions when extracted out which isn't ideal.
Some of the error handling examples in Go are unnecessary as Go allows you to access variable defined in your if-statement in other branches. For instance:
if result, err := something(); err == nil {
if result.RowsAffected() == 0 {
return nil
}
} else if err != nil {
return err
}
These blocks have other advantages such as having access to the outer scope. Yes those could be passed in as arugments to a function, but I think there are many cases where that extra overhead feels unreasonable given you have already gathered all of the values here for this purpose anyway.
Yep. And they also keep single-use code local to the only place it's being used at the moment, which i think can help the readability and maintainability of said code :)
Functions do have a place too of course; even single-use ones. Especially when you can give them a clear purpose and name.
It's more of an extension to "Don't use globals", but at a more fine-grained level. Code using this can still be DRY, if you understand DRY to be "don't repeat the same information excessively". Reusing the same variable name in multiple (similar or dissimilar) contexts is not a DRY violation in that sense if the names are more a coincidence than actual shared information.
Most programmers have no problem, for instance, with seeing multiple loops declared like:
for(int i = ...; ...; ...) { ... }
`i` is "repeated", but there's no objection in having multiple declarations since each has a meaning dependent on its local context.
Functions fulfill the same function, but they move the code to a different place. Sometimes that's desirable for readability, sometimes it's more readable to keep things inline. It's nice to have both options
I like this for some things where the logic is somewhat large-ish, but also strongly coupled. A good example is setting up state for integration tests; I'm not going to re-use any functions I create for that. It's not bad to use functions, I just find it more convenient to keep it all in one function, but split out a little bit in these faux-subfunctions.
I don't use it often, but when I do, I find it convenient.
"A good example is setting up state for integration tests; I'm not going to re-use any functions I create for that."
And if it turns out I'm wrong about that, the braces provide a very nice, guaranteed cutting point in the future that anyone can understand without having to load all the context of the function.
This isn't the sort of thing I want splattered all over a code base, but it is a nice niche tool.
I’ve tried doing this a few times in a legacy codebase to contain the definition of `err` when it changes type (this team did not believe in the error interface for awhile).
The problem is that it doesn’t stand out visually, and it’s uncommon, so less-experienced team members would have difficulty comprehending what’s going on. In the end we just opted to use two different variables for the two types of error.
I have thought about this but sometimes I reuse the variables on purpose to reuse the memory allocated, especially in a language like Golang where sometimes it is not that clear whether it will be allocated in the stack or the heap. I guess we should avoid doing this in hot paths, right?
Best use I've found for this is to make sure borrowed a Rc is dropped when you want it to be. Otherwise I've run into "this thing is already borrowed" errors when the runtime decided the thing should still be borrowed by something else.
Ruby blocks are distinct from the blocks this article is talking about. They are equivalent to “closures” in most other languages. For Ruby, its blocks and their ubiquitous comes from Smalltalk.
In the context of Rust, using a block can only strictly reduce the lifetime of a local, so if that would lead to a problem it would just not compile. :P
However, try to compile this with Rust 1.0 (which you can get by running `rustup update 1.0.0` and then using `cargo +1.0.0 run` or `rustc +1.0.0`) and you'll get an error saying that you can't push another element onto the vector due to it already being borrowed by the slice. This is because the borrow checker previously assumed that any borrow would remain in use for the remainder of the scope. The "fix" to this was to manually put in a block to "end" the borrow early. However, a lot of work was done to allow the borrow checker to be more sophisticated, and in Rust 1.31 (near the end of 2018: https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-201...), the work allowing the compiler to recognize that a borrow was no longer used before the end of a scope and therefore would allow subsequent borrows that it previously would have considered conflicting.
All that being said, there's still a useful feature of blocks in Rust that I don't see mentioned in this blog post: blocks in Rust are actually expressions! By default, the last value in a Rust block will be yielded, but you can also use the `break` keyword earlier (similar to how the return value of a Rust function will be the last value, but you can also explicitly `return` earlier). As an added bonus, this also works for `loop` blocks; since they will only ever terminate if `break` is explicitly invoked (unlike `while` or `for`), whatever value is specified will be yielded from the loop.
Of course, blocks-are-expressions is necessary if you want to return the result of your computations from the block and store it in a variable outside the block. (You can of course declare the storage variable outside the block and assign to it inside, but that's less nice.)
GNU C even has an extension called "statement expressions" where blocks can return values; the syntax looks like this:
Clang implements it in addition to GCC, as do a few other compilers. (Notably, IIRC, MSVC does not.)