About the only time that you have to be really careful with stuff like this is if you're writing something that's super sensitive to GC pauses, such as an XNA game. In those cases, yield return, linq, lambdas, some foreach loops will all generate short lived objects that will cause the GC to kick in more often. So if you're doing that in every update loop you could end up with performance issues. And even that is only on some platforms, as the desktop GC does a great job of dealing with short term garbage.
But in most cases such as ASP.NET MVC actions, or doing an API call in a desktop app, the maintenance and high level code simplification you can get by using these techniques far outweighs the low level "cost" of the code generated by the compilers.
It's good to know what's going on under the covers though.
The C# compiler one-ups F# here, and will cache the delegate for lambdas, so long as they don't capture any locals (that is if it's "lifted") - so lambdas don't necessarily mean an extra object. (Although the LINQ methods need enumerables and enumerators.)
When did they introduce caching for lambdas? I've been caching them by hand since I used to see them pop up in CLR Profiler all the time. Is it unable to cache lambdas constructed in member functions because it can't be sure they don't close over 'this'?
It's been years, at least - maybe from the beginning? I just tried a small program with a lambda inside a member function. Works fine, generates IL like:
IL_0001: ldarg.0
IL_0002: ldsfld class [mscorlib]System.Func`2<int32,int32> test.Program::'CS$<>9__CachedAnonymousMethodDelegate1'
IL_0007: brtrue.s IL_001c
// create and store delegate
IL_001c: Load delegate from field and call
F# will do neat stuff like completely eliminate the lambda, if you're only using it locally. It even does some constant detection across functions, so it can calculate constant functions at compile time. But if not, F# will create a new closure object every time.
Reminds me of an experience I had. I kind of naively wrote something as a method that would "yield return" bytes. After all, everyone is familiar with that attitude so many people have, that you write what looks nicest and worry about bottlenecks later. I'm personally not usually too big on that attitude (I think it's often an overused excuse for obviously bad code) but "yield return" does let you write some very natural-looking stuff.
So later I did compare it to a very simple for loop operating against a byte[], and had both versions work against a 10MB buffer (not unrealistic input in my use case). The throughput of the "yield return" code was something like 5 times less. I didn't try too hard to track down the precise cause of that at the time (maybe it was GC from lots of temporaries as you say? I was pretty CPU bound, and thinking it could have had more to do with a more straightforward loop generating code with fewer jumps after JIT), I just took the faster version.
A for loop will have bounds checking removed, for one. Also, the JIT does better inside single method bodies. Compare:
1: for(i = 0; i < arr.Length; i++) sum += arr[i];
2: foreach(var x in generate(count)) sum += x;
In the first case, you end up with a small, relatively tight loop (the machine code has a lot of extra stuff I don't quite understand).
In the second case, you're literally doing a virtual call (and I don't think the CLR inlines interface calls) to get_Current() in a loop, followed by MoveNext(). So there's 2 function call overheads, not to mention the actual code that get_Current and MoveNext have.
Here's the sample program[1]. I get about 600% runtime for the enumerable version versus the array. Given all the extra work, I'm sorta impressed it's only 6x. The 32-bit JIT is a bit slower doing the array method than the 64-bit, which surprises me. Here's the code for the loops[2].
A "yield return enumerable" is "delay executed" so evaluating it multiple times causes the yield-return execution to occur at every evaluation (if you haven't taken care to ToArray() or ToList() it). Do something like:
var myBytes = GetMyBytesItr();
for (var i = 0; i < myBytes.Count(); ++i)
{
ProcessByte(myBytes.ElementAt(i));
}
and you'll be in a world of hurt especially if GetMyBytesItr() allocates a memory buffer. Count() causes the "get the bytes data" action to occur, as does every iteration of ElementAt(). Now I'm not saying this is definitely what you were experiencing, but it's a common pitfall. Also, using ElementAt() for each iteration is, of course, completely contrived for this example (you'd want to foreach instead which would cause only one execution of the "get the bytes data" action).
That wasn't the issue in my case. It was foreach (var b in f()). f did not do any allocations, just a loop with a bunch of yield return statements.
This is why I suspected that it was simply worse machine code after JIT. But I wasn't sure of all the details of the code that was inserted on my behalf.
The compiler transforms that into a normal for-loop that accesses the array directly. So that's already an advantage for the array over your enumerator.
The other difference that will have a major performance impact is the way the IEnumerator pattern works, even with generics. For:
foreach (var b in f())
The generated code looks roughly like this:
using (var enumerator = f())
while (enumerator.MoveNext()) {
var byte = enumerator.get_Current();
}
As a result, you've gone from 0 method calls per iteration (direct array access) to 2 virtual method calls per iteration (MoveNext and get_Current). An incredibly smart JIT might be able to figure out that the virtual methods are always the same and turn them into static invocations, or even inline them, but I don't think the CLR can do this for you reliably.
The ILSpy team's hard work is part of what made it possible for me to write my .NET -> JS compiler (http://jsil.org/). My ~120k LoC wouldn't work without their ~450k LoC (well, I don't consume all 450k...)
ILSpy is a pretty interesting application/library to look at under the hood. The decompilation logic that transforms .NET bytecode (MSIL) into higher-level data structures is split into a bunch of well-defined transform phases that run in a pipeline, which means you can actually step through the pipeline and watch it improve the readability and semantic clarity of the IL one step at a time. It's an incredibly valuable debugging tool and really useful for understanding how this kind of decompiler works, and it was a big influence on how I ended up designing the similar parts of my compiler.
As a whole, I think ILSpy demonstrates just how valuable it is to have a really well specified instruction set sitting beneath your compiler and runtime. MSDN's documentation for the instruction set is clear and understandable and libraries like Cecil and ILSpy make it easy to load, manipulate, and save for whatever your purposes might be - runtime code generation, machine transforms of compiled code, obfuscation, deobfuscation, or outright cross-compilation.
Yeah. I consume the munged IL that comes out of their transform pipeline (though for complex reasons, I don't use all of it - some of their transforms are destructive in ways that aren't helpful, or I'd have to undo them) which saves me the trouble of reimplementing things they already figured out, like how to transform most branch/jump patterns into if statements and while loops.
I could generate JS from raw IL (and other projects like Volta did just that) but ILSpy gives me a huge head start in terms of producing JS that actually looks like what you'd write by hand. For loops instead of while loops, switch statements instead of cascading ifs, etc.
Volta, from the demo I used a long time ago, seemed horrendously slow, too. JSIL feels far faster. I guess you gain a bit of performance by making the code higher level so the JS engines can tell if an optimization is safe.
Is JSIL limited to a subset of IL? Can you target C++ (in pure mode) to it? Opcodes like cpblk, and others?
Volta's approach seems to be essentially implementing a low level .NET runtime and representing the bytecode as JS. It gives you some cool stuff for free (for example, that low level approach means their type system works almost exactly like .NET's universally) but it does indeed mean that you have to work harder to give JS engines an opportunity to optimize your code. It makes integration tougher, too. In comparison, the best way to describe JSIL's approach is trying to express .NET concepts on top of JS, so it uses JS standard libraries and types wherever possible.
JSIL is theoretically limited in that there are things expressible in IL that you simply can't do in a browser. However, out of all the executables I've run the compiler on so far, very little of the IL they contain is actually impossible to translate - the tricky patterns and opcodes seem to get used only occasionally in one or two methods.
Some parts are definitely harder than others; I've only recently gotten support for pointers and the 'unsafe' class of C# features working: http://jsil.org/try/#5055026 and that's only covering a subset of all the different opcodes defined for doing interesting things with pointers and references. For example, function pointers will probably never work, and IIRC there are a few opcodes dedicated to interacting with those.
Inside every language there is a minimal language with all the syntactic sugar translated into a subset of the language.
e.g. the C# compiler turns "var a = 3;" into "int a = 3;" It turns anonymous lambdas into methods with generated names (i.e turns one line of code into 3-4 lines), and makes classes with constructors that do the environment capture if needed. It turns "yield return" generator methods into objects that have state.
While async/await is a cool feature and is worth using, it is worth noting the up trend in complexity of generated code – and ansyc/await generates significantly more code in the simpler language without this feature than previous new features did.
How much code does an await statement give rise to? it looks like about 40-50 lines to me.
You need to compare it to the alternatives. For instance how much IL will be procuded from creating Tasks with Continuations?
Async/Await makes it easier to write asynchronous code in a traditional linear manner, it's easy to shoot yourself in the foot but it's just as easy with the alternatives.
Personally I think the Async/Await model is too limiting. E.g. lets talk about this:
var result = await client.GetStringAsync("http://msdn.microsoft.com");
From the perspective of the caller, that code is still blocking. It solves the problem of threads being blocked, but that's only one of the problems you have.
The real and more complete alternative would be to work with a well designed Future/Promise framework, like the one in Scala [1] which is also usable from Java [2]. Doing concurrent computations by means of Futures/Promises is like working with Lego blocks.
Let me exemplify with a piece of code that's similar to what I'm using in production. Lets say that you want to make 2 requests to 2 different endpoints that serve similar results. You want the first one that completes to win (like an auction), but in case of a returned error, you want to fallback on the other one (type annotations and extra verbosity added for clarity):
val url1 = "http://some.url.com"
val url2 = "http://other.url.com"
// initiating first concurrent request
val fr1: Future[Response] = client.get(url1)
// initiating second concurrent request
val fr2: Future[Response] = client.get(url2)
// the first fallbacks on the second in case of error
val firstWithFallback: Future[Response] =
fr1.fallbackTo(fr2)
// the second fallbacks on the first in case of error
val secondWithFallback: Future[Response] =
fr2.fallbackTo(fr1)
// pick a winner
val finalResponse: Future[Response] =
Future.firstCompletedOf(firstWithFallback :: secondWithFallback :: Nil)
// process the result
val string: Future[String] = finalResponse.map(r => r.body)
// in case both endpoints failed, log the error and
// fallback on a local copy
val stringWithFallback: Future[String] = string.recover {
case ex: Exception =>
logger.error(ex)
File.read("/something.txt")
}
Given an HTTP client that's based on NIO, the above code is totally non-blocking. You can do many other crazy things. Like you can wait for several requests to complete and get a list of results in return. Or you can try the first url and if it fails, try the second and so on, until one of them succeeds.
In the context of web apps, you can prepare Future responses this way and return them when they are ready. Works great with the async responses support from Java Servlets 3.0. Or with a framework such as Play Framework 2.x you can simply return Future[Response] straight from your controllers [3]
I suspect that you could so something very similar in C# - GetStringAsync returns a Task<string>, and you don't have to await it right away or one by one. So you can do:
var urlTask1 = client.GetStringAsync(url1);
var urlTask2 = client.GetStringAsync(url2);
var firstDone = await Task.WhenAny(new Task[] { urlTask1, urlTask2 });
These also return Task<T> objects that you can work with further or await. I'd be very surprised if you can't do the equivalent, also in a totally non-blocking way.
I haven't used tasks heavily yet so I'm not sure how to implement everything in your example. But I believe it's all possible.
Async/await is "just" syntactic sugar for tasks. So I'm not sure what you mean by "still blocking from the perspective of the caller". When you call an async method, you get back a Task. When you use await on a task, the compiler rewrites your code to introduce continuations. If you need some more powerful Task features hidden by the syntactic sugar, you always still have the option of using tasks explicitly.
The interesting thing about async / await is that for the coder, the code looks linear, but it doesn't execute that way. So a gap is opened up between "the perspective of the coder" and the perspective of the machine.
I think that bad_user means that if you await one client.GetStringAsync then await a second client.GetStringAsync, the second GetStringAsync only starts after the first one completes, so the first get "blocks" the second, "from the perspective of the caller". I.e no parallelism.
Of course, you don't have to do that. Code elsewhere in this thread.
It is neat that C# has some standard macros defined. It would be nicer if it was more explicit what is going on behind the scenes. I would love to see the equivalent of c-macro-expand. [0]
Also, although it is possible to extend C# using stuff like custom linq providers, [1] I still prefer lisp style macros.
Yep. The C# design attitude seems to be that the compiler-writers have the ability to define such macros, but the C# compiler users don't. Compare with F#.
The technique exists now [0]. It is kind of a pain to use and seems kind of dangerous to use as a web app. But it is there. I messed with this a few years ago.
"While async/await is a cool feature and is worth using, it is worth noting the up trend in complexity of generated code – and ansyc/await generates significantly more code in the simpler language without this feature than previous new features did."
This is the case with all high-level language features. Full lexical scoping with first class functions forces the PL implementation to either generate a <code;environment> tuple for all function values, or to perform aggressive closure conversion (up to Stalin levels) in addition. Virtual method dispatch gives rise to inline caches, often polymorphic ones. Lazy evaluation forces you to generate thunks. Pattern matching forces you to generate a decision tree or a similar structure. The list goes on. You already have such things in C#, even without async calls.
"yield return" also makes a state machine, but a simpler one. I like async/await, but I am aware of the steep upward trend in internal complexity of these features.
IMHO this means that the fruit of these features are no longer so low-hanging.
ILSpy is probably one of the best utility programs I have ever used... It has saved me countless times from poorly documented API's and helps me work around countless bugs in third party libraries.
But in most cases such as ASP.NET MVC actions, or doing an API call in a desktop app, the maintenance and high level code simplification you can get by using these techniques far outweighs the low level "cost" of the code generated by the compilers.
It's good to know what's going on under the covers though.