Hacker News new | past | comments | ask | show | jobs | submit login

Part of the reason most languages obscure this is because it's a moving target.

If a language let you say, "this chunk of code here should run in 7 cycles", what happens when a new optimization finds a way to reduce that, or a new architecture comes up where that operation gets slower but lots of others get faster?

I'm not arguing against your desire, just explaining that it's not unreasonable that we're where we are now. We've gotten so used to language portability, that it's good to remember how painful it can be to lose that. It's no fun having to rewrite all your code every time a new chip comes out.




This could only ever be doable with extremely simple architectures anyway. Off the top of my head, add in just one of branch prediction, micro-op fusion, out-of-order execution, pipelines and pipeline stalls, or cache misses, and this becomes impossible. Of course, this assumes you even know which CPU you are targeting its specific instruction latencies.

That's already an extremely niche set of processors. Further, the number of bits of code you're likely to care about this kind of extremely precise timing for, you'll either examine the emitted assembly, or just hand-write the ASM yourself.

It seems like a huge amount of effort for an extremely niche scenario. Remember, the ISA is still just an abstraction, after all.


To add to that there's also the difference between cycles spent executing an instruction and how many of those instructions can be executed at once in the pipeline. So there is a difference between executing a set of instructions once versus executing them millions of times.


In these times of dynamic frequency scaling even the temperature of the room the computer is sitting in ia going to affect the performance.

In practice I think hard real-time systems use extremely conservative estimates of cpu performance.


This is actually why you want the compiler to track it.

You write an algorithm that seems reasonable. And you encode timing constraints into the thing. Now you re-target to a different machine, and the compiler re-checks those constraints. A much cheaper way of dealing with weird timing bugs than doing a post-mortem of why the fuel valves didn't quite close on the day of the rocket test.

But this only works if the CPU is knowable to the compiler. Which is why it is only even remotely feasible in the embedded world (where it is also the most useful).


The compiler can only help so much as many of the timing parameters are variables and runtime dependent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: