All of this makes a lot of sense. But to add; a semi-professional piano player I know once gave me a heuristic that can help sometimes, namely “the lowest fifth contributes the most to what the chord feels like”. But you can still get into discussions about whether something is, say, a Dm/F or F6 (and I see a lot of tabs where I disagree which the author's choice).
(semi-professional piano player here) That's interesting. I've never heard that...not even sure what "lowest fifth" would mean. I would think that the root and the third contribute the most, right? The fifth is sometimes even omitted. I'm very curious how this might work. Do you have any examples?
I guess it means in the sound scape as a whole, potentially including multiple instruments. But to keep it to a single piano, e.g. if you play B3 D4 G4, then it won't sound like a B chord (neither minor nor major) because there's no fifth. Instead, you'll hear D5 as an overtone of D4, and G4-D5 will be the lowest fifth that you hear, so it is readily interpreted as G/B. If the third were the most important, then this would have been a Bm, which is pretty clearly isn't unless you add an F# (or there is some other instrument in the mix that provides one).
An important point here is that for certain languages, using the original grammar is pretty much impossible. In particular, for C, you want to do diffing and merging on the un-preprocessed source, but the language's grammar very much assumes the source has gone through the preprocessor.
Of course, the existence of the preprocessor means there are situations where it's completely impossible to know what the correct parse is; it will necessarily be heuristic in some cases.
Using clangd for this very much assumes that you have all include files and build flags available at the point of merge resolution, which is often pretty much impossible.
Agreed, it's not very robust the way tree-sitter is. I just thought that libclang trick was very useful for extracting semantic information from C source.
You can render the entire string before upload, but then you are essentially using a CPU render, which will be slower than having the GPU do the same thing.
FWIW, this method is also a texture despite being called “texture-less”; the texture is just stored in a different format and a different place. True textureless font rendering evaluates the vector curves on the fly.
Does TPC-H SF1 really take _one and a half hours_ for you on regular Postgres? Last time I tried (in the form of DBT-3), it was 22 queries and most of them ran in a couple seconds.
Interesting. I haven't used the DBT-3 kit, does it add any indexes? I manually added these Postgres indexes https://github.com/BemiHQ/BemiDB/blob/main/benchmark/data/cr... to reduce the main bottlenecks on SF0.1 and reduce the total time from 1h23m13s to 1.5s. But SF1 still took more than 1h
It adds a bunch of indexes, yes. I don't think anyone really runs TPC-H unindexed unless they are using a database that plain doesn't support it; it wouldn't really give much meaningful information.
Edit: I seemingly don't have these benchmarks anymore, and I'm not going to re-run them now, but I found a very (_very_) roughly similar SF10 run clocking in around seven minutes total. So that's the order of magnitude I would be expecting, given ten times as much data.
I've tried --call-graph lbr a bunch of times, but often, it… just returns junk? I don't fully understand why, it sometimes returns wild pointers even if you don't have deep stacks.
I often get junk when sampling without lbr. Which kernel are you running? The quality of perf and the associated perf_events varies wildly across kernel versions.
> - 5% of system-wide cycles spent in function prologues/epilogues? That is wild, it can't be right.
TBH I wouldn't be surprised on x86. There are so many registers to be pushed and popped due to the ABI, so every time I profile stuff I get depressed… Aarch64 seems to be better, the prologues are generally shorter when I look at those. (There's probably a reason why Intel APX introduces push2/pop2 instructions.)
This sounds to me more like an inlining problem than an ABI problem. If the calls take as much time than the running, perhaps you just need a better language that doesn’t arbitrarily prevent inlining due to compilation boundaries (eg. basically any modern language that isn’t in the C/C++ family, before LTO)
I see this in LTO/PGO binaries as well. If a function is 20 instructions long, it's not like you can inline it uncritically, yet a five-cycle prologue and a five-cycle epilogue will hurt. (Also, recursive functions etc.)
reply