More

nwlieb · 2024-01-21T21:13:41

There's some ambiguity for argument destruction order for example: https://stackoverflow.com/a/36992250

Similarly, the construction/destruction order for std::tuple elements is not well defined.

Granted, that's implementation defined behavior, which is technically deterministic on a single compiler.

jstimpfle · 2024-01-21T22:11:01

This isn't really about constructors/destructors. Expressions like function calls with multiple arguments have always been "unsequenced" with respect to each other. In other words the order is left to decide for the compiler. It's always been like that, going back to C (and probably other languages). If you call f(x++, x++) what values get passed to f is unspecified.

I suppose the destruction of whatever the expressions constructed still happens in reverse order of construction.

But either way I might not even care, I'm aware that at the level of a single statement the execution sequences aren't much defined, so I rarely put more than one mutating thing per expression, or otherwise I'm sure that I don't care about the order -- I could live with any sequence as well as totally parallel execution.

Example: buf[i++] = 5 has 2 mutating sub-expressions, but I know it's not messing up anything. I don't care whether i gets incremented before 5 gets assigned or the other way around.

nwlieb · on May 14, 2023

Say I wanted to rank my own personal collection of songs by retention/engagement— are there any open source libraries or crisp descriptions of algorithms/statistical models that one could use?

nwlieb · on April 17, 2023

The runtime is quadratic for a given context size, although it seems like there is some progress on this front https://gwern.net/note/attention

nwlieb · on Aug 24, 2022

Is `(x - 1)` not a runtime cost if `x` is a runtime variable?

vikingerik · on Aug 24, 2022

Not on many instruction architectures - addressing modes often support adding/subtracting a constant.

nwlieb · on Nov 17, 2020

Is this a 1-1 comparison? If the ARM compile is compiling to ARM binaries then there might be less work/optimizations since it is a newer architecture. Seems like a test with two variables that changed. Would be interesting to see them both cross-compile to their respective opposite archs.

mlyle · on Nov 17, 2020

Maybe not, but A) it's close-- most of the work of compiling is not microarchitecture-level optimizations or emitting code, and B) if you're a developer, even if some of the advantage is being on an architecture that it's easier to emit code for... that's still a benefit you realize.

It's worth noting that cross-compiling is definitely harder in many ways, because you can't always evaluate constant expressions easily at compile-time in the same way your runtime code will, etc, too, and have to jump through hoops.

throwaway894345 · on Nov 17, 2020

As someone who knows relatively little about this, I'm very curious why this is downvoted. It seems like a rebuttal would be enlightening.

marmaduke · on Nov 17, 2020

Hm my experience was that compiling C on arm was always super fast compared to x86, because the latter had much more work to do.

mlyle · on Nov 17, 2020

This doesn't align with my experience. Clang is about the same, but GCC often seems much slower emitting cross-ARM code.

  jar% time x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   0.97s user 0.02s system 99% cpu 0.992 total
  jar% time x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   0.93s user 0.03s system 99% cpu 0.965 total
  jar% time x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   0.94s user 0.01s system 99% cpu 0.947 total
  jar% time x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  x86_64-linux-gnu-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   0.92s user 0.04s system 99% cpu 0.955 total

  jar% time arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   1.43s user 0.03s system 99% cpu 1.458 total
  jar% time arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   1.46s user 0.03s system 99% cpu 1.486 total
  jar% time arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   1.55s user 0.04s system 99% cpu 1.587 total
  jar% time arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I ../../shared/api
  arm-linux-gnueabihf-gcc --std=c99 -O3 -c insgps14state.c -I inc -I   1.44s user 0.03s system 99% cpu 1.471 total

marmaduke · on Nov 18, 2020

That’s interesting. I was not cross compiling so maybe the arm system I was using was just faster.

fomine3 · on Nov 18, 2020

So cross compile for RISC-V, POWER, or something else would be fair?

attractivechaos · on Nov 17, 2020

Apple has been optimizing the compiler for a decade for iOS.

freehunter · on Nov 17, 2020

If everything else is the same, that seems like a solid reason to prefer the ARM architecture even setting aside 1:1 comparisons. Isn't faster compilation and execution the whole point of a faster processor?

kag0 · on Nov 17, 2020

The assertion is that compilation might be faster since there are fewer optimizations, and therefore runtime would be slower.

nwlieb · on Feb 9, 2020

Could you describe what makes the Google Fibers so nice?

I'm also really curious why they require modifications to the Linux kernel. My first guess would be stronger integration with the IO model at the syscall boundary (similar to io_uring).

Edit: is this the talk your referring to? https://www.youtube.com/watch?v=KXuZi9aeGTw

Blackthorn · on Feb 9, 2020

You know how the first time you learned about tcp sockets you made a server that spawned a new thread to handle an incoming connection (or maybe not, people learn differently nowadays).

With the fibers implementation you can just do that. It doesn't kill your performance, and you don't need to go to a painful async model just for performance reasons.

m0zg · on Feb 9, 2020

Pretty much. You get to pretend inside your fibers that you're actually running threads. IIRC (it's been a while) you also get proper stack trace when something barfs, the importance of which cannot be overstated.

ninkendo · on Feb 10, 2020

What are they though? Is this a library for an existing language? A runtime scheduler like the one that does goroutines in Go? If it were open sourced, how would I use it?

thedance · on Feb 10, 2020

It’s just a library that allows easier development of C++ servers in the synchronous, thread-per-requests style, similar to working in Go but a bajillion times better because it’s not in Go.

Blackthorn · on Feb 10, 2020

All of the above, and more -- kernel enhancements. See the linked paper, they detail what they do for the kernel side at least.

dkersten · on Feb 10, 2020

> or maybe not, people learn differently nowadays

If by “nowadays” you mean ~2000 when I first learned socket programming (using select!)? ;-)

m0zg · on Feb 10, 2020

Yes, that's the one. Unfortunately it doesn't show any of the API details that a developer would be exposed to.

nwlieb · on Sept 26, 2019

Related: is it possible to reliably maintain physical disk space quotas in Linux (similar to cgroups)?

Furthermore, is it possible to say how much "space" you would use if you were to create a file with a given size, accounting for block-size, fragmentation, and metadata? Matters such as block-size, inode usage, and metadata seem to make this very difficult even if you add special integration to the userspace application, for example by using stat or statfs. This could help prevent quota overruns for example.

These seem like hard problems unfortunately, and I suspect the best solution is to just create separate disk partitions for each quota group.

chungy · on Sept 26, 2019

First question: Quotas have been supported on Linux for a very long time. All major (and native) file systems support them.

Secondly: disk usage accounting for metadata as well as regular file data may or may not be tricky. ZFS always tells you how much data+metadata is used by a file, helped by metadata itself being dynamically allocated on ZFS like everything else is. File systems like ext4 that have fixed metadata locations on disk don't report back metadata allocation with the file; it wouldn't really be useful to see this information since removing the file doesn't free any metadata in the ext4 case.

ifcho · on Sept 26, 2019

Project quotas appear to be more similar to cgroups. They are available in xfs and ext4 https://lwn.net/Articles/623835/

cat199 · on Sept 27, 2019

The traditional 4.2BSD-style quotas (1983!) on linux also support quotas on unix groups. Not sure if you had this in mind, but anyway.

I suppose project quotas as outlined here would allow multi-group support though.

Another option could be sparsely-provisioned COW LVM volumes.

nwlieb · on Sept 3, 2019

In my experience this is only true for C/C++ (with a decent amount of work to setup CROSSTOOL properly). As soon as you start to get into Python, and Python<->C++ interop, it becomes very leaky.

I heard that Google has some tools internally that build the Python interpreter with Bazel and use that in order to guarantee hermeticity, but that doesn't seem to be possible with public tooling (at least not without some major hacks, for example https://github.com/bazelbuild/bazel/issues/4286 )

It would be interesting to see how Google manages languages such as Python at scale (and other languages that have similarly leaky package management).

nwlieb · on July 18, 2019

What is an example of a 3-space embedding or interesting literature? I'm having difficulties googling the term.

jandrewrogers · on July 18, 2019

A 3-space embedding is a representation optimized for efficient decomposition and computational geometry, ideal for scale-out analytics. This is an interesting design problem in that you can't achieve both with a single surface and they are mathematically incompatible (one requires a discrete surface, the other requires a real surface). A 3-space embedding is a dual surface representation engineered to make it easy to move between the surfaces as required by code. As the name implies, you are logically embedding a standard 2-spheroid in a synthetic discrete 3-space and both coordinate systems can be used simultaneously. Presentation requires computing a projection of some sort.

Unlike single-surface representations, these have the advantage of being essentially free of computational edge cases if you design them correctly. They are also amenable to implementations that are extremely computationally efficient to use, which is a bit of an afterthought for most presentation-optimized designs but important for high-scale geospatial analytics.

A common reflexive criticism of these representations is that they use equal volume sharding, which means that sharding them is not a good approximation of equal area on the embedded surface. An equal area decomposition only makes sense in the context of presentation (e.g. tiling) because the underlying data distribution is naturally extremely and unpredictably skewed, leading to non-uniform cell loading no matter how you decompose it. The assumption that equal area decomposition helps to ensure uniform cell loading is trivially false in practice, making it a non-optimization. Therefore, any competent implementation always requires a separate mechanism for ensuring uniform loading independent of the decomposition model.

The term of art for all of this is discrete global grid systems (commonly "DGGS"). The vast majority of the literature is focused on presentation optimized systems, and the design of single-surface representations, but other types of representations are discussed. It has a very rich taxonomy. I have an article I've been sporadically writing which I should probably finish that steps through the design of a state-of-the-art 3-space embedding representation system for scale-out analytics, based on a (currently stalled) effort to produce a formal standard for industry. A good 3-space embedding has a relatively simple description and implementation but there is much technical subtlety as to why it is designed a specific way.

scottlocklin · on July 18, 2019

I'm guessing he means stuff like voronoi tesselation, which isn't limited to 3-space. Look at the books of Hanan Samet for more on this stuff: http://www.cs.umd.edu/~hjs/

nwlieb · on May 12, 2019

Cruise uses ROS

https://roscon.ros.org/2018/presentations/ROSCon2018_Lessons...

a_t48 · on May 13, 2019

ROS (Robot Operating System) is not a RTOS (Real Time Operating System). ROS itself is very very non deterministic and has no guarantees about scheduling.