nathanrf's comments

nathanrf · 2025-03-11T17:55:54 1741715754

I'm not involved in this rewrite, but I made some minor contributions a few years ago.

TSC doesn't use many union types, it's mostly OOP-ish down-casting or chains of if-statements.

One reason for this is I think performance; most objects are tagged by bitsets in order to pack more info about the object without needing additional allocations. But TypeScript can't really (ergonomically) represent this in the type system, so that means you don't get any real useful unions.

A lot of the objects are also secretly mutable (for caching/performance) which can make precise union types not very useful, since they can be easily invalidated by those mutations.

nathanrf · 2024-06-25T16:28:22 1719332902

The quoted paragraph is correct as written. E.g. if

f: ∀A, B. A -> B -> A

Then we have poly type π = "∀A, B. A -> B -> A".

The monotype μ = Str -> Int -> Str is a specialization of π, so we are also permitted to judge that f has type μ.

The type specialization operator "looks backwards", where we write sorta "π ≤ μ" because there's a kind of "subtyping" relationship (though most HM type systems don't actually contain subtyping) - anywhere that a term of type μ is used, a term of type π can be used instead because it can be specialized to μ.

I think this is a good complementary article: https://boxbase.org/entries/2018/mar/5/hindley-milner/ (it also links to three more academic sources for more context/detail)

nathanrf · 2024-04-07T02:34:12 1712457252

The paper defines them as programs in a process calculus (which is fairly standard as far as theory for protocols is involved):

  Definition 1 (Asserted protocols) Asserted protocols, or just protocols for short, are
  ranged over by S and are defined as the following syntax rules:
  S ::=
      |p.S                action prefix
      | +{ l_i : Si }_i∈I branching
      | µt.S              fixed-point
      | t                 recursive variable
      | end               end
      | assert(n).S       assert (produce)
      | require(n).S      require
      | consume(n).S      consume

Process calculi are "fundamental" descriptions of computation analogous to lambda calculus but oriented around communication instead of function calls. (As far as paper structure, I find that usually the important "basic" definitions in programming language research papers are usually in Section 2, since Section 1 serves as a high-level overview).

Basically, a protocol consists of a a sequence of sends/received on a particular channel, mixed with some explicit logic and loops/branches until you reach the end. There's some examples in Section 2.1 which are too complicated to reproduce here.

As a general note on reading protocols- for (good, but industry-programmer-unfriendly) technical reasons they're defined and written as "action1.action2.action3.rest_of_program" but mentally you can just rewrite this into

  {
    action1();
    action2();
    action3();
    ... rest_of_program ...
  }

(in particular, making "the rest of the program" part of each statement makes specifying scope much easier and clearer, which is why they don't just use semicolons in the first place)

hyperthesis · 2024-04-07T09:37:59 1712482679

Thanks for your guidance! I now see they're (now-obviously!) things like TCP and http. I had missed their informal definition:

> Here we use the term protocol to denote a specification of the interaction patterns between different system components. [...] To give a more concrete intuition, an informal specification of a protocol for an e-banking system may be as follows: The banking server repeatedly offers a menu with three options: (1) request a banking statement, which is sent back by the server, (2) request a payment, after which the client will send payment data, or (3) terminate the session.

I would say it's like how you use an API.

nathanrf · 2024-04-01T15:31:53 1711985513

Feral != Native. There are no honeybee species native to North America; all honeybees, including the feral ones, are descended from colonies imported from Europe (e.g. the "Western honeybee" apis mellifera).

There are a lot of non-honeybee bee species native to North America though, and they now face competition from feral domesticated honeybees. It's unclear exactly how much impact they have- some research does treat honeybees as a harmful invasive species (similar to many other human-introduced species).

Native bee species have a harder time getting good PR because they don't directly work for us, even though they are important pollinators for some native plants.

giantg2 · 2024-04-01T15:40:17 1711986017

I think there at least used to be honey producing bee species in the southern US. I believe some are extinct now and others are only present in South America now.

Baeocystin · 2024-04-01T21:26:35 1712006795

Mayan honeybees are native to the Yucatan, FWIW. They've also spread to Cuba.

mattmaroon · 2024-04-01T21:11:27 1712005887

South America has the stingerless variety and I’m very jealous.

nathanrf · 2024-04-01T01:07:08 1711933628

It is a good thing that cppfront lets you do that, then!

Cppfront generates #line pragmas which tell the generated .cpp file which source lines to "blame" for each piece of generated code. This isn't something new and fancy for cppfront, it's a bog-standard pragma that your debugger already understands. So it will work the exact same as your current debugging workflow even if you mix cpp and cpp2 source files.

wheybags · 2024-04-01T09:27:32 1711963652

I'm working on a hobby project language that generates plain C as output, and debugger integration has been one of my big worries. If that works, then this is awesome, thank you!

nathanrf · 2024-04-01T01:01:48 1711933308

Unfortunately, C++ uses ++ and -- for iterators, many of which cannot reasonably implement += or -=. This distinction is baked into the type system to tell whether or not an iterator supports efficient "multiple advance" (e.g. a linked list iterator doesn't have += but a pointer into a contiguous vector does).

There's no way to fix this in a reverse-compatible way for existing code (which is one of the constraints of cpp2- it must work with all existing C++ so that it is possible for existing projects to migrate regardless of size).

Rexxar · 2024-04-01T02:35:24 1711938924

Making ++ or -- a statement that increment the target without returning a value should probably be enough for forward iterators.

layer8 · 2024-04-01T16:18:28 1711988308

A noncopying ++ could be spelled `+= 1` though. So some iterators would support `+= 1`, but not `+= 2`. This would be vaguely similar to how the null pointer constant was defined as an integer constant expression that evaluates to zero: Define `+= <integer constant expression that evaluates to one>` as the increment operator.

nathanrf · on Oct 17, 2023

The distortion comes from trying to map R^2 onto the surface of the mesh, since they have different curvature, so they only "match" near the origin.

But the random-walk algorithm doesn't actually need to happen in R^2; I think it should be relatively straightforward to adapt to walk directly on the mesh.

Instead of tracking an "angle", you just need to track the forward tangent vector. And then walk in very small steps using the geodesic walk algorithm, turning the tangent at each point proportional to the derivative and normal at the new location.

It will probably be slower, but it means it will be distortion-free, since you're no longer trying to force flat coordinates onto a non-flat surface.

Dealing with intersection is more complicated, though - I think you could probably just use an octree structure for acceleration (like the spatial tree in 2D). Since the segments can be each placed (almost) exactly on planar triangle faces, you can just project onto one/both of the segments' containing faces to check for intersection.

Ameo · on Oct 17, 2023

Ah, that's an excellent idea!!

I hadn't considered walking on the mesh directly like that, but now that you say it that makes a ton of sense. I will have to try that out

nathanrf · on Oct 14, 2023

It is unsound to transmute `&'a T` into `&'static T`, but it is not UB - as long as all of the subsequent uses of the transmuted reference obey the "real" lifetime of the original reference:

    fn example<'a>(r: &'a mut i32) -> &'static mut i32 {
        unsafe { std::mem::transmute(r) }
    }
    
    
    fn main() {
        let mut x: i32 = 5;
        let ptr: &'static mut i32 = example(&mut x);
        *ptr = 6;
        println!("{x}");
    }

(because it's unsound, it's considered wrong to do this - you should not intentionally write functions whose types are lies, and this one definitely lies, so it should be marked `unsafe` - but this is not automatic UB)

https://play.rust-lang.org/?version=stable&mode=debug&editio...

You can run through Miri and confirm there's no UB even though we're modifying `ptr`, whose lifetime has been extended beyond the length of the function.

However, Rust does have extra guarantees here which make this irrelevant to the pessimization problem in the linked article - you cannot ever legally convert a `&T` into a `&mut T` - this is always UB. This means that Rust guarantees that `example` does not modify `x` (unless e.g. it contains an `UnsafeCell`, like a `Mutex`'s contents), and so it does not need to defensively reload its value.

That is to say: Rust, just like C++, makes it legal (but frowned upon) to "leak" a reference beyond the stated lifetime it's provided as. But unlike C++, it is (always!) illegal to "upgrade" a `&T` into a `&mut T`, and thus the fact that it escapes does not hinder other optimizations.

kaba0 · on Oct 14, 2023

Could you please expand on this last point? Would it not be the same in case of C++’s `const`s?

nathanrf · on Oct 14, 2023

In C and C++, `const` on pointers/references is basically just a comment to programmers - it is part of the type, but doesn't "mean" anything to the abstract machine; the rules don't treat const / regular references/pointers differently, they just say that the types only let you mutate through a mutable pointer.

Obviously, good code should treat it as more than just a comment - using `const` correctly clarifies intent and makes it possible to stay sane as a C++ developer, but the abstract machine doesn't care.

In C++, you can basically always `const_cast` a `const T&` into a `T&` and then modify it without causing UB. A function that accepts a `const T&` is just pinky promising that it will be polite and probably not do that.

It is only UB if the underlying object is "actually const", and even then, it doesn't cause UB until you actually perform the mutation; creating the mutable reference itself is perfectly fine.

For example, the following is perfectly legal:

    int& upgrade_to_mut(const int& x) {
      return *const_cast<int*>(&x);
    }

    int x = 5;
    const int& x_ref = x;
    int& x_ref_mut = upgrade_to_mut(x_ref);
    x_ref_mut = 6;

it's only invalid if the object that is pointed at is const, as in:

    int& upgrade_to_mut(const int& x) {
      return *const_cast<int*>(&x);
    }

    const int y = 5;
    const int& y_ref = y;

    int& y_ref_mut = upgrade_to_mut(y_ref); // it is actually legal to produce y_ref_mut, but we cannot modify it


    y_ref_mut = 6; // this is UB: cannot modify a const object 'y'

The difference is that in Rust, "mutation capabilities" are part of references, and so you cannot create them out of nowhere, that would be UB. But in C++, mutation capabilities are part of the object being pointed at, so as long as they happen to be there when you perform the mutation (e.g. you're not modifying a string literal or a variable declared `const`) then there's no problem.

ryukoposting · on Oct 15, 2023

It's not entirely true that "const" is "just a comment," depending on the use case. In machines with super-limited RAM you can use const on globals to tell the compiler "put this in .rodata"

In other words, "const" (in a global context) can tell the compiler "you don't have to copy this to RAM, just read it directly from non-volatile storage." Obviously, that would be undesirable on a desktop computer, but if you're dealing with a wee little microcontroller, it's very helpful.

nathanrf · on Oct 15, 2023

`const`-as-comment is specifically limited to pointers and references - `const` on objects definitely does change semantics (it is always UB to attempt to modify a `const` object).

Another good example is string literals (except when initializing a non-const `char[]` variable), which are often allocated in read-only data in the same way, since they are const objects too.

Kranar · on Oct 15, 2023

The comment specifically referred to references and pointers whereas your example does not.

j16sdiz · on Oct 14, 2023

I don't agree it is just a comment to programmer. CppReference says,

> Modifying a const object through a non-const access path and referring to a volatile object through a non-volatile glvalue results in undefined behavior.

and both compiler have tried to take advantage of this in the past.

nathanrf · on Oct 14, 2023

That is saying exactly the same thing that I said; the qualification of the pointer is irrelevant; you get UB when modifying a const object. In the code snippets above,

   int x; // not a const object, so can be freely modified
   const int y; // is a const object, so cannot be modified

C/C++ pointer provenance doesn't include information about constness, so it doesn't matter "how" you got a mutable pointer to `x`: you're always allowed to modify `x` through that pointer, even if the pointer "came from" a const reference.

The reason for mentioning a "non-const access path" is that the type system forbids you from modifying through a const access path in the first place, so the program would already be rejected if you tried that.

I'm not saying it's a good idea to go around dropping const qualifications; `const_cast` is mostly evil and should be avoided. But at the level of the abstract machine, it's a no-op, even when going from const -> non-const, other than changing the type of the provided pointer.

The benefit of `const` is that if you don't use C-style casts that discard constness, and you don't use `const_cast`, and you don't use `mutable` or other type-unsafe or const-unsafe features, it's not possible to accidentally obtain a non-const pointer to a const object. Thus C++ actually helps you avoid this UB pretty well. But the fact that general conversion from const to non-const is permitted reduces the kinds of optimizations that can be performed.

sapiogram · on Oct 14, 2023

Thanks! Do you know if the Rust compiler is able to pass this information to LLVM somehow?

hkalbasi · on Oct 14, 2023

It seems it isn't yet: https://github.com/rust-lang/rust/issues/116744

afdbcreid · on Oct 15, 2023

It does pass this information to LLVM, in the form of the `readonly` attribute. This seems to be a bug in LLVM that does not optimize the function propely, I don't know why.