Hacker News new | past | comments | ask | show | jobs | submit login
Native Reflection in Rust (wrenn.fyi)
277 points by jswrenn on Dec 15, 2022 | hide | past | favorite | 65 comments



Great writeup! The defmt logging crate uses a linker script to extract debug symbols so that you get nicely formatted stack traces on embedded systems. It works on linux, macos and windows. I wonder if the same technique can be applied to this project. It needs a runner though so may not be the right approach.

https://github.com/knurling-rs/defmt


I've used very similar method, at work, to provide C++ "reflection" between my own system and a system from another team.

Basically, the other system is a dynamic library which sends and receives C structures from my application. Those structures are then mapped into a buffer that is supposed to have the same size and there are pointers with metadata pointing into the buffer that are supposed to be exactly like the struct elements. Those structures can have arbitrary complexity, and are passed around through type erasure (essentially char*).

I wrote a "reflection" code for the other team, which runs when they register the struct instance to be sent, checks if there's a matching PDB [0] around, reads it, and outputs a json including the metadata needed, which can then be used to define the structures' metadata on our side correctly.

This is all in C/C++ since in some contexts we have soft real-time requirements, else I would have used any of the many RPC frameworks available.

This has been working for several years now.

This is not a generic solution but it's good enough for in-house communication between 2 systems that are maintained by different parts of the organization, where the API between them, that like I said is based on passing around char* buffers, has been more or less set in stone a long time ago. Conway's law [1] and all that. Sigh.

[0] We are a Windows shop although the same thing should work with DWARF info, same as the OP library works. In fact he says "It may never work on Windows, which does not use DWARF to encode debug info" but I can say that the same approach does work on Windows, for C++ at least. The PDB format might be a tad undocumented, but its documentation has been improved in the last decade or so since I started working on my library. Writing some small test programs is enough to understand how to access it, if all you need is meta info on C-style structures. Other stuff is more... challenging. But it wasn't necessary for my use-case.

[1] https://en.wikipedia.org/wiki/Conway%27s_law


Was the other team completely unwilling to provide a header?


Yes, they are willing, that wasn't the problem. The problem was that the consuming app on my side is historically metadata driven, and historically tries to avoid having to recompile when the interface changes. We do that by keeping the code generic and by reading the interface from a database. This leads to faster iterations. The problem rises when we have to interface with any other system which is not generic and has its interface defined in H files.

Yeah, I know, it's our problem, not theirs. It's something I cannot fix on my own without a huge effort. I've tried pushing for it for more than a decade, and at some point my wish was sort of abducted by my boss' boss as an excuse to create a DSL [0]. This did solve some huge problems but also created many others. It didn't solve that char* / h file problem since it doesn't really have an FFI.

[0] Domain specific language, custom-made for our own internal users. I've come to hate DSLs since I have to support that one, which never wanted.


Does using DWARF info imply that this will break when you strip the resulting executable? I often strip my Rust binaries because it practically halves the application size, which can become quite a lot in a language where you're statically linking everything.

Regardless, quite an ingenious use of standard ELF features, I didn't think this would be possible in Rust without adding some kind of VM around reflection code.


Yes, unfortunately that's a tradeoff here. Rust does support splitting debug info into other files, but Deflect doesn't support loading split debuginfo yet.


C# has similar issues where they have to be conservative about what them trim from binaries for AoT in case it is used for reflection, so I imagine you'd run into the same issues for almost any compiled language you want to implement reflection for.


"When you call .reflect on a dyn Reflect value, deflect figures out its concrete type in four steps:"

* invokes local_type_id to get the memory address of your value’s static implementation of local_type_id

* maps that memory address to an offset in your application’s binary

* searches your application’s debug info for the entry describing the function at that offset

* parses that debugging information entry (DIE) to determine the type of local_type_id’s &self parameter.

This is a rather strange thing to bolt onto a language. I could see this as an external tool. The use case seems to be programs which used "async" so much they can't figure out the resulting state machine. External debug tools to view and examine the async state machine might be helpful.

My experience with Rust has been that debugging of safe code is just not a problem. Print statements and logging are enough.


> This is a rather strange thing to bolt onto a language. I could see this as an external tool.

It is an external tool. This is a crate, not a part of the compiler.


Are you saying there aren’t any legitimate uses for runtime reflection? Because I think Java and .Net, even Go have proved that wrong over the years.

This seems like a valuable library. It’s impressive that it can be so powerful in a compiled language. C and C++ are much older but don’t have anything quite like this.


A lot of what you'd use reflection for in GC languages is done with macros/code generation at compile time in Rust. For example, rather than using reflection to map objects to something like JSON to serialize, Rust has a library called serde (https://serde.rs/) that lets you annotate structs and enums and generate conversions at compile time that you can use. I wouldn't go so far as saying that there's no possible legitimate use of reflection, but I do wonder how much could be happening in Java and C# and Go that's so dynamic that you wouldn't be able to reason about it in advance. I think most of what reflection is used for in those languages _could_ be done at compile time, but it would both require a way to express it (via macros, codegen, or something like that) and be worth the extra compile time in order to save runtime. Rust's ethos is to try to optimize as much as possible for runtime efficiency even at the expensive of compile time, and while there can be (and often are!) ways to opt out of this for a given feature, it's almost never the default.


I’ve used Rust extensively the last couple years. I understand that. A lot of what people do with reflection in Go could be done more efficiently with code generation - but more easily with reflection. I’m sure the same is true in Rust, to a lesser extent. There are times when runtime reflection would be really nice to have.


> This is a rather strange thing to bolt onto a language.

It can just be an extremely fun and cute demo, without practical application.


It can also be something that looks cool and doesn't necessarily ever get past "kinda works", but piques the interest of the core dev team and they take steps to make it work even better, resulting in the ultimate "deprecation" of this sort of thing by virtue of it being even better integrated into the core.

I don't have the context to judge the probability of that in this specific case (lots of technical nitty-gritty comes in to this sort of thing), but I've certainly seen similar things happen in other communities.


how about adding this to debuggers for better object-views? (could it be possible to provide near-js/python/java level of obj view?)


Thus is already using DWARF debug infos. Using this for debugging would be a long way around to arrive where you started

You can already script gdb to provide rich views of any data structure.


DWARF is a standard for data to support debuggers, so this crate does effectively the opposite: it uses info normally only available during debugging to provide reflection.


This is a beautiful (hacky) demo of something that I didn't think was possible in Rust (yet). I hope other applications don't accidentally start using it just to discover that it doesn't work in release mode.

Very impressive work!


Oh, I should add a note about that. Fortunately, it's quite easy to tell Rust to generate debuginfo even in release mode.


Can’t we rely more on Rust’s Pattern Matching and it’s strong type system?

Reflection seems more helpful when the programming language is little unsounded.


Absolutely! That's the approach that frunk [0] takes. Frunk (and other reflection libraries like it) are suitable for most use cases, and make better use of Rust's affordances.

My crate is suitable for cases where you cannot know (or control) the set of types you might need to reflect on in advance. It's primary use-cases are related to debugging.

[0]: https://docs.rs/frunk


Is Frunk Rust's Shapeless (from Scala)?


Yep!


Today I learn that Rust does not have reflection.


Reflection is usually not available in AoT compiled languages. The prevalent Rust coding styles rely heavily on monomorphic data types and functions, meaning there's nothing left to reflect at runtime. But if you want to deal with trait objects and need to access the underlying type, you need to use Any::downcast or rely on annotations on every type you want to reflect on. Or now, leverage DWARF info on Linux with deflect.


That's runtime reflection.

Compile time reflection AFAIK is available in D and Zig, and is planned for C++.


"Compile time reflection" is an inconsistent and nonstandard concept; originally it seemed to just mean typeclasses for people who hadn't heard of typeclasses.


for weirdos who only have ad-hoc constraints instead of knowing what typeclasses are, it means that you can first say "I only have ad-hoc constraints" then say "wait wait I need to make decisions based on the specifics of the type" which may be useful for e.g. generating serializers and deserializers at compile time instead of using code generators like protobuf


Yup. I consider runtime reflection an antifeature, which has negative performance effects, is unsafe (see e.g. log4j) and leads to fragile code.

I would however welcome static reflection with open arms. In Rust in particular, I’d prefer it if derive was implemented using static reflection, rather than proc macros.


Derive or equivalent ought to be something you can implement on top of frunk (so you're ultimately still depending on a proc macro, but the whole ecosystem only needs to depend on that one macro, and tools etc. can build in support for it) - that's how it's usually done in Scala.


That's right. Nim does as well. It's amazing. Once you get used to having CTTI and being able to use it, it's hard to program without it. Bonus points if you can do basic dependent types too.

In C++ with SFINAE you can effectively do CTTI-style programming in C++. C++ has long had runtime type reflection as well (RTTI), though it needs to be compiled in. Looks like there's a boost library for CTTI.


The C++ reflection improves a lot in C++20, but it's still very limited compared to that aspect of Nim, or even Zig. The std::meta::info and "splices" based on Haskell for C++26 are incredibly exciting to me. I have many use cases in mind. Splices in combination with std::embed will make C++ basically just a bad Racket (but one with inline assembly!).


What are monomorphic data types? What should be my first read on the subject?


It's a fancy way of saying "every time this type is used, replace all the generic type params with what was used and generate code for it". It's how generics are implemented in Rust. If you have

    struct Foo<T>(T);
And you create Foo(42i32) and Foo(0.0f64), the compiler will create the equivalent to

    struct Fooi32(i32);
    struct Foof64(f64);
In other languages like Java, generics are implemented the way that Rust does "trait objects" (&dyn Trait).

Rust is not the only language that does this, to be clear.

If you're interested in a quick intro on the compiler side of this, you can read https://rustc-dev-guide.rust-lang.org/backend/monomorph.html


Expanding on trait objects: these are implemented as "V-Tables", structs holding pointers to the trait's methods and to the underlying type. This means that if you need to know what the underlying type, you have to do something fancy, usually referred to as "reflection". Also, invocation of generic functions that use V-Tables require "chasing pointers", which makes cache locality worse (because data might not be in the same cache read as the v-table itself), but makes the generated binary smaller (because if you have something like Foo<T> used with 1000 types, with monomorphization you end up with 2000 generated types in the binary, instead of 1001 with trait objects).


To add to this, even the Foo-wrapper is gone, just the i32 remains. Rust values are amorphous data blobs at runtime.


Yes, that's true but that is an implementation detail that only comes into play when dealing with ABI, and then you should be using #[repr(transparent)] to ensure that the compiler won't do something else :)


Sure, it’s good to point out the difference between “the behavior of a typical optimizing compiler” and “things actually guaranteed by the language”. The context of the discussion was the former, I think. I’m not even that certain that monomorphization is actually required in theory.


Yes, monomorphization isn't needed in theory, as long as the user-visible behavior remains the same, and in practice the team is exploring options[1] to identify cases where the currently manual practice of writing

    pub fn foo<T: AsRef<X>>(x: T) {
        inner_foo(x.as_ref());
    }
    fn inner_foo(_: &X) { todo!() }
can be instead done by the compiler automatically (turning monomorphized code back into polymorphic code, hence the polimorphization hame).

[1]: https://rustc-dev-guide.rust-lang.org/backend/monomorph.html...


ABI wise that is not true though. structs have struct ABI, even just a newtype struct around an integer will not use integer ABI unless annotated with #[repr(transparent)].


Pretty sure that some usage patterns of polymorphic types can not be completely monomorphized. Here's example in Golang:

    package main

    import (
        "fmt"
    )

    type wrapper[T any] struct {
        Value T
    }

    func (w wrapper[T]) String() string {
        return fmt.Sprintf("{%v}", w.Value)
    }

    func stringWrapped[T any](n int, v T) string {
        if n == 0 {
            return fmt.Sprintf("%v", v)
        }
        return stringWrapped(n-1, wrapper[T]{Value: v})
    }

    func main() {
        n := 0
        fmt.Scanf("%d", &n)
        result := stringWrapped(n, "test")
        fmt.Println(result)
    }
Go refuses to compile because it can't possibly generate all instances of wrapper[T] that this program may use: wrapper[string], wrapper[wrapper[string]], wrapper[wrapper[wrapper[string]]], etc.


Rust will complain about a recursion limit being reached during instantiation[1]. The solution in Rust is to use &dyn Trait or Box<dyn Trait> instead.[2]

[1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

[2]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

^ This blows the stack because it keeps calling itself with no break condition, but shows how the type system accepted the code.


I think this is called polymorphic recursion in Haskell circles.

In C++ you can monomorphize as long as you can somehow prove the recursion terminates at compile time (for example by threading a static recursion counter).


Nice examples - you can also have languages (like SML) where monomorphization is simply an implementation detail. Some compilers (e.g., MLton) perform monomorphization and others don't.


I only recently realized that certain type system features, like polymorphic recursion, make monomorphization impossible in the general case. In Haskell for example, it’s by necessity only an optimization that’s used where applicable, and not the general strategy.


That depends on what you mean. SML has "polymorphism" boiling down to being able to plug an arbitrary type in some places, which is denoted like 'a. But when people talk about generics, they more often talk about C++ templates, Java generics, Rust traits, etc. whose SML equivalent are signatures, structs and functors. Signatures are a bit like Rust traits, structs are a bit like Rust implementations of traits, whereas functors are like Rust's "templates", i.e. wherever you swap angle brackets to parametrise something with types constrained by traits, or values constrained by types. Except in Rust this parametrisation can be slapped on a bunch of things. It can be on structs, on functions, on traits, on implementations of traits etc. In SML you need to group all the "parametrised" things into a struct (and a corresponding signature), which is going to be returned by a functor.

And now the thing is: with transparent signature ascriptions, functors are monomorphised in SML, instead of everything being hidden behind signatures (as is in the case of Rust with traits when you use dyn), which has semantic consequences. E.g. a struct returned by a functor may contain a type. You can't perform proper type-checking without monomorphising, because you don't know what the exact type is. E.g. in the following program, the final line couldn't be type-checked without monomorphisation:

   signature ITERABLE = sig
       type ElemT
       type SrcT
   
       val new_iter: SrcT -> unit -> ElemT option
   end
   
   signature LIST_ELEM_TYPE = sig
       type T
   end
   
   functor ListIterFun (ListElemType: LIST_ELEM_TYPE): ITERABLE = struct
       type ElemT = ListElemType.T
       type SrcT = ElemT list
   
       fun new_iter l = let val lr = ref l
                        in
                          fn () => case !lr of
                                     nil => NONE
                                   | (x::xs) => (lr := xs; SOME x)
                        end
   
   end
   
   structure IntElemType: LIST_ELEM_TYPE = struct
       type T = int
   end
   
   structure IntListIter = ListIterFun(IntElemType)
   
   val next = IntListIter.new_iter [1, 2, 3, 4, 5]
If I change the signature ascription on ListIterFun to an opaque ascription (:> ITERABLE), the final line won't type-check, because it's not obvious from the signature, that ElemT is int. So transparent signature ascriptions require monomorphisation (Rust traits without dyn), and opaque signature ascriptions free the compiler from having to do monomorphisation (Rust traits with dyn*).

There was a lot of discussion of this issue when Go was settling on a design for its generics, under the phrase "reified generics".


Not exactly the same thing but JITs can turn dynamic objects into structs if the structure is consistent. JS runtimes and Julia do this as far as I know.


Firefox's js runtime also do tricks like generate multi copy of optimized function when the function has multi call site instead make one with lots of if else. So it no longer suffer from the problem that function that frequently get multi different type of parameters from different call site has poor performance.

It's probably exactly how templates work, except the details are invisible to users.

https://hacks.mozilla.org/2020/11/warp-improved-js-performan...


Yes! Java as well. And this is how those languages can show impressive benchmarks for consistent workloads. In theory they can even surpass AoT languages. In practice it depends on the specifics.


Julia doesn't do this. It just has structs in the first place.


I think cpp does this too


It indeed does. The only difference is that Rust has traits (similar to C++'s concepts) which require explicit mention of what interface the type parameters have inside the function, whereas C++'s templates will have a compile error after instantiation if you passed something that didn't meet the expected contract. This is closer to Rust's macros in operation.

Given

    fn foo<T>(a: T, b: T) -> T { a + b }
The compiler will complain that you should have been explicit on how T is going to be used:

    error[E0369]: cannot add `T` to `T`
     --> src/lib.rs:1:32
      |
    1 | fn foo<T>(a: T, b: T) -> T { a + b }
      |                              - ^ - T
      |                              |
      |                              T
      |
    help: consider restricting type parameter `T`
      |
    1 | fn foo<T: Add<Output = T>>(a: T, b: T) -> T { a + b }
      |         +++++++++++++++++
whereas in C++ this would have been accepted until you called foo with two things that couldn't be added together, like a Rust macro[1].

[1]: https://play.rust-lang.org/?version=nightly&mode=debug&editi...


Reflection is typically provided by a runtime, and languages that don't have runtimes usually don't have it. You shouldn't expect a low-level systems language to have reflection. There is no zero-cost way of implementing it.


Except Rust has runtime: [0]. And so, usually, does C (in hosted implementations).

[0] https://doc.rust-lang.org/reference/runtime.html


These are a couple of functions executables can call at run time, but they're more like an extra standard library. It's not a runtime in the same sense as a runtime in dynamic or GC languages that manages all objects and is able to know types of arbitrary objects and inspect/trace them.

Rust has no run-time type information except limited downcasts via `dyn Any` or explicitly derived traits on per-type basis, and these features compile to type-specific monomorphic code rather than calling some run-time reflection.


Pretty sure you don’t need a runtime to track runtime type info. What we think of as a “runtime” in GC languages is usually several distinct things (a scheduler, a GC, and maybe some other stuff in the case of Java/.Net).


This is of course only true for runtime reflection. And which language does not have a runtime?


Rust has very little influence from reflection-heavy languages like Java and C#. On their list of influences (https://doc.rust-lang.org/reference/influences.html), Java is not even mentioned, and C# is only mentioned for its attributes. There is very little overlap between the design philosophies that influenced Rust and Java/C#.

Ruts does not support inheritance either. But I have never missed either feature in a Rust program.


The usual argument is that between having macro and focusing on a strong type system, there are very few legitimate usecase for reflection left in Rust.


My version of Greenspun's Tenth [1] is that any sufficiently complex static language contains an adhoc, informally specified, bug ridden and slow version of a dynamic "any" type.

Thx OP for providing an example.

[1] https://en.wikipedia.org/wiki/Greenspun's_tenth_rule


Rust has a dynamic any type, `std::any::Any`.


The entire purpose of OPs thing is to give you a semblance of workable reflection so you can actually operate on said type. It requires byzantine hacks to read debug info and doesn't work on macOS.

I don't think you understand how people in dynamic languages use any types at all.


It would be really cool if it was possible to natively inspect the state of a Rust generator in a type-safe way


Does this still work if the application is complied in release mode or with optimizations?

Even if not, this is still very useful for debugging


It only works if DWARF is generated. By default, the `release` profile of Cargo sets `debug = false` [0]. But, it's quite easy to override this setting, and have a build that is both optimized and includes debuginfo.

[0]: https://doc.rust-lang.org/cargo/reference/profiles.html#rele...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: