What would it take to add refinement types to Rust?

amenghra · 2024-12-24T21:50:05 1735077005

Units might seem simple but they have a ton of edge cases. Do you want to be able to add inches and feet? Be careful about potential precision/rounding issues. What is the unit for a temperature delta? You can’t simply keep the original unit (eg C or F) because conversion from F to C is a different rule than ΔF to ΔC. Etc.

Units do prevent bugs in programs, so they have an important role to play. But they also need to be designed very carefully.

Java adopted units via JSR 385 (https://belief-driven-design.com/java-measurement-jsr-385-21...)

dwattttt · 2024-12-24T23:00:58 1735081258

> Do you want to be able to add inches and feet?

This probably doesn't have to be too complicated; the usual answer for Rust is "no". Rust doesn't even let you "just" add two unsigned integers of different sizes. Following that design, I would imagine units would require an explicit "turn feet into inches" or the other way around.

valenterry · 2024-12-25T01:07:05 1735088825

That's where Scala shines. I wrote about this here a bit: https://valentin.willscher.de/posts/contextual-syntax/

Rust is heavily inspired by Scala, but I guess achieving something like the examples in my post is difficult. I really hope Rust finds one way or another to make it work. Because simply forbidding everything all the time isn't even the safest way - it drives many people to just avoid it altogether and use unsafe code.

vlovich123 · 2024-12-24T23:35:14 1735083314

Might be annoying if you’re working with floating point but given that each conversion introduces error, it’s probably good to be explicit and recommend internally to be consistent

stouset · 2024-12-25T00:07:07 1735085227

Thanks to generics you could theoretically work with whatever underlying type makes the most sense for your use-case.

    let v1 : Inches<Ratio> = Ratio::new(5, 8).into();
    let v2 : Inches<Ratio> = Ratio::new(3, 8).into();

    let v3 : Feet<_> = (v1 + v2).into();

    assert_equal!(v3, Ratio::new(1, 12));

throwawaymaths · 2024-12-25T04:33:24 1735101204

no that's not the point. if you let rust automagically decide when and where to apply conversions you could easily wind up in a situation where you have more operations than you need, which increases numerical error, and also be a bitch to uncover or refactor to minimize conversions.

stouset · 2024-12-25T05:27:49 1735104469

Rust doesn’t automatically apply conversions, full stop.

And with a thoughtful approach to the API, you could avoid numerical error entirely by using integral types.

weinzierl · 2024-12-24T22:11:53 1735078313

I find the idea of being able to specify numerical types with arbitrary ranges very appealing.

In Pascal these are called range types, e.g.

month: 1..12

would define an integer where the type system would ensure that it is always between 1 and 12.

Apart from Ada this seems to be an alien concept to all other languages. The concept of a "range type" also seems to have other meanings.

What is the Pascal "range type" properly called in type theory and what is its relationship to refinement types?

3eb7988a1663 · 2024-12-24T23:17:33 1735082253

Nim calls these subranges https://nim-lang.org/docs/manual.html#types-subrange-types

  Subrange = range[0..5]
  PositiveFloat = range[0.0..Inf]

frogulis · 2024-12-24T23:12:25 1735081945

Anybody correct me if I'm wrong -- in Pascal these subranges can specify the size of the variable in memory and check compile-type assignment to a literal value, but don't (can't) do much else.

There's the concept of dependent types in e.g. Idris which lets us correctly do this sort of range check throughout the program (not just on literal assignment) but it comes with strict requirements on what the compiler can allow, such as no unbounded recursion, because checking dependent types is roughly equivalent to running the program.

bspammer · 2024-12-24T22:51:28 1735080688

I frequently wish I had a natural number type. So many programs would benefit from the type system guaranteeing that numbers are never negative.

josephg · 2024-12-25T00:02:31 1735084951

Rust has this one - though the ergonomics are a little awkward. Its called NonZero:

https://doc.rust-lang.org/std/num/type.NonZeroU32.html

archargelod · 2024-12-25T03:24:24 1735097064

Nim has Natural and Positive (signed int) types that checked at compile and runtime. You can also define arbitrary range types for any ordinal type, for example enums:

  type
    Month = enum Jan, Feb, Mar, Apr, May, ...
    SpringMonth = range[Mar..May]
  
  var m: SpringMonth = Mar

It will raise under/overflow exception if value falls out of range.

frogulis · 2024-12-24T23:13:57 1735082037

Easy enough to write in some languages... as long as you only need to support addition and multiplication, or a very limited set of numbers.

munchler · 2024-12-24T22:22:35 1735078955

> an integer where the type system would ensure that it is always between 1 and 12

How does Pascal handle overflow/underflow? E.g. Month 10 + Month 11 = Month 21?

thesuperbigfrog · 2024-12-24T22:05:03 1735077903

I would love to see refinement types added to Rust.

They are very handy for preventing errors and expressing intent in Ada:

https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...

nmilo · 2024-12-24T18:14:01 1735064041

It should be easier now with const generics, I know the popular C++ library looks like Value<Unit, T, L, M, ...> where the letters are numbers representing the dimension of time, length, mass, etc. So m/s^2 would always be Value<f64, -2, 1, 0, ...>. By keeping it normalized you don't need Equivalent or whatever

JoshTriplett · 2024-12-24T22:35:43 1735079743

As a related concept, we've talked about adding "pattern types" to Rust: `Result<T> is Err(_)` (the type of a Result that you've just confirmed is an `Err`, so you don't have to handle the `Ok` case), or `u32 is 1..` (equivalent to `NonZero<u32>` but more flexible).

bonzini · 2024-12-24T23:25:33 1735082733

Is the former different from a hypothetical Result::Err type (I don't know if you'd spell that Result::Err<T> or Result::<T>::Err)?

n_plus_1_acc · 2024-12-24T23:32:44 1735083164

Different solutions to the same problem. Pattern types are more general than enum variant types.

JoshTriplett · 2024-12-24T23:30:34 1735083034

Same concept, but generalized to any type, not just enums.

loeg · 2024-12-24T17:58:47 1735063127

The author didn't mention this, but maybe there could be a type (unit) canonicalization step that always produces the same reduced/simplified type for any equivalent set of Mul/Divs? So that you don't need the later equivalency check.

E.g. with the article's example, where 'a' is Meters and 'b' is Seconds, 'a/(b*b)' and 'a/b/b' both have type 'Div<Meters, Mul<Seconds, Seconds>>' instead of one having the type 'Div<Div<Meters, Seconds>, Seconds>'.

For only Muls and Divs you can basically just have a histogram of units to powers (e.g., m/s^2 => m: 1, s: -2) which uniquely represent equivalent types.

hinkley · 2024-12-24T19:50:08 1735069808

James Gosling looked at this and wrote about it at the time. It can be a bit mind bending. Especially with units named after people.

If you’re looking at statistics for current or power the type system might try to convert it to joules even though you wanted to look at average wattage.

epcoa · 2024-12-24T21:44:59 1735076699

I do not understand, average wattage/power would be integration of wattage (joules) divided by time (so back to watts), it’s still watts? Under what implementation would you end up with joules that isn’t just oddly broken in general?

I guess I could see a naive implementation (confusing integration/sampling and discrete summation) going the other way, erroneously ending up with a nonsensical W/s.

Also don’t understand what it has to do with eponyms, which are just substitutes for base units, either your DA works or not, no? Average wattage is kg⋅m²⋅s⁻³ not kg⋅m²⋅s⁻² (joules) or kg⋅m²⋅s⁻⁴

loeg · 2024-12-24T20:43:09 1735072989

Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?

weinzierl · 2024-12-24T22:03:47 1735077827

This is a matter of perspective.

For example the unit Sievert is an official SI unit despite being just J/kg. This is because confusing equivalent dose and absorbed dose, which also has the unit of J/kg, could be very dangerous.

Note, that this is different from J sometimes being written as Ws. While there are informal conventions, when we use J and when Ws, using the unconventional one would not be technically wrong because 1 J is simply 1 Ws, whereas 1 Sv is not necessarily 1 J/kg when the later is an absorbed dose.

I think one could reasonably disagree with these decisions but that is how the SI people see it.

bonzini · 2024-12-24T23:30:23 1735083023

Sievert is more of a "newtype" in Rust/Haskell terminology than a separate unit. Likewise for radians and moles.

dwattttt · 2024-12-24T23:10:58 1735081858

> This is a matter of perspective.

I'd make a stronger statement here; this is a specific example of when having units is most powerful. When even though two things are expressed in some common form, they nevertheless represent something different.

> Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?

This could be said of two u32s as well.

pests · 2024-12-25T06:12:25 1735107145

> When even though two things are expressed in some common form, they nevertheless represent something different.

This is where people go wrong trying to DRY and other refactors. Slightly forced example but

  function averagePerClassroom(total) { return total / 30; } // 30 kids per class
  function averagePerMonth(total) { return total / 30; }  // assume 30

"Oh, the function body is the same therefore lets refactor this into an "averagePer" function" expect its two completely different concepts and once the code calculates the actual days per month or once classes are no longer 30 people suddenly things need to be un-refactored, or what I see more often, is just branching off inside the new single function based on an argument flag. Horrible.

amalcon · 2024-12-25T03:12:10 1735096330

A joule, notably, is a newton-meter. Torque is also measured in newton-meters, but the meters are perpendicular to the newtons.

Adding energy to torque is rarely going to be intentional.

jiggawatts · 2024-12-24T22:01:45 1735077705

A simple way to do this is to store a vector of the powers.

For example momentum is kilogram * meter / second, which is MASS^1 * LENGTH^1 * TIME^-1

As a vector, this can be represented as (1,1,-1) where the positions are M, L, T respectively.

In that format velocity is represented as (0,1,-1), acceleration is (0,1,-2), etc...

This is automatically canonicalised and much easier to manipulate than a tree of operations.

Of course, this assumes uniform units such as CGS or MKS in something sane like the metric system. Conversion back and forth is generally straightforward, as long as the types encode the system used. E.g.: CGS<1,1,-1> and MKS<1,1,-1> both represent momentum, but at different scales.

Imperial also works, and other base units can be added to extend the system. This can include things like current, temperature, moles, etc...

See: https://en.wikipedia.org/wiki/Dimensional_analysis

loeg · 2024-12-24T23:07:20 1735081640

> This is automatically canonicalised and much easier to manipulate than a tree of operations.

As long as the vector is sorted by unit, yeah. With that caveat, it's the same idea.

jiggawatts · 2024-12-24T23:23:53 1735082633

It’s a positional system with the same basic measurement units always at the same index locations.

jph · 2024-12-24T17:59:37 1735063177

Type refinements are a great concept and I'd love to see them in Rust. And double-refinement types are great for helping with conversions, such as with Rust From/Into, and potentially a dynamic converter function.

Examples of double-refinements that I'd like:

- Common units like Length:Meter and Length:Foot.

- Color bits like Color:RGB24 and Color:CYMK24.

- Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.

- Human languages like String:English vs String:Cymraeg with a converter function that knows translations.

skissane · 2024-12-24T21:40:25 1735076425

> - Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.

Exchange rates vary over time, so you'd arguably need a type which includes a timestamp (e.g. "USD 1000 at 2024-12-25").

And that's ignoring all these other complexities such as the spread, different currency converters offering differing rates, unofficial and multiple official rates in countries with currency controls (e.g. Argentina), hedging, etc

jaza · 2024-12-24T22:18:30 1735078710

Plus, perhaps the biggest complexity of all is that the currency rates are often not free, particularly if you want "live pricing" (updated every few seconds), and particularly if it's for commercial use. And, the fact that they may or may not be free, also illustrates well the fact that there are no definitive rates, there are only "rates according to X".

spockz · 2024-12-24T21:33:35 1735076015

The currency converter is the odd one out because it context dependent. (Time, contract, etc)

0xFF0123 · 2024-12-24T22:18:55 1735078735

Similar to time with a timezone?

ordu · 2024-12-24T22:48:08 1735080488

I'd really like to have them. If I'm writing an algorithm that maps interval (0, 1) of probabilities onto different intervals of u32 integers, and probabilites are also represented as integers (an integer part of p*2^n), I will want to have different units for those integer probabilites and their mapped values. I don't want to mix them accidentally and to add probability and a mapped probability accidentally. Probably I dont want to add cumulative probability and a probability, though I'm not really sure about that, because sometimes I want to add (for example when calculating cumulative probabilities) and I want to compare them. Though maybe it will work with some exotic rules, like CumProb-CumProb -> Prob, while CumProb+CumProb is forbidden?

And there are more examples of that, I can have probabilities from different distributions, I don't want to add them accidentally, though multiplications of them is all over the place, so let it be. Of course I have no hope that any language could deal with the explosion of types that are needed to represent this, but if compiler just gave me f64 as a result of multiplication of probabilites, I'd be happy.

It can be done in Rust on case by case bases, but it is a lot of boilerplate. The issue is the typedef IntProb = u32, treat IntProb like an alias to u32 and rustc converts these types into each other silently like C compiler converts int to char. One can do struct IntProb(u32), but then it would be needed to implement traits like Add, Mul, Cmp and so on, which is possible but it is too much work. If it was possible to force rustc to treat typedef MyType = {NumericType} as a distinct numeric type that requires explicit conversion into NumericType, while retaining all the traits of NumericType, just substituting in them NumericType with MyType, it would be great.

I think, that all the complexities described in the article stem from the attempt to create an universal instrument that can do everything and to keep bees. The real difficulty is to pick a small subset of wants, that will cover 80% of needs, while being really simple. I see no issues with occasional .into::<Length<f64>>(), like I see no issues with (my_struc.index as usize), if I keep index as u8 to spare memory, but use it as usize of course. I see no need in different units for the same quantity, because in any case I'd want to convert everything into the same uniform units before I start adding and multiplying. But I'd like to have restrictions on available operations between different types, and I'd like to have a possibility to add optional dynamic checks for type value, that I could turn on for a debug build and turn off for a release one. For example, I'd like to check that any probability p fits into [0, 1].

Waterluvian · 2024-12-24T20:37:25 1735072645

If I understand this properly, I’ve needed this in typescript a lot.

I can make `type uuid = string` for self documentation, but a lot of plugins will just label it “string” and developers can (and have) mistakenly put some other identifier, like the robot’s hostname.

Of course we validate at the API but it’d be more skookum if we could prevent accidental wiring together of front-end components that make this error.

String literals help a ton. Gosh they’re wonderful to care about the shape of a string in the type system. But sometimes I really want to say “strict uuid” as in “I don’t care if it quacks like a duck, it’s not called duck.”

feznyng · 2024-12-25T02:56:25 1735095385

As spockz pointed out, you’re looking for the new type pattern. Rust supports this explicitly but you’ve got to do some workarounds to get it in typescript.

https://kubyshkin.name/posts/newtype-in-typescript/

Unfortunately doesn’t help much when you’re dealing with functions from packages someone else has typed.

mcfedr · 2024-12-24T20:50:56 1735073456

Check out what are often called tagged types.

You basically define a `type A = string &{a: SomeSymbol}`

And then have a type assertion function that just returns true, and you have control over the places A can come from

demurgos · 2024-12-24T21:14:52 1735074892

Yes, it's the standard way if you want implicit compat with the base type, so you can pass a value of type `A` to a function expecting a `string`. An other name for this pattern is "branded types".

steve_adams_86 · 2024-12-24T21:43:41 1735076621

I really like the Effect schema library for tagged types. I’m not sure how well it works for primitives, though.

Waterluvian · 2024-12-24T20:56:46 1735073806

Ah yeah I remember poking at that. I should review it again. I think at the time it felt like a bit of a hack, but maybe some features of the past years of TS updates have helped.

I assume the idea is to lie to the type system about the existence of the symbol, and at runtime it is just a string.

TOGoS · 2024-12-25T04:01:49 1735099309

To me it feels like less of a hack if I can make the type declaration not a lie. e.g.

   type FooID = string & { typeName? : "FooID" }

Read as 'of course this thing doesn't have a typeName[1] property, since it's a string, but if it did have the property, the value would be "FooID"'. You can then cast between FooID and string, but not between FooID and some other type that declares a typeName property.

[1] I actually tend to use 'classRef' with an RDFish long name for the type, but that makes examples longer and isn't the point.

spockz · 2024-12-24T21:34:22 1735076062

Haskell has had this with `newtype` for ages. Type safety without runtime overhead.

marxisttemp · 2024-12-24T20:17:57 1735071477

Could someone clarify to me the different between dependent types and refinement types?

dietr1ch · 2024-12-24T21:17:56 1735075076

Refinements embed a predicate that elements of the new type must pass.

    def Nat = Int: (|x| -> x >= 0)

Dependent types allow types to be computed from functions (and depend on arguments, otherwise it seems they become just weird constants),

    def Five(as_type: String) -> NumericType(as_type):
      match as_type:
        "string" => "five"
        "int" => 5
        "float" => 5.0f
        "double" => 5.0d
        _ => panic()  // Unnecessary if you refine `as_type` from a String to an enum or a fixed set of strings.

Dependent types seem weird, but they help making types first-class (https://www.youtube.com/watch?v=mOtKD7ml0NU&t=325s) and gaining types like `Array<T, N>` that allow ensuring things are the right length, and define append/extend properly.

burakemir · 2024-12-24T21:48:32 1735076912

Dependent types typically refers to type systems where a type can depend on a term. The canonical example is "Vector n" where n is some expression that evaluates to a natural number.

Refinement types typically(1) refers to a type systems that lets you create a subtype of a type through refining (qualifying) with a predicate or constraint on the shape. Examples {x \in int | is_even x } or { x \in List | len(x) = 1 }

Refinement types can be very powerful but that may well make type checking undecidable (think of a type of Turing machines, and the refinement that keeps only the ones that halt). By being careful about the logic used in the refinements, one may retain decidability.

(1) The article seems to have a different idea of what a refinement type is: quote "a type system that does its work after another type system has already done its work".

I am not going to play orthodox guardian of type theory terminology here, yet to me personally, it does seem unfortunate to use that term. The author seems to really want a form of type-level computation, which could be interesting if it could be rigorously specified and it's relation to the existing type level reduction clarified.

demurgos · 2024-12-24T21:23:50 1735075430

Dependent types can make type constructors generic over _values_ (instead of only types). Refinement types keep the separation between types and values, but they let you wrap an existing type to enforce extra constraints or semantics. Another example of refinement are pattern types where you can attach a match pattern that is enforced to pass.

akdor1154 · 2024-12-24T22:25:54 1735079154

You could just read the first paragraph of TFA?

pwdisswordfishz · 2024-12-24T16:45:18 1735058718

[flagged]

sagacity · 2024-12-24T17:22:38 1735060958

It's unfortunate that this is the only takeaway of the article for you.

voidfunc · 2024-12-24T17:52:04 1735062724

[flagged]

skavi · 2024-12-24T18:04:26 1735063466

Well sure they matter, but “yaoioum” wasn’t really the main point of the post. It was more of a threat; “add refinement types to Rust, or yoric will continue building poorly named static analyzers.”