Units might seem simple but they have a ton of edge cases. Do you want to be able to add inches and feet? Be careful about potential precision/rounding issues. What is the unit for a temperature delta? You can’t simply keep the original unit (eg C or F) because conversion from F to C is a different rule than ΔF to ΔC. Etc.
Units do prevent bugs in programs, so they have an important role to play. But they also need to be designed very carefully.
This probably doesn't have to be too complicated; the usual answer for Rust is "no". Rust doesn't even let you "just" add two unsigned integers of different sizes. Following that design, I would imagine units would require an explicit "turn feet into inches" or the other way around.
Rust is heavily inspired by Scala, but I guess achieving something like the examples in my post is difficult. I really hope Rust finds one way or another to make it work. Because simply forbidding everything all the time isn't even the safest way - it drives many people to just avoid it altogether and use unsafe code.
Might be annoying if you’re working with floating point but given that each conversion introduces error, it’s probably good to be explicit and recommend internally to be consistent
no that's not the point. if you let rust automagically decide when and where to apply conversions you could easily wind up in a situation where you have more operations than you need, which increases numerical error, and also be a bitch to uncover or refactor to minimize conversions.
Anybody correct me if I'm wrong -- in Pascal these subranges can specify the size of the variable in memory and check compile-type assignment to a literal value, but don't (can't) do much else.
There's the concept of dependent types in e.g. Idris which lets us correctly do this sort of range check throughout the program (not just on literal assignment) but it comes with strict requirements on what the compiler can allow, such as no unbounded recursion, because checking dependent types is roughly equivalent to running the program.
Nim has Natural and Positive (signed int) types that checked at compile and runtime. You can also define arbitrary range types for any ordinal type, for example enums:
type
Month = enum Jan, Feb, Mar, Apr, May, ...
SpringMonth = range[Mar..May]
var m: SpringMonth = Mar
It will raise under/overflow exception if value falls out of range.
It should be easier now with const generics, I know the popular C++ library looks like Value<Unit, T, L, M, ...> where the letters are numbers representing the dimension of time, length, mass, etc. So m/s^2 would always be Value<f64, -2, 1, 0, ...>. By keeping it normalized you don't need Equivalent or whatever
As a related concept, we've talked about adding "pattern types" to Rust: `Result<T> is Err(_)` (the type of a Result that you've just confirmed is an `Err`, so you don't have to handle the `Ok` case), or `u32 is 1..` (equivalent to `NonZero<u32>` but more flexible).
The author didn't mention this, but maybe there could be a type (unit) canonicalization step that always produces the same reduced/simplified type for any equivalent set of Mul/Divs? So that you don't need the later equivalency check.
E.g. with the article's example, where 'a' is Meters and 'b' is Seconds, 'a/(b*b)' and 'a/b/b' both have type 'Div<Meters, Mul<Seconds, Seconds>>' instead of one having the type 'Div<Div<Meters, Seconds>, Seconds>'.
For only Muls and Divs you can basically just have a histogram of units to powers (e.g., m/s^2 => m: 1, s: -2) which uniquely represent equivalent types.
James Gosling looked at this and wrote about it at the time. It can be a bit mind bending. Especially with units named after people.
If you’re looking at statistics for current or power the type system might try to convert it to joules even though you wanted to look at average wattage.
I do not understand, average wattage/power would be integration of wattage (joules) divided by time (so back to watts), it’s still watts? Under what implementation would you end up with joules that isn’t just oddly broken in general?
I guess I could see a naive implementation (confusing integration/sampling and discrete summation) going the other way, erroneously ending up with a nonsensical W/s.
Also don’t understand what it has to do with eponyms, which are just substitutes for base units, either your DA works or not, no?
Average wattage is kg⋅m²⋅s⁻³ not kg⋅m²⋅s⁻² (joules) or kg⋅m²⋅s⁻⁴
For example the unit Sievert is an official SI unit despite being just J/kg. This is because confusing equivalent dose and absorbed dose, which also has the unit of J/kg, could be very dangerous.
Note, that this is different from J sometimes being written as Ws. While there are informal conventions, when we use J and when Ws, using the unconventional one would not be technically wrong because 1 J is simply 1 Ws, whereas 1 Sv is not necessarily 1 J/kg when the later is an absorbed dose.
I think one could reasonably disagree with these decisions but that is how the SI people see it.
I'd make a stronger statement here; this is a specific example of when having units is most powerful. When even though two things are expressed in some common form, they nevertheless represent something different.
> Yeah, but equivalent units with different names is sort of only a display/formatting issue, right?
> When even though two things are expressed in some common form, they nevertheless represent something different.
This is where people go wrong trying to DRY and other refactors. Slightly forced example but
function averagePerClassroom(total) { return total / 30; } // 30 kids per class
function averagePerMonth(total) { return total / 30; } // assume 30
"Oh, the function body is the same therefore lets refactor this into an "averagePer" function" expect its two completely different concepts and once the code calculates the actual days per month or once classes are no longer 30 people suddenly things need to be un-refactored, or what I see more often, is just branching off inside the new single function based on an argument flag. Horrible.
A simple way to do this is to store a vector of the powers.
For example momentum is kilogram * meter / second, which is MASS^1 * LENGTH^1 * TIME^-1
As a vector, this can be represented as (1,1,-1) where the positions are M, L, T respectively.
In that format velocity is represented as (0,1,-1), acceleration is (0,1,-2), etc...
This is automatically canonicalised and much easier to manipulate than a tree of operations.
Of course, this assumes uniform units such as CGS or MKS in something sane like the metric system. Conversion back and forth is generally straightforward, as long as the types encode the system used. E.g.: CGS<1,1,-1> and MKS<1,1,-1> both represent momentum, but at different scales.
Imperial also works, and other base units can be added to extend the system. This can include things like current, temperature, moles, etc...
Type refinements are a great concept and I'd love to see them in Rust. And double-refinement types are great for helping with conversions, such as with Rust From/Into, and potentially a dynamic converter function.
Examples of double-refinements that I'd like:
- Common units like Length:Meter and Length:Foot.
- Color bits like Color:RGB24 and Color:CYMK24.
- Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.
- Human languages like String:English vs String:Cymraeg with a converter function that knows translations.
> - Worldwide currency like Money:USD and Money:GBP with a converter function that knows exchange rates.
Exchange rates vary over time, so you'd arguably need a type which includes a timestamp (e.g. "USD 1000 at 2024-12-25").
And that's ignoring all these other complexities such as the spread, different currency converters offering differing rates, unofficial and multiple official rates in countries with currency controls (e.g. Argentina), hedging, etc
Plus, perhaps the biggest complexity of all is that the currency rates are often not free, particularly if you want "live pricing" (updated every few seconds), and particularly if it's for commercial use. And, the fact that they may or may not be free, also illustrates well the fact that there are no definitive rates, there are only "rates according to X".
I'd really like to have them. If I'm writing an algorithm that maps interval (0, 1) of probabilities onto different intervals of u32 integers, and probabilites are also represented as integers (an integer part of p*2^n), I will want to have different units for those integer probabilites and their mapped values. I don't want to mix them accidentally and to add probability and a mapped probability accidentally. Probably I dont want to add cumulative probability and a probability, though I'm not really sure about that, because sometimes I want to add (for example when calculating cumulative probabilities) and I want to compare them. Though maybe it will work with some exotic rules, like CumProb-CumProb -> Prob, while CumProb+CumProb is forbidden?
And there are more examples of that, I can have probabilities from different distributions, I don't want to add them accidentally, though multiplications of them is all over the place, so let it be. Of course I have no hope that any language could deal with the explosion of types that are needed to represent this, but if compiler just gave me f64 as a result of multiplication of probabilites, I'd be happy.
It can be done in Rust on case by case bases, but it is a lot of boilerplate. The issue is the typedef IntProb = u32, treat IntProb like an alias to u32 and rustc converts these types into each other silently like C compiler converts int to char. One can do struct IntProb(u32), but then it would be needed to implement traits like Add, Mul, Cmp and so on, which is possible but it is too much work. If it was possible to force rustc to treat typedef MyType = {NumericType} as a distinct numeric type that requires explicit conversion into NumericType, while retaining all the traits of NumericType, just substituting in them NumericType with MyType, it would be great.
I think, that all the complexities described in the article stem from the attempt to create an universal instrument that can do everything and to keep bees. The real difficulty is to pick a small subset of wants, that will cover 80% of needs, while being really simple. I see no issues with occasional .into::<Length<f64>>(), like I see no issues with (my_struc.index as usize), if I keep index as u8 to spare memory, but use it as usize of course. I see no need in different units for the same quantity, because in any case I'd want to convert everything into the same uniform units before I start adding and multiplying. But I'd like to have restrictions on available operations between different types, and I'd like to have a possibility to add optional dynamic checks for type value, that I could turn on for a debug build and turn off for a release one. For example, I'd like to check that any probability p fits into [0, 1].
If I understand this properly, I’ve needed this in typescript a lot.
I can make `type uuid = string` for self documentation, but a lot of plugins will just label it “string” and developers can (and have) mistakenly put some other identifier, like the robot’s hostname.
Of course we validate at the API but it’d be more skookum if we could prevent accidental wiring together of front-end components that make this error.
String literals help a ton. Gosh they’re wonderful to care about the shape of a string in the type system. But sometimes I really want to say “strict uuid” as in “I don’t care if it quacks like a duck, it’s not called duck.”
As spockz pointed out, you’re looking for the new type pattern. Rust supports this explicitly but you’ve got to do some workarounds to get it in typescript.
Yes, it's the standard way if you want implicit compat with the base type, so you can pass a value of type `A` to a function expecting a `string`. An other name for this pattern is "branded types".
Ah yeah I remember poking at that. I should review it again. I think at the time it felt like a bit of a hack, but maybe some features of the past years of TS updates have helped.
I assume the idea is to lie to the type system about the existence of the symbol, and at runtime it is just a string.
To me it feels like less of a hack if I can make the type declaration not a lie. e.g.
type FooID = string & { typeName? : "FooID" }
Read as 'of course this thing doesn't have a typeName[1] property, since it's a string, but if it did have the property, the value would be "FooID"'. You can then cast between FooID and string, but not between FooID and some other type that declares a typeName property.
[1] I actually tend to use 'classRef' with an RDFish long name for the type, but that makes examples longer and isn't the point.
Refinements embed a predicate that elements of the new type must pass.
def Nat = Int: (|x| -> x >= 0)
Dependent types allow types to be computed from functions (and depend on arguments, otherwise it seems they become just weird constants),
def Five(as_type: String) -> NumericType(as_type):
match as_type:
"string" => "five"
"int" => 5
"float" => 5.0f
"double" => 5.0d
_ => panic() // Unnecessary if you refine `as_type` from a String to an enum or a fixed set of strings.
Dependent types seem weird, but they help making types first-class (https://www.youtube.com/watch?v=mOtKD7ml0NU&t=325s) and gaining types like `Array<T, N>` that allow ensuring things are the right length, and define append/extend properly.
Dependent types typically refers to type systems where a type can depend on a term. The canonical example is "Vector n" where n is some expression that evaluates to a natural number.
Refinement types typically(1) refers to a type systems that lets you create a subtype of a type through refining (qualifying) with a predicate or constraint on the shape. Examples {x \in int | is_even x } or { x \in List | len(x) = 1 }
Refinement types can be very powerful but that may well make type checking undecidable (think of a type of Turing machines, and the refinement that keeps only the ones that halt). By being careful about the logic used in the refinements, one may retain decidability.
(1) The article seems to have a different idea of what a refinement type is: quote "a type system that does its work after another type system has already done its work".
I am not going to play orthodox guardian of type theory terminology here, yet to me personally, it does seem unfortunate to use that term. The author seems to really want a form of type-level computation, which could be interesting if it could be rigorously specified and it's relation to the existing type level reduction clarified.
Dependent types can make type constructors generic over _values_ (instead of only types). Refinement types keep the separation between types and values, but they let you wrap an existing type to enforce extra constraints or semantics. Another example of refinement are pattern types where you can attach a match pattern that is enforced to pass.
Well sure they matter, but “yaoioum” wasn’t really the main point of the post. It was more of a threat; “add refinement types to Rust, or yoric will continue building poorly named static analyzers.”
Units do prevent bugs in programs, so they have an important role to play. But they also need to be designed very carefully.
Java adopted units via JSR 385 (https://belief-driven-design.com/java-measurement-jsr-385-21...)
reply