> Frink has special syntax for date values. You can write # 2001-08-12 # to mean the date 2001-08-12
Frink? Frink?
This Visual Basic erasure can not stand. The #-delimited date literal syntax has been found all over the VB family: Visual Basic, VBA, VBScript, and it persists to this day in Visual Basic .NET.
Of course, being a VB syntax, it's completely cursed. You can put
# 01/05/2023 #
in a VB file, and what date it represents will depend on what locale it's.. compiled? executed? evaluated? in? Maybe? Depending on which VB dialect you're in? Good luck. Some of those languages would also accept
# 01/05/23 #
And they might even agree about what century it's in.
A VB date literal is and always has been locale-independent. Originally, they did make the mistake of making it always US-style, M/D/Y. VB.NET added more sane formats such as YYYY-MM-DD.
Now if you tried to convert a Date value to String at runtime, yeah, that would use the current locale. That was a constant source of bugs in VB6 apps running on non-US locales (including, famously, at least one Microsoft installer).
Are you sure? Maybe for Visual Basic proper, but I thought the VBA date literal basically followed Excel date parsing rules, so was localized depending on your office inatall.
Yeah, but this is Windows we're talking about. Even the SCHTASKS program (kind of like /usr/bin/at) takes a date which is locale sensitive, making it absolutely useless for scripting. Check out this answer to a question about how to use it: https://stackoverflow.com/a/18730884
I think the argument here is that VB is cursed because Windows is cursed; it inherited the bias toward cursedness. Of course other languages can choose to do better.
Or you could just use the standard '2001-08-12' with no special syntax needed and avoid the problem altogether. ISO-8601 or it's invalid. There's no reason to allow other formats or need other syntaxes when we have a 35 year old standard to use.
>> Frink has special syntax for date values. You can write # 2001-08-12 # to mean the date 2001-08-12
> But is that December 8th or August 12?
I didn't even think YYYY-DD-MM is a possibility. Maybe I can grant 08/12/2022 is ambiguous and so is 08-12-2022 but can we please agree YYYY-MM-DD can't have variations?
Anyone who packs multiple values into a single string hates the world and possibly themselves.
Nowhere is it written that we must use the same field separator between the ambiguous fields. We could fix this problem by using the separators as type signifiers.
We could also stop using day <= 12 in our examples, which would also help a ton.
My favorite is uniform function call syntax. In several languages (Nim, Koka, D, …), you can always write bar.foo(baz) instead of foo(bar, baz) and vice-versa.
Another one from Nim is the implicit result variable. Instead of having to do this:
func sum(nums: seq[int]): int =
var result = 0
for num in nums:
result += num
return result
you just do this:
func sum(nums: seq[int]): int =
for num in nums:
result += num
It saves so much time and I'm disappointed that more languages don't have it.
> My favorite is uniform function call syntax. In several languages (Nim, Koka, D, …), you can always write bar.foo(baz) instead of foo(bar, baz) and vice-versa.
To me, these are "Tell bar's foo to do something with baz." and "Tell foo to do something with bar and baz.". So being 'able' to flipflop the syntax is at least temporarily semantic'ly confusing.
Instead, the code should describe the interaction between the two units of state:
onCollide(Player, Pickup)
If we structure our code like in the last example, it makes sense to weaken the `a.b` vs `b(a)` distinction, and instead use the dot as a kind of pipe-operator.
It shouldn't be that confusing. Ideally, a method should be defined on an object only if it needs to access the encapsulated state of that object. Otherwise, it should probably be a free function.
So, in your case, depending on other modeling decisions, I could argue either for
onCollide(Player, Pickup) - if Player and Pickup are both plain data, and don't need to guarantee any invariants
Player.collect(Pickup) - if Player actually has to ensure some invariants such as health<100)
I don't see any good arguments for pickup.boost(Player), in typical games. Of course, if both the Pickup and the Player have some invariants that need to be maintained on a collision, then arguably the design has to be changed at a deeper level.
But this isn't a discussion about pickups specifically. Attacking an example is pointless if the thing you take issue with is incidental and not fundamental to the argument. I bet you could think of a case where you pick something else and OP's example meets your standards.
My point was that there is a meaningful, and I believe relatively simple, distinction to be made between free functions and methods bound to an object - a distinction which UFCS doesn't really help with. For any given example, I believe there is a reason to prefer one over the other, and I showed what reasoning I would use for the particular example raised by OP.
Although I rarely see Objects used this way. Often, methods are used to implement all related functionality. Unity even strongly encourages this. (...at the moment. They are working on Entity Component Systems which will work more similar to my third example)
I concede that languages shouldn't use the dot as a syntactic tool, be it through Extensions[1] or UFCS, but rather offer a pipe-operator. If they don't, I'd still prefer UFCS rather than no way of chaining at all.
[1] Extensions for interface/protocol conformance are fine of course.
That arguably depends on your POV. Thinking like python where a method always declares 'self' as the first argument, then a function is just 1 thing (there's no such thing as a method). Then dot syntax is just syntactic sugar for passing the first argument, and there's nothing special about functions. You can manually pass the first argument.
In other words, to me it's simpler and therefore less confusing.
The use case for this sort of thing I like best is extending objects without touching the object.
E.g. in D to convert between different types you can use std.conv:
import std.conv;
"123".to!int;
123.to!string;
Ints don't need to understand string building and strings don't need to understand int parsing. The conversion code just needs to declare a couple functions taking the right arguments and it just works. To me the above is much more readable than
to!int("123")
to!string(123)
in any case. It's also quite nice when dealing with C APIs since it allows you to pretend they are OOP in quite a lot of cases. e.g. with SDL:
SDL_CreateRenderer(window, -1, 0);
turns into
window.SDL_CreateRenderer(-1, 0);
and say I'd like to have a function to initialize all the renderer stuff in one go? I can simply declare
I find the distinction between “foo” and “bar's foo” unnecessarily confusing. For example in C++, why is getting the last element of a “vector” something that belongs to it, but reversing a “vector” is something external?
Doesn't the reverse pollute the function namespace? If every obj.fun() can be written as fun(obj), doesn't that cause ambiguity with a previously imported global function fun()?
You can also `use Object::*` to be able to do this without prefixing. Also, you can throw use statements into functions, so this is pretty useful in specific cases.
In a language without function overloading or namespaces/modules, yes, it would be a major limitation.
But since most languages have both, I don't think it's a serious concern. I don't know of any language where the only names pace support is classes, so that all functions go in a global namespace unless they are methods on a class - maybe you could argue C works like this (where "methods" are function pointer member of a struct)?
Yah I ran into that issue more in Julia. There's a built-in function to help find clashing functions.
In Nim however I've only had it happen a handful of times in a few years. Then you just need to use the module name to qualify it, or change your imports.
Clearly these should do different things. I suppose "computeArea(shape)" does dynamic dispatch based on the type of shape? But you're still putting every function defined on every type in your entire codebase in a global namespace. It's not obviously awful but I'd definitely be a bit nervous about it.
You only need dynamic dispatch if dynamic types are involved. If you can always resolve to concrete types, then you can statically resolve the required function.
Implicit result var is also in Delphi, as Result. The original Pascal is to assign to a variable with the same name as the function, which looks kind of odd, and is overloaded in recursive scenarios, since functions with no args don't need parens to invoke.
The last I used Matlab, it also required you to assign your return value to a variable with the same name as the function. Presumably Gnu Octave also has the same feature.
I've only seen this proposed as a feature to help with template meta-programming (though it could also make similar sense in any dynamic language).
There, it helps if a template can say `t.foo()`, and, with UFCS, can use that template with any t for which either `foo(t)` or `t.foo()` exist. In contrast, in C++ today, a template using `t.foo()` limits its own use only to types that have a foo() method, probably unnecessarily (note that writing `foo(t)` in a C++ template is less limiting, as someone who controls neither the template nor the type of t can still define that function).
However, outside of this use case, I think conflating these two is more of a negative than a positive. It means that there are twice as many places where I may need to lookup the definition of foo(), at the very least. So I wouldn't add this to any static language that doesn't support templates or macros.
Without UFCS, if I see obj.foo(), I know I can lookup the definition of foo() in the definition of obj's type, or in its supertypes. Even for foo(obj), there is often a canonical place where such functions are defined. With UFCS, I need to look in both of these places until I can find the right definition.
And sure, the IDE/language server/other tooling can often help, but not always (e.g. if I'm browsing some code on Github). Either way, more ambiguity for no gains is typically not a good idea, even if the downsides are minor (again, I am very much in favor of UFCS where it's directly useful, such as C++ or D).
D certainly has classes, and a member function has access to private members, while a free-floating function does not. Nim indeed doesn't seem to have this distinction at all, and only seems to support encapsulation at the module level, not the class level (as far as I could tell from very brief searching - I have never programmed in it).
FWIW the implicit result variable is as old as FORTRAN and ALGOL, although the common practice then was to name it the same as the function. Delphi is one language that inherited that (via Pascal) but renamed it to Result, although I don't know whether it originated there, or whether Nim picked it up from Delphi.
It's kinda fine if it's a short function like that[1]. When you get above 10-15 lines in a function though, it's easy to lose track of what's a return variable and what isn't.
[1] and everything in the codebase uses that style otherwise it's annoying to have to context-switch every 5 minutes.
Lots of languages will implicitly return the final expression— I feel like that's a decent compromise. Not quite as magical as an actual named variable that just exists, but not as clunky as needing as explicit `return` every time.
I always disliked implicit returns, and over the years and having dealt with many more codebases, some quite large, I've learned to dislike any implicitness.
I would much prefer that you must explicitly return a value (even if it's through an implicitly declared 'Result' variable) rather than just 'try to guess what happened here, in this long function with lots of expressions'.
There are a few exceptions, like Forth, where you really have to keep the current state of the stack in mind at all times anyway. Those exceptions naturally tend toward very small functions. Most languages don't, and the result is inevitably difficult to understand bugs.
Sure, definitely, and in a case like that (say, in rust), I would just put an explicit `return` in. But there are lots of other scenarios where the result is naturally being returned by the final expression and it's quite convenient to elide the extra keyword.
I think missing `return` works if the language is designed around everything being expressions, so the function is just written as a single expression. I agree that in procedural type paradigms I like having the `return` keyword over things like "return the result of the last expression".
Any language that supports overriding the index operation should support this. You should be able to do this in C# with a struct with a backing array, for instance. If you're going to do this, use the word "Circular" in it, and I would also insist that if a has 4 elements, then a[0] == a[4] == a[8]. In other words, you always just take the (positive) index modulo the size of the area. Then a[-1] is the same as a[N-1] for an array of size N. This could be useful in a lot of contexts, but should be made explicit.
You'd have to care about the difference between indexing with a literal and indexing with variables of different types/widths, and how indexing with a variable interacts with the size of the area.
For instance if you have an int array that contains the numbers 1-250 and you index with a uint8 variable i,
for (uint8 i = 247; i++;) {
// print circ_arr[i]
}
for the values of i near the overflow points of the circular array and of the uint8 it gets weird:
Right, the calling code needs to handle its own integer overflows, of course. And if your circular array has a size other than a power of 2 you can get only a partial enumeration in the cycle that includes overflow. Sure. But are you really indexing an array of unknown size with a uint8? It's really impossible that there might be more than 255 things you ever care about? No. Everyone who is using a uint8 to index arrays is either doing something extremely low-level and fiddly where abstractions like this simply don't apply, or they're idiots who are doing cargo-cult shotgun "optimization" because they don't know how to write code that works.
If you're indexing with a [u]int32 you need to worry about this once every 4 billion increments, and if an incomplete cycle is a show-stopper for you, you can compute a safe modulo yourself based on the size(s) of your circular array(s), but more likely you just need something else. But really, you don't care if your cache hiccups a little once every 4 billion caches.
You make a good point, of course, I'm just allergic to people poking holes in back-of-the-napkin explanations of things with the trite "but integers can overflow!" It's one of those most common well actuallys written on this site. Of course integers can overflow. They almost never do though, do they? And if they do, a test fails and you add a single line somewhere to fix it.
I really think the vast majority of programmers are too often thinking about bits when they should be thinking about math.
Um, maybe, but then his example is sixteen million times worse than reality! If I argued against some technology by showing how bad it would be if it were sixteen million times worse, that's just not a very good argument, is it?
In a language where arrays are fixed-size, I think the proper solution is to have arrays not indexed by integers, but by a custom modular type that depends on the array with values in [0,n) that allows literals in the range [-n, n-1], with literal ‘-1’ being a different way to write ‘n-1’, etc.
You’d need a way to get that type, for example as
float a[10,20] // two-dimensional array of floats
typeof(a.dims(0)) i = 0 // modular type with values in [0,9]
typeof(a.dims(1)) j = 0 // modular type with values in [0,19]
or, slightly neater:
auto i = a.indextype(0)
auto j = a.indextype(1)
Ugly syntax, but in a modern language, most code would probably do something like
for (i,j,value) in a
where the types are inferred.
Having those modular types means the compiler would do the arithmetic correct for the array, while the negative literals allow programmers to specify “last” and “next to last” correctly.
> you always just take the (positive) index modulo the size of the area.
That's something I'd like in a bunch of languages - a real modulo operator that always returns between 0 and n, even for negative inputs, rather than a remainder operator that's advertised as a modulo operator. Grrrrr!!!!!
It's a little verbose and can probably be reduced, but "((x % m) + m) % m" always works. Although it's probably not better than a check if x < 0 since branch prediction will get that right almost always.
I do think it's quite odd and frustrating that modulo can return negative numbers and I don't really get the reasoning there, but there's probably a good reason I don't know about.
Ecstasy uses % for modulo, and /% for divrem (division and remainder). So "a = b % c" for calculating the modulo, but "(a, r) = b /% c" to get the divisor and remainder.
It's frequently used in signal processing to the point where it's considered one of the defining features of DSPs. One common case is filtering over a fixed size buffer of samples. If you have circular indexing, you can simply overwrite the earliest sample and increment the base reference to the next element.
I'm not sure I'd want it for every list, but there are certain places it's nice.
First of all, it handles the "get me the 2nd to last element" case automatically, but in a way that doesn't feel like a weird edge case: it's more "mathematically sound", basically. I always want mathematical soundness if possible because it leads to serendipity, the opposite of technical debt. Where technical debt is "dammit, this is going to take so much longer than it should!"; serendipity is "oh wow I can implement this cool new feature just by combining these other two things in a new way, in like 2 lines. This is going to be way faster than I thought." Mathematical soundness / purity leads to serendipity.
Directly, it supports caches very well. You just increment the number of things you've ever cached and that's where your next cached value goes; you don't care when it overwrites an old value.
There are other cases where you just need some variant of a thing, but you don't actually care that much about which variant you get. You might want to vary your wording in auto-generated text, for instance, by rotating synonyms. Or rotating the tiles you use in a 2D game. In this case I'd define an interface where you pass in a "seed" integer and it gives you back some deterministic example; a circular array is the simplest implementation of this interface (but there are others).
You could also do simple load balancing by sending work to Worker[workCount++]. While usually you want to track each workers' existing workload (because the work takes unpredictable time), this simple approach could be sufficient if all your work completes in about the same time.
If you're doing fancy math or science computing, you may be working with finite groups or fields, whose elements you could stick in an N-dimensional circular array (based on the characteristics of the field).
Sonic Pi, the live music coding environment, has a circular list structure type called a 'ring'. This proves curiously helpful for a bunch of musical scenarios.
Like:
- you can put a short chord sequence into a ring, and it now functions as a list of as many repetitions of that chord sequence as you like. You can just loop over it forever (which is kind of the essence of how sonic pi live-loop play works)
- you can put the notes that make up a scale into a ring, and use it to extract specific chords - like, take the 1st, 3rd, 5th, 7th and 9th note - from just a seven note scale.
- you can use rings of booleans to capture drum patterns and rings of notes to capture melodies, and loop them forever
As a choice then perhaps but as a default and unalterable behaviour it can be a bloody timewaster when negative subscripts are a runtime error in your work. I've hit that in python and didn't enjoy it.
A nice alternative I've seen is that negative index is an error, but there is special syntax for indexing from the back like array[end], array[end-1], array[end-n], where n is a (positive) variable. Likewise, end can be used in range definitions like array[5:end]. Julia and Matlab both have this.
C# has a very nice approach to this: indices aren't simple numbers, but values of type Index [1], which store both the offset and the direction, and can be implicitly created for plain ints. When you do want to index from the end, you use the unary ^ operator to create a reverse index. Thus, you can write things like a[^1] or a[0..^1].
But, more importantly, it means that any custom collection type can define an indexer that can handle reverse indices in the manner that is appropriate for that particular collection; it's not just for arrays.
That's a lot of machinery which I feel is going to benefit relatively few people and cases. I suppose I should learn it just in case but my suspicion is that MS is adding extra stuff which they hope people will use which will act as a lock-in to C#. Ergo the benefit of this is to MS not to the end user ISTM.
And it mihjt be possible to add a static method to array to index backwards yourself (Can't remember what they are called, but look and act like methods on the object but aren't).
All standard .NET collections with defined order and O(1) indexing support indexing backwards.
And no, it's not possible to do this using an extension method, unfortunately - there are no extension properties or indexers in C# (yet; it's something that keeps coming up). But then again, if and when they add extension indexers, this arrangement with a custom type is what'd allow you to write one that does backwards indexing on a collection type that doesn't support it out of the box.
Nim, which is not controlled by a corporation, does it the exact same way. The unary ^ operator applied to an integer creates a value of type BackwardIndex.
@int_19h, @xigoi perhsps you're right but how many times have you ever indexed backwards? Other, I grant, than to get the last item in a list. If it's more general then reversing the list would be better, alternatively you might have
lst.reverse()[x]
which the compiler could guasrantee to recognise and simply implement as a calculation.
Well, I write plenty of Python code, so it actually comes up quite often. The annoyance with Python is that it just treats negative values as magic, so if you accidentally end up with a computed negative index, it silently does the wrong thing. But the alternative approach with explicit index-from-end syntax - whether like in C# and Nim, or like Julia and Matlab - doesn't have that problem; it's pure convenience.
And yes, of course, you can always do the same in some other, more verbose way. But why should we tolerate that verbosity when there's a solution that makes code both shorter and more readable? I rather hope that more languages will adopt one of these techniques.
(per your other posts, extension methods is their name. And they aren't supported here, got it).
> it silently does the wrong thing
yeah, my original complaint was this
> why should we tolerate that verbosity when there's a solution that makes code both shorter and more readable?
Because it's a balance. How much it benefits how many users to what degree vs. extra cost of implementation and maintenance. If you're not careful you go down the kitchen sink road and end up with bloat. Be careful when adding stuff cos you have to support it forever.
`$` refers to the length of the array, so `d[$]` is an array-bounds error. `d[$-1]` is needed for the last element, but you can’t do that blindly, you have to check that the length is nonzero or you’ll get unsigned underflow to ulong.max
Yeah, but $#list is better used for writing loops:
foreach my $i (0..$#list) {
say "$i: $list[$i]";
}
For getting the last element from a list you can just use -1 (and of course further negative numbers work like you would expect, -2 is second to last and so on):
I'm fine with an array type that supports this, but not as the default. I've been bitten in the past by code that ran error-free while giving incorrect output due to this feature suddenly making the indexing valid. I'd prefer it to be opt-in somehow so that the default behavior for negatives is invalid, not silently wrong yet valid behavior.
Only downside I can see is performance hit. You need to check if it’s negative then you need now calculate the length first… which is a bit of overhead but considering how often you use arrays in a tight-loop… I mean if you can optimize out the check because you can detect that the index is always positive…
That's for languages that can't define arrays with custom start/stop indexes. But those that have custom indexes they can very easily expand/implement as helper class (for example array.indexFromLast(1) which means array[Length(array)]. This way you can have best of both worlds.
Surely if your language has custom indexes / ranges `Length(array)` is completely broken and the language provides something like "Index`Last" you can hook on?
Because an array with indexes [3, 7) has length 4, but 4 is not the index of the last element.
That's what Ada does, yes. You'd let the array (or whatever collection) do the work for you:
for I in A'Range loop
A(I) = A(I) + A(I);
end loop;
Whatever the range is, this will work. If you really need the first and last elements or want to be explicit:
Start := A'First;
End := A'Last;
And if the type of the range (since any discrete type can be used) doesn't support simple incrementing with +1 or similar, you can use 'Succ to step through:
Index := A'First;
Index := Whatever_Type'Succ(Index);
Also 'Pred to work backwards. Those can be wrapped up in a simpler function if desired.
And with its array slice mechanisms, Ada is one of the most easy/productive language to handle arrays.
Being able to give subarrays to a procedure and preventing buffer overruns everywhere, reducing screw-up scope everywhere is a superpower I didn't know I needed before starting writing proved parsers.
Yup, correct. What I meant above with array[Length(array)] is for the languages that don't have it. Let me be more clear.
C/C++ doesn't have custom array indexes and as such <array[std::size(array) - 1]> is returning the last element of said array.
Delphi has custom array indexes and as such, taking your example with defining an array in the form <example_array : array[3..7] of integer>, I would not get the last element in case of <example_array[Length(example_array) - 1]. In this case I would have 2 options. Option 1 would be to use <High> function as in <example_array[High(example_array)]> to access example_array[7] element. Delphi also has <Low> function so you can iterate through a custom defined array by using <for> keyword with the help of them. Option 2 would be to actually build my own helper (this is the most wanted case when you're dealing with multi-dimensional arrays that also have custom indexes) and I would have something like <example_array.FromLastIndex(0)> to access example_array[7] element.
As long as there exists a bijection between whatever you choose as an index and the natural numbers starting from 0 it is fine. (I.e. the range of valid indices must be a countable set)
In your example that bijection could be:
3 -> 0
4 -> 1
5 -> 2
6 -> 3
This works for vectors as well, so why not have a range from (0,0) to (5,5) to index into an array arr?
You could write the function that does the mapping manually:
This is such a weird take to me. You're saying: I want to add a rule, where this structure responds to a request in a certain way, based on how the programmer wrote the request in the calling code. Layer upon layer upon layer of weird, janky, edge-case, pseudo-rules, with no consistency, no clear mental model; an absolute nightmare of a programming language. No longer can you possibly intuit what a[-1] really means, nor can you intuit the rules of indexing. You've broken TWO mental models in one fell swoop. No longer can I look at your programming language and assume that anywhere I see a 7, I can replace it with a variable whose value is 7. That is no longer
true in your language! Variables no longer work intuitively in your language. Think about that! What an absolute nightmare!
This is exactly the difference between a language like PHP and a pure functional language. PHP says: usually we want to do X, but sometimes Y, so we'll make Z which does X unless Q is true in which case T1 will be set and Y will happen most of the time when you want it assuming you called it the write way and put an @ in the right spot otherwise P will happen because I hadn't had lunch when I wrote that and it seemed like P was pretty likely to be the case when T1 was set but an @ was not written but lately I've been feeling like maybe T2 should also be set sometimes so if you call Z and you want X but T1 is written and you don't want to write an @ then you can just set CONSTANT_FOO_BAR_WITHOUT_X_SET_AT to 17 because the other 16 codes are already used for other things.
Functional languages say: what if everything was just math?
I disagree about purity. At some point "math purity" "math correctness" may be not desirable.
In this case everything is about intention
In general accessing index out of range (above or below) is not desirable, in almost all cases this is bug.
And now, in my opinion `array[-1]` when `-1` is hardcoded would tell, with full intention that last index is desired.
Basically it would be translated to `arr[arr.Length - 1]`.
You don't write code with `array[-1]` because that's clearly wrong (when there's no going back behaviour)
Meanwhile when it is calculated, then it should result in an error.
The rules are pretty simple I'd say - if you desire to use "reverse syntax" then you can, but when you use variables with may be calculated wrongly, then you will receive an error.
TLDR: Not that weird. If it is something that is almost certainly going to fail code-review, then may as well let the compiler fail it.
Long:
Just because I want only literals allowed someplace, or only values allowed in other places is not even close to weird.
Most places, code review won't let a function call like `foo(true, false, true, false, true)` through, because the potential for errors is so high and the readability is low.
With this take I can see code review easily getting into the weeds for each `bar[x]` to determine if x will wrap around, while letting `bar[4]` through because it is clear it will not.
Right now, with most languages, we simply let `bar[x]` through because if it is out of bounds it will throw an error/panic/etc. I think it can only silently return wrong data in C and C++.
If you intended to calculate an index and you accidentally get len(arr), you get a runtime error. But if you accidentally get -1 you silently get the last element instead. Similar to the argument about signed/unsigned indices in low level languages.
Elixir's sigils are amazing. There are date sigils that allow you to do what the OP does:
~N[2023-01-01 12:00:00]
But you can also define your own sigils to create new "custom syntax" for almost any struct. Kind of a special case of reader macros, I guess. Very convenient.
Swift has the expressiblebyTypeliteral series of protocols for this.
For example you could write an extension on Date to add initialization from a string:
extension Date: ExpressibleByStringLiteral {
public init(stringLiteral value: String) {
// parse the string here.
}
}
You can then do things like:
let happyNewYear: Date = “2023-01-01 12:00:00”
There are a protocols for all literal types. For example, you could implement ExpressibleByIntegerLiteral and have have it init the Date object from a unix timestamp. There is even an ExpressibleByNilLiteral.
It's "funny" that things that are considered an anti-feature and something that needs to be avoided by all means in one place is considered a great feature in another place. This points strongly in the direction that there is no logic behind such "considerations".
What you just showed was a implicit conversion from String to Date. Something you would get beaten up for in Scala land.
C++ has the same with implicit constructors, generally considered to be a footgun that should be disabled with the explicit-keyword unless such a cast makes sense, implicit constructors are otherwise the default. For example vector has a constructor with takes integer size argument, if it wasn't explicit you could accidentally do vector v = {10} which would construct a vector with 10 empty elements, instead of one element with value 10. This also has to do with the ambigous curly brace syntax in c++.
Eg: ~w(foo bar bat) is a word list. `~ letter bracketed-text lettersasmodifiers` desugars as sigil_<letter>(text,modifiers). Similar to foo_str() of Julia[3], but for one-letter-names and more brackets. But not the unicode brackets of Raku.
The most recent addition to the digit family being Phoenix’s new ~p”/healht”, which is a HTTP route string that automatically verifies whether the route exists, and returns compile-time warnings when you link to a path that didn’ doesn’t. It’s fantastic, and really surprising it took this long to be added to any web framework.
Maybe more obvious, it also works the other way around. Saving you from doing [head] + body + [tail] or keeping copies of temporary arrays manipulated with append and extend.
arr = [head, *body, tail]
Same with dictionaries, often very useful when you need to include **os.environ with modifications into a subprocess.
Python has keyword arguments to functions. In that context, a single star packs or unpacks a list of positional arguments and a double star packs or unpacks a dictionary of keyword arguments.
Technically, `...` acts as either the spread or the rest operator, depending on the location. I recall there was an initial impression of intimidation among people who were familiar with ES5 syntax when trying to adopt to these new features because of this.
On the other hand, I'm fairly certain that having to visually disambiguate between `` and `*` and remembering which did what would have gotten a similar reaction.
One of the best truly micro features I've seen recently (can't remember which langauge unfortunately - it wasn't a mainstream one), is general binary literal syntax of the form:
0x[de ad be ef 00]
So much nicer than the usual condensed format. And I think it'd be valid syntax in any language that allows binary integer literal.
A lot of languages allow underscores in numeric literals. Something like 10_000. You can put them anywhere and they get ignored. I don’t know if they also allow it for hex numbers.
There are also real human languages where a space is used as a thousands separator, and an underscore is kind of the programming equivalent of a space.
TL;DR is that it's because C++ uses quote. The reason C++ uses quote (they considered underscore) is because of a very obscure feature of C++ called custom literal suffixes which I'd never even heard of, but numbers can be suffixed with a custom identifier, and since single underscore is a valid identifier you can't use that. (https://en.cppreference.com/w/cpp/language/user_literal)
Swiss German for example. But space, half space and dot is also common in Europe. Just not a comma, because this is used as a decimal separator (fractions) in most countries.
I think you are missing the point - there is no endianness in normal written text, (or code), but there is endianness when that is translated to an actual number - what order are those bytes intended to be used in?
(Given they have been listed separately rather than as a single number).
What order are the bytes in 0xabcd intended to be used in? I don't see how making it 0x[ab, cd] would be ambiguous, it's the same assumption of reading left to right.
Absolutely damned is combination of Kotlin's several features:
1. Lambda functions can be defined with `{}`.
2.`foo(bar, somefunc)` is the same as `foo(bar) somefunc`. In other words, if the last parameter is a function, it can be provided AFTER closing parenthesis.
3. Interfaces that require only one method can be implemented on-side with a lambda function (i.e. `{}` syntax for no-param function).
Combined those three features, the code may look like that:
I don't know why teenagers insist on taking a negative work and making it positive, especially when teenagers don't like adding emphasis to words so you have no idea whether they think it's good or bad.
Not that I was completely innocent of this at that age.
It reminds me of Scala too, but I have no idea which started when. And of course both could have come up with this fairly independently. (After all it is the norm in functional style.)
Kotlin was started even as just a poor Scala clone. (Because JetBrains didn't manage to get a working Scala plugin for their IDE, so they thought it would be simpler to create their own "simpler" version of the language).
Kotlin's features are amazing for designing DSLs but I think your code sample uses more than just the 3 points you mentioned. Specifically the `call.respondText(..)` part. I assume that in this example the second argument to the `get` function is actually a "lambda with receiver", which means that the lambda executes with another object as the receiver (and that object is bound to "this" inside the lambda block), which makes the `call` object available.
> 2.foo(bar, somefunc) is the same as foo(bar) somefunc. In other words, if the last parameter is a function, it can be provided AFTER closing parenthesis.
Just a irregular syntax quirk that tries to get around the fact that Kotlin does not support multiple parameter lists, like the language where most Kotlin features come form, Scala.
> 3. Interfaces that require only one method can be implemented on-side with a lambda function (i.e. {} syntax for no-param function).
That doesn't have anything to do with Kotlin. That's Javas SAM (Single Abstract Method) feature.
> I'm surprised author didn't mention it.
The author seems not to know any Scala. Otherwise the lists would show mostly only Scala features… ;-)
With how old pytest is, I assume that's where they got it from. Does it perform recursive value printing, or bespoke comparisons?
e.g. in pytest it won't just print out the values of "a" and "b", it will recursively document intermediate values until it's reached the toplevel expression:
assert f() == g()
assert 42 == 43
where 42 = <function TestFailing.test_simple.<locals>.f at 0xdeadbeef0002>()
and 43 = <function TestFailing.test_simple.<locals>.g at 0xdeadbeef0003>()
and it's possible to customise the report so you can report as a diff:
assert "foo 1 bar" == "foo 2 bar"
- foo 2 bar
? ^
+ foo 1 bar
? ^
This is one that I like a lot. Years ago (1997 timeframe) I had implemented it in a Java compiler, and a few years later in a Java library (https://github.com/oracle/coherence/blob/4e6e343e1ffd9bbfea3...) that would create an exception on the assertion failure and parse its stack trace to find the source code file name, and read it to find the text of the assertion that failed, etc. so it could build the error message ...
In Ecstasy, we built the support directly into the compiler again:
With regard to strings, they give a good example in Lua, but oh boy wait until this person hears about Perl (:
There's a whole section in the manual [1] for string quoting operators (qq, qw, qx, ...)
In general, I feel like Perl is one of those languages that has a high amount of these "quality of life" syntactic features, and helps make it enjoyable to write, once you get over the learning curve.
usernames = '''
foo bar baz
hello world
'''.split()
# instead of this, which needs too many keystrokes
usernames = ["foo", "bar", "baz", "hello", "world"]
Interestingly, Python named tuples have similar interface for fields:
# all of these are equivalent
EmployeeRecord = namedtuple('EmployeeRecord', ['name', 'age', 'title'])
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title')
EmployeeRecord = namedtuple('EmployeeRecord', 'name age title')
Raku (once Perl6) generalizes quoting as a Q , followed by optional "how should this behave" adverbs, and text bracketed by anyish unicode bracket pair. So Q:w <foo bar> is a list of two words. And has Perl-like qw/foo bar/ as sugar. Heredocs are Q:to/THEEND/ ... \nTHEEND . I'm unclear on whether you extend this without defining your own Q-like thing?
Julia allows[2] defining your own non-standard string literals. foo"bar"hee and qux`...` desugar as macro calls foo_str("bar","hee") and bar_cmd("..."). But lack the bracket flexibility.
That's actually a great point. Perl has so many syntactic sugars that it is a poster child for too much variety in ways you can write things. Making it much harder to read someone else's code.
Yeah, it's a trade-off that's not discussed enough, IMO. Most general programming advice is directed at "programming in the large", involving many people, over a longer period of time, and the maintainability issues that go with that.
But you give up something to get benefits in those areas. Making use of the expressive power of something like Perl is a wonderful sensation. The barriers between thought and making it happen are lower, and so you can be remarkably productive. It is also just more fun, I find, which has subtle and under-valued long-term benefits.
But yeah, agreed that comprehending someone else's Perl-fueled vision quest can be ... rough (:
For rust, it is probably the try (?) operator. Fundamentally, it's just syntax sugar for a match statement with an early return in the Error or None cases, but it really improves the ergonomics of dealing with Result and Option types.
Interestingly, there was once a solid effort to add a try operator to Go. While the proposal was quite well received, upon closer inspection it was realized it would be essentially useless in the real world as, given how the rest of the language works, you almost never would want to simply early return with the value received. The data revealed that the vast majority of the code in the wild that the syntax sugar would replace returned something else.
So while it is indeed a nifty feature if the rest of the language is also designed for it, it's not something that is easily tacked on to a language that is not.
In Rust, when you use `?`, it includes a step to convert the error type of the expression it was used on into the (possibly different) error type that the current function returns. So if you need to map low-level errors to high-level ones in a consistent way, you'd just do it once when defining that error type.
Which too requires the language to have a 'the producer is always right' over a 'the customer is always right' design, which Rust does. It all works well when the language is designed for it, but it has to be there at the macro level. Definitely not a 'microfeature'.
Is there a link to this discussion? This seems interesting. If I can’t make a call to a downstream service, or a file I’m trying to read doesn’t exist, or a s3 bucket 404’s, or almost any other “real world” error I can think of, the only (sane) way I can think of handling this is propagating the error down to the caller (and perhaps logging?)
Do you mean to say that it is idiomatic in Go to handle errors by… doing something else?
On mobile so I can’t put in a code block, but here’s how I thought Go was written:
value, err := some_fn()
If err != nil { Return err }
(? Operator works here because you could do value := some_fn()? And remove the if statement boilerplate)
Do you mean that instead of “return err” Go idiomatically does something else?
> the only (sane) way I can think of handling this is propagating the error down to the caller
A big problem, among many, with doing that is that you leak implementation details out of the abstraction. If, to stick with your example, you have a function that helps you with reading files, the caller shouldn't care where the data is stored. Today it might be the local filesystem, tomorrow S3, and when you make that change nothing about the rest of the program should break.
But you can't count on the lower level functions using the same errors. As you suggest, a "a file I’m trying to read doesn’t exist" isn't represented as a "bucket 404", even though at a higher level they are the exact same thing. If you straight returned the "a file I’m trying to read doesn’t exist" error as you got it from the file API, now the caller is going to depend on that, and when you replace it with the S3 function that returns a "bucket 404" error, everything starts to break.
What you typically want to do is return a more generalized "not found" error that can remain stable regardless of specific implementation details. There are rare cases where you can get away with simply returning the value up the stack, but in the majority of cases you need to handle the error, either by doing something with it or returning a new error that is more useful to the caller. And, so, try becomes essentially unusable without the language taking a larger macro take on supporting such a feature.
Like the sibling comment points out, Rust does "from" conversion when using try (?) to try and avoid encountering the same fate.
> A big problem, among many, with doing that is that you leak implementation details out of the abstraction. If, to stick with your example, you have a function that helps you with reading files, the caller shouldn't care where the data is stored. Today it might be the local filesystem, tomorrow S3, and when you make that change nothing about the rest of the program should break.
This would lead to the situation where a local call is treated as the same thing to a network call. Which is know to be a very bad design.
> Like the sibling comment points out, Rust does "from" conversion when using try (?) to try and avoid encountering the same fate.
Yeah, and it avoids all the hassle.
Why couldn't any language (and especially Go) just do the same?
If your abstraction leaks that the implementation is a local call, and then you try and change that later, unquestionably. Again, you need to avoid leaking implementation details, which is too why you can't just add a try operator and make it automatically useful. Any leak of any kind in your abstraction will make life miserable later. Don't let your abstraction leak.
> Yeah, and it avoids all the hassle.
All it does is move where the code is located, placing the onus on the producer "the producer is always right" instead of the caller "the customer is always right". You don't actually avoid anything, just change the perspective.
> Why couldn't any language (and especially Go) just do the same?
Perhaps it could, but it requires that the language take a more macro look at the problem. It is not a microfeature.
You have things a bit backwards. The try operator was not conceived until after sum types and traits were already stable parts of the language. The addition was simple, didn't really alter the language in a meaningful way and is almost purely syntactic sugar meant as a quality-of-life improvement for users. (ie. it hits pretty every single point in the article's definition of microfeatures)
Lua also allows you to choose the string delimiter. If your string contains "]]" you can delimit it with [=[ or [==[ instead. Any number of "=" so long as the opening and closing delimiters match.
And that's why all modern languages implement streams/string helpers/string builders. You do not want to actually write strings/manipulate them using "+" (concatenation symbol) in code directly because, in modern Unicode world, it tends to become a point of failure for obscure bugs / a maintenance horror show.
String builders originated in languages with immutable strings making code using something like "foo += bar" in a loop very expensive due to the need to allocate a new string on every iteration. A string builder is basically a mutable string that can be built in-place efficiently and converted to a proper immutable string at the end. It is purely a performance thing, and there are no Unicode issues when concatenating valid Unicode strings (i.e. sequences of codepoints).
Note that some kind of string builders are necessary for efficient repeated concatenation both in languages with immutable strings and in languages with 0-terminated strings.
Using strcat() repeatedly in C for example will mean that the string is being read over and over again to find the end, making an O(n) loop actually O(n²).
I don't understand the link with parent, nor why + would be bad, besides a) because language is naive about concatenation / allocation, and b) if the language allows / doesn't differentiate between a char 'x' and an integer, bc that would result in an integer addition instead of a concatenation.
If you manipulate strings in code using "+" (string concatenation symbol) from a user input you'd be in a world of hurt where you either do a lot of regex (which is ugly and unmaintainable) or you limit the user input to known characters only (which would be a bad user experience and later on your manager would ask you to lift such constraint anyway because they want to support a new feature from now on). Therefore you use, for example, a stream and simply dump your user input in the stream buffer, as they come, and go with that stream in your code from that point on. This way you're future proof too if your manager wants a new feature to support, as example, Chinese and/or Japanese keyboard
> [In Chapel] there’s the config keyword. If you write config var n=1, the compiler will automatically add a --n flag to the binary. As someone who 1) loves having configurable program, and 2) hates wrangling CLI libraries, a quick-and-dirty way to add single-variable flags seems like an obvious win.
Letting people define configurable variables at their call site is incredibly valuable, even if you don't have compile-time support, and even if you're working on something not meant to be an isolated binary.
At my startup, one our most beloved innovations is that you can write `resolve_config("foo", default="bar", request=request)` pretty much anywhere you'd normally hardcode a value or feature flag... and that's it.
The first time it's seen in any environment, it thread-safely inserts-if-not-present the default value into a key-value storage that's periodically replicated into in-memory dictionaries that live on each of our app servers. Any subsequent time it's accessed, it's a synchronous key-value lookup in memory, with barely any overhead. But we can also configure it in a UI without needing a code redeploy, and have feature flags and overrides set on a per-user or per-tenant basis.
Sometimes, you don't need language support if you have some clever distributed-systems thinking :)
I would want that system to log those changes to whatever monitoring system is being used, or integrate with the deployment system as a "deploy", so that when some oncall person is trying to figure out why the entire fleet is pegging their CPU, they can trace it back to the flag change.
> You'd just need to have a mutex lock on the values.
Oh no they're saying that it's thread-safe, that's not an issue. Rather that depending on the order of initialisation, possibly of different systems entirely, you can have different initial states because different systems or subsystem decided of the default value.
> Sometimes, you don't need language support if you have some clever distributed-systems thinking :)
I think you may have outwitted yourself here; I know what that looks like because I've done it so many times in the past :-)
I'm afraid your solution is not distributed-system safe, as a different bootup order of the nodes[1] in your system would result in a different config value for that key. And, at some point, your nodes are going to come up in a different order.
I'm partial to for-else-loops. They fit perfectly into languages with compound expression (produce a value from a break in the loop or from the else block).
They don't come up that often, but when they do they're really the best solution.
I hate the Python syntax using "else" for this, but I love the feature.
I've often wanted both a "then" and an "else" from both for and while loops. The "then" would be for a successful completion (no break), and the "else" would be for when the loop doesn't even run a single iteration.
But that didn't make it into our "language budget", unfortunately. It's easy to implement, but hard to argue for when it doesn't get used often
They also seem to be unknown to a lot of Python programmers unfortunately; I’ve had PRs rejected because a for-else loop was “unusual syntax” and therefore considered hard to maintain.
> Most languages have multiline literals, but what makes the Lua version great is that the beginning and ending marks are different characters. This solves the infuriating “unnestable quotes” problem string literals have, and you don’t have to escape all your literal \s.
Nestable comment syntax is also nice. At least some MLs (eg SML) has it, that I know of.
Indentation-sensitivity can also solve similar problems. (Indentation does not have to exclude requiring graphic termination. A formal language can require both. Or just a helpful tool.)
(Also agree with 'kebab-case', although the name is new to me and a bit weird.)
I like how functions in js can be `arg => result`. In F# I have to do `fun arg -> result` with the `fun` keyword. It makes sense since `MyArgType -> MyResType` is a type signature in f#, but I feel like the compiler can just check if the arguments are references to types or are argument bindings.
# Multiline Lists/Arrays ####
I like how F# doesn't require delimiters for multiline lists.
So I can do `let myList = [1; 2; 3]` or
let myList = [
1
2
3
]
# Regex Literal ####
I like how in Crystal instead of doing `/my[regex]/` i can do `%r(my[regex])` where the parenthesis can be any brace type (like "(", "{", "<", "[") so I don't have to escape any characters.
# Argument Accessor Shorthand ####
In Crystal, you can use an ampersand to bind and access a property on an object, instead of writing the verbose form with a function.
So this
["a", "b"].join(",") { |s| s.upcase }
can be written as
["a", "b"].join(",", &.upcase)
If this were available in F#, for example, instead of
> I feel like the compiler can just check if the arguments are references to types or are argument bindings.
1. this is absolutely terrible because now you need feedback from the type checker to know how to parse the program
2. it is furthermore also ambiguous with function application, requiring arbitrary lookahead to disambiguate, also not a fun thing to do
JS gets away with it because the sigil was not previously used and it only requires a single lookahead to parse, as only single-parameter anonymous functions can have "bare" parameter lists.
In haskell (and elm) it's `\` which is quite OK (and also a nod to the lambda symbol λ).
But yes "fn" is quite nice (taking over "f" is a bit much). And a few characters can definitely degrade the experience, especially as "u" and "n" are typed with the exact same finger.
Anonymous functions were definitely one of my least favorite features in Erlang, not because they don't work well but because their leading keyword is "fun" and there's an arrow between the (parenthesised) parameters and body and they also have a closing keyword "end":
map(fun(X) -> 2 * X end, [1,2,3,4,5]).
That's a bit much.
But HoFs in general are quite awkward, as referring to a named function also requires the `fun` leading keyword, and requires specifying the arity, so
map(fun double/1, [1,2,3,4,5]).
after having defined the function as
double(X) -> 2 * X.
(as you can see Erlang would really rather you defined named functions).
It doesn't need feedback from the type-checker. The type is not required, only whether a symbol is a type symbol. So it is enough to have feedback from the lexical scope, which can be tracked during parsing without any type analysis. It's a compromise, but a useful and simple one. C has been doing it since the 70s ("typedef").
The function syntax I like even more is one with implicit arguments so you don’t have to name them, e.g.
waiting = sum workers #(%.in_queue + %.in_flight)
Clojure has some syntax like this though it isn’t needed for the most obvious use-case of functions to extract fields because keywords, which are usually used for map keys, are implicitly functions that look themselves up in their arg, e.g.
It's very common already: try-with-resource (java), using (C#), bracket (haskell), unwind-protect (common-lisp), ... though it the latter two it's more of a building block.
Also building block: languages with a convenient and "unrestricted" syntax for anonymous function can just use that e.g. Smalltalk, Ruby, ... in Ruby a "with" is usually just passing a block to the corresponding object's constructor:
# python
with open(...) as f:
...
# ruby
File::open(...) do |f|
...
end
'with' is cool, but it is annoying that it creates a new scope. And then you have ExitStack if your lifetimes do not neatly map to scopes. And AsyncExitStack and async with if you happen to be in an async function.
Call/Pipe operators are just sugar for normal function calls. It's nice because adding or removing a call doesn't require balancing parentheses. It's helpful for writing stream or sequence/iterator based code and throwing debug utilities in the middle.
It's less of an issue if your language has UFCS or other postfix function call syntax like mentioned in this thread, but if you don't this is nice to have.
It also allows you to invert the order of the calls so they are written in the order they occur. Instead of paint(sand(cut(measure(wood)))), you can write wood | measure | cut | sand | paint, which is easier to read, especially if it splits over multiple lines or has additional arguments:
IIRC you can do that in Haskell as well, but I forget the name of the feature. Many OOP libraries have started to adopt a chained method call style similar to this, but it is nice to be able to do with any function.
I'm not a Haskell user, but my experience with this in the Nix language is a bit mixed. It definitely works sometimes, but then you get a pileup of parenthesis nesting anyway, because the default is greedy and you have to control which functions get which arguments.
For instance Racket and Clojure have threading macros, which are more flexible as they're just macros (Clojure's `->` is equivalent to Elixir's pipe operator, but `->>` will fill in the last parameter rather than first, and `-->` lets you use a keyword to define where the parameter is inserted in each call).
Haskell let anyone who wants define their own pipe operator, historically you had to BYO, which wasn't exactly hard:
(|>) = flip ($)
or
x |> f = f x
would do (modulo fixity), but today it's provided by default as "(&)".
Some from Raku (formerly Perl 6) that I really like:
* sub MAIN:
sub MAIN(Int $x, :$verbose) { }
generates a command line parser that expects an Integer plus an optional named switch --verbose
* It has named params (as seen above), and there are abbreviations: instead of thing => $thing you can write :$thing to avoid duplicating the name (:thing also exists, though it create a pair "thing" => True, so ruby lovers need to be careful :D )
* junctions for quick conditionals/validation: 0 <= all($x, $y, $z) <= 2 * pi
* this is a probably debatable, but: if you use a * as a term, it will create a lambda for you, so *+2 is similar to sub ($x) { $x + 2 }
> If you look at something like numpy functions, so many of them share the exact same parameter definitions. What if you could write def log(standard-exp-params) instead of having to write them out every single time?
They're not actually written out every time, the issue is mostly documentary (and it would be nice if Python or Sphinx ever had a good solution). And numpy actually has a bunch of generators for that e.g. https://github.com/numpy/numpy/blob/45bc13e6d922690eea43b9d8... handles filling in the common bits of documentation for the ufuncs.
Yes but also that lacks most of the documentation so it's not great.
If you have multiple callables taking these parameters documenting them is awkward, by default help/pydoc and sphinx will tell you that the parameters are `default_args` and `default_kwargs`, but that's not actually true, those are just intended as shortcuts / helpers .
The goal of Project Coin is to determine what set of small language changes should be added to JDK 7. That list is:
* Strings in switch
* Binary integral literals and underscores in numeric literals
* Multi-catch and more precise rethrow
* Improved type inference for generic instance creation (diamond)
* try-with-resources statement
* Simplified varargs method invocation
I see Mathematica as a shell for math. You can write long programs or modules in it, sure, but very often you're simply typing up a couple of lines to check some computation or visualize an expression which you won't even save, in which case readability is secondary.
You're writing something and you decide you want to apply 'f' to it, so you type '// f' (instead of backspacing like a caveman). It's actually rather convenient.
I don't see how, except for manually rolling a struct for every new combination of units. The C# generic system is not flexible enough to define things like "m / s / s = (m/s^2)".
Frink (which OP mentions for its datetime syntax) has the concept of units and unit conversions built into the core language - but that's about all it does! I'd love to have this feature in a general-purpose language.
Keyword and optional arguments (seen in e.g. Python, Bash with --flags, and OCaml) are my favorite language superpower. They make code more self-documenting and let you add add optional behaviors to functions. This makes it really easy to make concise, highly usable APIs.
if the parameter has the same name on both sides (caller, callee) there's a syntactic shortcut. It's a small but noticeable force that pushes you towards more consistent naming.
I love Common Lisp arguments support, where you can also get a parameter that tells you whether an optional or keyword argument was supplied by the caller, instead of relying on the default value, for when you want to know. It's very useful for something like a patch function.
WebGL's swizzling vector selectors.[1] Where v.x desugars as v[0], v.y as v[1]. Similarly for z and w. Also r,g,b,a. And they swizzle: v.rgb, v.xz, v.zx . So `v1.xy = v0.yx` reflects.
Which leads to the question why you would use offset / index syntax instead of dot syntax in the first place.
Therefore I would prefer something like this to be the usual array access syntax:
val chars = Array("a", "b", "c")
val secondChar = a.2 // as a shorthand for `chars.atIndex(2)`, or equivalently `chars.atOffset(1)`, maybe also with `a..1` for the offset case
(Also we should stop calling the offset "index", and get a proper "atIndex" method.)
I might add kebab-case to my current language project. From all the code I've written, I only found a handful of - operators not surrounded by spaces, so that ambiguity wouldn't bite me often.
Also, I wish the unary negation operator was more visually salient. `foo * -bar` is very different from `foo * bar`, but it's only a handful of pixels on the screen. I've thought about trying to render it as an em-dash or something. Didn't NASA lose a rocket over a spurious - sign?
Some languages use ~ for negation; wasting it (and other common chars) on bitwise ops is a waste in most languages that don't specifically target bit twiddling.
That said, have you considered making this outright illegal without explicit parentheses? I actually wish that more languages would require that any sequence of operators has the same precedence throughout; i.e. a+b*c would also be illegal. It's always a pain to remember the exact precedence rules, especially since they're not consistent across PLs, so I'd prefer any expression that is ambiguous to be explicitly disambiguated.
Code is symbolic and not like a (western) written language.
Kebab-case and snake_case may seem to read better when looked at code like it would be written text, but they read worse than camelCase when looked at it in a symbolic way.
What I'd like to have brought back is basically the extended version of the numbers/kebap-case: Ignore white space in constants and identifiers if possible, like Algol-68 did.
That means you can write your number as "1 000 000" and put it in the variable "one million", and then feed that to your function called "withdraw money".
Yes, sure, makes it harder to grep. Here's a nickel, get a better grep tool (or wrapper thereof).
Clojure’s loop expression hits this spot for me. It sets a recursion point to which you can jump using any logic inside the body you want, as long as it is from tail position. It’s like a while loop turned into an expression. I haven’t encountered any other way to write iterative expressions whose number of iterations isn’t known at the top (like map and reduce).
Tail call optimization can get you that, too. If you've written Scheme and/or gone through SICP you might be familiar with this: you write a recursive function, with the recursive function call as the last thing the function does ('tail-recursion'), and the compiler/runtime is able to optimize those recursive calls out rather than consuming one stack frame of space per call ('tail call optimization'). Clojure has loop/recur at least partially because it doesn't support tail-call optimization.
Interestingly, I almost prefer Clojure's `recur` semantically. Means you don't have to change the function name twice if you rename it, and it's hard to miss that you're recursing.
Those are cool properties. Another one is that you get a compilation error if your recursive call isn't in the tail position (and thus would actually grow the stack when you thought it didn't).
One thing I don't think you can do with loop/recur, though, is optimize more complicated bits of recursion than a single function that calls itself. I.e. imagine a recursive call pattern that goes like f -> g -> f -> g -> ...
It's much older, it's lambda calculus stuff. It's a way to implement recursion in a language which doesn't have recursive functions (but for some reason does have first-class functions).
However it allows making anonymous functions recurse as well.
TCO also, unlike special syntax for direct tail recursion, works when the last call is not (directly) recursive (which supports indirect/mutual recursion, and just structures with deep call heirarchies that aren’t necessarily recursive.)
Oh I didn’t know it was kind of a workaround. I do like the fact that loop is not a function though but an expression like if or case.
FWIW I think in Clojure you can use “recur” inside functions too to specifically indicate tail call recursion without relying on automatic optimization
> roughly three classes of language features [...] 3. Quality-of-life features that aren’t too hard to add
I'd regrettably add another class, quality-of-life features which you'd have hoped weren't too hard to add, but because of past choices, now are.
Examples: Adding javascript-like dots a.b.c for Julia Dict's a[:b][:c] would conflict with "wasn't intended to be public but has been" Dict implementation fields, like .count . Adding { a,b | ... } instead of a less concise { |a,b| ...} for Ruby blocks, but for a yacc grammar conflict.
You can build method access in Ruby trivially enough but you will forever be explaining to users which node.class isn't node[:class]. And now every method added to that Hash/Dict-like object is a breaking change because someone somewhere could have used that already to access a key.
The { a,b | } syntax is also still ambiguous with hash literals, unless you require that | to be there, and that looks like it gives the parser a whole lot of look-ahead work to do in order to distinguish hashes from lambdas.
I agree that kebab variables aren't to my taste either, but I am partial to the notion of kebab-case keywords that I encountered in a JEP draft [0]. It suggests expanding the keyword vocabulary with a form that is otherwise invalid syntax, similar to how java treats module-info.java and package-info.java as valid files, but rejects any other hyphenated java class filename.
You can also put a newline in a variable name if you really want. Or a 0 byte.
Here's a demo. I've used the debugger because its "X" command can print the true name of the variable:
$ perl -d -e 1
Loading DB routines from perl5db.pl version 1.60
Editor support available.
Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(-e:1): 1
DB<1> ${"variable-name"} = 123;
DB<2> ${"variable\nname"} = 456;
DB<3> ${"variable\0name"} = 789;
DB<4> X ~variable
$variable^@name = 789
$variable^Jname = 456
$variable-name = 123
Agda does, but it also supports a plain ascii hyphen in identifiers. It allows operator characters inside identifiers and requires spaces around operators otherwise (as proposed in the article). So you can use x-y as an identifier:
x-y : ℤ → ℤ → ℤ
x-y x y = x - y
The Agda community also heavily uses unicode characters. I've even seen a unicode colon used for a custom syntax because the ascii colon was unavailable.
Because in most languages they're not useful. Symbols are solutions to problems, some of which are:
1. mutable strings (ruby)
2. and / or expensive strings (erlang, also non-global)
If you have immutable "dense" strings and interning, and you automatically intern program symbols (identifiers, string literals, etc...) then symbols give you very little.
And then there's the slightly brain damaged like javascript, where symbols are basically a way to get some level of namespacing to work around the dark years of ubiquitous ad-hoc expansions so you're completely stuck unable to add new program symbols to existing types because you could break any page out there doing something stupid.
As the article covers, they are nice syntactically, regardless of those performance considerations. They fill a niche that in my experience actually turns out to be more common than string literals (though less common than strings as actual textual data).
I haven't written ruby (or any lisps) for awhile, and I miss symbols.
They exist in K/Q. A single-word identifier-shaped symbol begins with a backtick, or a multi-word symbol can be created with a backtick and double quotes. A sequence of symbols is a vector literal, and is stored compactly. For example:
`apple
`"cherry pie"
`one`two`three
Many languages will intern string literals implicitly, or allow a programmer to explicitly intern a string; for example Java's "String.intern()".
The problem with string interning, especially for strings constructed at runtime, is that for the interning pool to be efficient it is very desirable for it to be append-only, and non-relocatable. A long-running program which generates new interned strings on the fly risks exhausting this pool or system memory.
Personally, I found Ruby's symbols to be a source of bugs because they can easily get mixed up with strings. The article gives the example of dict[:employee_id]. But what happens if you serialize "dict" as JSON, then parse it again? The symbol :employee_id will be silently converted to "employee_id", which is treated as a different dict key from :employee_id. I found it was easy to lose track of whether a given dictionary is using the "keys are symbols" or the "keys are strings" convention, especially in larger codebases.
Yeah symbols are terrible and they lead to using Mashes or "Hashes with indifferent access" to attempt to allow both syntax. This helps with round tripping to JSON and back and getting consistent access either way, but values are still not converted. And values shouldn't be symbolized from JSON which means round tripping through JSON typically converts symbols into strings.
It would be a lot easier if symbols had been just syntactic sugar for immutable frozen strings so that :foo == "foo".freeze == "foo" would be true.
And under the covers these days there is very little difference. It used to be that symbols were immutable and not garbage collected and fast. And that strings were mutable and garbage collected and slow.
These days symbols are immutable and garbage collected and fast and frozen strings are immutable and garbage collected and fast (and short mutable strings are even pretty fast).
Symbols as a totally different universe from Strings I would consider to be an antipattern in language design. They should just be syntactic sugar for frozen strings if your language doesn't already have frozen strings by default.
Symbols in Ruby are meant to be more performant that strings iirc. If I have symbol :a, then it's allocated once regardless of how many time it appears. As opposed to "a" which is reallocated every time.
I guess it's similar to Python having a single instance of small integers. PlayStation also experimented with caching small floats which gave them some perf improvements too, but I think wasn't as performant in all cases.
Lua (and some other languages) intern strings, so all strings that are the same point to the same string instance. This gives the same benefits (plus string equality is just pointer equality) without a different type.
There is a caveat in older Ruby versions that they aren't garbage collected, so they shouldn't be used for things like user input. Not a problem since 2.2 though.
Symbols can even improve performance. Replace them with integers at compile time like a global enum, and so the runtime only needs to compare integers instead of potentially lengthy (especially if UTF-16) strings.
> Replace them with integers at compile time like a global enum, and so the runtime only needs to compare integers instead of potentially lengthy (especially if UTF-16) strings.
All of those strings will be interned, and can thus be compared by identity. Which is an integer comparison.
You can do all sorts of things with them. Use them like symbols in Scheme, say to name fields `get #name user` or to access database tables, etc.
But what's even more interesting is that the name is reflected up into the type. `get #name user` won't fail at runtime. Your database table name can be checked at compile time.
Ah, OverloadedLabels – I've only seen them in the `get #name user` use-case. I feel like Scheme/Lisp symbols are used quite a bit more generally, but maybe it's just not caught on yet in Haskell, also other features fill the same roles (e.g. in many lisps you can unquote a symbol and use it as the function of that name; people also often use them similarly to data constructors for pattern matching).
Balanced string literals (with some extra QoL) recently landed in C#, haven’t got the chance to use them yet, but they sound really nice in the right situation.
In Next Generation Shell I've experimented by adding
section "arbitrary comment" {
code here
}
and this is staying in the language. It looks good. That's instead of
# blah section - start
code here
# blah section - end
Later, since NGS knows about sections, I can potentially add section info to stack traces (also maybe logging and debugging messages). At the moment, it's just an aesthetic comments and (I think) easily skip-able code section when reading.
Symbols
I've decided not to have symbols in NGS. My opinion (I assume not popular) is that all symbols together is one big enum, instead of having multiple enums which would convey which values are acceptable at each point.
var foo string
{ // Do stuff
// ...
foo = "..."
}
{ // Other section...
}
You can also split stuff up in to sections, but for some kind of functions where you know the functions will never be re-used and are intimately related, I find this clearer.
Of course, your language will need to have block scope, or at least blocks.
If it's just for comments, IIRC Lisp has docstrings - the very first expression in a Lisp function can be a string literal which gets compiled into the final executable as a docstring which can be retrieved at runtime.
Ruby's Enumerable module is incredible and jam packed with features like .tally, .any?, .take, .partition, etc.
But one of the things I love most is how seriously they take having their Hash be enumerable. I love being able to loop through any hash as easily as you would with an array
The "Expanded Parameters blocks" is kind of supported in Kotlin. You can write a function like this:
/**
* @param arg the string to garble
*/
fun doThing(@NotEmpty arg: String = "default)
In this example, "arg" is mandatory, or else it would have the type "String?", making it nullable. It obviously has a default value. It has an annotation that performs some validation, though admittedly that is a library and not a language feature. And it has its own documentation. I find this more concise than the Powershell example.
Sure, comptime is great, but I've also found it hard to reason about code with it. I prefer my comptime stuff separated out into its own section/file/whatever. With that small change, it becomes so much easier.
And error unions with corresponding semantics. And explicit casting requirements. And @TypeInfo. And no hidden allocations. And probably like 4 or 5 other things I'm not thinking of right now.
JSON compatibility, to be able to copy and paste from JSON to valid nested dynamic arrays.
a = {
"a": "a",
"b": {"b": 2}
}
Optional commas at the end of lines, so this is also valid
a = {
“a“: "a"
"b": {"b": 2}
}
and we are able to swap or append lines without editing the commas or forgetting to do it and get a syntax error. Mandatory commas on all lines would do but it gets in the way of JSON compatibility.
This must also be legal code and equivalent to the previous one
a1 = {
a: "a",
b: {"b": 2}
}
a == a1 # true
The developer decides when saving typing time is more important than JSON compatibility.
PS: a big yes to kebab-case too. That's in part CSS compatibility because CSS class names are often kebab cased.
I want more implicit typing in typed languages. Quite often the compiler knows exactly what type a function will return, but I still need to write it there. Sometimes it’s easy („int“), but what about HashSet<Immutable<Tuple<int,string>>>
Typescript does it well. F# (completely statically typed) too…
For Java, I used to write `list = yourFunctionReturningThatSet();` and then have the IDE fill in the type when it complains about the undefined variable.
Easter egg: you can use -> _ to ask the compiler what type it thinks the return type should be, given your body. Because it is only used for diagnostics it isn't fully featured and there are things it doesn't cope well with, but it is there and works most of the time.
That defines a method `getDate` with the type `() => String` in Scala.
The type is statically know, of course.
But it's recommended to use explicit return types for public methods. This helps preventing breaking public API by refactoring the implementation of a method.
I mean interface not as a language construct, but as a declaration of how module (class, component etc) can be used. You want to see the type in such declaration, not to infer it based on implementation details.
These use dash, en dash and em dash, respectively. Most languages that allow you to use a decent amount of Unicode in variable names probably will accept that kind of kebab–case.
Those dashes look pretty similar to each other in monospaced fonts but not indistinguishable, so it’s readable and not super confusing. Might work. Why not?
If unit testing makes the cut, then I think standardized documentation and doctests should be included also. Python (string literal at the beginning of a class/function), C# (structured comments) and Rust (doc attributes) are three different, valid ways of adding this to the language.
Looks like these fall into the "structured comments" category, based on seeing /** */ and /// in a repository I found. Does it also have a story for doc tests?
Prolog has a weird third thing going on. Arithmetic only happens in specific contexts. i.e.
A is 4 - 2, % arithmetic, technically is(A,-(4,2)), A is 2
A #= 4 - 2, % arithmetic, A is 2
A = 4-2, % non arithmetic, A is unified with the term -(4,2) which pretty prints as 4-2
A = -2, % non arithmetic but A is the number -2 not a term -(2).
A = 4-2, B is 4 + A, % a weird one A is the term -(4,2) but when it gets called in the context is(B,+(4,A)) it gets treated as the arithmetic '-' and B is 6
you can also kebab-case predicate names so
l-h-t(L,H,T) :- L = [H|T].
?- l-h-t([1,2,3,4],H,T). %works as desired
H = 1,
T = [2, 3, 4].
Good observations. Actually more important than it looks, I think. All those little helpful things.
Also worth discussing: micro-misfeatures to be avoided when designing new languages. Maybe non-micro-misfeatures, ie the lack thereof, can be considered a microfeature. Like, for example, uniformity.
And I just have to trot out my favorite example: Java import statements do not allow keywords and numbers in package names. So we can't put our Java source code in folders named 'import', 'long', or in paths like '2023/01/'. Great. For no good-enough reason - the syntax would actually be cleaner with a separate package name syntax. (BTW, this could be fixed, I think.)
Ruby having anonymous functions being defined like { |args| <whatever> } and for no arguments you can drop the || entirely. Rust's |args| { <whatever> } with || { <whatever> } being mandatory for no args removes the ambiguity in parsing.
class Foo {
@Max(10) int bar;
@NonNull String name();
}
var field = @Foo::bar;
var max = @Foo::bar.Max;
var method = @Foo::name;
var foo = new Foo();
var name = method(foo);
var bar = foo.field;
this is usually a style thing not an enforced syntax and maybe a hot take but I actually really like leading commas in comma-delineated lists (like Elm and Haskell). Makes changing the order of things really convenient
I wrote a small DSL for easy compilation to SQL that included some similar features. It included datetime literals, but they were specified as just a string prefixed with "d".
d'2020-02-20'
I also tried to make it so that every comparison had both english and symbol representations, a range syntax, and an approximation/match comparison (e.g. "=~", "!~") which could work with both floating point numbers and strings properly.
I found this useful, and wish it was in more languages.
(which is allowing nice postfix ones with currying)
Scala does it pretty well. and nowadays finally the function names don't require a PhD in ancient Egyptian hieroglyph decoding.
so requiring alphanum names for functions is pretty important imho. (sure it's okay if it's just a stern lint warning and the developer has to opt-in. but searching for iteratee is easier than searching for \∆>> or whatever.)
I think it's okay for a language to help guide its users toward "better code" (of course Rust is the big one for this).
But I'm also a firm believer of providing escape hatches. So it should be just a toggle in the project/file/directory to enable whatever behavior. And a very good language would require a human readable explanation for these, so when the developer says
allowEmojisInCode = true "we decided to allow emojis to make our happy DSL, see emoji reference at https://..../...."
downstream users/readers of the code are in a much better position than with just 30000 lines of emojis :)
Scala has this escape hatch. But it's annoying. Why can I use arbitrary symbols (even with spaces, and such) by adding back-ticks, but not regularly. It wouldn't make any difference, besides not looking bad and needing extra key strokes for absolutely no gain (as it does not prevent bad code anyway!). I hate that kind of hand holding! Like I said: It's not the business of a language to judge what kind of code is "good".
Scala does not even let most people on the world express code in their native language, and that in the age of Unicode! Sorry, but that's a little to much of "we know better than you what's good for you" kind of thing.
Of course I know where this comes form: Scala is used broadly in education. There it's good to not allow the students to do all kind of "madness".
But Scala is also mostly used by seasoned professionals in real world settings. (Just have a look at the latest survey, found on the Scala website). For a professional it's just extremely annoying when a language tries hard to know better then they how "good code" should look like. The main thing about a expert programmer is that he knows when it's OK to break "rules". Needing to jump through arbitrary but completely useless loops just to do that — when you know exactly what you're doing(!) — makes me mad sometimes.
I considered to fork the compiler not only once because of this. I hate such kind of "but we know better" behavior.
The backticks are more of an anti-feature than anything, and it's absolutely not a file or module/package level setting. (I think a lint-exclude at declaration site would also be okay.)
> It's not the business of a language to judge what kind of code is "good".
Well, yes, but no. Language design is just inseparably infused with judgement calls. Making things easy leads to them being used (as the backtick illustrates, making things hard reduces their usage, even if it sounds nice that it allows for special cases).
But as you imply the language has to be flexible and thus powerful enough to provide the option of a seriously different design trade off. (Because backticks are just a bad compromise. It's not really switching to a different design choice after all.)
> I hate such kind of "but we know better" behavior.
I think that implies too much intent on the Scala core team, unfortunately the reality is - probably - that historically someone wanted something, it got done somehow, and that's it. (In the particular case of backticks maybe Martin really had a strong opinion. Dunno. Probably you have looked into this at least a few times if you considered a fork :) )
... related to this the recent discussion about Rust's GAT (generic associated trait) feature is a very interesting case study in the intersection of language design, "project governance/management" (the reality of pragmatic compromises). There a small team spent at least a year developing GAT support for the compiler, and then a bunch of people were asked to decide whether to merge it. And it's a very unenviable position, because of course the work was not perfect. So what to do? In the end, I think at least, the narrative that was comfortable for everyone involved was that "this is the best version we can have realistically, and yes it provides net positive value in this current state".
Nested json dictionary lookups with default fallback, for scenarios where any of the sub-dictionaries could be null or missing a key. Write parseDateTime(config["a"]["b"]["c"]) safely as a oneliner. Haven't seen any language solve this in a neat way, without catching indexerror exceptions or ugly chaining of null-conditionals and empty dictionaries as fallback-values.
It's not language-level, but I keep a Python class handy that wraps objects like that and has that behavior. You can also access items with config.a.b.c if desired (and if they're valid identifiers).
Ruby’s symbols are undersold in terms of how useful they are and how they enable meta programming and DSL development. You can basically invent new language keywords with them.
private def foo
“bar”
end
In this case, ‘def foo … end’ returns ‘:foo’, and ‘private’ is just another method that takes it as an argument and decorates the provided method. It’s not a special language keyword.
What I always wanted is proper spaces! Designing syntax where identifier could have spaces (without backticks or something like that) might be tricky of course. But may be it's not impossible.
All those space imitations, whether they're dashes, underscores or camels - they're just imitations. Nothing compares to real spaces.
If anything, underscores are closest ones, if you ask me.
It's pretty easy, actually, and was common in early PL designs such as ALGOL. All you need to do is make a special syntax for keywords - https://en.wikipedia.org/wiki/Stropping_(syntax) - and then whitespace becomes completely redundant for tokenization purposes, and can be ignored altogether, or treated as part of the identifier in which it occurs.
> What I always wanted is proper spaces! Designing syntax where identifier could have spaces (without backticks or something like that) might be tricky of course. But may be it's not impossible.
What would this look like, though?[1] How would you solve the problem of adjacent identifiers vs a single identifier which has spaces?
Maybe having identifiers, and only identifiers, starting with a uppercase letter with no uppercase in the rest of the identifier? Then lines like `if MySubRoutine()` is easily parsed as `if My Sub Routine`, and `MyVarType MyVarName;` becomes an easily parser `My var name My var type;`.
Looks harder to read than the usual camelCase, kebab-case and PascalCase identifiers though.
[1] I ask because I'd like to do something like this when writing a program that comes with it's own language for the end-user to use to enhance the program.
Well, I would imagine that it requires designing a language which ordinary does not allow adjacent identifiers.
Of course keywords must not be allowed as part of identifiers (or there should not be no keywords at all like with Lisp).
Just an example of my head that I didn't think really much about:
function print person (p: person) {
var full name = p.first name + p.last name;
if p.middle name != "" {
full name += p.middle name
}
print line(full name)
for c : p.subordinate person list {
print person(c)
}
}
I think this syntax should be parseable with little restrictions (like your identifier can't start with keyword).
I am not that familiar with Kotlin, but these seems better than the syntax primitives from a language design perspective (I greatly recommend the “Growing a language” presentation done by Guy Steele), these are ordinary functions that are well-known from other parts of the language, not an added “hack” that has a one-off use. If you were to use a concurrent hashmap implementation you no no longer can use the syntactic sugar, and writing against an implementation is quite common in Java (which plays quite a big role in the design of Kotlin), e.g. having a List in the interface, instead of ArrayList.
Coming to Kotlin from Python, these always get on my nerves. My brain hasn't adjusted to needing to call a function. And 'X to Y' just doesn't click for me. I guess I'll get used to it eventually
Also don't forget about year 0 (which doesn't exists and is a single point of failure for so many programs that deal with calculating time between now and a BC date)
local const variables like JavaScript has them. You can declare variables with „let“ (mutable) or „const“ (immutable). This is really great when reading code, because you never have to check if some code may change the variable at some point. And you usually declare most variables as const.
A lot of languages provide immutable variables only for class members or statics, but not for local variables.
Syntax highlighting can help with distinctions like that.
You might prefer Nim's visually distinct "let" for immutable, "var" for mutable. "const" is also available and means resolved to constant at compile time, similar to Zig's "comptime".
Actually not, as firstly you almost never use `var`s, and secondly if you do, most syntax highlighting rules will make them shine in bright color as something very exceptional.
This means you can’t reference the function itself by name without some kind of quoting construct or other circumlocution; also, while the no-arg case might make sense as a special case, if you allow the one arg case, you might as well admit the general case.
The general case adds ambiguity which makes it more of a design choice, rather than an obviously good generalisation.
In the general case of 2 or more comma-separated arguments, is "func a, b" a call with two arguments, or a tuple containing a call with one argument? Feel the same about "x = (func a, b)"? What about when this appears in list syntax like "[x, y, func a, b]", or the argument list of another function like "obj.method(x, y, func a, b)"?
They are easily solvable with a design decision about precedence, but arguably the syntax is a little confusing or worth a warning in all but simple cases.
The single argument case works fine for Lua but maybe the no argument case woudn't. Having to supply only the empty argument list seems counterintuitive, though.
Variable sigils would work but that would be even more annoying.
Kotlin has it, but only if the function accepts the function argument, e.g.
`foo({println("hello")})` is the same as `foo {println("hello")}` (larger example in my other comment above).
You can also make it on no-args function, if it's a method (attached to a type). Utilizing @get(), like that: `3.mph` and have somewhere else defined
If you have constants repeated throughout the codebase, you can pull them into a constants file, and then you can go to that file, navigate to the definition of interest, and use your IDE of choice to find usages.
You'll also be able to give these constants semantically significant names, and comment next to them providing derivations or citations. And of course, if it's a mistaken or outdated value, you can change it one place and apply it everywhere.
Consider that, if you were debugging a problem with this constant, and the problem was caused by someone having made a typo in one of it's usages (eg having typed 1000500 instead of 10000500, a mistake that's more difficult to make of you have better ways to format numbers [did you have to look back and forth to find the mistake? I did]) - your regex would fail to find it, even if there were no ambiguity about the format it was written in.
I'm in the same boat as GP. Typically, I grep such things when I am not familiar with the codebase, so I can't change where constants are defined, and I do not know where to find such a file.
For what it's worth when I dive into a new codebase, the first thing I do is try to guess what files exist and then find them and get a feel for structure. Constants are high in the list.
The problem is not finding where the constants are defined, it's about finding what the code does based on a hardware datasheet, the constants could be in a file, or could be scattered throughout, it doesn't matter at all.
You can also use the magic of base 16 to search for lexical subsets of a value to find where code uses things with the same mask, which are probably related. Extremely effective in reverse engineering a hardware device.
You haven't understood the problem at all. No wonder, few people do any sort of bare metal programming. It's not about defining constants, or even writing code at all, it's about figuring out what this arbitrary piece of code is doing based on hardware datasheets. You search for constants defined in the datasheet in the piece of code you are analyzing to determine what it does, or where does it do specific things...
In my experience most (embedded) code is usually not possible to grep for such specific numbers anyway, because the assignments use bit-shift operators, set-macros, bitfields, binary literals, hex-literals non-hex-numbers, splitting a 16 bit number into a 2-element 8 bit array, mixing up the endianess, etc. An IDE that can find all assignments where right side has a numeric value of choice would cover more of such variants.
I'd gently suggest that if you wanted me to have that context when interpreting your statement, you could have provided it in your original statement or provided it now blamelessly, rather than framing this as a deficiency on my part that I failed to use my crystal ball to determine you were an embedded programmer. (I do very similar things at the application level, for what it's worth.)
I believe the solution I proposed remains viable in that context or for that usage. If I defined a constant for the magic memory address one writes to to configure the MMU, and you and to understand how I implemented context switching, you could navigate to my constant and find usages.
If that solution doesn't work for you, no worries, it was just a suggestion/observation.
You could also use CScope or grep for the constant's variable name. I don't consider an IDE to be particularly specialized, but you do you, I hate it when people tell me I'm using the wrong IDE, so I'm not going to tell you to use an IDE.
To be clear, I grep for things all the time, even though I use an IDE.
It's not about searching for specific values either. Sometimes you find some unknown value that's not reflected in the out of date or simply wrong datasheet. What does it do? Well, it's quite likely it does something related to other values that use the same mask, so you can try to figure out the mask by searching for lexical subsets of the value, which because of the magic of base 16 is an extremely effective strategy in finding related things. Much harder now with underscores.
And it's also about being able to search Google for a found constant, which indexes other people's code. There have been plenty of occasions where I found newer version of proprietary driver code that the hardware manufacturer claimed it either lost or doesn't exist and won't provide to us by simple searching for constants on Google...
But more often is to just find drivers from other operating systems that already support that device, or mailing lists or forums where other people try to reverse engineer the device.
Depends on the size of your codebase and how many people work on it. Once you have 20 years of code written full time by 200 developers in your monolith, finding the constants file out of 400 different subsystem's constant files for a given subsystem that you've never seen before can become a legitimate and challenging pain.
Totally true. I don't know if IDEs commonly have a feature like, "find this value, regardless of how it's expressed" (even better yet, fuzzily, to catch typos), but I think that's the proper general solution. It's a good idea to consider a constants file earlyish, while everything still fits in your head.
I'd say in the case of such a sprawling system, make a constants module, and it can have different files for different topics. But keep them all together. Code style is an engineering tool you can use to prevent problems.
But I do understand this is cold comfort for those working on systems where the decision around this were made 15 years ago, and there's no possibility of refactoring the constants. That's quite annoying.
Then you have a problem with having to update both the constants module and the service that depends on it to change the service, and you need to handle packaging and distributing the module. Maybe if you have a monorepo where those issues are moot...
I didn't mean to suggest the constants module was a separate, reusable module, but a component of the same piece of software. By "module" here I meant "directory which can contain multiple importable files," so you could namespace your constants (constants/rfc_abcd.xyz, constants/customer_limits.xyz, etc).
I'd rather copy-paste any reused constants to different projects to avoid coupling, unless there was some kind of compelling domain/project specific reason.
When your code-base reaches a certain size, you cease using the IDE to find code and instead start using specialized tools, such as Lucene indexes so that grepping through code takes seconds rather than minutes. One of the bads of this is that using regex over an index is O(n) in comparison to the O(log(n)) of a normal index lookup.
This so hard. A constant that replaces a magic number like 5 that is ungreppable, or a constant for a precise number like the physical constants - sure. A constant that replaces a number like 404 or 500 that you want to be able to easily grep across heterogenous code bases for? Pass.
I'll take the momentary ambiguity in some cases (usually quickly resolved by line context and file name) over having to manually hunt down 5 different projects' inconsistently named constants to do 5 different greps any day of the week.
Out of interest, how often do you do this, and what are the semantics of the numbers you're grepping for? I literally can't remember a time I've ever tried to grep for a number.
When I was doing bare metal programming I was doing this all then time, depending on what I was doing maybe even tens of times an hour. And it's not just source code either, these days even debugging tools print values in this way making it a total PITA to reverse engineer things because you can't easily match values coming in from different tools, or from the tool and source code, etc.
The solution is clearly to have language tooling with a find-constant tool which you give an expression and it parses all declarations in the source code looking for one which evaluates at compile time to the provided expression.
This is why I like the underscores, it's easier for me to see the typo in:
quota = 1000_000_000
than it is in:
quota = 1000000000
And also when I'm typing the number, it's easier for me to be sure I got it right when I can count the zeros in groups of three. It's rare that I've needed this, but I've used it in Java a few times in my career.
That’s not enough — the underscores can appear anywhere, so you need to cater for 500000_0 and 5_0_0_0_0 etc. basically an optional underscore between each digit.
That's kind of a handful to type every time :)
I'd just do for the first one and then scan for the number I'm interested in. If you have more than a page full of constants..
> Second, what if parameter blocks were abstractable?
Sounds like a great way to make unreadable code
#define STANDARD_EXP_PARAMS ...
// I hit gotodef on func(...) and got here. What are the arguments?
void func (STANDARD_EXP_PARAMS) {
// ... long function body ...
x = y; // What is the type of x? Is it an argument or a local?
}
I'd really be irritated if someone ever used this feature, optimizing for lines written is shaky territory to begin with. When it's at an interface boundary it's not excusable. NB4 "use an IDE" - requiring an IDE to make code legible is dumb, and I'm an IDE shill!
You could say the same about a function call or a function that takes an object as an argument - who knows what code lies behind the impenetrable barrier of structured or object oriented programming?
That's actually a feature of encapsulation. The object (ideally) is used to abstract away those details so I don't need to know about them, only how to create the object. Encapsulation is a different feature than reuse.
The exception is in languages that use that syntax as a way to implement keyword arguments, but I would still ask people to destructure it in the parameter list or at the top of the function so I can see what the arguments are.
The point is that when I am looking at a function definition I need to know how to call it. As described all I see is obfuscation to save some keystrokes, not clean code.
But how would you know how to call it if it takes an object as an argument? How different would the process be for looking at a named standard set of arguments?
Frink? Frink?
This Visual Basic erasure can not stand. The #-delimited date literal syntax has been found all over the VB family: Visual Basic, VBA, VBScript, and it persists to this day in Visual Basic .NET.
Of course, being a VB syntax, it's completely cursed. You can put
in a VB file, and what date it represents will depend on what locale it's.. compiled? executed? evaluated? in? Maybe? Depending on which VB dialect you're in? Good luck. Some of those languages would also accept And they might even agree about what century it's in.