
A Reply to “Let’s stop copying C” - ScottFree
https://www.gingerbill.org/article/2020/01/25/a-reply-to-lets-stop-copying-c/
======
AgentME
>NULL/nil is just one of many invalid memory addresses, and in practice most
of invalid memory address are not NULL.

I want a language that can check at compile/check time that none of my
pointers will have invalid addresses. Recognizing null as just another invalid
value makes it more obvious to me that I want a language to handle it
differently than how C does it.

In memory-safe languages, it's already unthinkable (/ very rare outside of
situations where you're deliberately doing unsafe/native code integrations) to
get non-null invalid pointers that point to a different type of thing than you
want. When was the last time you had a Java program crash because a typed
reference actually had a different type of reference at runtime somehow? Isn't
that great how that basically never happens? -- But if you consider nulls a
different type of reference, then it does actually happen sometimes. It would
be great if we could try to close off that issue.

I'm a huge fan of languages with non-nullable references as the standard, like
Kotlin, Typescript, and Rust. In my experience, it's much easier as a
programmer to understand how a codebase uses nulls when the codebase is in a
language with default non-nullable references, and therefore null-related
issues where a future programmer passes a nullable value somewhere it
shouldn't be happen much less often.

~~~
enriquto
> I want a language that can check at compile/check time that none of my
> pointers will have invalid addresses.

This is incompatible with pointer arithmetic, as it would require to solve the
halting problem. You could, still, verify it at run time.

~~~
AgentME
Right, but another option is to eschew pointer arithmetic. Iterators in many
languages address many usages of pointer arithmetic, and can be designed to
compile to the same sort of code as pointer arithmetic-using C compiles to.

C wouldn't ever be able to take out pointer arithmetic, so I worry anyone
envisioning a new language as some diff from C is probably going to get stuck
on that sort of thing too. I'm a big fan of the original referenced article by
Eevee for bringing this sort of thing up.

~~~
jstimpfle
Pointers and pointer arithmetic are a physical reality. If you forbid them you
are necessarily limiting yourself. It's a tradeoff.

Iterators may work only very locally, or require garbage collection, for
example.

Static option types add complexity to the typesystem and require many
typecasts in practice (which can me made safe with runtime checks and panics,
but they are a hassle).

~~~
int_19h
The problem with pointer arithmetic in C is that it's the _default_. Every
pointer implicitly supports it, even though the vast majority of them point to
a single object of a given type, and so it doesn't really make sense for them.
So it's much easier to get an invalid pointer than it ought to be.

The obvious fix, as seen in e.g. Zig, is to have different types for pointers
to objects and pointers to arrays. But once you have that, you can relegate
the latter to the "unsafe" opt-in subset of the language.

~~~
jstimpfle
I've worked on a hobby programming language where I've made that design
decision, too. But it makes pointers less general. Single "Objects" are arrays
of length 1, but not in the type system. And in practice it often happens that
I want to treat an object as an array of length 1.

On the other hand I can't remember that I've ever given a pointer to an object
that was treated as an array. I figure it happens about as often, and is about
as easy or hard to debug, as swapped function call arguments. Which is pretty
rare in my experience. And my philosophy is that making separate types for
things that are structurally the same is usually a bad idea. Because it splits
your world in two.

------
jlebar
TIL the C variable declaration syntax is meant to evoke how you would use the
variable. For instance,

    
    
      int *x[3]
    

means that to get an int, I type

    
    
      *x[3]
    

(well ok, _3_ would be a poor choice). Whereas if I have

    
    
      int (*x)[3]
    

I'd dereference it as

    
    
      (*x)[0]
    

meaning, it's a pointer to array of ints, while the first is an array of int
pointers.

This is mildly life-changing.

~~~
ensiferum
You got an off by one bug in your first example ;-)

~~~
wruza
If we used languages with 1-based arrays, then the first element was 1, the
last was n, and find(x) could return 0 as an “item not found” marker instead
of returning -1 or worse uint_max. Reasoning about counting from the end could
be off-by-one easier: n+1-i instead of n-i-1, n-1-i or n-(i+1), whichever
nonsense you like better. Looping: i=1, i<=n. Setting next: a[n+1]=x, where
the capacity allows.

But when you mention one of these languages, you get a bunch of “oh, 1-based
arrays, so uncool, not an option”.

~~~
cyphar
The downside of 1-based indexing is that pointer arithmetic stops being
consistent with array indexing. Both approaches have downsides.

~~~
wruza
And one may see that as a nice way of ditching pointer calculus completely to
the machine level where it belongs.

~~~
jstimpfle
fact is, array indexing _is_ pointer arithmetic.

And how do you for example index an array of 256 elements with one byte, with
1-based indexing?

~~~
wruza
257, 300, 100k elements? You do not index it with one byte, that simple. If
that is a hard requirement (8-bit chip, low ram, etc), leave it to C _or_ a
special syntax. __ptr_offset(p, n), p+n, p: array[0..n] of x, option base 0,
you name it.

I mean, we can ask ourselves tens of such in-the-box questions for any feature
that doesn’t fit it, but out-of-box world doesn’t really crave all of that by
default.

~~~
jstimpfle
No, I'm trying to show is that offset-indexing is the right way to do it:
because it makes sense, mathematically. If you need more and better arguments,
have a look at Dijkstra's "Why numbering should start at zero" (it also argues
why we should have left-inclusive and right-exclusive bounds).

~~~
wruza
>...unnatural by the time the sequence has shrunk to the empty one. That is
ugly, ...

I reread his article for clarity. He first makes the “ugly” argument, which
originates from 1) natural number domain definition issues, and 2) only then
he goes to the range definition argument based on 1). It may be valid when
you’re working in your mind on math problems of natural numbers starting* with
0, which is a self-referential argument btw. But it doesn’t have to be applied
to software arrays of items. Dijkstra’s argument is repeated as a mantra, but
the general consensus is that representation of natural numbers (unsigned
ints) brings more issues than ought to solve, so we don’t use them today,
which defeats 1), and then 2) loses its causality.

>>find(x) could return 0 as an “item not found” marker instead of returning -1
or worse uint_max

Let’s test that against Dijkstra’s statement on range unnaturalness? Which is
more unnatural, 0, -1, NSNotFound?

Programmatically it doesn’t make any sense except in salary gains from
mastering off-by-ones.

* Edit: natural numbers start differently in different countries, but you may s/0/1/ there and it still holds, because he speaks about domain borderlines. The same holds true for e.g. [321..500].

~~~
jstimpfle
It's not about number definition issues. The argument is convincing that the
difference of upper and lower range bounds should equal the number of
contained elements, and that it's ugly to represent empty ranges with an upper
bound that is lower than the lower bound. (I agree and I held that opinion
before even reading this article).

Note that I rewrote his argument a little here, because it is not really about
natural vs unnatural, but more about looking at the difference between upper
and lower bound. Now, if I may presume that we can agree that the difference
of the bounds should equal the number of elements, leading to left-inclusive
and right-exclusive bounds, the question is would you rather have a lower
bound of 0 (which is an entirely "natural" number that is already in the game
for subsequences of length 0), or would you introduce an entirely new number
for the upper bound, (size + 1), which also needs an addition operation to
compute?

I don't care about definitions of natural numbers. As demonstrated, numbering
elements as offsets makes a lot of sense for purposes of indexing, and if you
insist on starting with 1, then you need to either lower your base pointer by
1, or subtract 1 at every indexing operation (both is not exactly simple), and
you frequently need to add 1 to the size.

I have no issues with zero-based indexing, and I don't think I've ever had to
write quirky code. The most "ugly" thing is that the last element must be
indexed as (size - 1) (which also makes some sense, since the last element
need not necessarily exist. The subtraction makes clear that this is
dangerous).

> Which is more unnatural, 0, -1, NSNotFound?

I normally handle that as -1 which is absolutely ok, especially given that
this value _is_ special. I concede that you might prefer 0 since that aligns
well with evaluating truthiness of integers. It also makes a lot of sense
(mathematically/programmatically) to use "size" (i.e. one-past-last index) as
not-found, but that breaks when arrays are resized.

------
dang
The referenced article was discussed last year:
[https://news.ycombinator.com/item?id=18977460](https://news.ycombinator.com/item?id=18977460)

and a bit at the time:
[https://news.ycombinator.com/item?id=13079341](https://news.ycombinator.com/item?id=13079341)

------
eximius
Hadn't read the original essay, but Eevee is always a great blog.

Honestly, Rust hits a lot of good points for me. My only concern so far has
been `.await` and a minor concern that they'll keep adding junk and end up
like Perl with too many features.

~~~
zozbot234
Rust has a standardized 'edition' system, so they can deprecate
superseded/junk features from newer editions of the language while still
playing nice with legacy code targeting an older 'edition'.

~~~
eximius
That's true and I hadn't really thought of it being used for removing
features!

But, also, the features can only be removed from the language. The compiler
must support them forever. :/

------
zzo38computer
I do not agree with all of them. I think assignment expressions is good, and
textual inclusion is good (but should not be the only kind of inclusion), and
increment/decrement operators is good, and macros is good (although the way C
does it is not good enough; there are many things it doesn't do), and pointer
arithmetic is good, and goto is good. But I agree with them that the C syntax
for octal numbers is bad, and the C syntax for types is bad. Identifiers
should not have hyphens if the same sign is also used as an operator. Strings
should be byte strings which have no "default" encoding other than ASCII
(although UTF-8 and other encodings are still usable too, if they are
compatible with ASCII). Another problem with C is that it does not allow null
declarations, and does not allow duplicate declarations even if they are the
same. There are many other problems with C as well; some kind of low-level
stuff is too difficult with C.

~~~
jstimpfle
> Another problem with C is that it does not allow null declarations, and does
> not allow duplicate declarations even if they are the same.

What do you mean here?

~~~
zzo38computer
Well, it works now (so that problem no longer exists); but on the computer I
used before this one, it didn't work.

------
axaxs
I'm having trouble following the negative modulo wording or formula. It says %
and %% are identical for unsigned integers then below defines it as

a %% b == ((a % b) + a) % b

If that's the case I can't figure out how they are identical for unsigned
integers. Or is that example only a hint for how it would work with negatives?

~~~
klyrs
I don't precisely follow the text either; it looks like that formula computes
±(2a)%b.

Perhaps it's a typo for

a %% b == ((a % b) + b) % b

Which brings a negative modulus into the range [0, b)

~~~
mjevans
I think you are probably correct. Lets try a set of simple cases.

    
    
        ( 4  % 3) == 1
        ( 4 %% 3) == 1
    

Are these two correct answers?

    
    
        (-4  % 3) == -2 ??
        (-4 %% 3) == 1 ??
    

Lets assume I got that correct and plug in some numbers.

    
    
         a %% b == (( a % b) +  **b** ) % b
        -4 %% 3 == ((-4 % 3) + 3 ) % 3
        -4 %% 3 == (-2 + 3) % 3
        -4 %% 3 == 1
    

However I think it would be clearer for maintenance if the extra operator
wasn't used and instead some 'math.absolute()' function were used.

PS: Hopefully those are short enough to not die on mobile

~~~
Vassvik
It's definitely a typo.

Conceptually I prefer looking at them as

    
    
        a % b  = a - trunc(a / b) * b
        a %% b = a - floor(a / b) * b
     

The latter is even used as % in languages such as Python, which further
compounds any syntactical confusion. :D

------
mckinney
I hadn't given Odin a look before I read this post. As a fellow general
purpose language author (Gosu) I applaud the author's pragmatic, albeit
unpopular at times, point of view. For instance, his position regarding Null
is spot on re the "streetlight effect" reference. Others include: * pascal-
style declarations * type system * multiple returns * strings * switch Also I
would not downplay the advantages of operator tokens && ||. More than just
familiarity with C-family developers, in my experience, they are more
generally effective as expression delimiters. They stand out better than 'and'
'or', which is better for readability.

------
kazinator
The direction of modulo with negative operands (and of division) was
implementation-defined in C90.

At least it is pinned down now.

Common Lisp has a number different modulo operators that go every which way.

Firstly there are the single-valued _mod_ and _rem_ :

[http://clhs.lisp.se/Body/f_mod_r.htm](http://clhs.lisp.se/Body/f_mod_r.htm)

Secondly, the floor, ceiling, truncate and round functions also return a
remainder:

[http://clhs.lisp.se/Body/f_floorc.htm](http://clhs.lisp.se/Body/f_floorc.htm)

------
kazinator
> _foo_ bar; // Is this an expression or a declaration? You need to check the
> symbol table to find out*

Checking the symbol table is a simple one-liner.

Pushing all semantic information into the syntax so that the meaning of
everything is known without any lookup is not realistic and will result in a
bad language.

Lisp:

    
    
      (a b) ;; what is this?
    

You need the full surrounding context to know whether it's b applied to a,
value of b being bound to variable a, or a list of base classes a and b in a
defclass or what, ... and it's good that way.

------
einpoklum
Most of what the author suggests is available in (modern) C++ - while
maintaining backwards compatibility with C code:

* _Textual inclusion_ : C++20 has modules, where you include sematically, not textually.

* Bitwise operator precedence: Not soundly supported, but you can sort-of have custom infix operators in C++, see: [https://stackoverflow.com/questions/15632217/defining-new-in...](https://stackoverflow.com/questions/15632217/defining-new-infix-operators) so, specifically, one could implement the alternative modulo.

* Leading zero for octal: C++ has custom literals, so you could define a _octal or _hex etc and get the value determined at compile-time.

* No power operator: Same thing in Odin and C and C++, IIANM.

* Switch with default fallthrough: Since C++17, there's an official [[fallthrough]] anottation. You can make your compiler fail in cases when you implicitly fallthrough. So, not as elegant, but you can ensure you don't mess up and get the wrong behavior.

* (Type syntax: Nope, C++ has the contrived syntax of C.)

* Weak typing: With a library, you can have strong aliases which are not interchangeable with the aliased type. See this blog post: [https://foonathan.net/2016/10/strong-typedefs/](https://foonathan.net/2016/10/strong-typedefs/) and the library it links to. Things would be better, though, if the C++ standard committee allowed for an operator.

* Bytestrings: This is a library issue really. And both C and C++ have libraries which deal with wider characters, with UTF-8 and what-not.

* ++ and -- : You don't _have_ to use them... and it's possible to have static tools which forbid them in source files.

* ! Operator: C++ has !, && and || , but also "not", "and" and "or". I like the latter.

* Multiple returns: It's easy to return a tuple in C++, and with C++17 you can even construct and initialize multiple variables like that, e.g. `auto [index, name] = get_index_and_name(whatever)`.

* Errors: C++ is multi-paradigmatic here, supporting the traditional status return, an expected<T> type (either an actual T or an error; not yet standardized by available via widely-used ibrary), and exceptions. Monadic-style programming is not yet supported and will likely not be in C++20, but there are discussions about this.

* Nulls: With `std::optional` and `gsl::non_null`, and especially with no return types necessary, you can comfortably avoid using `nullptr` in C++ code.

So - with some choices and a little discipline (really not much!), you can
have all of this "Odinish" behavior you like.

Of course - the price of C backwards compatibility and supporting multiple
programming paradigms is complexity of the formal language definition,
grammatic ambiguity, and a syntax that is not always pleasant.

~~~
the_why_of_y
> * Textual inclusion: C++20 has modules, where you include sematically, not
> textually.

... provided that the libraries that you want to use are available as C++20
modules, not header files.

> * Leading zero for octal: C++ has custom literals,

... but no way to redefine the built-in "0" prefixed octal literal.

> * Weak typing: With a library, you can have strong aliases which are not
> interchangeable with the aliased type.

... you can have that in theory, I agree it's very useful, but can you name
one widely used C++ library that e.g. defines custom integer or string types
purely for the purpose of type safety?

> * Bytestrings: This is a library issue really.

... indeed, and the standard library's std::string is quite unhelpful, there's
no way provided to iterate code points if you put UTF-8 in std::string.

> * Errors: C++ is multi-paradigmatic here,

... but none of the paradigms available with just the language and standard
library have any static checks that the caller of a function actually handles
an error, whether return value or exception.

> * Nulls: With `std::optional` and `gsl::non_null`, and especially with no
> return types necessary, you can comfortably avoid using `nullptr` in C++
> code.

... provided you never use any library from the C++ ecosystem, none of which
currently use std::optional or gsl::non_null.

One of the problems I have with C++ is that re-using existing libraries vs.
structuring your code to statically avoid sources of bugs is a trade-off; it
shouldn't be.

~~~
einpoklum
> ... provided that the libraries that you want to use are available as C++20
> modules, not header files.

Yes. This will take time. But if you're writing most of the world from scratch
(like with a new programming language), then you can write libraries in
modules.

> ... but no way to redefine the built-in "0" prefixed octal literal.

True, but why use that? It's confusing if you don't know the convention.

> ... can you name one widely used C++ library that e.g. defines custom
> integer or string types purely for the purpose of type safety?

Well, custom string types? QString is used in tons of apps; but custom string
types are rarely about safety. Integers - there's Boost's safe_numerics
library. Granted, I'm not sure it's very popular or just mildly popular, but
still.

> ... indeed, and the standard library's std::string is quite unhelpful,
> there's no way provided to iterate code points if you put UTF-8 in
> std::string.

Yes, Unicode support in the C++ standard library is lacking, or where it
isn't, it might as well have been. But again - it's not the language. So you
can write your string library (or port ICU) in Odin or in C++

> ... but none of the paradigms available with just the language and standard
> library have any static checks that the caller of a function actually
> handles an error, whether return value or exception.

static checks are a compiler/IDE thing. But - there _is_ a [[nodiscard]]
annotation in C++17, so that you can't just ignore a returned status
altogether.

> ... provided you never use any library from the C++ ecosystem, none of which
> currently use std::optional or gsl::non_null.

Toucheel and the backwards compatibility is a bitch here. But

1\. At those libraries' boundary, you have either silent or explicit
conversion, which are safe, as long as you act safely on the outside. 2\. Most
templated standard library code will use optional just fine...

> One of the problems I have with C++ is that re-using existing libraries vs.
> structuring your code to statically avoid sources of bugs is a trade-off; it
> shouldn't be.

Agreed. Having said that - writing lightweight wrappers for an existing
library is a middle-of-the-road solution which is a lot less bug prone than
just writing unsafe, old-style C++ all over.

------
discreteevent
They are right about null. To say it is a billion dollar mistake is completely
over blown. Maybe it was at the time but now a null pointer exception is
extremely easy to find and fix. It doesn't compare to the other complex bugs
one has to deal with most of the time.

~~~
iainmerrick
If when you say “null pointer exception” you’re thinking of Java: do you find
any value in nullable / non-null annotations? Or if you’ve used Kotlin, its
non-nullable references?

I find them very useful. I don’t just want null pointer exceptions to be easy
to debug, I want to avoid them completely in the first place.

~~~
dehrmann
> do you find any value in nullable / non-null annotations

Hah! Annotations. They're tales told by idiots, full of sound and fury,
signifying nothing.

But seriously, they're about as meaningful as reading the javadoc. At least
Optional enforces the contract; it's just clunky.

~~~
iainmerrick
You can configure your IDE to warn of incorrect usage, and you can configure
your compiler to flag warnings as errors. That will catch a lot of potential
problems at compile time. It’s not perfect but it can be lot better than
nothing.

Java annotations are better than Optional when you’re interoperating with
Kotlin code, such as when you’re partway through converting a large codebase
to Kotlin.

