
Fat pointers in C using libcello - dgellow
http://libcello.org/learn/a-fat-pointer-library
======
eqvinox
In my personal experience, "a bit more than a pointer" works best as a pair of
(start, end) pointers (where "end" points to just beyond the last element.)
The most obvious reasons for this are:

\- slices become a total non-issue since a pair of (start, end) already is a
slice and you can just move start and end.

\- comparing against an end pointer is generally easier than adding up a
length value first, particularly if you're slicing at the same time.

\- the end pointer value is independent of the array element type, so if you
e.g. cast to uint8_t * (which arguably you shouldn't in most cases) it stays
exactly the same. If you store a count you need to adjust a multiplier. If you
store a byte length, you need to do a lot of divides or casts to deal with
pointer arithmetics.

Also, this is a huge red flag to me:

[https://github.com/orangeduck/Cello/blob/master/include/Cell...](https://github.com/orangeduck/Cello/blob/master/include/Cello.h#L135-L140)

    
    
      #define is ==
      #define isnt !=
      #define not !
      #define and &&
      #define or ||
      #define in ,
    
    

P.S.: This also is a "try to invent a new programming language without
inventing a new programming language" thing. Have your cake and eat it...
either it's C or it isn't, and this library is leaving the space of "normal"
C.

~~~
kortex
This is how Go does slices, though they are 3 fields:

    
    
        type slice struct {
            zerothElement *type
            len int
            cap int
        }
    

Though I suspect under the hood the len and cap are ordered first.

If it's good enough for C greybeard Ken Thompson and unix hacker Rob Pike, its
good enough for me.

In fact I've looked for a port of go-style slices to C and haven't found one.
Maybe people think sds is good enough?

~~~
int_19h
Go has a rather idiosyncratic take on arrays and how they're used, which is
reflected in its slices. I can't think of any other language or framework that
did it this way.

~~~
saati
Rust Vec has the same representation.

------
joejev
Interesting idea, but this implementation has UB:

    
    
        typedef void* var;
    
        struct Header {
          var type;
        };
    
        // ...
    
        #define alloc_stack(T) header_init( \
          (char[sizeof(struct Header) + sizeof(struct T)]){0}, T)
    
        var header_init(var head, var type) {
          struct Header* self = head;
          self->type = type;
          return ((char*)self) + sizeof(struct Header);
        }
    

The section "struct Header* self = head" is UB. The alignement requirement of
the local char array is 1 but the alignment requirement of struct Header is
that of void* which is probably 8.

~~~
asveikau
Not only that but you have a pointer to a parameter returned back and used
outside its scope ...

~~~
nitrogen
It's not a pointer to a parameter, it is just the parameter itself.

var is a typedef for void* and no & appears in the function.

~~~
asveikau
> no & appears in the function.

It's an array, so you don't need & to take its address, it decays into a
pointer without &.

Imagine:

    
    
        char buf[sizeof(struct Header) + sizeof(struct T)];
        char *p = buf;
    

Then take away the names so that you are effectively passing p as a
parameter... Then returning p.

As in, an anonymous temporary being given to the function, and the function
returns its address back.

It's assuming that this temporary parameter buffer will exist after it is used
and the function has returned. I'm not sure what the standard says for that
but it is crazy sketchy. [Edit: Googling around, it seems like maybe this is
illegal in C99 but possibly legal in C11? Or that C11 changed the rules for
this. Does not seem like a great thing to rely upon.]

~~~
eMSF
There is no issue here (except the one highlighted by joejev).

Many standard library functions return a pointer which they got as a parameter
(or another pointer offset from it, as is the case here). The compound literal
is no more "temporary" than a variable that was introduced right before the
function call. Lifetimes of compound literals are specified in section 6.5.2.5
of both C99 and C11.

    
    
      func((int[256]){ 0 });
      // is mostly equivalent to 
      int __a__[256] = { 0 };
      func(__a__);

~~~
asveikau
> Many standard library functions return a pointer which they got as a
> parameter

Obviously. But this does not extend the lifetime of the buffer they are
passed. Namely you can't use this as a technique to extend the life of
automatic storage falling out of scope.

> Lifetimes of compound literals are specified in section 6.5.2.5 of both C99
> and C11.

This is what I was missing. So it is valid by the standard. Which is good if
you have looked it up. It remains _not obvious_ when reading source without a
copy of the standard on hand, or prior knowledge of that section. Passing an
expression of that sort and keeping a pointer to it visually looks like the
intent is to retain a pointer of more limited scope. Intuitively it would make
just as much sense if the lifetime were shorter. If you are seeking clarity of
intent this is not a great thing to rely upon.

------
alkonaut
Perhaps a stupid questoon: Why isn't a vector type similar to { ptr, count } a
normal thing to pass around in C? It's what you reach for in any other
language, why did it become idiomatic to pass pointers and lengths separately
in C?

A C standard library has a header file for _complex math_ but it doesn't
define a simple fixed size array struct? Why is that? Is it because they
become pointless when there is no generics to deal with the stride?

~~~
matheusmoreira
> why did it become idiomatic to pass pointers and lengths separately in C?

I've read that it's because there used to be binary interface issues with
structures. They can be returned from functions and passed as parameters but
it isn't immediately clear how that happens: is it on the stack, in one
register or in several registers? Even today there are compiler options that
affect the generated code in those cases:

    
    
      -fpcc-struct-return
    
      Return “short” struct and union values in memory like
      longer ones, rather than in registers.
    
      -freg-struct-return
    
      Return struct and union values in registers when possible.
    

[https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-
Options.html#ind...](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-
Options.html#index-freg-struct-return)

[https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-
Options.html#ind...](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-
Options.html#index-fpcc-struct-return)

[https://gcc.gnu.org/onlinedocs/gcc/Incompatibilities.html#in...](https://gcc.gnu.org/onlinedocs/gcc/Incompatibilities.html#index-
fpcc-struct-return-1)

~~~
andrepd
>They can be returned from functions and passed as parameters but it isn't
immediately clear how that happens: is it on the stack, in one register or in
several registers?

Why does it have to be clear? It can be unspecified and the compiler will do
what it thinks is best given the struct, e.g. return `struct {int x,y};` in
registers, return `struct int[80] x}` as pointer to memory or write in-place
to the caller's stack, via RVO.

~~~
matheusmoreira
> Why does it have to be clear?

Because it's part of the binary interface. Changing the binary interface
prevents existing programs from interoperating. Everything breaks until the
software is recompiled and that can be a major nuisance at best and impossible
in the worst case.

Simple, well-defined and stable binary interfaces are a major reason why C is
still widely used. Uncertainty in this area is never a good thing so people
will actively avoid language features that introduce it. Looks like enums
aren't favored for similar reasons: what is the underlying type?

------
pjc50
On the "weird pointer solutions" tangent, there's the ARM authenticated
pointers: [https://lwn.net/Articles/718888/](https://lwn.net/Articles/718888/)

Given that years ago we added a mandatory piece of hardware to most systems to
implement virtual memory, I'm now starting to wonder what security and/or
performance benefits could be achieved by delegating memory allocation to (or
through) hardware.

~~~
arethuza
That's a fascinating approach - how widely used is it? Would love to know
whether it causes any problems for 'existing' code.

~~~
saagarjha
Every new (post-2018) iPhone ships with this. iOS developers can build code
for the architecture but I believe Apple currently strips it out before
distribution, so its use is limited to the OS for now. I would assume at some
point they’ll flip the switch to allow it; until then developers can use the
toolchain to test if their code still works (generally it does, but messing
with function pointers in ways unspecified by the standard can occasionally
cause problems). ‘pjmlp is fairly interested in this topic so they might be
able to share some more examples of it being used if they drop by the thread.

------
dgellow
The concept of "fat pointer" the article is about has been described by Walter
Bright (D creator) as "C's Biggest Mistake":
[https://www.drdobbs.com/architecture-and-design/cs-
biggest-m...](https://www.drdobbs.com/architecture-and-design/cs-biggest-
mistake/228701625). It's also an interesting read.

The summary version (from Walter Bright's article) is:

> C can still be fixed. All it needs is a little new syntax:

> void foo(char a[..])

> meaning an array is passed as a so-called "fat pointer", i.e. a pair
> consisting of a pointer to the start of the array, and a size_t of the array
> dimension.

~~~
tom_mellior
It's worth noting that fat pointers didn't originate with Walter Bright or
that 2009 article. The oldest C-with-fat-pointers I can think of off the top
of my head is CCured from 2002:
[https://people.eecs.berkeley.edu/~necula/Papers/ccured_popl0...](https://people.eecs.berkeley.edu/~necula/Papers/ccured_popl02.pdf)

The paper mentions fat pointers in passing, not putting the term in quotes,
not defining it, and not giving a citation -- which makes it clear that the
term was already well established at the time.

~~~
caspper69
Fat pointers were part of Pascal (and derivatives), although I'm sure the
concept has existed in one form or another going back to the beginning.

edit: Pascal pointers were just a location and size, however, not a slice-type
fat pointer; however, I have always heard of _any_ pointer containing more
information than a memory address referred to as a fat pointer (except tagged
pointers). YMMV.

~~~
int_19h
I don't think Pascal pointers had to be implemented as location+size, since
most operations that required checking the size were undefined or
implementation-defined anyway. Some implementations might have used them to
provide runtime checks, but it was certainly not the case in Turbo Pascal, for
example.

------
clarry
Is this blog post confused or am I confused? It keeps talking about fat
pointers but the description looks much more like "arrays with their length
stored before their first element," which is a massive difference.

~~~
mehrdadn
I think the latter. Fat pointers were supposed to be 2 pointers wide, not 1...

~~~
eqvinox
It's just using "fat pointer" to refer to the concept of passing around a
pointer with extra information concerning the data it points to. I agree that
generally people would expect "fat pointer" to imply a larger pointer itself,
but I don't think the label is misused egregiously enough to warrant picking
at this.

~~~
clarry
> It's just using "fat pointer" to refer to the concept of passing around a
> pointer with extra information concerning the data it points to.

Is it actually? Here's a quote: "The trick is to place the value representing
the number of items in the array in memory just before the pointer we actually
pass to functions. This pointer is fully compatible with normal pointers,"

So suppose I pass this "fully compatible with normal pointers" pointer to a
function.. and it ends up being stored in a register. Now where is this
location "just before the pointer"? In the previous register?

I don't think this post is describing fat pointers at all. I think it is
describing arrays with a prepended length header. There is no data in or
before the pointer, there is metadata before the pointee assuming you're
pointing at something with metadata prepended to it.. No extra information is
passed with a pointer, it's just assumed to be there at the pointee. Call it a
fat pointee if you want a fancy name; at least that acknowledges the onus is
on the pointee to have the right metadata in the right place. There's nothing
in the pointer here.

~~~
saagarjha
> Is it actually?

Yes, the extra data being the length that is located in memory before the
pointer.

> So suppose I pass this "fully compatible with normal pointers" pointer to a
> function.. and it ends up being stored in a register. Now where is this
> location "just before the pointer"? In the previous register?

You’re at the wrong level of indirection; the header in in the memory before
what the pointer points to, as in p - 1 rather &p - 1.

> I think it is describing arrays with a prepended length header.

Correct.

> Call it a fat pointee if you want a fancy name; at least that acknowledges
> the onus is on the pointee to have the right metadata in the right place.

I would prefer that this was not called a “fat pointer”, but the claim made
above is that a pointer with any implicit data associated with it is “fat”.

> There's nothing in the pointer here.

No data, but there is an implicit guarantee that it points to the data portion
of a length-prepended array.

~~~
clarry
> Yes, the extra data being the length that is located in memory before the
> pointer.

> You’re at the wrong level of indirection; the header in in the memory before
> what the pointer points to, as in p - 1 rather &p - 1.

You are contradicting yourself. &p - 1 is before the pointer. p - 1 is before
the pointee.

> I would prefer that this was not called a “fat pointer”, but the claim made
> above is that a pointer with any implicit data associated with it is “fat”.

> No data, but there is an implicit guarantee that it points to the data
> portion of a length-prepended array.

I think that's a borderline useless definition. In my C, there's an implicit
guarantee that any pointer, in a context where it may be dereferenced, points
to a valid object (as long as it's not NULL and there's no programming error
causing it to point at nothing well defined). Usually it points at the start
of a struct, sometimes it points at list node (or whatever) embedded in a
struct and I might have to work my way back with offsetof. Either virtually
all of my pointers are fat or none of them are. I go by that none of them are,
because whatever data I have (implicit or explicit) is not a property of the
pointers I use but of the data I point them to.

------
gumby
I understand their desire to use a library, but there's a faster and safer way
to do this that's more C-like if you have access to the compiler:

Just locate anything declared as an array in a particular linker section so
the pointer manipulation can be done with two (or one if it's at the top of
memory) comparison, possibly even to a constant.

If you do this you can even forbid pointer arithmetic except in actual
[]-declared memory, and can do transparent bounds checking (&array-1 can hold
the array length or, possibly faster, the address of the location after the
end of the array).

An advantage of this over the library route is you can prevent pointer/array
punning but otherwise allow any C program to work fine. And apart from a few
corner cases (there are legit non-array uses of pointer arithmetic, though
very few) and noncompliant program can be changed to use [] and still work
perfectly fine without this option being used.

------
wyldfire
"This proposal wasn't accepted into the C standard..."

Walter often shows up on HN, so I'll ask: was this proposal merely on the Dr
Dobbs article or did it actually go to a committee for review? If the latter,
why wasn't it accepted?

Should C reconsider this? Especially now that C++ has std::span<> and
std::string_view<>?

~~~
pjmlp
Currently Checked C seems to be only attempt left, and a mentality shift to at
very least use the static analysis tools that come with the compilers.

Contrary to common HN wisdom, most C and C++ related surveys show that only up
to 50% actually use some kind of analysis tooling.

------
mehrdadn
Also see: BSTR

[https://docs.microsoft.com/en-us/previous-
versions/windows/d...](https://docs.microsoft.com/en-us/previous-
versions/windows/desktop/automat/bstr)

~~~
tonyedgecombe
Because you can never have enough string types in your Windows projects.

~~~
72deluxe
Or enough wrappers to safely acquire and free those things in C++ without
having to write many extra lines or spend a long time looking for leaks

------
saagarjha
Previous discussion:
[https://news.ycombinator.com/item?id=10526159](https://news.ycombinator.com/item?id=10526159)

(Fixed to actually point at discussion of fat pointers; thanks, ‘dgellow.)

~~~
dgellow
The link I posted is the documentation about fat pointers, not the homepage of
libcello.

~~~
saagarjha
Fair enough; I’ll replace the links above with
[https://news.ycombinator.com/item?id=10526159](https://news.ycombinator.com/item?id=10526159),
which as far as I can tell points to the same page. Thanks!

------
seventh-chord
This seems to assume you want to actually allocate individual objects using
malloc/calloc, when in reality you usually want to pool your allocations
somehow, both for performance and for your own sanity.

------
arcticbull
Might be a nice safety feature to tag the first few bits of the size with a
magic sequence so the "free" method can sometimes catch an attempt to free a
non-fat pointer passed into it.

------
projektfu
Another question for language lawyers: if you are given a pointer to char, is
it defined behavior to somehow cast the data before that into a correct
integer? Assuming you have

    
    
       typedef struct MyStr {
          size_t mystr_length;
          char [1];
       } MyStr;
       
       typedef char *PMyStr;
    
       size_t MyStrLen(const PMyStr p) {
          const MyStr *pmystr = ?;
          const size_t *psize_t = ?;
    
          return *psize_t;
       }
    

How do you make a legal cast to get either pmystr or psize_t?

~~~
saagarjha
That depends on what p points to. If it’s the address of the char[1] array
inside a valid PMyStr, then as far as I’m aware you can subtract the correct
number of bytes (offsetof is your friend) and cast the resulting pointer to
your MyStr, then get the size from there.

~~~
projektfu
Ah, with your information I'm looking into the container_of macro which I'm
not entirely certain is legitimate but hasn't run into problems.

------
projektfu
Many string functions are intended to operate on a substring. These functions
would appear to need the original string passed in every time to find the
length.

~~~
glxxyz
Every pointer into an array could pass the triple _{current, start, len}_ , or
even maybe the quadruple _{current, start, len, allocated}_ , which is the
kind of minimal efficiency C programmers are looking for.

------
MichaelMoser123
They use a similar trick with std::string in libstdc++. The string object has
a pointer to the null terminated character string, right before that string is
a structure that contains the reference count. (I think libcello could add
reference counted objects that way)

------
krilovsky
From a cursory look, this is one of the most unsafe pieces of code I have
seen, with complete disregard to memory alignment requirements and the
lifetime of temporary objects passed as arguments to functions.

Definitely don't use this in production code.

~~~
jedimastert
From the FAQ[0]

> Can it be used in Production?

> It might be better to try Cello out on a hobby project first. Cello does aim
> to be production ready, but because it is a hack it has its fair share of
> oddities and pitfalls, and if you are working in a team, or to a deadline,
> there is much better tooling, support and community for languages such as
> C++.

[0]: [http://libcello.org/home](http://libcello.org/home)

~~~
asveikau
The author has already made it clear they consider "memory allocation by
unaligned offset into temporary char[] cast into a struct pointer" to be a
valid strategy, so frankly I'm not very interested in their opinions on
whether it's production ready.

I've seen it on HN before, whenever this project gets mentioned. People who
don't know much about C confuse it for a really cool thing you can do with C,
as if it's just another legit library that you can pick up and use. It's a lot
of undefined behavior. People have enough problems writing safe C as it is,
and on top of this complaint about alignment and misuse of temporaries, this
thing makes the problem worse in other ways too, removing the few safeguards
that exist by treating everything as void* for instance.

