I imagined a separate pass for the preprocessor, where the state would only be #defines and the current stack of #if blocks. Thus compiler would only have to keep track of type declarations and globals (including functions). With some effort, it should be possible to encode this quite efficiently in memory, especially if strings are interned (or better yet, if the arch can do mmap, sliced directly from the mapped source). Looking at tcc, it's somewhat profligate in that compound types are described using pointer-based trees, so e.g. function declarations can blow up in size pretty fast.
Yeah, definitely running the preprocessor as a separate process eases the memory pressure — I think Unix's pipeline structure was really key to getting so much functionality into a PDP-11, where each process was limited to 16 bits of address space.
Pointer-based trees seem like a natural way to handle compound types to me, but it's true that they can be bulky. Hash consing might keep that manageable. An alternative would be to represent types as some kind of stack bytecode: T_INT32 T_PTR T_PTR T_ARRAY 32 T_PTR T_INT32 T_FN or something for the type of int f(int *(*)[32]) or something (assuming int is int32_t). That would be only 11 bytes, assuming the array size is 4 bytes, but kind of a pain in the ass to compute with.
Interned strings — pointers to a symbol object or indexes into a symbol array — can be bigger than the underlying byte data. Slices of an mmap can be even bigger. This is of course silly when you have one of them in isolation — you need a pointer to it anyway — but it can start to add up when you have a bunch of them concatenated, like in a macro definition where you have a mixture of literal text and parameter references.