I wonder, and this is speaking from virtually no experience in the area, if it would be possibly to use machine learning to infer all the myriads of layout rules (etc.), instead of actually writing to spec
I'm fairly familiar with ML, and I'd say that's a definitive no.
Implementing a layout spec is exactly the kind of thing that is "easy for computers, hard for humans". ML is for things that are "hard for computers, easy for humans" (like, telling dogs apart from cats, or transcribing speech, etc).
I'd bet even that would be only with a some or all of (1) a LOT of training data, (2) a LOT of preprocessing, and (3) using less famous architectures (perhaps recursive neural nets).
I say so because (among other reasons) in general, current popular ML architectures (like transformers) count processing heirarchical data as one of their weaknesses. For example, there's a theoretical paper proving that self attention blocks (which are central to transformers) cannot solve arbitrarily nested negation(Ie, resolve not(not(not(true))) to true). Practically as well, we see most times that language models have trouble dealing with naturally occurring double negation in language, etc. But CSS/HTML is, I think, very heirarchical.