While TLD registries will probably provide you with files in a sane subset of that specified in RFC 1035, there are a number of things that will NOT work in general:
- Splitting the file in to lines (paren-blocks and quoted strings can span lines, strings can contain ';' etc).
- Splitting the file on whitespace (it's significant in column 1 and inside strings)
- Applying a regex (you'll need lookahead for conditional matching and it'll get ugly fast)
Don't go down the road of assuming it's a simple delimited file.
A few references:
 See page 9 of https://archive.icann.org/en/topics/new-gtlds/zfa-strategy-p...
I use the BIND tool named-compilezone to canonicalize zone files, which allows me to apply simple regex parsing, because I can assume one record per line, all fields present, and no abbreviated names. Main disadvantage is it is not very fast.