
Parsing Excel Spreadsheets with Swift's Codable Protocols - maxdesiatov
https://desiatov.com/swift-codable-xlsx/#title
======
sheetjs
First off: awesome work!

Correctly reading the data from XLSX is a lot more complex than described or
implemented here, mostly because Excel is so robust in reading files and there
are many sloppy writers. If you're interested, there's a many-thousand page
ECMA-376 specification: [https://www.ecma-
international.org/publications/standards/Ec...](https://www.ecma-
international.org/publications/standards/Ecma-376.htm)

\- to correctly get the first worksheet, you actually need to parse the
workbook.xml file and look into the sheets array to find the corresponding
relationship IDs. This is explained in section 18.2.20 (page 1579 in the part
1 PDF). iOS Numbers used to write worksheets in the opposite order, which
messes up the naive attempt to read the relationships file in order.

\- the attribute "s" in a cell is an index into the styles table, while the
cell type "s" corresponds to the shared string table. If you're curious, its
in section 18.1.3.4 (page 1604 in the part 1 PDF)

PS: We build and maintain parsers and writers for spreadsheets in JavaScript
([https://github.com/SheetJS/js-xlsx/](https://github.com/SheetJS/js-xlsx/) is
our most popular project), including a CLI script to convert files to CSV.
Some of our users use JSC in the context of Swift applications, ingesting data
in JS and returning a CSV for further processing in Swift.

~~~
ergothus
> there's a many-thousand page ECMA-376

Honestly, I'd love to see if there are organizational tips on managing (and
using!) a document that large.

I feel like technical writing is the closest to coding we get in plain
languages, but there are still critical differences. (technical writing is
trying to give instruction to a human, while coding is giving instructions to
code while giving a lot more context and description to a human). Specs cross
this line a bit more - it's giving descriptions to a human with the intention
of giving instructions)

Unfortunately, while I can find good code and bad code, I tend to find bad
specs and WORSE specs. I can see progress (the various HTML5 and related specs
are vastly better than previous versions, for example), but anytime I go in
with a question (which is admittedly rare) I spend a lot of time finding the
salient part compared to related-but-missing-the-vital-piece part, which is
actually the exact same problem that I think the most common problem in
maintainable code: making it easy to not only know how, but WHERE.

Are there lessons from specs we can learn? Do the good ones have some sort of
"concept" section that makes the reading of it easier? Does each subsection do
that?

~~~
zimablue
The advice is that it should only exist in machine-readable form and the
reason why Microsoft does it this way is to ensure lock-in. See also MSSQL,
for which there exists no complete machine-readable spec ANYWHERE (as far as I
could tell when I spent 2 days looking a few years ago).

~~~
13of40
>lock-in

A company that spends millions of dollars employing technical writers to
publicly document a format probably isn't conspiring to keep that format
secret. Maybe the macaronis aren't the shape you wanted but you got the
macaronis.

~~~
scarejunba
IIRC it was in response to many government sources requiring open formats (a
good instinct) so just because they did it doesn’t mean they wanted to. They
may have been forced to, and done the minimum as a result.

~~~
userbinator
I believe it was the antitrust lawsuits that forced them too, and it shows; if
you look at all the MS protocol/format documents and compare them to something
like RFCs and ITU/IEEE/ANSI standards, the MS docs are noticeably harder to
read with their verbosity, weird syntax notations and conventions, and almost
look as if they were deliberately obfuscated.

------
conradev
The one thing I don't love about Swift's Codable is the lack of
customizability in the "magic" part: the part where the compiler generates the
Encodable/Decodable implementations. Most notably, the compiler can't generate
implementations for enums. The only thing that Swift supports customizing
without fully implementing the methods for Encodable and Decodable is the name
of the keys, using a custom CodingKeys type.

Serde, an equivalent third-party crate in Rust, supports a lot of
customization which I find invaluable. It can (de)serialize values like this
with ease:

    
    
        {"type": "location", "value": {"latitude": 0, "longitude": 0}}
    

in a very small amount of code:

    
    
        #[derive(Serialize, Deserialize)]
        #[serde(tag = "type", content = "value")]
        #[serde(rename_all = "lowercase")]
        enum Value {
            Location { latitude: f64, longitude: f64 },
            String(String),
            ...
        }
    

Serde also supports customizing serialization on a per-field basis without
having to implement the entire protocol, which is nice:

    
    
        #[derive(Deserialize, Serialize)]
        struct Record {
            #[serde(with = "chrono::serde::ts_seconds")]
            updated: DateTime<Utc>
        }
    

I really hope that Swift has better ways (like the above) to customize Codable
in the future. I find myself implementing the protocol myself in 90% of cases,
whereas I very rarely have to do that for Serde.

~~~
maxdesiatov
The main reason for that is lack of hygienic macros in Swift. Currently you
can use code generation tools like Sourcery and SwiftGen, but I expect 1st-
class meta-programming support to come after Swift 5.0 release. After ABI
stability I imagine macros are pretty high on the priority list of the core
team.

~~~
slavapestov
I have a hard time envisioning full-fledged macros coming to Swift, but moving
Equatable/Hashable/Codable synthesis to a library should be possible with more
restricted meta-programming capabilites. See Joe Groff's talk for details:
[https://www.skilled.io/u/swiftsummit/swift-s-reflective-
unde...](https://www.skilled.io/u/swiftsummit/swift-s-reflective-
underpinnings-joe-groff)

~~~
saagarjha
> I have a hard time envisioning full-fledged macros coming to Swift, but
> moving Equatable/Hashable/Codable synthesis to a library should be possible
> with more restricted meta-programming capabilites.

I've heard many people on the mailing lists talk about wanting to add hygienic
macros–so when you're talking about "full-fledged macros", are you talking
about unhygienic or hygienic ones?

> See Joe Groff's talk for details

I'm actually curious how much is possible, in terms of reflection capability,
both currently and in the future. How close can I get to the dynamism of say
the Objective-C runtime? Can I grab a function pointer knowing its mangled
name? Can I list every class in a binary by consulting the runtime metadata?
Can I "swizzle"?

~~~
slavapestov
> How close can I get to the dynamism of say the Objective-C runtime?

Well, right now the compiler emits a lot of metadata that is mostly used for
runtime generics, dynamic casting, and the (somewhat limited) Mirror type. But
these features have enough generality that a surprising amount of stuff has to
be encoded. However there’s no nice API for looking at it yet.

> Can I grab a function pointer knowing its mangled name?

That’s just dlsym().

> Can I list every class in a binary by consulting the runtime metadata?

The metadata is there, but there’s no exposed API for doing this.

> Can I "swizzle"?

There’s an experimental thing for this now:
[https://github.com/apple/swift/pull/20333](https://github.com/apple/swift/pull/20333)

~~~
saagarjha
> > Can I grab a function pointer knowing its mangled name?

> That’s just dlsym().

Sorry, I should have been more clear: can I grab a function pointer to an
_unexported_ function?

> > Can I "swizzle"?

> There’s an experimental thing for this now:
> [https://github.com/apple/swift/pull/20333](https://github.com/apple/swift/pull/20333)

This is interesting; I haven't been really been following the lists recently
but I took the time to read the linked thread and the pull request. While the
functionality contained in the pull request is interesting, it's not quite the
same as swizzling since it's done unconditionally at load time rather than
runtime (so it's much more similar to DYLD_INTERPOSE in that sense). While
this covers many of the cases when swizzling is necessary, it leaves out a
rather important one where the correct method override is selected at runtime,
generally conditionally.

Also, as a sidenote, it seems that there is some sort of motivation to have
compiler type checks for the replacement, i.e. @_dynamicReplacement(for:
bar())–how will this actually work in practice? If I compile a bundle without
access to the source of the application I'm going to be loaded into, how would
bar() be accessible to the compiler at all?

------
oflannabhra
The Codable protocol in Swift is game-changing. Max Howell (homebrew author)
recently started a series of articles[1] in which he used Codable structs as
the shared data model in both the backend and frontend of his new app Canopy.
Even though communication is through HTTP and JSON, he never even has to touch
it.

[1] - [https://medium.com/@mxcl/server-side-swift-making-
canopy-2ed...](https://medium.com/@mxcl/server-side-swift-making-
canopy-2ed586b7f5a9)

~~~
mpweiher
> Codable protocol in Swift is game-changing

Which is really weird, considering Objective-C had automatic
"activation/passivation" from the beginning (early 80s), and it was pretty
trivial to adapt similar mechanisms later.

~~~
maxdesiatov
Except that in Objective-C it's a runtime feature, which is by definition
slower. Swift's Codable implementation is generated by the compiler or is
hand-written with an obvious benefit of a stronger type system.

~~~
mpweiher
Have you measured this? (I have)

