This is really great stuff, as usual from Ink & Switch.
One initial thought that I'd offer is that in a simple running system, these 'lenses' are arranged into a series[1] that can take a datastructure from an origin 'base' version all the way through to the latest-known representation. This may be more familiar to most developers as an analogy to database migration scripts, as mentioned in the post.
When a new lens is created, the developers may want to distribute it to a subset of users, ensure that it works correctly, and potentially adjust it based on feedback before issuing the final lens to the application's population[2].
If this is the intended release workflow, then the arrangement of lenses becomes a graph or chain rather than a simple series. There may be times where it's necessary to backtrack briefly.
It's possible (although hacky, I will admit) to create an NPM JavaScript module that has another version of itself as a dependency.
I mention that because this provides a way to distribute some code -- a lens, for example -- alongside a dependency graph using an ecosystem that is relatively well-evolved (in terms of release management, client upgrade support, etc) for versioned code distribution.
[1] It's nice the that analogy with light passing through a series of lenses fits
[2] Think of an optician testing your eyesight and asking questions about various sample lenses
I just have to say, it's my name, it's my name! A cool sciency thing has my name! (I'm not sciency at all. Social worker by training, policy and budget wonk as my one-and-only non-human/non-canine love).
I really want to use lenses. The idea of a definition that allows lossless translation from one data representation to another is so elegant. But I wonder if it's a mathematical ideal that doesn't match up with real world problems? I thought to apply the concept of a lens in my problem domain, converting engineering files to machine readable config files, but the reality is the process is more like combining multiple inputs to form an output, which may not contain all of the input data. So I'm left applying some tricks (including unnecessary data in the output/input) to make the process act like a lens. And it's pretty error prone.
Are there applications for lenses that really allow lossless translation in both directions? Or am I misunderstanding the concept in some fundamental way?
This was my take-away as well. This is cool and all, but if your mapping problem can be entirely solved declaratively like this I don't know if you actually had all that tough of a problem in the first place. Almost every data translation nightmare I've been sucked into involved some pretty sophisticated business logic, and any declarative language that has the sufficient power to handle it inevitably becomes more complicated to write and maintain than a turing-complete, plain old programming language.
Without being too knowledgeable on the topic, but with a little math background - you'd need a bijective function to that. On a quick search for bijective lenses https://arxiv.org/abs/1710.03248
Close! In math lingo, you can only have a bijective function for domains with equal cardinality.
In other words, you couldn't have a bijective function between an integer and a boolean, because there are fewer possible boolean values than integer values.
Demanding bijective functions is probably more restrictive than you want to be for building real-world systems. We discuss this a bit in the paper -- I think look for "convert".
As I was reading, I was reminded of schema-migration definitions which some ORMs/DB tools give you (I currently use Knexjs). There might be a useful intuitive correlation there [edit: I see jka and the post already pointed that out]. Of course in the context that I&S is discussing, the migration is at “runtime” (so to speak) because we’re dealing with systems that can’t do all-at-once migrations. Like intrepidhero commented, I’m curious to know if declarative lenses will be sufficient for the task, but I’m glad that’s what they explored. I’ll be reading this post again more closely in the future.
Yes, this is definitely a useful point of comparison!
We hesitated to mention database migrations too much in the essay for the reason you mention: in some sense, the whole point of Cambria is that you never do a one-time "migration", but rather you just continuously translate the data.
Still, using the system does feel pretty similar to doing database migrations, and we intentionally modeled some of the dev workflow after ActiveRecord migrations:
I love the analogy, but no! First, XSLT is unidirectional and although I have built some tree-like XSLT processing systems, Cambria is explicitly graph-based and bidirectional.
In other words, with Cambria you can build a graph of lenses that describe various data schemas and then translate from or to any of them.
That said, XSLT has different strengths (syntax-aside) in that it is a general purpose templating language with strong functional programming underpinnings.
Databases should add support for lenses and also support type conversations, column relocations, new default values, and everything else a user might need.
YAML is a convenient stand-in for a better syntax and was chosen for being less tedious to type than JSON itself, while still supporting JSON Schema to make our editor integration work.
Long run, I expect neither JSON nor YAML are really the best solutions.
The problem with these data transformations is that the limited descriptive capabilities of YAML are going to be a problem, and then they'll switch to some hybrid monstrosity of YAML+Typescript, or some ad-hoc language or some new template syntax.
This always ends up happening; this will happen here, without a doubt.
One initial thought that I'd offer is that in a simple running system, these 'lenses' are arranged into a series[1] that can take a datastructure from an origin 'base' version all the way through to the latest-known representation. This may be more familiar to most developers as an analogy to database migration scripts, as mentioned in the post.
When a new lens is created, the developers may want to distribute it to a subset of users, ensure that it works correctly, and potentially adjust it based on feedback before issuing the final lens to the application's population[2].
If this is the intended release workflow, then the arrangement of lenses becomes a graph or chain rather than a simple series. There may be times where it's necessary to backtrack briefly.
It's possible (although hacky, I will admit) to create an NPM JavaScript module that has another version of itself as a dependency.
I mention that because this provides a way to distribute some code -- a lens, for example -- alongside a dependency graph using an ecosystem that is relatively well-evolved (in terms of release management, client upgrade support, etc) for versioned code distribution.
[1] It's nice the that analogy with light passing through a series of lenses fits
[2] Think of an optician testing your eyesight and asking questions about various sample lenses