Swapping from EF6 to Dapper was one of the best choices we ever made with our pr...

benmorris · on Sept 21, 2019

If I was starting from scratch I would probably use an micro ORM like dapper or in my case I have experience with Petapoco.

There are so many "gotchas" to using EF that I'm not sure if was worth using to begin with (for a complex project at least). 8 years into developing and maintaining a huge ecommerce platform built on EF I think if I started over I would probably not use EF.

celticmusic · on Sept 20, 2019

This is just another take on EAV (https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...).

I'm unsure if it's better than the typical way of modeling EAV though, I'd have to think it through.

hdfbdtbcdg · on Sept 20, 2019

I am seriously glad I don't work on this project!

bob1029 · on Sept 20, 2019

Care to elaborate on which aspect noted above is adverse from your perspective? I would be happy to provide more context and explain in more detail some of the reasoning involved in our decisions.

hdfbdtbcdg · on Sept 20, 2019

> Deserializing a JSON blob to/from a column into/from a model containing 1000+ properties in complex nested hierarchies

> complex, rapidly-shifting business models

Your business logic classes have 1000+ properties. And you plan to not migrate them when the schema changes but leave many instances with old versions of the schema sitting in the datastore. Your application logic is going to get nasty!

bob1029 · on Sept 20, 2019

The other aspect here is that the lifetime of these objects is very brief for our application. Typically 10-60 minutes. Schema changes, while breaking, are tolerable along business cycle boundaries.

hdfbdtbcdg · on Sept 21, 2019

Why use SQL at all? It sounds like you needed a key value store?

micksabox · on Sept 20, 2019

Some commenters have asked what kind of business would deploy this kind of technique.

I’ve done it. For MLS syncing software, there are lots of properties, not thousands but can be hundreds. And each MLS RETS has its own Schema, so for any kind of logic portability this is necessary.

Actually, storing the raw data as a blob is a flexibility technique and is a separate concern than the number of fields. As I can’t predict the future set of optimized queries I’ll need, and I don’t want to constantly sync and resync (some MLS will rate limit you), then this way I can store the raw data once, and parse plus update my tables/indexes very quickly.

james_s_tayler · on Sept 20, 2019

I didn't quite understand fully how your solution worked in the end, are you storing the entire object graph as a JSON blob alongside the relational data in the table, or are you simply storing the JSON blob instead of using relational data?

Its difficult for me to picture how Dapper even comes into play when you're doing this trick with the JSON blob.

Why not use NoSQL?

Also 1000+ properties on an object? I know some domains sometimes surface these kind of extreme cases, but is there not an alternative to having 1000 properties in one object?

bob1029 · on Sept 20, 2019

For certain abstractions, we are looking to maintain the state of a business process over time:

The relational concern is the storage of metadata sufficient to locate & retrieve the state. E.g.: integer primary key, name of process, current transition in process, some datetime info, active session id, last user, etc.

The 'non-relational' portion is simply a final 'Json' column per row that contains the actual serialized state. This state model can vary wildly depending on the particular process in play, so by having it serialized we can get a lot of reuse potential out of our solution.

In terms of the models, it's not 1 gigantic class with 1000 properties. Its more like a set of 100+ related models with 5-30 properties each.

james_s_tayler · on Sept 20, 2019

Ah yup, that makes a lot more sense. I can imagine how fun that might be trying to do through EF.

It sounds like a workflow engine. I'm picturing one table that is very generic that tracks "this job id, this workflow type, this stage in workflow, this entity, this state of the entity" and a single job has multiple of those entries over the lifetime of that job execution and the JSON blob is the current state of things, so that you don't have to go and recompute that.

Yeah, it seems like a reasonable choice.

What sticks out to me in a scenario like that is a good CQRS implementation. The write side of it pumps in the history of the job execution, then denormalizers run to project that into a shape amenable to being read by the application.