> Robin: The COO can’t waste time trying to track down some engineer and talk to them about… uh… God knows what.
At my first job (infrastructure monitoring, 24/7 shifts) the COO appeared one morning at 7:30 am in the office because he had seen a ticket opened a few minutes before from a particularly annoying but important client.
He proceeded to give us some additional context on that client, added some clues about their infra and we went on our way to handle the case. In all this he was knowledgeable and kind. I have a fond memories of that man.
> The whole company was dependent on the internal content management system (the CMS). It was built from scratch using the Ruby on Rails framework. Robin had overseen its development. Various teams had been working on it for 7 years.
Hahaha, oof. Tale as old as time, and I mean specifically this exact thing: Rails, passing through the hands of multiple teams, some or all outsourced. Anyone who's seen this before knows exactly what that codebase was like to work on.
I have to confess I don’t know what a data lake is. Halfway between a data puddle and a data ocean? There’s a lot of terms in this industry that you can avoid if you don’t work on that specific subset of functionality.
I do wonder if the marketing data from a 150-person org wouldn’t fit on a good old-fashioned database server. No need to get your feet wet.
Based on my own personal experience, a data lake is more of a data dumpster fire: a bunch of CSV, XML, or JSON files in random directories (typically in cloud object storage like S3 or a clone) plus sort of system that processes that data (Amazon Athena, Spark, Hadoop, DuckDB, whatever) to generate reports. In some projects I've seen, the size of the "lake" has been vastly over stated and could be stored in RAM on any modern system.
I think it works the other way, once a manager is threatening to fire someone, they are already fired. It's really hard to work well for a person who doesn't like you. Even if you swallow your pride and work hard, you're going to get all the shit shoveling tasks, and you can never work fast enough.
That's what I'm getting at - if you've threatened to fire someone, they've clocked out; they know they're gonna get shit on so they're going to basically wait around until you actually fire them.
> Sadly, I was unable to convince Robin that she should change her habits. In 2018 the Board Of Directors fired her. She had wasted 9 years and millions of dollars building a CMS that still lacked crucial features that the staff needed, and so she had done great harm to the organization.
This reminds of the System Engineering Maxim "User rejection is the number one cause for system failure"
I don't know that that is 100% correct, but I think one of the strengths of iterative processes is regular customer input & fast feedback loops
The good thing about anecdotes is that they convey an idea in a concise manner. The bad thing about anecdotes is that they convey someone's idea about a real thing which we don't know about and have to rely on their storytelling which is usually self serving and rather subjective.
We don't know so many things about the project that all the solutions I see in the comments could be wrong for one reason or another.
Lot of comments here regarding how to implement the solution, but IMHO, The key takeaway was at the end -
> Respectful leadership pays dividends in all circumstances. But it especially pays dividends when we are talking about the relationship between top leadership and those teams on whom they are utterly dependent. Respectful Leadership does not mean that you have to spend your time listening to everyone in the organization — in a large enough organization that is not even possible. But it does mean that when you are dependent on someone’s work, you commit to working with them in good faith, and you make time to hear their concerns and their suggestions.
Well, reality often works differently, and the bulk of the work is to prepare the environment where that SQL query may run. You have various data sources - and which one exactly, that SQL query won't tell you, you'll usually have to go and find out with some other means. The data sources have various interfaces and formats - and sometimes protocols too. You might decide to collect the data together, to that proverbial data lake - then to run the SQL query you'll have to design that data lake, tables, columns and all, and convert the data into it. Or - I was wondering if this applies to the described problem - for one-off calculations you may choose to work with streams of data, skipping building data lakes - this can work, but how to correctly map SQL to streaming data could be an interesting problem - what if some data you need several times for several different intermediate functions, will you have to read streams many times or do something else?..
So, yes, the SQL query. Perhaps one of the small and easy parts of the problem.
Yes but to get all the raw data into a form that can be queried by SQL or getting a tool that can query JSON, CSV … via SQL is the non trivial part. Not to mention reconciling date formats etc from multiple sources into one format.
80% of the job is data cleaning.
This was a bizarre thing to read. I felt like it was straight out of the uncanny valley of human interaction. “No one talks this way!” I thought. Except for that little voice in the back of my head that reminded that, yeah, some people do actually talk this way, and reading this feels like I’ve crash-landed on a planet full of them.
Regardless, “data lake” is a fucking stupid and nonsensical metaphor.
I would interpret the first conversation as a giant red flag and get the hell out of there. Unless you're getting paid stupid amount of consulting dollars, it isn't worth the time dealing with COOs with inflated egos.
Milk it if you need the money, but avoid putting your reputation or long-term goals on the line for people like that.
Oh, so this is how my PII winds up in any number of jointly shared "data lakes"...
Amelia: Okay, but what does this project involve? I mean, really? Have you thought about the details? You want us to get all of your Facebook data, all of your Google Ad Words data, all of your Amazon data, all of your print data? And your entire catalog? And all of your sales data? That’s an enormous amount of data.
Me: Oh, I see. You need a place to put it.
Amelia: We need a place to put it. So we are going to build our own, internal data lake, and we will put your data there. But I’m thinking, wait, does that make sense? We build a private data lake full of your data? Won’t you want access to that data lake?
Me: Yes, of course. If it’s our data, then we would want access to the data lake.
Amelia: That’s what I’m thinking.
Me: Since you have to do it anyway.
Amelia: Exactly. There is no other way we can do it.
what I'm thinking is that there is another way to do it.
"Please note, this is not a rant about out-sourcing."
This should have been a rant about out-sourcing. If understanding what they are spending money on is not a core requirement for a capitalist organization, I'm not sure what could count as one.
The business needs a database, the database needs someone competent in SQL along with some basic accounting and finance education.
… or he could have just made a simple report like she asked. A very cringey take down of someone just asking for basic information a COO would logically need to do her job. A good developer would have got that work done without being so obtuse.
At my first job (infrastructure monitoring, 24/7 shifts) the COO appeared one morning at 7:30 am in the office because he had seen a ticket opened a few minutes before from a particularly annoying but important client.
He proceeded to give us some additional context on that client, added some clues about their infra and we went on our way to handle the case. In all this he was knowledgeable and kind. I have a fond memories of that man.
Holger if you’re reading this, you’re great.