I've worked with very large amounts of data and high performance computing for most of my career; I mostly had finance related jobs in the last decade or so. I have most of the skill you want, including some you don't know you want. However when salary comes up, that is where we start to part ways. If you are really serious about a shortage, you should be really serious about making offers that can be competitive, but I keep seeing the same $150k offers. That isn't a "shortage" kind of offer.
I've been looking for work in data engineering and databases for 9 months, and while I'm certainly not as qualified and experienced as you are, I consider myself capable. I've definitely passed the take home and whiteboard tests I've been given, etc.
When I read about a "shortage," I wonder if this is more indicative of unicorn searching than anything else.
Once a resume gets to me, and I'm only speaking for myself here, I'm looking for the challenges you've faced and the problems you've solved. I actually care very little about what tech you used because odds are we'll have something different, but we'll need to solve problems. If someone is solid in some related technical skillset, can think critically, and communicate the details of what they've tackled in the past, learning our specific tech stack is going to be the easy part.
Let me put it another way - when I look for interns or entry level hires, the number of those that can do more than spell SAS or Teradata approaches zero very quickly. But if they've solved challenges of the magnitude that they'd be expected to solve with us initially, the tech is secondary to process and problem solving. As we look more experienced, I'd still be limiting myself to candidates from a set of "legacy" industries that prefer these sorts of tools if I insisted on checking those boxes at the outset. I'd prefer to teach a really smart person to use the things that they don't know yet if I have it my way.
That was when 17 top of the line supermini's Pr!me 750's was a huge cluster (we where the largest non back user in the UK) - probably about the same as a 10-20k core Hadoop setup would be today.
I don't fancy it up too much, either. I build teams that make the data move and land it clean so that your PhDs can do the smaaaht stuff with it. I can stack BI and Analytics on top, but a lot of people can do that starting from clean data - and clean data is what I do. But I do get the impression that we're viewed as janitors and plumbers - who you'd be thrilled to see at 3am when your shit(ter) broke, right?
This is already generic.
Yes, you can always always always find somebody to do a job is your a willing to pay 10 million dollars. That means that "shortages" are impossible. It means that you can never have a shortage in any situation, because you can always pay 10 million dollars for a single visit to the doctor.
But this line of logic isn't very useful when talking about "shortages".
If you had to pay a million dollars for a loaf of bread, is there a shortage of bread? IE, billions of people will starve to death by next week, because they can't afford to buy food.
Most people would say "Yes, there is a shortage of bread".
When people talk about shortages, they are obviously talking about a shortage at a certain price point. There is no other definition of the word shortage that makes sense.
A good definition that I use for the term shortage is "If the government could snap its fingers and instantly produce large amounts of X overnight, would the world be a better place"?
If the answer is "Yes, the world would be in a much much better place", then that means there is a shortage of X. If the answer is "No, the world would only be a little better". Then that means that there is NOT a shortage of X.
A company found something it could profit from more if it paid less than current market value. That is all. They are not saying there are no qualified applicants. They are not saying they want 10million dollars.
What is the case is that a business finds a resource (the perfect hire) that they wish to profit from but do not want to pay the market value for it because that would reduce profits. Rather than be satisfied with what would be an erosion of profit (or an admission of an unworkable business model) articles are posted to demand government pressure wages downward.
If you want a bread analogy, it's as if I found a cheap source of bread I can sell elsewhere at a profit but then complain there's a shortage solely because the cheap stuff isn't even cheaper.
Wal-Mart greeters can be wonderful people and I'm not saying they aren't valuable as humans. But in labor market terms, there is clearly not a shortage of them.
If there was 1 gallon of water left on earth, Bill gates would buy that gallon for $50 billion, and everyone else would die of dehydration.
There has always been a shortage of maids willing to do all my house work for $10.
And there is a shortage of data engineers at $x, but there wouldn't be a shortage at $1M/year (because less companies would want one, and more people would be willing to do the work).
really? who would sell the last gallon of water on earth?
1. I have student debt from my law degree, and I have a lower risk tolerance until that's paid off.
2. My daughter is 4, it's nice to be around for the early years, and the corporate gig is quite comfortable in terms of hours.
3. I'm in Maine. Most clients would require me to travel, which impacts #2.
I do have a former colleague here that started a data consultancy. I should grab a beer with him and see if we have common ground in the short term. It's not quite starting your own thing, but it might be fun.
Thought experiment: If 100 companies had openings for a skill set that only one person could deliver, all 100 companies could eventually fill their openings by sequentially outbidding each other for the services of that one person.
So how would we know if a talent shortage really exists for a certain job? I can think of a couple potential hints: if starting salaries are going up much faster than the national average, or if the unemployment rate for that job is much lower than the national unemployment rate. Either would seem to indicate that, relative to the job market as a whole, there was a greater demand than supply for that particular job.
2. Wall St. quants
3. game programmers
4. PhD statisticians
So, the problem is not that there aren't 6,600 people in the US that can do it, it's that the companies can't pay or don't want to pay the $200,000 + that would be required to hire them.
Instead of working as a data engineer, I'm working at a non-profit doing pretty much everything involving data for them, as well as running their appeals, and doing almost all of the analysis. I'll lead off by saying the biggest downside of working for this particular non-profit is the salary. However, there are a lot of things I like about this job:
1) Location: I want to be located in Chicago. I have 0 interest in moving out to the West Coast. I'm up in the air about working remotely, because I feel like there is a lot of value in working with people in person.
2) The role is very broad. I get to do a lot of exciting things with data, but it is also a marketing and communication role as well. I am included in nearly every strategic discussion, not just those pertaining to data or technology.
3) Work life balance is very good. I am never expected to work more than 40 hours a week. My boss makes sure that everyone is focused on their lives, to the point where he basically kicked me out of the office for a week because I was waffling about taking a vacation. He makes sure that people know they aren't expected to check their email or do work on the VPN during off-hours.
4) The work I do makes a difference. Not in a "I make something people use" difference, but in a "my work has rescued people from being homeless and fed starving kids" difference. My first couple of jobs out of college were totally lacking this aspect, and I didn't realize how much it meant to me until I started working at a place like this.
I've been here a few years now, and so it's approaching the time where I should start looking for a new job if I want to continue to grow, but I'm having trouble visualizing what that would be. From my perspective, the problem with hiring is that job listings really focus on titles rather than roles, even in smaller organizations. I think my best bet of finding an organization matches the first two points, if not all four, is through my network rather than through job postings. So, to your point, the only way I see myself in a narrow-title role like a "data engineer" is if I really need money.
I'd just be happy with that. Most of the work I've done professionally hasn't gone anywhere; it's always "we missed the market window" or "upper management decided on a new strategy". I can't point to that many things I got to work on that actually made it into the market and were used by people for long. One place (a semiconductor company) had a successful though buggy product and large customers in place, with the product already deployed into the field, and the software I wrote got used by some customers, but then suddenly the company decided they weren't making a big enough profit margin on this part (even though the profits were guaranteed and extremely low-risk as the customers had the part designed-in), so they simply quit the market and laid off our entire team.
Making something people use would be a step up. Rescuing people and feeding starving kids is a pipe dream, but then again I work on embedded devices, not big data or analytics or anything like that so that's not exactly a position that'd be easy for me to find if I really wanted it.
Just wanted to comment on this part - sadly it's difficult to make more than a pittance even with a PhD.
If you want to talk about a shortage of labor where it would matter, biology as a field is probably hurting way more for talented software engineers than any company that needs a data engineer. There are so many great applications for programming in biology, and unlike other sciences, say physics, researchers don't tend to pick up on any amount of programming skill on their way to their PhD.
I've tried getting involved in bioinformatics on the side, but it's really difficult to keep up with the field if you don't have thousands of dollars to drop on journal subscriptions. It's also really hard to get access to the data researchers use in general (in any field), but it is made even harder when dealing with researchers involving people due to concerns about privacy. I don't think a focus on privacy is a bad thing, but a lot of publicly available data is sanitized to the point where your sample size would need to be in the billions to draw any inferences. You can request access to less general data, but good luck doing that without the support of a research organization.
Anyways, unless you have a martyr complex, there really isn't any reason to go into bioinformatics.
And I would bet across all of the tech workers in the USA there are well more than 6k that could do this.
But the GP was particularly amusing to me because of its assertion that 'smart, quantitative people, regardless of industry, can build data infrastructures for startups.' I guess we could also say, there's little incentive to pay to train them (or for them to pay to train) to become a data engineer.
Source: Am physicist who'd love to find sustainable part-time work at market rates.
Put it this way, the company isn't going to pay the employee more than the value they provide. That is the ceiling on salary. So until that ceiling is reached it is indeed a case of higher bidder takes all, as your thought experiment demonstrated. But once that ceiling is neared the company will make the decision not to bid higher, thus reducing the demand.
Thus, there is no shortage, just a shortage to work at the lower salary of companies with lower ceilings, because they aren't capable of leveraging the employee's talents sufficiently to draw from fields with related skill sets.
However if people start liking kale, and the price goes up 20% and you start telling people about the massive kale shortage people will think you're being a little histrionic.
Alternatively, you can just join a large tech org. Netflix etc. have no problem paying good DEs north of $200k in total comp.
I am a believer in inherent talent but Data Engineering is a skill set.
Most of the reskill programs I have heard of failed miserably exactly because the skill isn't enough.
But the brightside is that talented people will find a way to "upskill" themselves in whatever environment they find themselves in. It is then up to the candidates to sell themselves and for the potential employers to be flexible about considering different backgrounds and nurturing the development of cross-functional skills that are needed for so-called data-engineers.
The skills listed in the article are all fairly common but its hard to find enough of these skills within individuals. For example, its not hard to find folks who can do the care and feeding of sql-server databases, or skilled programmers, or analysts who understand the business domain intimately. The problem is getting all of these together in one individual in a "know-enough-to-be-dangerous" level.
But if you used to work as a plumber and want to up-skill to data analyst (or vice versa) it's not that simple.
Someone with a natural talent for picking up new development skills will still learn data engineering far faster when provided with proper resources and strong internal mentorship.
I can see how you might make this observation after observing a poorly conducted training program.
Also this is not just one poorly conducted training program. Denmark spent billions up-skilling parts of their work force. The results where simply no there. Something like 6 out of every 1000 person or something like that.
Also Finance requires proper education and training. Not so much for App development. So for everyone who complains about getting $150K offers, there are a 100 thousand people right here in US applying for $60K technical analyst jobs.
And they don't have finance/Google/Facebook level needs for data engineers. They can't reasonably claim to need top-level skills and then beggar out on the cost.
That's true for just about anything.
"there is no epipen crisis, only a crisis at what you are willing to pay"
"There is no poverty , only poverty at a given income level"
*"there is no crime problem, only crime problem at a given crime level"
what you are saying is self-contradictory. If you (or others) are able to turn down 150K offers...you know what you are.
Poverty is simply a description wealth and is always comparative. We can define poverty as any level we so desire.
One might argue that any crime is a problem, as long as it causes an issue for society or victims.
I have been thrown these projects at work before, where I'm the frontend engineer and I need to make some cool D3 visualization, but low behold the data is shit, and I have to help the backend team make the data useable. It's a mind-numbing job, that nobody wants, because it sounds like a one month task to get a good REST API up and working, but it usually takes three months, because you have to go back and forth making sure the data is right, and there is always 10 tricky edge cases that you have to work some magic on. Not only that but you need to have smart people cleaning the data, so that you don't make some big mistake down the line or your REST API is super slow, and you have to add another couple weeks or month to rework the data again. So that one month becomes three months, and most likely a year, because somebody will say that looks great but can we also add this, and it goes on and on. It's literally a mind-numbing job that most nobody wants. I have found that products like Tableau are the best for this, you still have to clean the data, but it helps speed up the process.
Data cleaning is a super golden problem to solve.
Give me emacs and a command line, and I have all the truth I need, which is far more honest, in my mind, than anything that can be created with D3 or Tableau. Beauty is in the eye of the beholder, and it doesn't really do anyone service to look down on the work others find enjoyable. If doing D3 makes you happy, that is awesome, and I can only congratulate you for your passion and your ability to look forward to work I don't "get," and I wish the feelings would be mutual.
I just enjoy working with raw data and raw code more than I enjoy writing something that launches a graphic. I enjoy writing a script that finds a bad piece of data, or a script that fixes up everything, or writing something that was once unable to run at all get converted to something that runs in 500ms. Perhaps it is that journey of constant discovery, and seeing that every situation is a unique little puzzle. It is seeing the world as it is with no one reinterpreting what the data means for me. I can explore it and discover what it really means. It is hollow truth, a mess of ideas converted to sets of ideas layered on sets of ideas, and when it is finally drawn down, converted, and passing all tests, it is self-evident and self-reflecting, and true. Hard to explain, but I suppose I like all the things people hate about it.
The tools matter about as much as it matters what CSS framework you are using. You have the ability to logic through UI and UX, whereas I do not. I have zero hope of ever doing well at what you do, since I simply don't have the foundation, but if it matters, I know most jobs I've applied to and worked at tend to be more ad hoc, using PL, Python, Ruby, etc.
I know this isn't reddit, so I'll point you to reddit. Check out /r/datascience where those folks talk about what it takes to be a data scientist. Some folks are honest about data engineering, but most handwave past it, or talk about it like it's beneath them. Their role would not be possible without solid data engineering, rather than a complementary and equally important discipline. Good luck doing "data science" or "analytics" or "machine learning" or every other buzzword without clean data, and for us data engineers, good luck ever demonstrating value without the analytics folks working with us.
Don't sell yourself short or select yourself out of an opportunity (within reason). That's someone else's job!
sed -i 's/emacs/sublime-text/g' what_u_said.txt
Which are difficult to find when you think of them as "janitors", and treat them accordingly.
Once upon a time I managed (and, frankly, also wrote a lot of the code for) a project integrating half a dozen sources each managing a block of our business (billing, coverage, claims). The data was awful coming in and we managed to get a bunch of business processes changed in addition to some pretty heavy cleansing steps that we wrote. In any case, this big fragmented mess of monthly and weekly stacked data became my integrated, clean warehouse. For the first time ever at this organization, I had coverage and claims records tying up at a rate of 100% without any manual intervention. We did this so that we could implement a modern finance ops process on top (being intentionally vague) that would allow us to manage this block more efficiently, save time, and even let us better invest - it was a 2 year project including my data work. A handful of actuaries and analysts got promoted out of this as it was a BFD to the company. Yet, at the end of the year, when I got my review I got our equivalent of the average rating, 3 of 5, etc, and like a 3% raise, and a shitty budget for my people too. From then on, I spent almost as much time out there promoting our team's work as we did doing the work. We did considerably better the next year, and that's been the way I've operated ever since. I market the work.
This kind of work requires a manager who will actively market it within the organization.
Hm, I wonder why he's having problems hiring janitors.
I guess this means that the entire profession consists of janitors and plumbers.
I've read, but not confirmed for myself, that in the US the biggest gains in health came in the post-Civil War period, when "plumbers and janitors" made the difference. Of course, that's really starting with, after the science, the civil engineers who designed the public works systems that supplied clean water and took away sewage, and let's not forget that politicians and like who found it worthwhile to buy votes that way (now, they take our infrastructure for granted and buy votes more directly...).
There was a lot going on. Germ theory, of course, was part of it. But public health measures, especially sewerage systems, clean drinking water, and municipal waste removal, were all massive contributors. Note that the decline in mortality occurs well in advance of antibiotics and even most vaccinations.
For all the recent debate on vaccinations, it's interesting to note that the peak period of their impace (roughly 1930 - 1960) saw relatively little reduction in mortality, though there was a large decrease in disease incidence. It turns out that with septic control, antibiotics, food quality, and nutrition, many viral diseases weren't killers, but did present quality-of-life issues. And yes, often quite severe -- polio was no joke, and I know people who've suffered lameness from it myself. Measles and smallpox are similarly scarring and have long-term impacts.
But the major impacts of virtually all medicine are front-loaded to the period before 1950, with much the gains since attributable to either greater access (especially for the disadvantaged) and removal of environmental agonists (lead, tobacco, alcohol, asbestos, miscellaneous poisons, safety hazards).
The job didn't involve too many "pipelines" but the knowledge and creativity required to make them work was well above what I see from most software developers.
"Plumber" is not the put-down that poster thought it was.
In a boldface font, no less. The cockiness behind that language is really quite astounding.
Favorite paper on the topic: http://research.google.com/pubs/pub43146.html
Plumbers are entirely different. They have to get their hands dirty working on some awful systems, but they actually have to know what they're doing, get specialized training, etc. Soldering a proper joint with copper pipes isn't that easy, and if you screw it up, it'll leak later and cause a lot of property damage. Knowing which pipes and fittings to use where is specialized knowledge. It's not something you can just grab someone off the street and train them to do in 30 minutes. Of course, plumbers also cost a lot too, and the ones who are self-employed (rather than their assistants) generally do pretty well financially.
Here's the original paragraph for reference:
Data engineers are the janitors who keep your data clean and flowing. Insights are great, and you need them. But to deliver insights at scale, you need data infrastructure. That’s delivered by data engineering. It’s not as fun to talk about as D3 visualizations and business intelligence dashboards, but it’s every bit as important.
There has been for a long time hype around new technology and labels for business intelligence, data warehousing, big data, and now data engineering/science. I'm not saying there are not some roles in this space that return huge value to organizations, but that these opportunities are much rarer than the buzz indicates.
I wonder if the perceived shortage is mainly hype as the shift to new cloud technologies makes many of the older ideas a little less useful - if you are plowing data into BigQuery, you probably aren't so worried about your star schema data model for reporting.
I would strongly advise people that look at these types of articles to look at the roles in question and ask "Is this role on the critical path to customers paying us?" My experience has been that the answer is often "No." This is bad. I have also seen situations where businesses that do rely on smart data integration can show that they are selling dollar bills for ten cents that still have trouble getting customers on board with spending that ten cents. Business is weird.
"Data Scientist" is a bad title in it's own way, in the sense that "Computer Science" is bad, but worse. To a lot of people there is a Brahmin kind of attitude associated with "Scientist" -- i.e. an aversion to getting your hands dirty. Real world data is pretty dirty and you aren't going to get far in getting value out of it unless you spend 80-90% of your time dealing with the dirt.
Frankly speaking, if your company doesn't need a data engineer, it won't hire one or move you into that role. They likely don't, either, if you're experiencing this pushback -- data engineers often develop ETL pipelines or data warehouses, both of which are very useful if your company has a data team and very useless if it does not.
That said, you may want to move closer to my role. There's actually a shortage of data-savvy people who can also write production software, and you would nicely complement a more research-inclined data scientist or analyst -- someone with far more experience with research/analysis than development.
I experience the same problem with shortage-at-price-X in the field you describe. I'm a machine learning engineer with experience in MCMC methods, but I also have a lot of low-level Python and Cython experience, some intermediate experience with database internals, and lots of experience writing well-crafted code for production systems.
There are basically zero companies willing to pay what I'm seeking (which is a salary based on my previous job and a few offers I got around the time I took that job). In fact, in some of the more expensive cities, the real wage offered is far lower than other markets.
I've seen reputable, multi-billion dollar companies offering in the $140k range for this type of role in New York. That's wildly below anything reasonable for this sort of thing in New York. I've seen companies in Minneapolis offering $130k for the same kind of job -- and even that is still too low for Minneapolis! The same has been true in San Francisco as well.
Because these companies value you more for simply looking good on paper and looking good as a piece of office ornamentation when investors stroll through, and they view you as an arbitrary work receptacle closer to a software janitor than a statistical specialist, their whole mindset is about how to drive wage down.
Frankly, given the stresses of the job and the risk of burnout, I think it's actually a terrible time to be in the machine learning / computational stats employment field, despite all of the interesting new work and advances being made. The intellectual side is good, but the quality of jobs is through the floor.
Man, do I ever agree. This is where the "shortage" argument falls apart.
This is why I'm so uninterested in the abstract arguments happening elsewhere on this topic about whether markets are failing and basic laws of supply and demand no longer apply at theoretical salary levels (10 million was offered as an example).
Why are we bothering with this debate, when it's so far from reality? I'd say that if you're trying to hire a very high skilled and critical tech worker in SF, and you just can't find one no matter how hard you try, and then I find out that you're only offering 140k a year?
In San Francisco and New York (and anywhere else in the US, really), that's nowhere close to the kind of pay where we should start scratching our heads about a shortage and start wondering why the usual laws of supply and demand aren't working anymore.
It sounds like you're a principal/lead/post-senior ML engineer; at that level, you can easily command more than $140k but you have fewer options to apply those skills at companies that really need them (because few companies actually need them).
I don't know. It's tough. I agree that it might be a terrible time to work in ML/computational stats because of stuff like this.
From there I went full time as something of an ML engineer at a company with a strong tech culture, and learned as much as I could in both tech and ML/statistics. The rest is history (although I'm by no means a rockstar or whatever).
My path is hard to reproduce -- it starts with being in NYC or SF at a specific point in time, before the labor market became saturated with data science bootcamps and PhDs furiously learning Python while working on their dissertations.
Your best bet at this point is to produce a few data-related projects (maybe work on open source like scikit-learn and pandas?) and network like crazy. Someone somewhere will have a need for someone like you.
Well no kidding, that's one person doing two jobs. That's easily a 5-10 year training time depending on how high a quality you demand from their production software.
I'll look again at the board but, I didn't see anything there before that wanted software engineering skills (which I have with industry experience), and not a graduate degree (which I don't), and happened to be commutable from my place just south of the bay. But I will keep looking!
As others have pointed out Data Engineering is more about building data pipelines, making architecture decisions for your ML stack, things like that. Less about model building, prototyping and training, which is what I think of when somebody says they 'do' ML.
It seems people in this industry refuse to understand that some people are not perfect. I never graduated college because I hated it with the very fiber of my being, so I am not particularly great at white boarding answers to algorithm questions off the top of my head in a high pressure environment. If I need them during my job, I look up answers and learn from people who are much smarter than I am.
My personal identity has been shattered, as I thought my ~5-10 year history of success in the industry indicated I was in demand and talented. I saw posts like this and thought that if the worst happened I'd still be able to find a job. The idea that there is a talent shortage is a lie, or candidates like me wouldn't be treated as I have been. I'm not asking for a free job, or a handout. I have had a successful career so far and am capable of doing good work. But I'm not a specialist in Big Data Machine Learning Neural Networks.
I have struggled with bipolar disorder and suicidal ideation most of my life. I've dealt with the death of my beloved grandmother and my father who was instrumental in my choosing to be an engineer with only minor lapses in control. Nothing has caused me to consider taking my own life as much as the past 6 months. It seems there is no future for me in the only career I have any skill in and which is a huge part of my identity. And to constantly be told that there is such a shortage of engineers only salts the wound.
The fact that you pulled through 25 of them is already commendable. Unfortunately as a labor provider you'll be subjected to all kinds of crap for the privilege of working.
Every single person on here needs to have a secondary business going on right now. Doesn't have to be a highly skilled industry either, selling hand made stuff on Etsy can be a lifeline in these situations.
I had always had an easy time getting a job before but this time it was different. Granted I knew it'd be tougher since for remote jobs, the world is the competition. But it was a summer of endless shitty timed hackerrank-style tests (virtual whiteboard hazing). I would tell my co-workers about them and they'd laugh in bewilderment at the questions that were asked in what should be a technical screener, and these are extremely smart and productive software guys that have started companies, written books, give conference talks. One funny question I got for a frontend React job: write a function that takes a sequence of bits that represent a negative-binary number (not a base-2 number that is negative, but a base-(-2) number) and return its negated value in base-2. For a frontend job. It was one of 4 questions to be answered in 90 minutes. gtfo.
A few companies would reply, most strung me along while -- I realize now -- they were keeping me as a backup(-backup) incase their "A-player" turned them down. Countless interviews, hours on takehome projects, it was tough. I learned to cut bait if the company was slow to move forward, had weeklong periods of no communication, etc.
I (just very recently) found it's easier to land small contract gigs because the barrier to entry seems to be lower, demonstrate value, and keep getting work from those guys after the initial project was done. It is different but so far I actually like the freedom that comes with contracting. I haven't been at it long enough to experience the downsides.
There's definitely not a shortage of talent. It's that every company thinks they need "A-players", when the vast, vast majority are doing a damn basic CRUD app.
Just wanted to say I hear you brother and share my story in some solidarity. You will find something, just keep plugging away. Each "failed" attempt makes you better no matter how many attempts it takes. Cliche of course but it is true. I am very lucky in that I don't face the mental demons you do, even then this job search hit me pretty hard. Please be proactive and take care of yourself, body and mind (body goes a long way toward mind also).
I've been dealing with large data even before "big data" was a word but i dont call myself "data scientist" or "data engineer". I am still a software engineer working on what benefits my organization.
"Serial Entrepreneur" is the same these days, claimed by anyone who had a lemonade stand as a kid.
But if you saw a nearby local maximum that's higher than your current local maximum, wouldn't you change what you call yourself, if it means being paid more but doing the same work?
This is similar to how the average "software engineer" makes about $30k/year more than the average "programmer".
I really enjoy that kind of work but it is difficult to articulate your business value in that environment. The best thing is working closely with a data scientist/front-end dev who can deliver products to the analysts and executives that need the data and make sure that you get the credit for enabling new streams of data. But most of the time you are putting out someone else's dumpster fire.
One advantage of data engineering: unlike front-end work, there are few non-technical people who will have an opinion on how you are doing things and burden you with bikeshedding.
 - http://www.avclub.com/tvclub/its-always-sunny-philadelphia-c...
It's very analogous to front-office and back-office work in Investment Banking. "Data Scientist" are the front-office, with all the prestige, and "Data Engineers" are the back-office, doing a lot of the heavy lifting without nearly as much recognition.
In my opinion there shouldn't be a delineation. You shouldn't be a data scientist if you can't gather, process, and clean up your own data.
Even if you require your data scientists to be able to do engineering work, it's probably way more efficient to have some good generalist Software Engineers doing all the "pre-math" work and freeing your statisticians up for what they're (hopefully) good at.
Plus as a side effect, your software will probably be better.
* How many aren't on LinkedIn?
* Since the whole article is about how the job title is poorly defined and growing in prevalence, why would you assume that people who don't already have such a job would use the term?
* The "growth" charts on the full study are just as bad - how much of that is just from renaming existing generic developer positions, since "data engineer" is clearly a relatively new term?
So why not change your LinkedIn job title to "data plumber", which is sure to get you some serious recruiter attention ;)
Looks like we need more English engineers too.
I'm starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.
I can think of Facebook, Google, Microsoft, IBM (which locations and groups within these companies / where?). I can also think of Confluent, CitusDB, Databricks, etc.
Which is what the poster was asking for.
Before going out to the market and discovering what talent exists and consequently what salary it will take to get them to join (ie negotiate) most organisations decide on a salary range, usually reflecting the current internal structure not the current external market.
The longer an organisation has existed the more out of whack with the market its internal set up is.
As such companies decide on their price point first, then go looking. Which is of course backwards.
If you choose to locate your company in one of the highest cost of living regions in the world, then you are complicit in the "shortage". Supply and demand - pay up. Or don't.
And how many companies are still interviewing with fizzbuzz?
From my experiences working in various contexts (applied machine learning, analytics, policy research, academics, etc...), there are several of factors that contribute to this shortage: (1) "data engineering" often requires a lot of breadth and knowledge, (2) "data engineering" is often (derisively and naively) referred to as the "janitorial work" of data science, (3) the spectrum of roles and requirements within the "data engineering" domain, in terms of job descriptions, can range from database systems administration, to ETL, to data warehousing, curation of data services / APIs, business intelligence, to the design/deployment/operation of pipelines and distributed data processing and storage systems (these aren't mutually exclusive, but often job descriptions fall into one of these stovepipes).
Some of my quick thoughts and anecdata:
Companies have made large investments in creating 'data science' teams, and many of those companies have trouble realizing value from those investments.
A part of this stems from investments and teams with no tangible vision of how that team will generate value. And there are several other contributing factors…
"Dirty work." People haven't learned how to, and more often don't want to do it. There's a vast number of tutorials and boot camps out there that teach newcomers how to "learn data science" with clean datasets -- this is ideal for learning those basics, but the real world usually does not have clean or ideal datasets -- the dataset may not even exist -- and there are a number of non-ideal constraints.
There are people that wish to call themselves “data scientists” that “don’t want to write code” and would “prefer to do the analysis and storytelling”
Engineering as the application of science with real world constraints: there are a number of factors that we take into account, often acquired through painful experience, that aren’t part of these tutorials, bootcamps, or academic environments.
Many “data scientists” I’ve met have a hard time adapting to and working with these constraints (e.g. we believe that the application of data science would solve/address __ problem, but: how do we know and show that it works and is useful? what are the dependencies, and costs of developing and applying that solution? is it a one-time solution, or is it going to be a recurring application? does the solution require people? who will use it? what are the assumptions or expectations of those operators and users? is it suitable? is it maintainable? is it sustainable? how long will it take? what are the risks involved and how do we manage them? is it re-usable, and can we amortize its costs over time? is it worth doing? This is part of a methodology that comes from experience, versus what is taught in data science)
Larger teams with more people/financial/political resources can specialize and take advantage of these divisions of labor, which helps recognize the process aspects of applying data science and address some of the above
Short story: if you view data engineering as "janitorial work" you're missing the big picture
Anyone else notice that the attributes of a 'unicorn' data scientist include the traits of a 'data engineer?'
someone with enough smarts to build/lead a team, sell to executive management, and have an actual business application is just too rare compared to the prevalence of the engineering talent.
- The project 'data engineer', in today's world, most likely will be a software developer responsible for ETL, etc. The data design will be more or less up to the software developer.
- An enterprise 'data engineer' is more concerned with data that affects the enterprise. This typically involves some sort of data integration. For example, how to integrate relevant data from N projects (e.g. A,B,C .. Z) where each project has its own idea of how to represent similar concepts (e.g. person, user, customer), with different provenance, truth assertions, access rules, data retention periods, granularity of metadata (e.g. at the attribute level vs entity level), etc. The enterprise is interested in questions like 'What did we know and when did we know it?", etc. The enterprise 'data engineer' will probably levy requirements on the project 'data engineer' to meet the enterprise's needs.
I'm not even sure if I'm being sarcastic.
But only 1 out of 100 are qualified :(