But to me that was more of an advanced (one hopes) end-user. Someone who could take a bunch of large, mostly-complete logical components that somebody else engineered and then use them to stitch together a solution by integrating these existing frameworks that already provide the first 80+% of the technical solution to carry the last ~20% toward a domain-specific use-case.
What I was looking for wasn't somebody who knew how to use something like HDFS. I was looking for someone who could build something as good or better than HDFS from nothing if they had to. A lot of what passes for "engineering" today, at least by marketplace label, tends to resemble the former rather than the latter.
There's definitely all kinds of space for both kinds of builders/creators depending on the needs and the project, but it certainly doesn't help that the English language and it's colloquial application to the problem space has grossly blurred the distinction.
I don't find this distinction very useful. We're all end-users at different layers in the stack. Building HDFS from scratch is also mostly taking others' components and ideas and stitching them together. That's what progress and innovation looks like. I think you're looking for engineers at a lower level in the stack than the applicants you received.
Additionally, if you're building the next distributed filesystem, you'll be much more successful if you're also an end-user of existing distributed filesystems, so you know the strengths, weaknesses, user preferences, etc. of the existing products. If you're building something without knowing how it's going to be used...well...you're probably not going to build the right thing.
The attribution for me had a lot more to do with the balance of optimism and skepticism. In my head the "developer" sees HDFS and goes, "Sweet, somebody solved this problem, now let me go use that thing and it will give me all these wonderful solved-problem qualities I don't have to think about anymore. This is going to save me a ton of time." The "engineer" looks at HDFS and goes, "Hmmm. This thing seems interesting, but this feature over here must be an incredibly painful one to use despite the fact that it seems super useful and is plastered all over their docs as being awesome. Because there's no free lunch in this problem space. So what possible methods are there to have implemented this kind of thing and how exactly can I test and exploit just how weak these floorboards are before I decide to start building on it?"
Again, not a very useful distinction. Agreed.
* or substitute whatever other enterprise framework you want
...because that's all that most business require, 99% of the time. To you know, get things done and make money and stuff.
Which may not fit your needs, but why be "depressed" about it? It's just the way things are in the commercial world.
If you want someone with more fine-grained stills, try articulating that in your job postings. What we see, all the time, are ads mentioning platform X, with no articulation whatsoever as to where, even on some approximate logarithmic skill, they'd like the skill level and comfort with platform X to be.
And in the few firms that actually do have real engineering trade-offs that favor the use of those types of frameworks, they tend to hire people who are well-suited for the role, and then create job functions surrounding them that are respectful of aptitudes and skills of the people they hire.
In most firms that adopt these frameworks (for status effects), they are just desperate to fill seats and increase engineering headcount. They don't respect your skill set or even care if it matches the business need. They just need to get you in the door, and then find a way to deal with inevitable dissatisfaction later.
Welcome to the real world.
For those positions, I am sure that smart full-stack developers could easily pick up the statistics for data cleaning and quickly gain a passable understanding of the models consumed from APIs in a black box way. In fact, full-stack devs may be happier in these jobs due to the visualization and database components.
A much smaller subset of data science is actually focused on solving novel business problems and may centrally focus on deeper knowledge of a given technique, like MCMC methods, deep learning, real-time classifier systems, etc. For these, you do tend to need more significant experience with the specific machine learning tools being used (or enough general skill in statistics to pick them up quickly). Smart people of all stripes could still learn that stuff, but it's a lot harder to see them being able to convince a firm to hire them in that capacity.
The second type of these jobs is really, really rare though.
This seems like a strange objection, considering that a) getting technology to fit the use case is all virtually everybody wants, and b) having to do 20% of the solution from scratch--rather than, say, the last 0.10%--would be an _enormous_ undertaking. Or don't you consider the silicon, microcode, network, servers, physical protocol, wire protocol, operating system, standards, tools, language, and compiler in that equation? If not then where do you draw the line?
Well, for 90 percent of job offers, what a company claims to need is not what it actually needs (i.e. the typical "10 years experience with 5 year old technology" bullshit). If your company is not a very unique snowflake or in an academic setting, believing that it makes zero economic sense to completely reinvent the wheel from first principles is a valid assumption for applicants to make.
That said, in my particular case it had a lot less to do with trying to necessarily rebuild HDFS from nothing, and more to do with a mindset of rigor and principles necessary to do so. Because being able to work all the way through that problem domain in both broad strokes and in meticulous detail would hopefully lend itself toward also considering ways to validate and attest the correctness of not only things like HDFS (rather than treating it like a solved problem ready built for use), but also applying that same level of rigor and principle to the stuff we actually do have to build from scratch.
Though to your point... a non-trivial amount of this concern and necessity is borne out of the market and regulatory regimes this stuff has to service and abide. That fact that it's not necessary for huge swaths of the marketplace is evidenced by the fact that things like property-based testing, mutation testing, chaos testing, and formal verification are fringe skills (at best) out there... yet the tech world continues to turn out totally awesome cool new stuff with none of that overhead all that time that still transforms all manner of life.
I actually think that "developers" and "engineers" are mostly pretty transparent about what sort they are. Or at the very least it's trivially easy to assess within just a few minutes with the right questions and conversation space. The harder part is getting non-technical people to understand that there's a distinction and that technical people aren't all just a fungible commodity. The weirdest part is that they get that on some level, especially when suddenly they're hit by a bus-factor problem, but that realization hasn't seemed to make a big impact in business/hiring process, practices, etc.
We can keep reclassifying what it takes to earn that label until we've eliminated all but the geniuses of the software world. The titles (and seniority) are, frankly, useless because they aren't legally enforced because no one has a good and popular way to test for competency. If they did, that would be the technical interview and then market forces could once again weed out people who don't make it.
There really isn't any push back when you have an opinion on what a software engineer ought to be when you're hiring, so naturally, a lot of people have their own opinion. Figuring out which direction to go if you are one of those supposed engineers is pretty much a crap shoot but still better than just not learning anything new.
Did you try articulating that distinction in your job ads?
Or if not, can you really blame people for applying when your ads read like 99% of help wanted ads in the fin-de-boom era; like, you know:
We've got the coolest office in the Mission, with a climbing wall and jamming room, we do beer bongs every Thursday and play lawn bowling together on weekends! And of course you can bring your dog in everyday, too! Keywords: Node, Python, D3, Spark, Hadoop, HDFS
There was some consternation over it because the recruiting staff didn't know how to go find people based on this new buzzword-free criteria, so I helped them to identify where to look, and persuaded them not to be the one to contact them & let that be the responsibility of one of the people already on my team.
It eventually worked.
Also, be fully aware I'm not holding the discrepancy against the applicants. It's not their fault. They're getting signals from the marketplace that they should call themselves a "Distributed Systems Engineer". I'm holding this problem against the institutional forces that are giving these people the signals to describe themselves this way to begin with. Because it makes it much harder to find one when you actually need one.