Hacker News new | comments | show | ask | jobs | submit login

I am sympathetic to the distinction. I had a bunch of applicants for a position claiming to be "Distributed Systems Engineers" mostly because they'd stood-up, maintained, and or used a Hadoop or Spark cluster in their current or prior job.

But to me that was more of an advanced (one hopes) end-user. Someone who could take a bunch of large, mostly-complete logical components that somebody else engineered and then use them to stitch together a solution by integrating these existing frameworks that already provide the first 80+% of the technical solution to carry the last ~20% toward a domain-specific use-case.

What I was looking for wasn't somebody who knew how to use something like HDFS. I was looking for someone who could build something as good or better than HDFS from nothing if they had to. A lot of what passes for "engineering" today, at least by marketplace label, tends to resemble the former rather than the latter.

There's definitely all kinds of space for both kinds of builders/creators depending on the needs and the project, but it certainly doesn't help that the English language and it's colloquial application to the problem space has grossly blurred the distinction.




> But to me that was more of an advanced (one hopes) end-user. Someone who could take a bunch of large, mostly-complete logical components that somebody else engineered and then use them to stitch together a solution by integrating these existing frameworks...

I don't find this distinction very useful. We're all end-users at different layers in the stack. Building HDFS from scratch is also mostly taking others' components and ideas and stitching them together. That's what progress and innovation looks like. I think you're looking for engineers at a lower level in the stack than the applicants you received.

Additionally, if you're building the next distributed filesystem, you'll be much more successful if you're also an end-user of existing distributed filesystems, so you know the strengths, weaknesses, user preferences, etc. of the existing products. If you're building something without knowing how it's going to be used...well...you're probably not going to build the right thing.


I agree with you. On all counts. I also don't find my own distinction very useful, and I also agree that part of it is "level in the stack" related problems.

The attribution for me had a lot more to do with the balance of optimism and skepticism. In my head the "developer" sees HDFS and goes, "Sweet, somebody solved this problem, now let me go use that thing and it will give me all these wonderful solved-problem qualities I don't have to think about anymore. This is going to save me a ton of time." The "engineer" looks at HDFS and goes, "Hmmm. This thing seems interesting, but this feature over here must be an incredibly painful one to use despite the fact that it seems super useful and is plastered all over their docs as being awesome. Because there's no free lunch in this problem space. So what possible methods are there to have implemented this kind of thing and how exactly can I test and exploit just how weak these floorboards are before I decide to start building on it?"

Again, not a very useful distinction. Agreed.


I've literally seen the opposite interviewing as distributed computing person, most people ask for a distributed engineer and then want someone to stitch together Hadoop and Spark or how to effectively leverage HDFS. When you start talking about concurrency issues, Lamport clocks, consensus algorithms, etc. their eyes glaze over and they ask how to efficiently rotate an array. Clearly something is broken but I'm not sure it is how we use language.


This a thousand times. For many firms, the roles of "data scientist" "data engineer" "distributed system engineer" and "platform engineer" are all fully synonymous, and all of them really mean "Hadoop* babysitter with a dash of full-stack whenever we arbitrarily feel like asking you to do other stuff too." It's depressing.

* or substitute whatever other enterprise framework you want


all of them really mean "Hadoop' babysitter with a dash of full-stack

...because that's all that most business require, 99% of the time. To you know, get things done and make money and stuff.

Which may not fit your needs, but why be "depressed" about it? It's just the way things are in the commercial world.

If you want someone with more fine-grained stills, try articulating that in your job postings. What we see, all the time, are ads mentioning platform X, with no articulation whatsoever as to where, even on some approximate logarithmic skill, they'd like the skill level and comfort with platform X to be.


That's rarely why these enterprise frameworks are bought or operated. More often it's for various permutations of the "no one got fired for buying IBM" excuse. Big showy re-orgs around Hadoop are mostly for status effects, hardly ever related to engineering realities.

And in the few firms that actually do have real engineering trade-offs that favor the use of those types of frameworks, they tend to hire people who are well-suited for the role, and then create job functions surrounding them that are respectful of aptitudes and skills of the people they hire.

In most firms that adopt these frameworks (for status effects), they are just desperate to fill seats and increase engineering headcount. They don't respect your skill set or even care if it matches the business need. They just need to get you in the door, and then find a way to deal with inevitable dissatisfaction later.


You didn't get picked for the special snowflake job that you trained for then and are made to do grunt work instead?

Welcome to the real world.


Do you think a decent full stack engineer would have a problem picking up "proper" data science on the job? Both are fairly intelligent jobs with a fair bit of overlap.


What most firms mean by "data science" is basically just using tools for data cleaning, visualization, and using APIs like black boxes for a few different kinds of models. You are often not even allowed to take a software design approach to these tasks, and often you're just writing scripts in a hurry to address "business intelligence" fire drills.

For those positions, I am sure that smart full-stack developers could easily pick up the statistics for data cleaning and quickly gain a passable understanding of the models consumed from APIs in a black box way. In fact, full-stack devs may be happier in these jobs due to the visualization and database components.

A much smaller subset of data science is actually focused on solving novel business problems and may centrally focus on deeper knowledge of a given technique, like MCMC methods, deep learning, real-time classifier systems, etc. For these, you do tend to need more significant experience with the specific machine learning tools being used (or enough general skill in statistics to pick them up quickly). Smart people of all stripes could still learn that stuff, but it's a lot harder to see them being able to convince a firm to hire them in that capacity.

The second type of these jobs is really, really rare though.


Clearly this is a failure of LinkedIn's ML team to properly align people with usefully overlapping needs. The dark patterns have failed us both. :-)


> take a bunch of large, mostly-complete logical components that somebody else engineered and then use them to stitch together a solution by integrating these existing frameworks that already provide the first 80+% of the technical solution to carry the last ~20% toward a domain-specific use-case

This seems like a strange objection, considering that a) getting technology to fit the use case is all virtually everybody wants, and b) having to do 20% of the solution from scratch--rather than, say, the last 0.10%--would be an _enormous_ undertaking. Or don't you consider the silicon, microcode, network, servers, physical protocol, wire protocol, operating system, standards, tools, language, and compiler in that equation? If not then where do you draw the line?


I don't think the skills/abilities you're looking for would be denoted merely by looking for an "engineer" of some kind. In fact, I think looking for an "engineer" likely obfuscates your actual needs since it has become such a catchall term in computing fields.


Completely agreed.


> I was looking for someone who could build something as good or better than HDFS from nothing if they had to.

Well, for 90 percent of job offers, what a company claims to need is not what it actually needs (i.e. the typical "10 years experience with 5 year old technology" bullshit). If your company is not a very unique snowflake or in an academic setting, believing that it makes zero economic sense to completely reinvent the wheel from first principles is a valid assumption for applicants to make.


Your characterization of the typical job marketplace in tech jives pretty closely with my experience. I've often said, in public speaking forums no less, that to me, "The overwhelming majority of the 'Big Data' marketplace seems predicated on selling to the hubris that software engineers want to believe they have problems bigger than they actually are." Despite being a denizen of the whole "NoSQL" thing personally.

That said, in my particular case it had a lot less to do with trying to necessarily rebuild HDFS from nothing, and more to do with a mindset of rigor and principles necessary to do so. Because being able to work all the way through that problem domain in both broad strokes and in meticulous detail would hopefully lend itself toward also considering ways to validate and attest the correctness of not only things like HDFS (rather than treating it like a solved problem ready built for use), but also applying that same level of rigor and principle to the stuff we actually do have to build from scratch.

Though to your point... a non-trivial amount of this concern and necessity is borne out of the market and regulatory regimes this stuff has to service and abide. That fact that it's not necessary for huge swaths of the marketplace is evidenced by the fact that things like property-based testing, mutation testing, chaos testing, and formal verification are fringe skills (at best) out there... yet the tech world continues to turn out totally awesome cool new stuff with none of that overhead all that time that still transforms all manner of life.

I actually think that "developers" and "engineers" are mostly pretty transparent about what sort they are. Or at the very least it's trivially easy to assess within just a few minutes with the right questions and conversation space. The harder part is getting non-technical people to understand that there's a distinction and that technical people aren't all just a fungible commodity. The weirdest part is that they get that on some level, especially when suddenly they're hit by a bus-factor problem, but that realization hasn't seemed to make a big impact in business/hiring process, practices, etc.


I upvoted you for the quality of your prose. Congrats!


Always a solid second stand-in for the veracity of my claims. ;-)


> A lot of what passes for "engineering" today, at least by marketplace label, tends to resemble the former rather than the latter.

We can keep reclassifying what it takes to earn that label until we've eliminated all but the geniuses of the software world. The titles (and seniority) are, frankly, useless because they aren't legally enforced because no one has a good and popular way to test for competency. If they did, that would be the technical interview and then market forces could once again weed out people who don't make it.

There really isn't any push back when you have an opinion on what a software engineer ought to be when you're hiring, so naturally, a lot of people have their own opinion. Figuring out which direction to go if you are one of those supposed engineers is pretty much a crap shoot but still better than just not learning anything new.


What I was looking for wasn't somebody who knew how to use something like HDFS. I was looking for someone who could build something as good or better than HDFS from nothing if they had to.

Did you try articulating that distinction in your job ads?

Or if not, can you really blame people for applying when your ads read like 99% of help wanted ads in the fin-de-boom era; like, you know:

We've got the coolest office in the Mission, with a climbing wall and jamming room, we do beer bongs every Thursday and play lawn bowling together on weekends! And of course you can bring your dog in everyday, too! Keywords: Node, Python, D3, Spark, Hadoop, HDFS


I did. To the point that I, after lobbying for the permission to do so, removed every single mention of any technology or stack-a-mabob buzzword and phrase from every single open listing my team had out.

There was some consternation over it because the recruiting staff didn't know how to go find people based on this new buzzword-free criteria, so I helped them to identify where to look, and persuaded them not to be the one to contact them & let that be the responsibility of one of the people already on my team.

It eventually worked.

Also, be fully aware I'm not holding the discrepancy against the applicants. It's not their fault. They're getting signals from the marketplace that they should call themselves a "Distributed Systems Engineer". I'm holding this problem against the institutional forces that are giving these people the signals to describe themselves this way to begin with. Because it makes it much harder to find one when you actually need one.


Do you really engineer something like HDFS just like that? Or does it take a lot of time, several people, many failed attempts,...?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: