But a computer science university program is not about scikit-learn or TensorFlow! It's about long-lasting principles, underlying mathematics, mental models and ways of thinking.
None of my computer science lectures were about how to apply that particular part of CS knowledge in some hot new Python library. It's expected that there will be some amount of time required to adjust to a company's software setup. That's not a big hurdle usually.
I'm not saying it should be only theory, though. University courses often have accompanying assignments or projects. Depending on the country in question, they often offer more hands-on, practical courses ("lab courses") as well, where you do actually go through the steps of making the theory work in real life. I had such courses where we played with microcontrollers and FPGAs to understand CPU instructions, assembly and low-level C concepts (but even there the goal wasn't to learn exactly the thing that you will use on the job. Most CS graduates will never need to program FPGAs in their day job.).
But sure, there is a place for even more data engineering training, but I don't think it's computer science university programs. Where do people like network engineers learn how to configure Cisco routers and use whatever config software they use? Where do sysadmins learn Bash, Unix, backup management etc? Not at university courses. Wherever they learn those skills, that's where data cleaning, parallelization engineering etc aspects of machine learning should be taught as well.
It sounds really silly, but some of the best instruction I gave was “Tab” to autocomplete a command and “Up Arrow” to re-run the last command. Whenever I would do a class demo on the projector someone would always stop me to ask how I was running commands so quickly and fluidly and how on Earth I could remember them all.
Does CS require uniquely high intelligence / problem solving ability compared to other fields? I don't think so. Then what do those 2% have that others don't? Some have suggested frustration-tolerance, we could perhaps add detail-orientedness. But these are also needed in engineering. Perhaps the abstract, intangible nature of CS makes it harder?
You might be misunderstanding the focus on data cleaning and feature engineering as being less specialized than say PyTorch coding but it’s exactly the opposite.
The most critical aspects of ML engineering for production are all about advanced statistics. Understanding multicollinearity, overfitting, dimensionality reduction, convergence, and time series issues like assumptions of stationarity or conditional independence effects.
Any engineer can crank out neural network software - that has pretty much zero value.
Value lies in realizing some stratification error in the data and following that lead to use a multi-level model to control for it. Value lies in realizing several key feature inputs are correlated on a seasonal basis - leading to multicollinearity - and then setting up some adaptive feature aggregation to mitigate it and dashboards with things like variance inflation factor to be able to raise alerts on it across time.
Value lies in working on small data problems and using literature review to determine the best prior to use for a Bayesian model, and doing robust posterior predictive checks to validate it.
These things require many years of education and experience dealing with statistical irregularities, understanding confounders and causal inference, understanding missing data treatments, understanding time series forecasting.
You cannot learn that in 101 courses that overly focus on the mechanics of how to type Tensorflow or sklearn code - that part can be picked up by anyone in a month or two. And mere intro to data cleaning and plotting distributions or proportions of missing data is not a substitute for actual statistical knowledge.
When we complain about universities not preparing students better for jobs, what we really mean is that universities are not doing the bare minimum that they should be doing - in case of CS, students should at least know how to program well, and be well versed in the practicalities of computing. That does not exclude learning the fundamentals (which is often denigrated as "theory").
It is just that students often have neither the theory nor the practice, and at a minimum, we're asking, they should know the practice so they can at least be useful in their jobs.
My big picture point is that the complaint is really general and isn't specific to machine learning (ML is more of a click magnet here). The same could be said about other parts of CS and about the general computer-handling skills of CS graduates.
My university did not teach source control, or the basics of good programming practices.
There were plenty of practical courses, with plenty of programming assignments among them, but the only thing that you were evaluated on was whether or not the resulting code worked.
In fact, a proper theoretical foundation makes this really easy. Graph theory and algebra will have taught them about DAGs and partial order, which is what git branches are. A crypto class will have taught them about hashes and signatures. Distributed systems class will have taught them about issues with synchronisation. With all that background it doesn't matter whether it's git or whatever system will be en vogue in 10 years.
Imagine a student having learnt CVS 20 years ago at university. Completely useless knowledge today. But the same student with the above fundamentals will pick up git in no time. That's what universities are for.
If the university expects to produce graduate students, who use computers to solve research problems, teaching them how to code well helps.
> A reasonably smart student picks up how to work git in a few evenings with some online tutorial and some open source project. Or just when doing their homework.
The same can be said for writing comprehensible English, yet you will lose marks on your essays for poor writing.
Those courses typically require you to submit essays, which will be evaluated for both their form, and their function.
Given the generally poor grasp of civics, ethics, and ability to relate to other cultures that I have seen, if the job of a university is the creation of educated people (as opposed to vocational training), more liberal arts education can't go that amiss.
You're also missing the main thrust of my argument. My university offered - and required - CS majors to take a number of practical, vocational courses, with many programming assignments. But it never actually took the time to train, or grade us on the quality of our solutions to those assignments. This is not how real engineers, chemists, or physicists, or folks studying bioscience get trained.
I see. That's rather a peculiarity of the American system, in Europe, higher ed is more specialized and broad, general liberal arts education ends at the high school level.
> But it never actually took the time to train, or grade us on the quality of our solutions to those assignments.
I've worked as a teaching assistant, and it's often due to a lack of staffing and time. TA's are most of the time also PhD students and they have projects to work on and papers to write and giving detailed feedback on assignments is infeasible beyond a certain number of submissions. The same with exams. They are optimized to be easy to unambiguously grade in the shortest time possible. There's a conflict between two roles of the university: research and teaching. Research is more important for one's advancement, so teaching gets neglected.
The number of times I have to walk through why linear memory access matters, how caches and branch predictors work is staggeringly high. In every single case they all knew the theory but never made the connection to how it applied to the task at hand.
This almost feels like an instance of Goodhart's Law : "When a measure becomes a target, it ceases to be a good measure"
IMHO they should (or, more accurately, they should have both 'true and proper' computer science and software engineering - but the expectation is that the latter will be the thing most in demand), but there's no consensus about that, people have different opinions.
At the end of the day you will end up with people who have excellent theoretical knowledge but no good practical skills.
Are students cheating? Is curriculum group based? Is the content not hard enough?
If people need to do coding interviews, I see no reason why similar can't be done in college at a 200 and 300 level checkpoint.
Programming/logic is easy if you understand. It doesn't need to be directly tested often.
Those people you say “can’t code” actually can code very well - it’s just that the question “can you pass this timed hazing trivia test in coderpad or on a whiteboard?” has no relationship to “can you code?”
I got turned down from codementor's fulltime platform because my "tic tac toe" in react wasn't complete. I didn't handle the diagonal case (which I was explicitly aware of) nor did it have unit tests.
Did I mention I had to do it from scratch live with the interviewer within 1 hour. Thats 1 hour to plan and implement from a blank slate from "hey write me tic tac toe with react"?
Never mind that It was functional, had a hook for me to handle the diagonal case if I had time and aside from that, worked!
Nevermind that I had completed a few projects on their platform already and have a great rating. That I have experience working in a few startups in Silicon valley under my belt. Nevermind that I have open source contributions and was a key speaker in a js conference.
I can't write a complete tic tac toe from scratch WITH unit tests from a cold start and a blank slate in less than an hour.
Sometimes I feel like its becoming a race to the bottom. As CTO of a startup right now, I spent days deliberating and planning out the db schema we use to optimize for our main use cases while allowing some deeper queries for analytics. I write stored procedures and convert business rules into working software that scales. Under our last load test, 90% of the reqs were served in less that 250 ms. I do code reviews in js, elxir and sql on the regular have trained other engineers on obscure sql like nested joing an unnest() ing json arrays for analytics.
Yet somehow my value can be arbitrarily broken down to "he's not good enough because he can't scope, plan and impliment a full app with unit tests within an hour"
It will show you if someone knows basic programming.
It’s truly, truly nothing at all similar to the actual activity of “basic programming.” Highly skilled people will get flustered, forget basic facts, put their foot in their mouth, none of which has anything to do with whether they are skilled or unskilled at coding.
I was going to make sure they understood loops and maybe classes/methods.
You can ask them data questions without needing to code on the spot.
Which one of us is terrible at coming up with relevant interview questions? The guy who considers tree manipulation basic or the one who considers loops basic?
Meanwhile asking about data structures and algorithms is an extremely ubiquitous industry standard for interviewing new grads and entry level programmers up through experienced veterans - and is practiced by nearly every tech company and tech recruiting agency under the sun.
Your expression of surprise about this suggests you are not very aware of the tech industry or software hiring practices.
But regardless, all the same issues would apply to asking about loops or classes that apply to asking about linked lists or sorting. In professional work, none of that is ever addressed in a time crunch with someone actively surveilling you and assessing intermediate explanations - so concluding someone is bad at coding from a poor interview performance is just flat wrong.
Both of my programming interviews had some tech questions but no white board coding. That said neither were exactly entry level.
I believe that has since changed, but I am not sure to what extent.
You wouldn't hire a metallurgist as a welder, so you shouldn't be hiring a computer scientist as a programmer.
Programming is a tool and they should not let anyone graduate with a CS degree if they don't know how to use that tool.
Heck it almost seems like Electrical Engineers in your fantasy would make better programmers.
CS isn't programming, software engineering, or software development. There are some places that offer degrees in the last two, and some places where you can focus on CS degree on those more practical aspects, but CS by nature is more theoretical and abstract, bearing roughly the relationship to those other things that Physics has to Mechanical Engineering or Automotive Maintenance.
Of course, you don't have places looking at Physics degrees as entry stakes for auto mechanics, but I'd they started doing that the problem wouldn't be with the Physics programs.
I once submitted code that did not compile as I ran out of time. I got 100% on that assignment.
Whether you get a good grade on the programming portions is almost random.
It has been many, many years since I was in school. I think it’s fine that your computer science education focuses on fundamental CS concepts and the mathematics so you can easily pick up areas that require that math (ML, for example). I do think universities can do better. At my school, we had mandatory “block” classes in arts and humanities, which in my opinion, offered no value.
To be clear, I’m not saying these subjects aren’t valuable. I am saying, however, that the quality of these courses was very poor, and they could have been substituted entirely with classes related more to my discipline.
I remember sitting in some political science classes that were part of this required pool. I have no idea what we learned in there. As far as I can remember, we read political science papers that were very poorly written/extremely inaccessible. It was impossible to differentiate the authors personal opinion from objective truth of any kind. It was more or less a checkbox - I had to have so many credit hours from this pool in their curriculum to graduate. Did it force me to think critically in some manner? Not at all. It actually gave me a false sense of how “intelligent people” write.
Yes - all these years later I understand that it wasn’t me who just could understand those papers - it was half rambling, pretentious rambling nonsense. It was the opposite of effective communication, and it provided no value.
Is there any "entrance point" where people don't complain about this? Companies complain universities don't prepare students. Universities complain that first-year students come out of high school without necessary background. High schools complain that elementary school does not prepare their entrants etc. Elementary schools complain kindergarten doesn't prepare kids to be mature enough. Kindergarten probably complains that parents don't prepare the kids enough.
Broadly speaking, I think there is more demand for highly skilled, highly intelligent people than can be produced by any given cross-section of the population born in a given year. Sure, universities could do better, but beyond a certain point teaching doesn't work. There are people who are intrinsically motivated and soak up knowledge and seek it out in books and online (so much high quality content can be found online, especially for CS!), and there are those who just coast and do the bare minimum. I don't think you can radically improve the outcomes by changing the curriculum.
"Based on the current state of machine learning courses it is clear that AI courses will get you through the door in your effort to perform cutting edge research or landing a machine learning job, but they won’t teach you everything you need to know. To fill in the knowledge gaps that remain you will have to put in outside effort on your own. "
I guess the question is whether the outside effort needs to be addressed by universities, or by other resources.
The thing is, data cleaning is no less fundamental than backpropagation. Maybe more so - learning algorithms come and go, but real-world data is always going to be inherently messy. The difference is in that we have a beautiful mathematical theory for backpropagation but not for data cleaning. So the courses that teach the former but not the latter are akin to the proverbial drunkard that searches for the lost keys under the street light - beautiful mathematical theories are easier to lecture on so they teach them instead of messier (but not less fundamental or useful) topics such as data cleaning.
Just as universities offer study programs and degrees in software engineering, there are also programs and degrees for network engineering, which would include not only the theoretical basis of networking but university courses for applied networking where they would learn all that you describe and much more; a university teaches a network engineer to configure routers and manage backups just as they teach a first year electronics engineer to solder stuff. Sure, a generic computer science or software engineering program will not include these courses, that's a usually a separate specialization, but universities definitely do offer engineering programs.
I had C#, .NET Core, Docker, MongoDb, MSSQL, Postgres, GraphQL, OData, Neo4j, Redis, WebAssembly (Blazor), React, Vuejs and stuff like Git.
that was covered on "Web apps", "Databases", "Non relational databases" and meanwhile some bigger/smaller programming projects.
Public school, studying at weekends.