What you've really missed is things like best practices, design patterns and concepts like SOLID, but a lot of people with CS degrees missed some of those as well.
If the book covers this, excellent, but why wouldn't it sell itself on valid points?
A CS degree prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes. It sort of gives you a crystal ball. You can see that some code isn't going to work when a db table gets to 100,000 records, or that some code is making the wrong space/time tradeoff, or that some code is using the wrong data structure from the standard library.
When your performance monitoring tool is telling you something is slow or leaking memory you have the foundational knowledge to understand why and fix it, rather than spending money on 10 more dynos or whatever.
Computers are so fast and cheap today a lot of this doesn't matter most of the time. The naive solution does just fine. But when the core dump hits the fan you better have one or two CS grads on staff.
I couldn't disagree more. I do not have a CS degree and have lead many teams of folks with a combination of having and not having them. It's a huge mixed bag and I'm not confident you can make a general statement in either direction.
Yes CS can prepare you by knowing some of the basics but I've run into countless people with CS degrees who don't understand how much of anything works. I've also run into many without CS degrees who understand how the damn storage implementation in Postgres works.
Anecdotally, to me with my small bag of data points from the teams I've lead and the people I've interviewed, a CS degree is only what you make of it. If you were a good student, studied and understood the content then you have an edge. If you were an okay student who just memorized things for tests and never actually applied the knowledge then you're in no better state than someone without a CS degree (perhaps even in a worse state as most of the people I know, including myself, were told by multiple leads that without a CS degree everything is going to be a struggle and to advance you career you must get one so of course I had to work even harder to prove them wrong).
> A CS degree can prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes.
I think most people can agree that it doesn't provide any guarantee, but it definitely gives you a boost in the right direction.
(For the record, I don't have a CS degree)
I once had an intern that was a CS master degree student and while he was tackling neural networks in school, I showed him how to link to a DLL 3-times in C++ and he still couldn't figure it out on his own. It also shows you having a CS degree doesn't mean anything.
I think CS though will tell you how it works.
That is just wrong. A degree is just a piece of paper. How is a piece of paper going to prevent you from making mistakes? It won't.
What you mean to say was this:
> Any intelligent person with reasonable CS knowledge will be able to work without making a lot of obvious and costly mistakes
I hope I've taught you something. :)
Either you know or you dont. CS degrees just means you learned it in a more standardized/formalized setting and have some proof to show you did it, but ultimately it's the knowledge itself that makes the difference, not how you gained it.
I have a CS degree and have worked with brilliant engineers that were HS drop-outs. It has everything to do with a passion for learning. It still requires the time focused on the study of CS, but the setting is secondary.
Anecdote: I have peers all over the CS knowledge spectrum while taking the exact same class from the same instructors. Some peers have taken that knowledge and written kernels. Others are struggling to write an array sorting method. The former have been served well by their undergrad studies. The latter would have likely fared better in a 12 week bootcamp where it's more training and less theory.
But so does writing software outside of college for 4 years.
Which one is better? That's an empirical question.
I also think the same holds true for self taught programmers. I am self taught. Early in my career (decades ago) I was using perl to process some large text files. I was building a string of relevant information like $x = $x + "some value".
So this was wrong on so many fronts. After 25 hours of running I figured something was wrong. Okay, so I'm a slow learner...
I preallocated the string and the program ran in less than 20 minutes. Now of course a string was an inappropriate data type as well. I learned a lot at that point and starting thinking about internal representations of data structures and other concepts.
This. Many years ago, our code was shitting the bed, a month before a major milestone deadline. Turns out that someone wrote an N^2 algorithm and only tested with N=5.
I don't have a CS degree--just a few semesters of combinatorics and graph theory. When I was programming, I always felt that was a huge liability. I'd confront a problem, and I knew just enough to know it could probably be reduced to some graph problem and solved using a known algorithm, but I didn't know what that was.
Indeed, yet since it's emphasized so much in college people prioritize this over cleanliness and architectural elegance which matters much more.
We would often spend class time writing cleaner, simpler code after arriving at a correct answer.
The CS grad usually understands the whole stack from UI through CPU, I/O, and memory. They don't get the distant stare when they see code with a Red black tree or a graph algorithm. They may not know about skip lists and bloom filters, but they can figure it out quickly. They understand reference vs. object equality. They understand multi-threading and concurrency strategies. They understand how to implement a hash so that there are few map collisions.
That said, a lot of IT work, web development work, database work, API work, etc. doesn't require all of that. A lot of my work does, but if it doesn't, I hire based on passion, productivity, resourcefulness, and craftsmanship.
As an analogy, many small businesses are successfully run by self-taught entrepreneurs. But, running a $100M company requires different knowledge.
Of course I may be environmentally damaged from never having worked with someone who was self taught.
The things that they have missed by not taking a Computer Science degree are precisely those things that don't tend to come up, such as big-O and the behaviour of various sorting algorithms.
Anyway, here are two paragraphs on the linked page that you might like to see:
"More than just theory, this book covers many practical areas of the industry as well, such as: Database design, SOLID, How a compiler works, sorting and searching algorithms, Big-O notation, Lambda Calculus, TDD and BDD."
"One of the more subjective parts of the book, but I was asked by many people to write about these things. Specifically: SOLID, structural design, TDD, BDD, and design patterns."
How do you know you are doing stupid shit if you don't know about complexity and don't know your algorithms?
Really, I do work on CRUD applications from time to time, and I often have to select algorithms based on complexity. Yeah, didn't have to implement one of them for ages¹, but I do have to tell coworkers things like "here you use a set", "here you use a list", "this sorting algorithm isn't stable", or "nah, just use brute force and get done" once in a while.
1 - Or, better, did implement a B-tree for a side project just a couple of months ago. Ended up just throwing it away, but I didn't know I wouldn't use it at the beginning.
I do know that some algorithms are more efficient than others, and some sorts depend on whether you expect the list to be random, sorted or partially sorted. If I find myself in a situation needing to choose between libraries. I would guess that half an hours reading would refresh what I need to know.
My CS degree (which I got late in my career) was invaluable. I got it precisely because I had no idea what I was missing and if I hadn't I wouldn't have investigated half the options I did and would not have had a good framework for thinking about them. I would have gotten the job done without it, but it wouldn't have been as good a job and I wouldn't have known what I was missing.
So, one cannot understand complexity without Big O calculus? I only learned the notation after years of calculating memory and runtime complexities for real-time code, and that was only for interviews. I don't find comparisons at the Big O level to be useful in day-to-day work.
> Really, I do work on CRUD applications from time to time, and I often have to select algorithms based on complexity.
Really, you often have to select algorithms at that level of granularity? I never have, in almost ten years of full-time development. I frequently have to select or tune algorithms based on the factors that Big O throws out, though. Then again, I do very little with searching and sorting.
So, in a way, the time and space complexity of each operation is part of the interface: maybe you don't really need to know how the internals are implemented, but at the very least you need to know the time and space complexity of the data structures you're using, and how to analyse the complexity of your code.
Look, I'm not advocating illiterate programming, nor am I a proponent of unprofessional work. I took my share of CS classes, and I can talk about algorithms and how to analyze them. The number of times I have had to do so on the job, however, can be counted on my bodily digits, without taking off my shoes.
This is probably not an original observation, but a big part of why these unproductive conversations keep on happening here has to do with how young HN demographic skews. When the biggest achievement in your life is getting that new degree you spent $BIGNUMBER $$/years on, you're going to want to belittle workmanlike programmers who don't measure up to your own standards. It takes a while for people to move on and grow up.
Depends very much on your projects. Web developers generally have it easy, as it's cheap to add more servers and free to burn CPU in the browser. Even so you can get into trouble with anything that's O(N) in the number of users. I do wonder how much use of nosql is from people who think SQL is slow because they've not got their indexes set up properly.
Game developers, of course, live and breathe performance. As do embedded and similar low-level environments. Or people working with this trendy "big" data.
We recently replaced a system whose online store took about 900ms to respond, and their back end admin section on average around 1.7 seconds. It obviously wasn't deal breaking since they lived with it for 3 years but it still was still ridiculous, and definitely had an impact on both their productivity and bottom line.
Also, if you used a list in Python (or other similar dynamic languages) swapping a list out for a set is a trivial operation...
receipts = 
receipts = set()
If you had said "small enough", I would agree, but linear walks of data that can fit in memory can easily be noticed by users, for large enough lists. (Very large lists can fit in memory.)
But, at a higher level, yes, this is always true when people talk about performance: make sure what you're optimizing actually matters. I think a charitable reading of what
ThePawnBreak said should assume one has already determined this particular operation matters for performance.
The answer turned out to be "because they're not adjacent and on this embedded system every single cache miss costs you hundreds of nanoseconds".
I strongly agree that, like woodworkers, performance improvers should measure before they take out the power tools.
That's completely untrue if you're running this check often enough.
Or they've written the query on the wrong side of the ORM so it has to pull in all the objects, check the value, and discard them.
Which may or may not be an accurate depiction (as we read personal accounts and thoughts of the commenters) of a quite marginal subset of real-life IT-professionals.
I wouldn't worry too much about what's being said or not said on HN. There are great ideas and topics to be covered here for sure, but they're sprinkled on top of a giant cake made with 1-part self-loathing, 2-parts day-dreaming, and 1-part regular huff-and-puffing.
It makes for good entertainment and procrastination.
> No idea what they mean, this books sounds great to me
That being said, not knowing Big-O while doing CS or IT work seems worrying. Sure, it's not absolutely necessary for most of the grunt work. But you should definitely have the same understanding of performance issues without knowing the fancy notation and terminology. Big-O is just a notation and a formalization of these concepts, and it helps with communication. I'd say it's still better to know it.
So, indeed the book probably doesn't hurt.
So, indeed indeed the book probably doesn't hurt ;)
Edit: Just glanced over the link in Practicality's comment about big-O and sure enough I think it may actually be useful as my ever increasing pile of data increases even further! I have to admit; as the parameters increase I find myself doing over night calculations more and more.
Here's a pattern I've noticed with code written for processing a data file by a lot of people (python-esque, using a function (match) that's "left as an exercise for the reader" to implement):
def search(filename, value):
with open(filename, "r") as f:
for line in f:
# we don't care about not matching
for v in [search1, search2, search3, ...]:
The data file is read every time something is searched. If you've got an SSD, it's not really noticeable. If you've got a spinning disk, it becomes a problem. If you're hitting network storage, you're downloading that file n times. The main issue being that each read (each iteration of the inner for) hits the hard drive, network, or similar. A simple performance hack is to move the read into main, put the whole thing into one list of lines and pass that list to search instead of the filename (modifying search appropriately):
def search(data, value):
for line in data:
# we don't care about not matching
with open("data.dat", "r") as f:
data = f.read().splitlines()
for v in [search1, search2, search3, ...]:
For very large files and very large search parameter lists, this will still take a long time, but it's much faster than the previous version when you're dealing with large files.
Shortest code I can think of to get the actual worst case that I've had a few coworkers pull off:
def search(filename, value):
with open(filename, "r") as f:
data = f.read().splitlines()
for line in data:
# we don't care about not matching
for v in [search1, search2, search3, ...]:
I'm trying to recall the structure of another case where this happened that with a more complex internal algorithm. The solution was far less obvious, but required similar refactorings. In that case it was both reading the file multiple times, and a several deep loop where one (which was by far the longest running) could be refactored to only happen once. Instead of 100 or so times, we flipped some of the loops around (moved it to be the outer loop, similar to the idea of moving the loop over all lines to be the outer loop in my other example). Big-O wasn't essential (for me), because I'd internalized that sort of thinking. But that explanation was essential for my colleagues (EEs, couple years out of school) who hadn't been exposed to that construct before (at least not enough to stick).
I don't see how Big O calculus helps here. If you have enough understanding to run that analysis you have enough understanding to see it is trivially a dumb idea.
> I'm trying to recall the structure of another case where this happened that with a more complex internal algorithm. The solution was far less obvious, but required similar refactorings. In that case it was both reading the file multiple times, and a several deep loop where one (which was by far the longest running) could be refactored to only happen once. Instead of 100 or so times, we flipped some of the loops around (moved it to be the outer loop, similar to the idea of moving the loop over all lines to be the outer loop in my other example). Big-O wasn't essential (for me), because I'd internalized that sort of thinking. But that explanation was essential for my colleagues (EEs, couple years out of school) who hadn't been exposed to that construct before (at least not enough to stick).
What's funny is that I am an EE, and the way I internalized complexity analysis and optimization was to look at the number of operations being performed, along with the cost of those operations, and design the code such that it used the fewest resources. Only later did I learn this "Big O" thing and it seemed stupid because it seemed overly complex and was telling me to throw out significant factors that I spent my career worrying about. I still don't really see the value of it over more detailed methods that seem trivially easy to me, like simply deriving an approximation of the complete polynomial describing the runtime, memory usage, or what have you. I am a systems engineer and have a bias towards modeling things, though.
I don't understand your argument regarding why you would not pay much attention to what is being said on HN. Could you explain?
But also because a significant amount of even the day's top popular topics have in the end very little relevance and impact to most businesses.
Which doesn't mean it's not interesting, though.
> There are great ideas and topics to be covered here for sure, but they're sprinkled on top of a giant cake made with 1-part self-loathing, 2-parts day-dreaming, and 1-part regular huff-and-puffing.
If you have a lot of experience, you probably already understand the concepts. The notation just gives a clear way to describe them: https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-nota...,
(Why does it take 100 ms for 10,000 records and 50 minutes for 100,000? ... hmm)
Learning these concepts through the theory is definitely more efficient.
I've never asked, but I'm guessing they don't know about any of the other things you mentioned either.
There's an ocean of small/medium businesses who just need to get shit done and don't need it optimized so it can scale to serve 7 billion people every day while running on a 1997 microwave.
HN seems to overlook that market entirely. Not hip/trendy enough, I suppose. Consultants are billing $250/hr just to write CRUD apps or reports, and they're completely booked.
Just... asking. You know, for reasons.
And when I explain what I do (mostly automating clerical or data entry/retrieval processes) to friends in other fields, a lot of them have said "Wow, we could use a lot of that at our company. We do X and Y and Z over and over and it's a huge waste of time." Small companies often force highly-skilled workers to complete their own repetitive clerical tasks, and medium companies seem to hire teams of $10/hr drones. They never think "At what point is this worth automating?" or they don't know where to look for a dev who can do it for them.
I think the small/medium business CRUD app market is extremely neglected because it's not as glamorous as machine learning or whatever else all the MIT grads are doing these days.
So I google them and read about them.
The point of a book like this is that some who knows what you don't know from being asked questions about things he didn't know by people who did know, he now knows what you don't know and can tell you what you need to know.
The only knowledge you have now is of known unknowns without knowing anything at all about the unknowns unknowns.
Know what I mean?
I now have a son going through the proper steps of receiving a degree, and I have book to supplement him for every class (except for the higher math).
The best example is taking a few options someone has for their function interfaces and reframing it in terms of type algebras which helps expose where it's overly complicated. The two best examples I have are making IO and mutations of input explicit and localized in single a location so the user doesn't have to guess which functions are changing things and which ones aren't. Once you have a good framework for describing these abstract boundaries a lot of other principles of design fall out from keeping it simple and readable.
Not saying novel use or innovative stuff requires a CS degree, merely that improved understanding of the basics of computing will likely improve the way you fully consider engineering problems. This one's definitely on my reading list for that reason (also MIT / Stanford open courses on the topic of algorithms, etc are super helpful for this kind of background)
Humans have an in built ability to think about saving the effort required to achieve any task X.
This type of book can help with the cultural roadblocks non CS degree wielding programmers may have. There are many cultural code words that CS degree holders use and being up on those can make an "impostor's" life easier.
Personally, I wouldn't want to work at a place where I felt like I needed to "fit in".
I find that the more CS fundamentals I learn, the higher the bar on my definition of "obviously stupid shit" becomes. It turns out lots of things become obviously stupid as you learn more about algorithm analysis, more about how compilers work, more about how CPU caching effects performance, etc.
Worst-case list comparison for two lists of length m and n respectively, is O(mn). For 10 million in each list, that's a bigger number than you can reasonably expect to iterate through using brute force. So brute force won't do: it is imperative that you understand how everything that touches all the data scales.
"Not doing obviously stupid shit" means understanding the implementation of data structures and how they scale, understanding how nested loops multiply runtime, how recursion levels multiply runtime. It turns out there's actually a 1:1 correspondence between knowing big-Oh and not doing obviously stupid shit.
That's why it's one of our in-person coding questions: after the candidate has written up a solution to the problem, they need to be able to analyze what they wrote and understand how it scales to different n. If they don't have an intuitive understanding of the performance of code, the chances are they'll do obviously stupid shit.
(Personally, I learned about big-Oh long before I went to college, and I believe it's probably the single most useful thing to learn in CS, right up there with compiler theory (my previous job was a compiler engineer, so I may be biased). With best practices / design patterns etc., it's extremely hard to substitute for experience; I see juniors misapplying software design concepts on a weekly basis.)
It just seems to me that not doing stupid shit can really depend on your intuitive grasp of theoretical cs. For instance, trying to find the optimal solution to an np problem might be a really bad idea, depending on the instance. But recognizing np problems doesn't really explicitly come up that often outside of theoretical cs.
Isn't that exactly the point? Based on one's knowledge, something that is obviously stupid to you could be a completely new concept to someone without that same knowledge or experience. Describing anything as "obviously stupid shit" is exactly the kind of thing that makes new self-taught programmers afraid to engage with others. It is better to give people a break, help them learn, and give them a positive environment in which to do so. This book sure seems like a good step in that direction, whether or not its topic comes up in everyday conversation.
CS is no substitute for that, but it's still an important part of getting it. It gives you a set of tools and a lexicon for understanding what's going on, and understanding some kinds of tradeoffs. As somebody who is just trying to figure all this stuff out, this is invaluable.
Calling it a "knack" kind of downplays the years (decades) of study, work, and effort people put into refining their craft. You could also say Michael Phelps has a "knack" for swimming, but he also did nothing but workout, train, practice and compete for years.
CS is no substitute for experience, but it helps.
How about in job interviews? Obviously never on the actual job, but how about interviews? They ask all sorts of crazy crap. I suspect because they've got no idea what to ask.
My current work, the problem isn't the algorithms that make the thing run slowly, its because the people who wrote it don't know how to use Django efficiently. Lots of loops in the application making database calls each time (our main page is making over 1000 database calls). A bit more thought in the database design, knowing a bit more about how Django's ORM works and you could do all that work in the database in less than 10 calls and make it a lot faster. A couple of extra database indexes in the right places, and a bit of caching and it should run even faster.
Thats the sort of knowledge that is relevant to (most of) our jobs, not big-o.
I've spent a semester doing all sorts of silly algorithms, including * sort, so it's not like I couldn't handle them if it was ever required. The thing is, time to market and code consistency will almost always leave you using the standard libraries which came with your environment.
There are obviously a few jobs where it's relevant, but I'm into things like architecture, digitizations and business development, so I'll never work one of those jobs, and neither will 95% of you.
As a person who has been programming for a long time without a CS degree, my guess to why it doesn't have "best practices, design patterns and concepts like SOLID" is because working programmers without CS degrees know that stuff already. I sure do. Things like "best practices" and "design patterns" are distillations of experience. (And I'll note that he explicitly mentions covering SOLID, so maybe double-check your complaints before posting them.)
The parts easiest for to miss, though, are the most theoretical ones. I had an intuitive grasp of algorithmic complexity from my teens, but going back and learning big-O notation was helpful in talking with theory-oriented approaches. I still don't really get lambda calculus because I've never had a practical problem where learning it would help me ship things. I wouldn't mind learning it, but it's just never gotten to the top of the to-read pile. I love the idea of a book like this because it strips out the 90% of a CS curriculum that I know and gives me only the pieces that I don't.
Not related directly but from the good old days: The Ars Digita Systems Journal. http://www.eveandersson.com/arsdigita/asj/
Had to look up SOLID. Makes sense, follows the principles I've soaked up from books and a couple O-O courses. That acronym was apparently invented 20 years after I dropped out of college :-)
Just keep learning. No drama. Just improve.
So in short, the idea of SOLID is to write units that have a single, small responsibility, split large units into smaller ones, make the interface abstracted from the implementation, and ensure that it is easy to write alternate implementations of a given interface as necessary.
Those aren't OO ideas, those are just good design ideas.
There's lots of good stuff in the front-end and web dev space as well. I know enough about UX to understand there is a lot of depth there. If you understand functional programming you'll go a long way with React and friends (and perhaps be able to innovate in this area as well.) Not all of this is in the typical CS curriculum but that doesn't invalidate the usefulness of what is.
Dynamic Programming also requires you to have a clue about it.
See https://www.youtube.com/watch?v=OQ5jsbhAv_M for some cool explanations.
Yeah everyone knows stuff like that isn't helpful in your everyday work.
But that is what they ask you in the interviews.
The tragedy is you have to do this ritual every now and then just to get a job, and that knowledge is largely useless everywhere else.
> The basic concept that people have figured out, so far, is that a number of NP-complete problems can likely be solved if we crack the Boolean Satisfiability problem.
> If NP-Complete problems get resolved, it is likely (though nobody knows for sure) that we'll crack every NP-Problem
Isn't the definition of a NP-Complete problem exactly that it is in NP _and_ every other problem in NP can be reduced to it in polynomial time. So we know _for sure_ ([Cook71]), that as soon we have a polynomial algorithm for SAT _every_ problem in NP can be solved in polynomial time, and not just some of them as the excerpt claims.
Am I missing something? Because this seems like a very confusing, if not downright wrong, way to explain NP-completeness and its link to SAT.
[Cook71] Cook, S.A. (1971). "The complexity of theorem proving procedures". Proceedings, Third Annual ACM Symposium on the Theory of Computing, ACM, New York. pp. 151–158.
I have no idea what's going on in the lambda calculus excerpt further down the page, in particular substituting (λx.x x) with (x x)? There seems to be a fairly big misunderstanding here. And lambda calculus isn't reduced in any particular order -- there are many ways to reduce the same term.
I think books like this are a good idea, and having self-taught people write them is also a good idea. BUT it looks like this particular book is in serious need of quality control.
(Also it should be split up into several parts. Any book that teaches both the Y combinator and how to configure zsh is... weird.)
> Lambda Calculus is reduced from left to right, which is very important
No, you can do feasible reductions in any order. The problem is that you need to define β-conversion and normal forms (β normal from is obtained by repeatedly applying β-conversion to the leftmost redex).
Also, the example reduction is just wrong. You cannot transform (λx. x x) (λx. x x) into x x (λx. x x) - that's just not how β-reduction works. Applying a β-reduction to that term yields the same term, because you substitute the second expression for x in the first, thus obtaining the input all over.
> That "undecidable" part is what makes this problem [the Halting Problem] NP-hard.
I mean, that's true - but it's very misleading. The most common context in which NP-hardness is discussed is when talking about NP-complete problems - those that are NP-hard and in NP. The way you go about showing that a problem X is NP-complete is showing that it's in NP (most of the time, this is the easy bit) and then showing that it's at least as hard as some NP-complete problem Y. This is usually done by transforming an arbitrary instance of Y into an instance of X in (deterministic) polynomial time, and showing that the X-instance is satisfiable iff (<=>) the Y-instance is.
Then there are problems for which NP-hardness has been shown but it's unclear whether they're in NP. Those are often of a continuous nature. I think deciding whether a level of Super Mario World is doable falls into this category.
I am a CS dropout who has been working in startups for a few years. About once a year, I see another programmer making a common mistake, and I draw on my CS knowledge to help them out.
The mistake is parsing HTML with regular expressions. It is so tempting to write a good ole' regex to grab that attribute value off of that element. And it works on the 5-6 samples you write your unit tests with. If you have run into this before, you may know that zalgo tends to appear in this situation. Parsing HTML with a regex will always fail eventually.
Of course the reason is that HTML is not a regular language. It is a context-free grammar, and thus requires a parser to parse it.
The funny part is, I failed CS Theory once, and was in the middle of taking it again when I decided to drop out. But this tidbit of knowledge has always stuck with me, and I've used it again and again to fix or prevent bugs in real software.
Takeaway: CS knowledge does have real-world value to everyday programming. You just have to know what you're looking for. And of course, always use an HTML parser to parse HTML.
(I'll add one more note to anticipate a common response. If you look at any HTML parser, you will see regular expressions in the code. These regex are used for chunking the HTML, and from that point the chunks are parsed.)
Unfortunately, the blog software had a bug and it was possible for markup to leak out of the comments. Sometimes a spurious </div> would close the comments div and the scraper would miss comments that came after it. However, the HTML did contain helpful HTML comments, something like "comments start here" and "comments end here". The reliable solution was to use a literal string search on these HTML comments to pull out the entire comments section, and then to use regexes to pull out the comments' content.
The only unbreakable rule is that there are no unbreakable rules.
The real problem with regexes is that they are hard to maintain and are extremely hard to get right to neither have false positives nor false negatives.
HTML will change about as often (only slightly less in my experiences) than regexes will need to be changed and can take more time to test and develop on each update, especially if you don't have access to a very durable parser which can stand broken HTML, unlike most XML parsers. So if a regex does the job, use it IMO.
In my experience (again about once a year), I have always seen this done with HTML that is coming in from the wild. After all, if your HTML is coming from a source you control, you most often have a means to provide the data in a format other than HTML.
(That said, the moral of this story is still correct, i.e. use a parser, but mostly because html-in-the-wild is uniformly awful and it's better to let someone else worry about that).
Never once found a CS degree a worthwhile indicator of ability.
It may be a superb book, but not even giving a sample chapter out to judge writing style, quality of explanations, depth and so on?
The problem arises when you need to make bricks. You don't necessarily need a CS degree to make bricks, but you need to learn most things taught with a CS degree.
The last paragraph is the money quote:
Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we've created over the years do allow us to deal with new orders of complexity in software development that we didn't have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks. And when you need to hire a programmer to do mostly VB programming, it's not good enough to hire a VB programmer, because they will get completely stuck in tar every time the VB abstraction leaks.
If your sensible resume with 10 years of experience doing work on an in demand field doesn't even get a recruiter call back, it might be because you did it for an enterprise company the recruiting team doesn't really know anything about. Chances are that no engineering manager ever got to read it.
I for one have worked with people coming from all kinds of backgrounds, and I don't think that the filtering makes any sense. My favorite software engineer hire had a Physics degree from Missouri-Rolla, and whose career highlight was work at a company that makes billing software for telcos. She would not have been given the time of the day in a lot of big tech companies.
If I was leading a company's recruiting strategy today, I'd aim at those kinds of candidates: If I aim at pedigree, I am competing, both economically and prestige-wise, with all the big names in tech. If instead I aim for the kinds of people that the market undervalues, I'll close more candidates and keep them longer.
But I can say I do have a good resume. I have a little bit of full time experience (before I decided to go to school full time for cs) and an internship at a company they recruit from.
So it's technically not an immediate disqualifier. But I did get a rejection quickly enough that I think it might have been automatic, and my school is still listed as "other - please specify" on their internship application.
- You've bought into the anti-intellectualism wave that's going on in the US.
- The universities near you are really poor.
Is it really the case that you learn so little on these universities that you literally have no advantage over those who didn't attend?
In interviews, and across my 25 year career, I've met some excellent degree holders who brought some great skills to proceedings and a roughly similar number of excellent developers who didn't have the paper.
I've also come across occasional degree holders who I'd barely trust to make coffee let alone put near code.
In short, people are people.
Similar to other commenters I've found no correlation between degree on CV and later ability in employment, or it to be a useful indicator for prospective employees. In consequence I do find requiring a degree for applications silly.
Sure,everything is a bell curve, but spending the majority of your time thinking of little else beyond software concepts for 4ish years will fundamentally change how you think and reason.
And then you will start working there and almost never use those skills again, because you'll be using a vast library of datastructures and algorithms in the language of your choice, and the skill you will need the most is the ability to analyze why you might need one vs the other.
Unless you're on the team which is writing said libraries, which is rare.
Most code most programmers write won't ever run at sufficient scale or be sufficiently critical that CPU or memory bounds will matter, and using e.g. one sorting algorithm vs another will be largely irrelevant. It's recognising when you need to bother optimising that is the sign of a competent developer.
And asking someone to write actual (not pseudo) code on paper/whiteboards - as I've seen in interviews - is like assessing someone's driving ability by asking them to mime driving a car.
But it's worth pointing out that a Google style interview also includes some portion which is "system design" oriented which would touch more on that kind of thing.
But they're more interested in avoiding false positives, than false negatives. So the process is designed to weed out most people.
Someone with a CS degree but little experience is going to have a hell of a time writing a Dockerfile and the scripts necessary to safely and securely deploy their application. They may be great at writing web applications but "that's not the hard part." The hard part is getting the architecture right so that the app will stay up despite being deployed, rebuilt, and re-deployed constantly.
Someone who's self-taught probably picked up Linux experience along the way (because how else are you going to teach yourself web development these days?) and probably has experience with things like AWS and the realities of hosting a web application (as opposed to just writing a web application). They will have had to setup their own development environments, do their own Linux installs, and probably got used to figuring things out on their own.
The way I see it computer science is science. The point of a CS degree is to give someone a career path as a scientist. These are the sorts of people that should be figuring out the next great algorithm (for whatever), finding uses for quantum computing, and finding solutions to similar fundamental problems.
Software development isn't computer science. It's like the difference between the structural engineers that invented that hurricane-proof nail and the architects that decided to use it in their build.
(No, not a perfect heuristic, but I think it may not hurt)
edit: It's also hard to find the author's name to find out who they are. It's not in text anywhere on the page, only in the image of the book cover.
As a self-taught programmer (English/Classics major), I am (or was) the target audience for this book, but with deliberation have returned to school (ultimately pursuing a master's in Computer Engineering).
From the OP link. The author spent a year "filling holes" in his knowledge.
An example from today: we have a very slow pattern matching code, that starts to be a bottleneck for the application. What can you consider? Well you can dive into sexy bloom filters, experiment with some Trie-based structures. But then when you analyse the problem it results that simple word lookup with a simple hashtable is the fastest solution for given constraints. No big deal, no rocket science.
Probably the same goes for rocket scientists, but one level higher ;)
For first five years of my career I never had to touch any of the stuff I learned in school and I was particuarly happy that I mostly mailed it in in those classes. Eventually my career evovled into dealing with data at massive scale and working on some of biggest services on the planet and the way I have taught myself to program completely changed. No longer it was possible to just sling code and hope that it will just scale to million of users. All the stuff that I slept around in class was relevant again and I had to go back to coursera and take those classes all over again. So moral of story, if you will be slinging webapps rest of your life you probably dont need to know Big O, different search algo, linear algebra and statistics etc but if you think you will be working on stuff thats coming around like automanous cars, IoT, augmented reality etc, you should definitely read up on it
Pipelining is another concept that I feel self-taught programmers have weaker foundations--many of whom I have worked with write code that waits for all results to become available, while an operation that's blocked by I/O doesn't necessarily mean we can't do stuff with the CPU while we're waiting for the next batch of I/O to come through.
I have also seen self-taught programmers accidentally write O(n!) or O(2* * n) functions and not realize it. I think data structures is definitely a good chapter to have. Especially when writing queries to a data store.
I think explaining how a hash table works would be excellent since it is such a useful and fast data structure. A lot of set-taught programmers sort of treat them like magical black boxes when it's not a very complex data structure yet it's practically O(1) for most insertion/reads/deletions.
Memory management is fortunately something we don't really need to worry as much about. With languages and interpreters that do a very good job of cleaning up after our code and now that memory is relatively cheap, we can afford to ignore it until we need to scale.
Of course, if you have a good product, you can get away with inefficiency and hire CS guys when you have built a unicorn. ;)
I'm just describing what I often see the self-taught "millennial" programmer is missing with a minimal number of data points. Doesn't mean all of them do, but hopefully by me listing it here a book such as the above can tackle these topics. I hope somebody reading this doesn't take offense of my laundry list; it's just intended to show what I think would be useful to cover in such a book given their importance to app development.
I'm no way knocking self-taught programmers. They learn a lot faster than those who went the formally trained route. One of my interns who was trying to get a master degree in CS I showed how to link to a library 3-times, and he still couldn't get it.
Discussions surrounding what happens when two programs try to write to the same file. How to detect when a file changes. Stuff like that. These things just don't seem to come up in high school education and I can't help but wonder why. It'd go a long way to giving people explanations as to what's wrong with their computer when it's running slow or a basic means of interpreting error messages/conditions.
> I've learned more in this last year since I started programming over 25 years ago.
Should be '... this last year than since ...'
Personally, I think '... the past year than since ...' reads better also.
Do I get a free copy of the book for pointing that out? ;)
This is unusua. With both Amazon and LeanPub you can at least gauge the writer's style or get a feel for writing quality by looking at a sample chapter and a table of contents.
I'm skeptical that all of those people praising the book bought the book site unseen.
seems like pretty simple stuff; couldn't all the information be found with one internet search away?
I should instead resort to doing a separate Google search for the TOC?
But that's not what Y is at all. It's called the fix point combinator, yes, but with the assumption you're going to use it in some curried lazy evaluation scheme with higher order functions.
All this on ycombinator.com, too!
Will there be a print version?
That's my goal, yes. I want to be sure all the edits are made and, technically-speaking, the book is completed. If I do end up with a paper edition, I'll send out a note to the mailing list in late September, early October.
- Didn't learn anything new from Linux chapter.
- Data Structures and Algorithms chapter is too basic. There is not even any implementation provided. I thought it didn't offer anything more than you could find on Wikipedia if you add some illustration done with Paper by 53 app. I'd recommend "Grokking Algorithms" by Aditya Bhargava for this topic if you want illustrated explanations with brilliant examples.
- Didn't learn anything new from Databases chapter.
- Didn't learn anything new from Programming Languages chapter. Inclusion of useless things like TIOBE Index made me furious, honestly.
- Didn't learn anything new from Software Design chapter.
I won't recommend this book to anyone working in software engineering for more or less 5 years with or without CS degree. This book merely serves as an index of what you'll encounter in the field, nothing more than that. Not even any good elaborations on those topics. Pretty meh.
Good database references here, guaranteed to learn something new.
This'll be worth the $30 to me.
The author also will give you your money back if you go through the book and think you haven't learned anything.
For some reason the combination of "imposter" and the colors evoked memories of an old paperback.
I hope the author has permission to use it?
"Unless otherwise noted, images and video on JPL public web sites (public sites ending with a jpl.nasa.gov address) may be used for any purpose without prior permission, subject to the special cases noted below."
The "special cases" mostly deal with using the NASA or JPL logos, or using photos with real humans in them commercially.