Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are useful CS theories you actually use at work?
100 points by muzani on June 8, 2018 | hide | past | favorite | 43 comments
For me, I think it was database design that was most useful - things like normalization especially.

Binary search I use a few times for an ecommerce cart. I remember to avoid putting loops inside loops.

I use Don't Repeat Yourself (DRY) and SOLID principles for code design.

Besides that, I haven't used them all that much. Which do you find most useful?

Amdahl's law, by far. I find myself explaining it on almost a daily basis. I work on big enterprise software and our junior engineers tend to enjoy optimizing a piece of some system, without realizing that the speedup of the system as a whole will be negligible.

I find performance / architecture stuff in general to be useful to know, along with having an intuition for orders of magnitude. Too much needless complexity exists in enterprise CRUD web apps for a 1.01x speedup.

I've found the speed of the framework generally matters more than what you do with it for the same reason. The size of the framework often dwarfs any application built with it these days.

We have a huge project in Spring Boot that crawls along, and a newer one in Dropwizard that's so fast we could support 1000x our current traffic. Funny because they both use Hibernate and JAX-RS to do roughly the same thing

I can relate to that. I just spent 2 weeks going down a rabbit hole trying to optimize a recursive algorithm to be able to calculate some values quickly.

I finally gave up and returned to the slow method but cached the results. Now we just warm the cache at night when usage is low.

Put it together in a day. Wish I had just gone this route from the beginning.

The term for that is memoization, for googling purposes.

Graph theory, all the time.

Our product basically allows users to build a whole bunch of assets (specifically, LOB wrappers, workflows and forms). These may depend on each other (e.g. a workflow displays a form, which displays data from LOB). If you want to redeploy those to a new environment in the correct order, that's a graph problem: topological sort (and you need to find cyclic dependencies, i.e. strongly-connected-components).

You cache users from upstream identity providers (such as Google or Azure AD) primarily because those providers can be horribly slow. You want to determine if a user is [recursively] part of a group, keeping in mind that some providers (Azure AD) allow for cyclic group memberships. Another graph problem: reachability. You want to take that same graph and cache it on nodes that query frequently. There's a paper for that[1] (pre-order and post-order numbering).

Lately, everything seems to be a graph problem - but I may just be wearing graph-tinted lenses. If you've never bumped into graph theory, there's probably a system somewhere that's a lot more complex than it needs to be.

[1]: https://eprint.iacr.org/2012/352.pdf

>Graph theory, all the time.

Seconded. Having at least a working knowledge of graph theory is essential for anyone, even CRUD app developers. You don't need to know how to implement a DFS blindfolded with one eye closed. But you do need to be able to recognize when you are facing a graph problem and what libraries to apply. The alternative you end up with is some really nasty O(n^2) or even O(n!) type code.

I've worked in several different fields: financial analytics, network monitoring, enterprise storage, and most recently electronic design automation. Each of them has involved different sets of formal theories. Sometimes combinatorics has come in handy, sometimes relational algebra, sometimes binary arithmetic, and so on.

Off the top of my head, I can only think of two that have been universally useful in all of them, and they have been very useful too. Graph theory comes straight to mind, and the only thing that might rival it is automata theory (including formal languages).

Graph theory and automata theory often prove to be useful even in problems that aren't conventionally thought of as graph problems or automata problems, but once you get a good grasp on them, you start to recognize patterns in problems (or sub-problems) that naturally lend themselves to elegant solutions if the problem is re-framed from the perspective of the appropriate theory. I've wound up on several projects where the requirements seemed really complex and gnarly at first glance, but we would be able to create successful, maintainable, extensible systems whose architectures essentially consisted of a chain of ETL ⇒ {graph,automaton}+ ⇔ UI — basically the same shape as the conventional compiler pipeline but applied to data, and sometimes interactive.

Graphs are extremely versatile as data structures, with well-studied properties, subtypes, and algorithms. Automata are similarly versatile as models of processes, and similarly well-studied.

The two are easily combined. One project in particular would take in a set of rules from the user, construct an automaton out of it, feed data through it, which would generate yet another automaton that doubled as a digraph. This autographaton modeled the control flow of the former automaton during execution, amenable to graph algorithms (search, flow, sort, pattern matching, etc.), arbitrarily modifiable by the user, and executable to generate an arbitrary amount of mock data that had the same characteristics (as defined by the input rules) as the data that it characterized. It was designed to aid in the analysis and simulation of network traffic, but it wound up being general enough that people in other parts of the company started using it for other things, too (that I knew of at the time I left the company: I/O loads, software testing, and automated bug report arbitrage using core dumps).

You could probably write such a program without any understanding of the theories involved, but I think that knowing the theories had some definite benefits. The theories led the program to be structured in a clean, elegant, modular, and straightforward way. They helped the program be general yet powerful and useful. Importantly, they helped the program to operate in such a way (and to present such an interface) that was intuitive for the tasks—and the people—it was designed for. I absolutely credit the solid theoretical foundations with the success of the program, and I've gone on to design programs with similar foundations yet intended for different sorts of tasks—and people—that have also been rather successful, though, admittedly, perhaps not so generically applicable.

May I ask what's LOB?

Line of business (applications)?

Exactly. Things like SAP, Sharepoint and so forth.

I'm guessing Large OBject - a data type commonly found in databases that stores a large chunk of unstructured/semistructured data. Usually a large chunk of text (CLOB) or binary data (BLOB).

Data structures class - understanding the variations of even a List and which to use based on how I read and grow it. Queues, Maps, Indexes, Trees... all the time.

Software Engineering - hated it while in school, thought it was overkill. Then spent time on a hundred man year project - super important.

Programming Languages - having recently moved from Java to Ruby, this class was the bridge to make that transition smooth, having understood the theory of languages first.

Others have said database and data modeling.

Scientific Computing - was implementing some crazy PhD modeling from Matlab and actually had to think about float vs double, summation/multiplication techniques of large sets and their compounding errors, solving techniques (like least squares).

Basically, every software job that i’ve had that wasn’t building a website required my degree.

A lot of the "rules" you learn in school are treated like Rules when they are often more like "It Depends". Take DRY for example: It's all well and good to share code but you need to balance that against having to rebuild (possibly breaking) multiple projects that depend on the library.

That said, the Single Responsibility Principle and Separation of Concerns (the "S" in SOLID) are things that I find important to adhere to. The trick with many CS ideas is to apply from them from first principles.

Example: Instead of some rule like "Your functions should never be more than N lines long." you should instead, ask yourself if the function is trying to do too many things i.e. deviating from the Single Responsibility principle.

Understanding algorithmic complexity was highly useful for me recently when moving a system from inefficient prototype to full-scale version. Complexity theory helped me out with determining the exact pain points in my design and what needed to be altered how (which was most of it). Similarly, graph theory heavily influenced that initial design, which was far more elegant. When moving that system from a single-threaded program that ran on one core into a distributed system running on dozens of VM's with an 'eventually consistent' datastore between them, I tapped into classes on network theory, distributed computing, and operating systems.

Also, during the original design, I tapped into theory of functional programming in coming up with a highly functional architecture (even though the 'functions' are actually abstract processes on a virtual machine).

I'm a big proponent of theoretical CS!

Knowing CS is extremely important. That's the difference between being a modern day factory worker and a high value expert.

That's bull. Being an expert does not simply follow from attending a few CS classes and getting a degree. Coming out of college you know very little. The least thing my company can use is overinflated egos.

I also bristled at the factory worker reference your replied to, as it refers to every relative I have (first one to get a degree). However your reply still isn't logical.

1) A CS degree is a lot more than a few classes of CS, not even counting math and other content.

2) I think when people here are saying "degree" it's really just a short cut for saying "taking some concentrated time to study important theoretical concepts that are useful to know as you go about real life software development". In other words, you could do the same thing without college. It's just harder to juggle multiple things and not have the structured path. Even this is not perfect because it's still unfortunately a resume signal, but that's a separate problem.

3) Coming out of college with little professional or practical experience has nothing to do with the value of leveraging what was learned in college as you go about obtaining that experience. You're company may not want people without practical experience, that doesn't mean they're against having a theoretical grounding also, these are two separate criteria for them to desire or not desire.

4) This has nothing to do with ego. As I say the previous reference made was an unfortunate choice, but doesn't generalize to all people who take this path.

How can I demonstrate knowledge of CS in my resume?

I don't know. It shows up easily in interviews or degree

And there’s a scary part to this. How much more are we losing out on because of what we don’t know? Can you justify going back for a masters in CS, not simply for any career reason, but just because of the theoretical power it could add to the foundation of your ability to solve problems and approach a broader set of them?

I find myself agreeing with most of the commenters here. Even for simple things like algorithmic complexity, I’ve watched colleagues choose data structures and algorithms with no regard for simple attributes that would make a big difference as soon as code gets out of the toy stage.

Why is it so hard to appreciate the importance of this?

The idea of seeing a doctor who never went to medical school would send us running scared. Hiring a developer with no formal computer science training? No problem, they’re smart, they’ll just pick it up as they go.

Is the job a trade or a profession? People who see the job as a trade don't care about formal training.

I find the distinction not very relevant to this question.

Of course, there's more justification in more rigorously vetting someone's credentials and abilities who is building a bridge or performing surgery (although software development work has absolutely caused loss of life).

However the question posed here is the value added by a solid theoretical grounding, and I believe it's not too different in terms of pure skills benefit.

I bet I could wing it in a lot of professions whether it be developer or surgeon, the main difference being more people would die in the latter example. Considering only the benefit to a person's abilities of more rigorous training, I see no reason why skills would not increase similarly.

Would you mind giving a couple examples? (Other than complexity, I suppose.)

Still learning.

Could go on all day, you can ping me privately if you like.

There are other simple things, logic and circuits get you familiar with many things. How to take any language expression and change each term to invert it or manipulate parts of it quickly and knowing exactly what will happen.

Compiler theory is helpful not much because many people end up designing languages but for other reasons. For example DSLs (small domain specific languages) are used all the time for different purposes.

Learning about formal proofs of correctness. Again even though it's not widely done just being aware of it helps you understand the limitations and what's possible with the reliability of software.

Some grasp of discrete math is required to work with lots of things like 3d graphics.

Physics is a supplement that would just exclude you from a fair amount of jobs without some level of knowledge.

No one is mastering all these things at an undergrad level. I'm sure I've forgotten tons of it. But just the exposure, knowing its there helps, and when you need to actually use it it's easier to jump back in and ramp up or go deeper.

Little's Law - https://en.wikipedia.org/wiki/Little%27s_law. Queues are everywhere.

Distributed Systems theory. Especially the complexity of failure detection in distributed systems. Slow vs failed.

Networking. 99 out of 100 developers do not have sufficient understand of networking.

Many operating systems concepts. Virtual Memory Management. Scheduling. Intel x86 architecture and how CPUs actually work.

Relational Algebra.

Multiversion concurrency control.

Defense in depth. Information Assurance in general.

AST. Parsing in general.

https://en.m.wikipedia.org/wiki/De_Morgan's_laws are incredibly useful when trying to analyze or simplify conditionals.

I always found this to be really interesting despite it being rather simple. Definitely got some mild satisfaction simplifying IF statements. Professionally haven't used it too much (explicitly) given 99% of branches use at most 2 conditions. Likely use it this upcoming semester for DSP w/ FPGAs.

Mind explaining how exactly de Morgan’s law comes up in your day to day?

Whenever I'm trying to understand a complex if statement that someone else (or past me) wrote, transforming the conditional to contain only 'or' or 'and' but not both often makes the logic clearer. I also find myself transforming new conditions for understandability before committing.

For really tricky cases, I'll use a Karnaugh map, but those don't come up very often.

Understood, (Sorry commented before your edit).

Interesting that you've actually seen complex enough cases to actually need a Karnaugh map, definitely one of the things I learned in class and proceeded to never use again.

I see dependency injection as writing functions as combinators and I frequently use category theory constructs. I work mainly with functional programming. Also lambda calculus helps me think about abstractions.

Understanding compilers and interpreters gives a lot of insight into speeding things up and making code and program easier to be statically analysed.

Automata and state machines are useful. Algorithmic complexity helps avoiding hard problems.

I see model theory directly when modeling domains.

Hard CS is rarely needed in my projects but there are times that it's unfeasible to make things work without it.

My last project involved making calculations, clustering and specific spatial queries over a set of 4D points ( GPS + time ) there is no way to make competitive complex applications if no one in your team understands O-notation, Dynamic Programming, Euristics, Graphs Theory, etcetera.

I assume this where a white-board comes into place and you teach them about O-notation, etc, right?

Does computer architecture count? I find myself using that knowledge all the time to optimize hot code for physical simulations.

I once had a temp job where myself and a few other temps were assigned to alphabetize a stack of about a thousand folders. I managed to get everyone to execute a merge sort, and it ended up being pretty efficient.

When I was building a simple parser, realized I was making a state machine and used a regex instead. Generally, having some estimates of feasibility; I was once asked to solve an NP-complete problem.


Building services as utilities with generalized input/output interfaces that know as little about your application almost always results in better code. If I can write something that could easily be pulled out into a generic package and used on projects with totally different data models and business logic, then I've built something right.

Queueing theory. Computer systems have plenty of queues at different layers, and queueing theory helps understanding their behavior.

Speed of light, about 1 foot (30cm) per nanosecond. Always a lower bound on how fast information travels in a system.

Minor nit: Upper bound on how fast information travels, lower bound on how long it takes.

KISS. Keep It Simple Stupid.

remember you or others of our Ilk will have to touch or replace that code some day. Getting fancy or sneaky is fun with competitions or bragging rights but in practical coding simple is ALWAYS best.

Data structures - knowing your basic data structures will help you chose the right path for the use case. Even in SQL it is helpful to understand B-Trees.

Yep. I worked with programmers who used compared all users to all users to check for pairs of users close to one another; they had no knowledge of spatial search trees. The job that calculated all those pairs dropped from hours to seconds.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact