Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  > a ‘fix it later’ approach
Oh man, I hate how often this is used. Everyone knows there's nothing more permanent than a temporary fix lol.

But what I think people don't realize is that this is exactly what tech debt is. You're moving fast but doing so makes you slow once we are no longer working in a very short timeline. That's because these issues compound. Not only do we repeat that same mistake, but we're building on top of shaky ground. So to go back and fix things ends up requiring far more effort than it would have taken to fix it early. Which by fixing early your efforts similarly compound, but this time benefiting you.

I think a good example of this is when you see people rewrite a codebase. You'll see headlines like "by switching to rust we got a 500% improvement!" Most of that isn't rust, most of that is better algorithms and design.

Of course, you can't always write your best code. There's practical constraints and no code can be perfect. But I think Knuth's advice still fits today, despite a very different audience. He was talking to people who were too obsessed with optimization while today were overly obsessed with quickly getting to some checkpoint. But the advice is the same "use a fucking profiler". That's how you find the balance and know what actually can be put off till later. It's the only way you can do this in an informed way. Yet, when was the last time you saw someone pull out a profiler? I'm betting the vast majority of HN users can't remember and I'd wager a good number never have



I completely agree with most of what you've said, but personally I rarely use a profiler. I don't need it, I just think about what I'm doing and design things to be fast. I consider the time complexity of the code I'm writing. I consider the amount of data I'm working with. I try to set up the database in a way that allows me to send efficient queries. I try to avoid fetching more data than I need. I try to avoid excessive processing.

I realize this is a very personal preference and it obviously can't be applied to everyone. Someone with less understanding might find a profiler very useful and I think those people will learn the same things I'm talking about - as you find the slow code and learn how to make it fast you'll stop making the same mistakes.

A profiler might be useful if I was specifically working to optimize some code, especially code I hadn't written myself. But for my daily work it's almost always good enough to keep performance in mind and design the system to be fast enough from the bottom up.

Most code doesn't have to be anywhere near optimal, it just has to be reasonably fast so that users don't have to sit and stare at loading spinners for seconds at a time. Some times that's unavoidable, some times you're crunching huge amounts of data or something like that. But most of the time, slow systems are slow because the people who designed and implemented them didn't understand what they were doing.


  > I consider the time complexity of the code I'm writing
First, I applauded you for doing this. This is very good practice and I want to encourage everyone to do it.

Second, it's important to remember that big O is very naïve, especially when you drop constant. You're size of n can make a big difference. O(n) is worse than O(n^2) for small n. When considering constants it's also reasonable to have O(n^3) algos be better than O(n)! This throws a wrench in analysis making it more time consuming.

But where the profiler really shines is *you don't write all your code from scratch*. So it tells you when to use a library and when to rewrite. The only other option is to painstakingly go through every library you use.

So the profiler is a big time saver. Big O for quick and dirty first pass and profiler for moving from alpha to beta and beyond.

Don't get me wrong, I'm not full of best habits either! I definitely don't do this for most personal projects nor most research code (which is most of what I write, though I profile a different thing...) but a profiler is the most cost effective way to write *production* code. It becomes exponentially more important as code size grows and team size grows. After all, not everyone is using your best practices. So you need practices that scale and work as you relinquish control.

Unfortunately, you need to profiler routinely. Luckily you can automate this and attach it to the CI pipeline so by doing that the cost isn't high


> O(n) is worse than O(n^2) for small n.

It can be, it isn't necessarily. And I don't care about small values of n. If I'm spinning through two lists of size 5 it doesn't matter if option A is slightly faster than option B, both options will run on the order of nanoseconds. The lower time complexity solution will continue to be reasonably fast as the input grows, pretty soon the difference will be measured in seconds, minutes, hours, days... Using a worse complexity solution for small inputs is a micro optimization you can use some times but it is a micro optimization. Using the best time complexity is what I like to call a macro optimization. It's the default solution. You might deviate from it at times, but you're much better off using complexity to guide your decisions than not doing it. Once you know what you're doing you can deviate from it when appropriate, some times worse complexity is better in specific cases. But 99.9% of the time, best time complexity is good enough either way.

I usually don't need the code I write to be optimal. It doesn't matter if it's a bit slower than it technically could be - as long as it's fast enough and will continue to be fast when the input grows.

Some times you may want to squeeze the absolute maximum performance and in that case you may be able to micro optimize by choosing a worse complexity solution for cases where you know the inputs will always be small. If that assumption ever breaks, your code is now a bottle neck. It can be useful thing to do in certain niche situations, but for the vast majority of code you're better off just using the best complexity solution you can. Otherwise you may come back to find that the code that ran in a millisecond when you profiled it, now takes 20 minutes because there's more data.

How do you run a profiler in CI? Do you hook it up to your tests? You need some code that runs with actual input so I guess you either profile the test suite or write tests specifically for profiling? Or maybe you write benchmarks and hook a profiler up to that?

This sounds like it could be a useful technique but very time consuming if you have to write the profiler tests/benchmarks specifically, and kind of useless if you just hook it up to your general test suite. I want my tests to run fast so I don't use big data for testing, and you won't really see where the bottlenecks are unless you're testing with a lot of data.


You're right, "can" is the right word. But you got hyperfixated on that and ignored the rest of what I said. It's not just small n as in 5. It can be hundreds or thousands and this can depend on the constants that people usually drop when doing complexity analysis.

Essentially, I'm saying use the profiler to double check your estimates. Because that's what (typical) complexity analysis is, an estimate. But a profiler gives you so much more. You can't always trust the libraries ands even when you can you need to remember liberties have different goals than you. So just grab a profiler and check lol. It isn't that hard

As for connecting to a CI, you're over thinking it.

How do you normally profile your code? Great! Can you do that programmatically? I bet you can. Because you're profiling routines.

You're already writing the test cases, right? RIGHT?

Btw, you can get the CI to work in nontrivial ways. It doesn't have to profile every push. You could, idk, profile every time you merge into a staging branch? You can also profile differently on different branches. M There's a lot of options between "every commit" and "fuck it, do it in prod". I'm sure you're more than smart enough to figure out a solution that works for your case. Frankly, there's no one size fit's all solution here, so you gotta be


Even if it is hundreds or thousands the difference will generally be imperceptible to humans. And that's all I care about, I care about how humans experience my software. If some background job that runs nightly takes 2 hours that's fine by me. I'll write it to be efficient and I've never actually written a job that takes anywhere near that long to run, but if I did I probably wouldn't waste time and energy optimizing it unless asked to. I've seen jobs written by others that take hours to run, and I haven't done anything about it because nobody's asked me to - because nobody cares.

I mostly work on web apps. If I was working on an application with a large user base and it was struggling to keep up with a large number of requests during peak hours, I might try to optimize the most frequently used or the most performance sensitive endpoints to ease the load. Then I might profile a call to these endpoints to see where I should focus my efforts. But if the app is responding quickly during peak traffic and generally just working perfectly, I see no reason to spend extra time profiling things just for the sake of it. I'm not paying the cloud bills and the people who are aren't asking me to reduce them. Realistically it would probably take years or decades to recoup the cost of having me investigate these things anyway, it's not worth it.

Yes, I already write tests. But like I said in one of my earlier responses, those tests exist to test functionality not performance. For example the test dataset might run super fast with your O(n^2) algorithm but the prod dataset might be much larger and take hours to run. Profiler won't tell you that unless you have a test with a huge dataset to provoke the issue. So now you're writing specialized tests just for profiling, which in my opinion falls under premature optimization. It also makes your test suite take longer to run which makes you less likely to run the tests as often as you otherwise would.

I'd rather just go with the low complexity option and revisit it later if necessary. I very rarely have any issues with this approach, in fact I'm not sure I have ever had a performance problem caused by my code. If there's code that stalls the application it's always someone else's work. My code isn't perfect, but it's generally fast enough. In my mind that's pragmatic programming - make it work, make it clean, make it fast enough. Most code doesn't need to be anywhere near optimal.


IFF you understand computing fundamentals, and IFF you have a solid understanding of DS&A, then yes, this is a reasonable approach. At that point, a profiler will likely show you the last 1% of optimizations you could make.

Most web devs I’ve met do not meet that criteria, but sadly also have rarely used a profiler. They’re not opposed to doing so, they’ve just never been asked to do so.


I think you're missing "IFF you write all the code and use no libraries."

The strategy works well, as you point out, for a single person but doesn't scale and can't be applied to library code without doing a deep dive into the library. Which at that point, a profiler is far more time effective.

  > They’re not opposed to doing so, they’ve just never been asked to do so.
I think the best option is to integrate profiling into the CI framework. Like a git pre-hook or even better, offload so that your VMs profile while you're testing different environments.

I think mostly it is a matter of habit. I mean let's be honest, we probably don't care about profiling our for fun project codes. Where profiling really matters is in production. But I think the best way to normalize it is just to automate it. Which, honestly, I've never seen done and I'm really not sure why.


I use lots of libraries and it works fine. Popular libraries are usually written by competent developers, some times they will even tell you the time complexity of a function in the documentation. Most of the time library code won't be a problem.

It really seems to me like you don't understand the advice about premature optimization. This is it, this is what Knuth is talking about. Profiling everything, optimizing everything. That's premature optimization.

Mature optimization is when you have an application and you see that part of it is slow. You investigate, for example with a profiler. Maybe you write benchmarks. Then you try a different approach and benchmark that as well (either with an actual benchmark library or just by running the code locally) and see if the new solution is faster.

I like to refer to what I do as macro optimization. I don't spend significant time worrying about performance, I just write code and kind of just keep complexity, IO and such in mind, avoiding badly scaling algorithms if I can. I don't care if the badly scaling algorithm might be slightly better than the one that scales, because that difference is nearly always negligible.

Like you said in your other comment, some times the constant can make the O(n^2) algorithm faster than the O(n) one for small inputs. But that difference is nothing, it's gonna be like one runs in 5 nanoseconds and the other runs in 10. So you saved 5 nanoseconds and nobody noticed nor cared. Then the input grows and now your badly scaling algorithm runs in 2 hours whereas the scaling one runs in a few milliseconds. That's why you use the best scaling algorithm by default. Even if you think the input will never grow you might be wrong, and the consequence of being wrong is bad. So you have a high risk and low reward.


  > It really seems to me like you don't understand the advice about premature optimization. This is it, this is what Knuth is talking about. Profiling everything, optimizing everything. That's premature optimization
You greatly misunderstand what I'm saying

  1) profile your code
  2) optimize what needs to be optimized 
  3) there is no step 3
Why did you suddenly assume I said you should optimize everything? We don't have infinite time on our hands


Because one does not just profile their code. It branches, different parts do different things. You need to run the code with sensible input, multiple different inputs to hit the different branches etc.

If you have a good workflow that includes profiling your code that's cool, it's probably a good way to improve the performance of your code. But personally I think it's enough to test the code and make sure it works and is reasonably fast. I'm not going to spend time eliminating nano and low digit milliseconds.

If something actually keeps users waiting I'll look into it. As long as it's near instant it's good enough for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: