

Ask HN: Premature optimization – when to use? - nekopa

I know the old adage, but I think I have a use case where it is false, so I want to see what HN has to say about it.<p>Fist of all, this is a private project, something I am working on with me as the only user. So the whole market fit issue shouldn&#x27;t be relevant*.<p>So I am working on a webscraping project, mainly through trying to apply what I have leared through the learn python the hardway course. So far so good, everything is working well. But when I deployed my first spider, it took an hour, and returned ~20 million lines for one year (I want to get data going back at least 5 years).<p>So its working the way I want, but I am thinking, should I start looking at my code and see if there are ways I can speed it up?<p>Either way, it&#x27;s not a problem, I have spare computers I can set to run the program for hours on end.<p>But should I waste electricity running a hack, or should I try to start optimizing now?
======
TheCams
I think what could be considered an early optimization (but not premature) is
the architecture design of your software.

I dont know how you did your application, but in this example, you could have
made the design choice of a heavily multithreaded application to fetch and
process multiple pages at the same time.

It could be an early optimization that saves you a lot of refactoring later.

------
aburan28
Find your bottleneck and see what function calls are the slowest/fastest and
find out the root cause of any perceived lack of performance. Just run python
-m cProfile <yourscript.py> and you'll get that information

~~~
nekopa
Thanks for this, I am very new to python, so I will try this out.

But I think at the moment my bottleneck is the scraping part. The current
program hits 365 pages (each day for a year) and managed to pull down the 20
million results.

I am now trying to figure out how to pull more info from each day, and that's
why I am thinking about trying to optimize. Does cProfile measure things like
time on a webpage (via bs4)?

Do you have any links on optimization "thinking", as in, I don't know if it is
my approach that needs optimizing, vs my code.

------
dudul
In your case this is not premature optimization. You have written functioning
code, and use it to tackle your problem, you just saw (and proved) that it
wasn't fast enough, now you can optimize.

Premature optimization is when you try to write "clever" and "fast" code
without first exercising it in real world scenario to actually see that it is
not fast enough.

~~~
nekopa
Cheers for this idea.

I was just wondering when I should stop the prototyping 'hacks', and sit down
to try doing things smarter.

I have tackled the "MVP" of my problem, and I am still hacking away to get
exactly what I need, but I am starting to think about the whole idea of
'technical debt'. I have enough code now that it is starting to not _fit_ in
my head.

I am wondering if this is the inflection point for starting to optimize -
maybe abstract the code enough that I can keep building and keep it all in my
head at the same time...

