Hacker News new | past | comments | ask | show | jobs | submit login

And, just because I'm in a golfing mood:

    def gen_stats_tech2(dataset_python):
        start = time.time()

        totals = defaultdict(lambda: (0, 0, 0.0))

        for product_id, order_id, quantity, price in dataset_python:
            o_count, o_quant, o_price = totals[product_id]
            totals[product_id] = (o_count + 1, o_quant + quantity, o_price + price)

        product_stats = [
            [product_id, num_orders, total_quantity,
             round(total_price / num_orders, 2)]
            for product_id, (num_orders, total_quantity, total_price)
            in totals.items()
        ]
        end = time.time()
        working_time = end - start
        return product_stats, working_time
950x improvement. I think the former reads more pleasantly and you're likely to do better with something like pypy though rather than dragging the code through the muck.

Most improvements are from use of some of the builtins like dictionary's .items method rather than making a getitem call per loop, removal of references to globals (int) since the values stored already have an integer value, removing N calls to append and letting the list-comp manage, and removing a store-then-unpack.

Sadly I don't think the top loop can be reduced in a similar manner because it's self-referential. Still, that made for a fun hour or so, I was really hoping to hit 1,000x improvement :/




Nice work!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: