Hacker News new | past | comments | ask | show | jobs | submit login
Spot the bug in this Python code (2023) (dwrodri.gitlab.io)
8 points by dwrodri 12 months ago | hide | past | favorite | 14 comments



Pandas has use cases, but i see it sooo much overused for no reason.

People automatically think "csv? I gotta install Pandas!" even if CSV is like 20 lines long. Then follows 15 lines of Pandas DSL that i have to decipher to modify anything

I hate it


I'm waking a bit of a zombie thread because unfortunately I had to step away when this was re-uploaded, but I have to resoundingly agree with you. The overuse of pandas is quite painful.

For what it's worth, here's my defense: The data in this neat CSV actually came from simulations that were dumping Parquet files on the order of 100s of GiBs, which was the reason I was using pandas in the first place. This code snippet was lifted from a processing pipeline where pandas was used for other reasons, but I admit, it probably would have been best to break it out into its own little file and not include the module at all.


On the other hand, if I need to parse a CSV with python, why not load pandas if you can bang it out super quickly?


An aversion to using something beyond the stdlib for no good reason probably. Personally I've seen people reach for pandas only to accidentally coerce data in non-obvious ways. Clearly it depends on what you're planning on doing with the data afterwards though.

    import csv
    with open(filename, 'r') as f:
        r = csv.reader(f)
        for row in r:
            tsc, _, pc, *_ = row
            print(int(tsc, 16), int(pc, 16))


The Python interpreter as acting totally sanely.

The author is asking it to loop over a list (or any iterable) and destructure each of the elements. But is then providing a a list containing some strings, none of which can be destructured.

Reducing to just the second part of the loop:

    [(x, z) for x, _, z, *_ in ["0x2d41854", "3", "0x80001a14", "(0xbffd)"]]
IE in non-list-comprehension form:

    for x, _, z, *_ in ["0x2d41854", "3", "0x80001a14", "(0xbffd)"]:
IE, the first element is expanded as:

    x, _, z, *_ = "0x2d41854"
Which is clearly nonsensical.


Is an generator already too much overhead? Just a small function with an double for loop that yields the data. Easier readable and also bugfree.


If you need to explain how your clever clever Python code works, you're doing it wrong.


My programming background is that I started in High School in Java, and then moved to a C and C++ workload in my college courses, and then in my later college years I picked up Python for research work. The importance of this trajectory, is that throughout those first 6-7 years, there weren't a lot of people telling me to be less clever, or to always focus on doing it the most "obvious" way. In particular, jumping from being in an environment where I was constrained to using C++03 with little access to a standard library to Python 3.X made me feel like I was drunk on the power of clever one liners. More specifically, it was the sense that I can get SO much more done with a screenful of code. It was also this exposure to Python on a regular basis that introduced me to functional programming, which didn't help with my propensity towards a code-golfy style.

It wasn't until I started writing code in Go years later that I really felt like I had encountered a programming experience that TRULY made me pay for my subconscious emotionally-driven need to get a dopamine hit after cramming a bunch of logic into a few lines of text. I have confess, when I worked on Go codebases, there is still a lack of "spark" that comes having code that feels so straightforward. But learning how to deal with the reality that it's quite frankly emotionally immature of me to shirk away from programming stuff just because it doesn't tickle my brain has probably been one of my most significant growth moments as a programmer.

If I'm writing code that pays my bills, I'm much better in the long run setting aside my emotions and keeping it as straightforward as possible, even if that occasionally means that simple stuff like walking through a CSV takes up an entire screen of text.

All of these words to say, you're right, and I thought it was worthwhile to discuss how I ended up realizing you were right.


Well yes and no. Let's steer around the usual car analogy:

I own a drill driver that delivers enough torque that it could cause a nasty wrist injury. Most of the time all is fine.

===== feel free to ignore this waffle =====

Screws will generally cam out first, when it is miss-applied - ie the driver jumps out of the screw and probably destroys the tracks in the screw head. You eventually learn the correct torque and speed settings etc. Drilling - hammer or not (conc/brick or wood/plastic), speed etc. Again, you eventually learn how to drill efficiently.

Then you put something like a 50cm long 16mm auger bit in. An auger bit is basically a sharp edged corkscrew shaped drill bit that is designed to drill a fairly wide and very deep hole in wood. There are also paddle bits which are flat paddle shaped (quelle surprise) with a pointy part to start the hole and guide the main part of the tool. These generally are used for wider and shallower holes than augers.

So you stab your long auger into say a sleeper (150mm thick, hardwood). The sleeper is part of a wall. Friction soon becomes an issue - around 30cm down or earlier for slightly built people. If you know what you are doing, you position a leg in such a way that the handle of the drill/driver hits it (in the horizontal plane) and your wrist is safe.

===== /feel free to ignore this waffle =====

I generally find that <statement> followed by ... "you're doing it wrong" is unhelpful. Why not critique what is on offer instead of a put down with no working?


I'll stick to DIY analogies, since it appears to speak to us.

In this case the author is using a tool for operating on lists on a thing that's not a list.

It's like they're trying to use a track / plunge saw for cross-cutting a single 2x4. You can do it, but you'll need to rig up some scaffolding (a list) around it in order for the track to have something to balance on. Meanwhile there's a circular saw and a speed square next to you on the table.


the bug is the diy csv loader. use instead the csv module or its equivalent. My experience parsing csv's (how hard can it be?) often end with nested lists of iterators, complex logic to deal with edge cases, and other things that fail on painful-to-diagnose corner cases - exactly like what the author describes.


I found the mistake obvious, but I have some experience with Python.

The use of the `for ... in [...]` to destructure the list is quite clever.

I would have simply indexed the list.


I don't think it's clever, it's quite unintuitive and ugly. To me anyway. If the author really wanted to cram everything into a list comprehension, the logical way to think about it is:

    (
        (int(tsc, 16), int(pc, 16))
        for tsc, _, pc, *_ in (line.strip().split(",") for line in fp)
    ),
Which is, of course, almost as ugly. Just use a for loop and yield


Right, I should have said clever code golf.

I think I would have written it like that:

    def parse(line):
      p = line.split(",")
      return ((int(p[0], 16), int(p[2], 16))
    (parse(line)) for line in fp)
And when it was still cool: map(parse, fp)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: