Some of the variations just had me in tears and I can‘t really explain why.
Edit: I guess it‘s the same formula as most jokes: There is some logic in the setup, and the punchline replaces it with a different logic or pattern. Only in this case, the original logic isn‘t common sense, but the expected pattern of each type of joke. A meta-joke in that regard.
It really depends on you being blindsided, but immediately recognizing the new “logic“ at the punchline.
A joke breaks expectations. Setup: "What kind of bear has no teeth?" Punchline: "A gummy bear" breaks the expectation you have trying to think of a type of animal bear.
Non-jokes or meta-jokes break the expectation but in an unusual way. "Why did the chicken cross the road?" punchline breaks your expectation of an broken expectation. You thought it was a joke, but it turned out to be a statement. The aristocrats joke turns out to be just a dirty story- the story is the "joke".
Memes are not non-jokes or meta jokes. Memes are refillable jokes in a known container. You see the picture, you know what the joke is. Just like a TV sitcom is storytelling/joketelling in a refillable container. You see the kooky friend character, you know how he/she is going to react.
The aristocrats joke happens to also be a refillable container (you can tell/retell it however you want), but that's not the part that makes it a non-joke.
I bought a lot of bitcoins that were locked up in mt gox. The lawsuit has been ongoing in Japan for many years now. Every 6 months or so they say they are just about to pay out the claims.
OP here. I went ahead and tried using the slicing instead of startswith() a few places, and moving the regexes out to module variables. (thanks for the suggestions!) It did speed up, but just a little bit (from ~32 secs to ~31 secs).
I may try changing the "not doneParsingEvent" checks into something that uses "continue", I'd imagine that would probably help. (although the Rust code is structured the same way) Another thing I've considered is looking at the data and figuring out which cases are more common and checking for those first...
I think the double caching is why "some more cleanup to cache compiled regular expressions that I thought would also speed things up a little" didn't actually speed things up - it was already cached. (See Python's re.py in def _compile(): where it uses '_cache'.)
Did you post something where you profiled your code? I didn't see it. The cprofile module gives a decent first-pass at figuring out bottlenecks, and it's sometimes not what you think it is.
If I find that the time is spent in one or a handful of functions, and the slow-down still isn't obvious, I'll then use line_profiler for more fine-grained profiling.
There are other profiling tools, but I haven't need to use them for many years for my current projects.
Yup, that makes a lot of sense - I didn't realize Python cached so many previous regex's, which is nice!
I did use cprofile a while back and I think at the time it showed a lot of time in the regexp matching. (I don't think I posted about it though) Honestly, once I implemented the parallelism I got less motivated to make it faster. But now I am kinda curious what the speed difference is between Rust and Python if I spend time trying to optimize both.
Rust should be a lot faster than Python if your time is mostly spent parsing the contents of those lines. Think that each Python op-code will be running extra assembly instructions, just to handle the virtual machine overhead.
I see a number of micro-optimizations that may give you a few percent more in Python.
For example, you use GameSituation as a mutable way to maintain parse state. You modify it with things like "gameSituation.outs += 1".
Mutating instance attributes has a much higher overhead in CPython than in C/C++ (and presumably Rust). You can reduce some of that overhead by telling the class which slots to have.
Consider "spam.py" containing the following:
class Foo:
def __init__(self):
self.a = 0
class Bar(Foo):
__slots__ = ("a",)
% python -m timeit -s 'import spam; x=spam.Foo()' 'x.a = 3'
5000000 loops, best of 5: 41 nsec per loop
% python -m timeit -s 'import spam; x=spam.Bar()' 'x.a = 3'
10000000 loops, best of 5: 32.4 nsec per loop
If you replace your GameSituation with a dict then you can get a little faster, but not enough to worry about.
Another micro-optimization is to reduce the number of temporary strings. Consider:
if (batterEvent.startswith('W+') or batterEvent.startswith('IW+') or batterEvent.startswith('I+')):
tempEvent = batterEvent[2:]
If you track the current offset in the string, then you can do things like:
if batterEvent[i:i+2] in ("W+", "I+") or batterEvent[i:i+3] == "IW+":
i += 2
and use the start position parameter in the re.match() calls.
(BTW, there appears to be a bug in your original code, since "IW+" is 3 letters long.)
Another BTW, you might change "for line in f.readlines()" to "for line in f". Shouldn't affect performance but should reduce your overall memory use.
In closing, character-level string processing in CPython is slow so I doubt you'll get all that much faster.
You might try pypy, but with the number of temporary strings you create, my guess is pypy still won't be that much faster. Should be easy to test though.
The Tech Solidarity guide at https://techsolidarity.org/resources/security_key_gmail.htm has detailed instructions on how to set up 2FA with U2F and then remove SMS 2FA. (I've been holding off because I use Firefox - hopefully U2F will get more support soon!)