Hacker News new | past | comments | ask | show | jobs | submit login

Would that be in the data loading that you are getting the most benefit?

I'm curious, since most of the big libraries are already just cuda calls anyway but I'm always interested in anything to speed up the full process.




I can't speak for the parent commenter, but there is often code processing the input/output of machine learning models that benefits from high-performance implementations. To give two examples:

1. We recently implemented an edit tree lemmatizer for spaCy. The machine learning model predicts labels that map to edit trees. However, in order to lemmatize tokens, the trees need to be applied. I implemented all the tree wrangling in Cython to speed up processing and save memory (trees are encoded as compact C unions):

https://github.com/explosion/spaCy/blob/master/spacy/pipelin...

2. I am working on a biaffine parser for spaCy. Most implementations of biaffine parsing use a Python implementation of MST decoding, which is unfortunately quite slow. Some people have reported that decoding dominates parsing time (rather than applying an expensive transformer + biaffine layer). I have implemented MST decoding in Cython and it barely shows up in profiles:

https://github.com/explosion/spacy-experimental/blob/master/...


In this case was multicore computation without GIL if i remember correctly.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: