Hacker Newsnew | past | comments | ask | show | jobs | submit | p0's commentslogin

How does this compare to the BPE crate [1]? Its main selling point is support for incrementally re-tokenising text, but it's also faster than tiktoken.

[1] https://crates.io/crates/bpe


I'm working on incremental re-tokenizing next. Then I'll run some benchmarks against this crate too.


Check out https://cs.github.com/about/syntax -- indeed, by default terms are searched in both content and paths. You can restrict to one or the other with `content:` or `path:`.


Then their documentation is wrong. I learned the hard way that GitHub code search didn't search file names in my case. I searched for a short bare string with some alphabet letters and one underscore, and it failed to find the file with that exact string in the file name, costing me a lot of time missing what I was looking for.

Unfortunately I can't reproduce the problem publicly because it happened while searching a private repo.


You could do this with `-path:docs/` (or `NOT path:docs/`).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: