Or you could, you know, clone the repository into a local working tree.

pizza · on March 13, 2022

Have you ever read this HN comment from 2007 before? [0]

> I have a few qualms with this app...

Different paths towards the same outcome should multiply, so we can increase the surface area of the bandages on the different pain points along the way.

[0] https://news.ycombinator.com/item?id=9224

VWWHFSfQ · on March 12, 2022

Snarky drive-by comments like this are the worst part of HN.

ranger_danger · on March 13, 2022

that's why lobste.rs was created.

replwoacause · on March 13, 2022

Any chance you have an invite and are feeling generous?

gravypod · on March 12, 2022

That works well for small repos or a few repos but if you want to find all cc files, at all release branch's, in your entire company and check for some exploit it is helpful to have a VFS. Makes it so you could also support N SCMs through one API. You just need to make a new VFS.

westurner · on March 12, 2022

Isn't there already a good way to push computation closer to the data?

GmailFS and pyfilesystem (userspace FUSE) and rclone are neat as well.

https://stackoverflow.com/questions/1960799/how-to-use-git-a... explains about the `git push` step that git-remote-dropbox enables: https://github.com/anishathalye/git-remote-dropbox

bspammer · on March 12, 2022

GitHub also has a code search now: https://cs.github.com

gravypod · on March 12, 2022

Needing to tie into a specific API (like codesearch) couples you to the specific storage backend (Github). If you build your software to operate on a POSIX-y file system, you can support anything that shows up as a file system. For example: A local working tree of files, an NFS share, or now a remote git repository.

westurner · on March 12, 2022

Running the code where the data already is saves network transfer: with data locality, you don't need to download each file before grepping.

Locality_of_reference#Matrix_multiplication explains how the cache miss penalty applies to optimizing e.g. matrix multiplication: https://en.wikipedia.org/wiki/Locality_of_reference#Matrix_m...

cle · on March 12, 2022

Stop for a moment and consider what the tradeoffs of that could be and why it might not work well in some situations.

pwdisswordfish9 · on March 13, 2022

- Higher latency

- Less efficient use of bandwidth, as the git protocol is optimised for bulk transfers

- Not resilient against unreliable connectivity

- No support for repositories not hosted on GitHub

Yes, I can think of some.