Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: structured-ripgrep – Ripgrep over structured data (github.com/orf)
15 points by orf on March 10, 2023 | hide | past | favorite | 2 comments



While this is cool I don't see the relation to ripgrep.

Ripgrep is about searching very quickly and extracting fixed strings to quickly skip over data. Looking at the JSON format it is literally parsing every line as JSON into a in-memory structure, extracting the selected value then running a regex on that.

To be more ripgrep-like it should extract the pattern into something that can be searched over the whole file (the example searching for LIMITED$ in bank_name could be extracted to "bank_name".*LIMITED") then once a potential match is found it can go and actually parse the JSON and validate it. Extracting the pattern is going to be relatively complex as you will need to consider every way that a JSON string can be escaped or not escaped. So your pattern needs to support every option. (At least two options for every character considering hex escapes).

Doing this for tar is going to be much harder as you don't have a clear way to seek back to the start of the file. However it is probably less important because as long as you are using a streaming parser the tar metadata parsing will likely be a small component of the cost. Especially if the files aren't tiny.


While I think it's great to explicitly cite ripgrep as an inspiration, and even to allude to it in the executable name, it doesn't feel right to me to call this structured-ripgrep without a close relationship or shared authorship with the original ripgrep.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: