Hacker News new | past | comments | ask | show | jobs | submit login
Greppability is an underrated code metric (morizbuesing.com)
1217 points by thunderbong 9 months ago | hide | past | favorite | 613 comments



Grepping for symbols like function names and class names feels so anemic compared to using a tool that has a syntactic understanding of the code. Just "go to definition" and "find usages" alone reduce the need for text search enormously.

For the past decade-plus I have mostly only searched for user facing strings. Those have the advantage of being longer, so are more easily searched.

Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.


Scenarios where an IDE with full syntactic understanding is better:

- It's your day to day project and you expect to be working in it for a long time.

Scenarios where grepping is more useful:

- Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

- You just opened the project for the first time.

- It's in a language you don't daily drive (you write backend but have to delve in frontend code, it's a 3rd party library, it's configuration files, random json/xml files or data)

- You're editing or searching through documentation.

- You haven't even downloaded the project and are checking things out in github (or some similar site for your project).

- You're providing remote assistance to someone and you are not at your main development machine.

- You're remoting via SSH and have access to code there (say it's a python server).

Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.


Further important (to me) scenarios that also argue for greppability:

- greppability does not preclude IDE or language server tooling; there's often special cases where only certain e.g. context-dependant usages matter, and sometimes grep is the easiest way to find those.

- projects that include multiple languages, such as for instance the fairly common setup of HTML, JS, CSS, SQL, and some server-side language.

- performance in scenarios with huge amounts of code, or where you're searching very often (e.g. in each git commit for some amount of history)

- ease of use across repositories (e.g. a client app, a spec, and a server app in separate repos).

I treat greppability as an almost universal default. I'd much rather have code in a "weird" naming style in some language but have consistent identifiers across languages, than have normal-style-guide default identifiers in each language, but differing identifiers across languages. If code "looks weird", if anything that's often actually a _benefit_ in such cases, not a downside - most serialization libraries I use for this kind of stuff tend to do a lot of automagic mapping that can break in ways that are sometimes hard to detect at compile time if somebody renames something, or sometimes even just for a casing change or type change. Having a hint as to this fragility immediate at a glance even in dynamically typed languages is sometimes a nice side-effect. Very speculatively, I wouldn't be surprised if AI coding tools can deal with consistent names better than context-dependent ones too; greppability is likely not specifically about merely the tool grep.

And the best part is that there's almost no downside; it's not like you need to pick either a language server, IDE or grep - just use whatever is most convenient for each task.


I'd say that writing this example from the article is a pretty big downside, especially if you had e.g. 20 different cases and not just 2.

```ts const getTableName = (addressType: 'shipping' | 'billing') => { if (addressType === 'shipping') { return 'shipping_addresses' } if (addressType === 'billing') { return 'billing_addresses' } throw new TypeError('addressType must be billing or shipping') } ```


Grep is also useful when IDE indexing isn't feasible for the entire project. At past employers I worked in monorepos where the sheer size of the index caused multiple seconds of delay in intellisense and UI stuttering; our devex team's preferred approach was to better integrate our IDE experience with the build system such that only symbols in scope of the module you were working on would be loaded. This was usually fine, and it works especially well for product teams, but it's a headache when you're doing cross-cutting work (e.g. for infrastructure projects/overhauls).

We also had a livegrep instance that we could use to grep any corporate repo, regardless of where it was hosted. That was extremely useful for investigating failures in build scripts that spanned multiple repositories (e.g. building a Go sidecar that relies on a service config in the Java monorepo).


If running into this, make sure to enable 64-bit intellisense and increase the ram limit, by default it is 4gb.


As someone who runs into that daily, I'm surprised I never heard of this before.

I seem to have found the 64-bit mode under "Tools > Options" then "Text Editor > C/C++ > IntelliSense". The top option is [] Enable 64-bit IntelliSense.

But I can't seem to find the ram limit you mentioned and searching for it just keeps bringing up stuff related to vscode. Do you know where it is off the top of your head or a page that might describe it?


The RAM limit is 32 bit Intellisense. 2^32 is 4GiB.

Edit: I take that back, this was a first-principles comment. There's a setting 'C_Cpp: Intelli Sense Memory Limit' (space included).


Thanks for that, while searching google for that result only lead me to vscode's IntelliSense settings. Searching for "Intelli Sense Memory Limit" setting in visual studio didn't lead me right to the result but it did give me a whole settings page that "matched". I found the setting in visual studio is "IntelliSense Process Memory Limit" which is under "Text Editor > C/C++ > Advanced" then under header "IntelliSense" towards the bottom of the section.


> It's your day to day project and you expect to be working in it for a long time.

I don't think we need to restrict the benefits quite that much—if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.

Going further, I'd equally rather have plugins ready to go for every language my company works in and use them for exploring a foreign codebase. The navigation tools all work more or less the same, so it's not like I need to invest effort learning a new tool in order to benefit from navigation.

> Yes, an IDE will save you time daily driving. But there's no reason to sabotage all the other usecases.

Certainly don't sabotage, but some of these suggestions are bad for other reasons that aren't about grep.

For example: breaking the naming conventions of your language in order to avoid remapping is questionable at best. Operating like that binds your business logic way too tightly to the database representation, and while "just return the db object" sounds like a good optimization in theory, I've never not regretted having frontend code that assumes it's operating directly on database objects.


> if it's a project that isn't my day-to-day but is in a language I already have set up in my IDE, I'd much prefer to open it up in my IDE and use jump to definition and friends than to try to grep and hope that the developers made it grepable.

It's funny, because my preference and actual use is the exact opposite: for a project that isn't my day-to-day, I'm much more likely to try to grep through it rather than open it in an IDE.


> if it's a project that isn't my day-to-day

Another overlooked advantage of greppability is to be able to fuzzy the search, or discover related code that wasn't directly linked to what you were looking for.

For instance if you were hunting for the method updating a `foo_bar` instance, grepping it will also give you instances of `generic_foo_bar` and `shim_foo_bar`. It can be noise, as it can be stuff you wouldn't have seen otherwise and save your bacon. If you're not familiar with a project I think it's quite an advantage.

> hope that the developers made it grepable

hopefully it's enforced at an organization level.


- You're fully aware that it would be better to be able to use tooling for $THING, but tooling doesn't exist yet or is immature.


you would not believe the amount of time i spent pretty-printing python dicts by hand last week



yeah, pprint is why i was doing it by hand ;)


I used to pipe things through black for that. (a script that imported black, not just black on the command line.)

I also had `j2p` and `p2j` that would convert between python (formatted via black) and json (formatted via jq), and the `j2p_clip`/`p2j_clip` versions that would pipe from clipboard and back into clipboards.

It's worth taking the time to build a few simple scripts for things you do a lot. I used to open up the repl and import json to convert between json and python dicts multiple times a day, so spending a few minutes throwing together a simple script to do it was well worth the effort.


part of what i ended up with was this:

    {'country': ['25', '32', '6', '37', '72', '22', '17', '39', '14', '10',
                 '35', '43', '56', '36', '110', '11', '26', '12', '4', '5'],
     'timeZone': '8', 'dateFrom': '2024-05-01', 'dateTo': '2024-05-30',
black is the opposite extreme from what i wanted; https://black.readthedocs.io/en/stable/the_black_code_style/... explains:

> If a data structure literal (tuple, list, set, dict) or a line of “from” imports cannot fit in the allotted length, it’s always split into one element per line.

i'm not interested in minimizing diffs. i'm interested in being able to see all the fields of one record on one screen—moreover, i'd like to be able to see more than one record at a time so i can compare what's the same and what's different

black seems to be designed for the kind of person who always eats at mcdonald's when they travel because they value predictability over quality


My understanding of black is that it solves bikeshedding by making everyone a little unhappy.

For aligned column readability and other scenarios, # fmt: off and # fmt: on become crucial. The problem is that like # type: ignore, those start spreading if you're not careful.


My only complaint with black is that it only splits long definitions into per-line if they exceed a limit. That’s probably configurable, now that I write it down.

Other than that, I actually quite like its formatting choices.


Line length is definitely configurable. All it takes is adding the following on pyproject.toml[1]:

  [tool.black]
  line-length = 100
Aside from matrix-like or column aligned data, the only truly awful thing I've encountered has been broken f-string handling[2].

[1]: Example from https://github.com/pythonarcade/arcade/blob/808e1dafcf1da30f...

[2]: https://github.com/psf/black/issues/4389


yeah; unless your coworkers are hindu, you can solve 'bikeshedding' about which restaurant to go to by going to mcdonald's, too


Fair. I spent some time trying to figure out how to make it do roughly that before giving up.


i kind of get the vibe from the black documentation that it's written by the kind of person who thinks we're bad people for wanting that, and perhaps that everyone should wear the same uniform because vanity is sinful and aesthetics are frivolous


we keep having similar problems, lol.


You forgot massive codebases. Language servers really struggle with anything on the order of the Linux kernel, FreeBSD, or Chromium.


I honestly suspect that the amount of time spent dealing with the issues monorepos cause is net-larger than the gains most get from what a monorepo offers. It's just harder to measure because it tends to degrade slowly, happen to things you didn't realize you were relying on (until you need them), and without clear ways to point fingers at the cause.

Plus it means your engs don't learn how to deal with open source code concerns, e.g. libraries, forking, dependency management. Which gradually screws over the whole ecosystem.

If you're willing to put Google-scale effort into building your tooling, sure. Every problem is solvable. Only Google does that though, everyone else is getting by with a tiny fraction of the resources and doesn't already have a solid foundation to reduce those maintenance costs.


The projects mentioned were all single projects with single repos


Sure. But those are far from the only massive codebases out there, and many of the biggest are monorepos because sorta by definition they are the size of multiple projects.


IMO the biggest problem with big projects is finding the right file and finding the right lines of code, so greppability is even more helpful with big repos.


clangd works fine for me with the linux kernel. For best results build the kernel with clang by setting LLVM=1 and KERNEL_LLVM=1 in the build environment and run ./scripts/clang-tools/gen_compile_commands.py after building.


Ok, but now every time you switch commits you have to wait for clangd to reindex. Grepping in the kernel is just as fast and you can do it without running a 4GB+ process that takes 10+ minutes to index


Sure. I wasn't intending to claim that there is no reason to care about greppability. Just providing some tips about getting clangd to work with linux for those who might find that useful.


>It's your day to day project and you expect to be working in it for a long time.

Bold of everyone here to assume that everyone has a day to day project. If you're a consultant or for other reasons you're switching projects on a month to month basis, greppability is probably the top metric second to UT coverage.


They said the scenario in which that would be useful was IF: "It's your day to day project and you expect to be working in it for a long time". The implication being that if neither of those hold then skip to the next section.

I don't think anyone is assuming anything here. I've contracted for most of my career and this didn't seem like an outlandish statement.

Also, if you're working in a project for a month, odds are you could set up an IDE in the first few hours. Not sure how any of this rises to the level of being "bold".


> - Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

You need a better IDE.

> - You just opened the project for the first time.

Go grab a coffee

> - It's in a language you don't daily drive

Jetbrains all products pack, baby.

> - You haven't even downloaded the project and are checking things out in github (or some similar site for your project).

On GitHub, press `.` to open it in a web-based vscode. Download it & open it in your IDE while you are doing this.

> - You're remoting via SSH and have access to code there (say it's a python server).

Don't do this. Check the git hash that was deployed and checkout the code locally.


> You need a better IDE.

No IDE will resolve this if the same code is preprocessed more than once to produce different files; or if you often build with different and conflicting preprocessing values; or if your build uses a tool your IDE doesn't know about; or if some of the preprocessing and compilation occurs at runtime.

> Go grab a coffee

So, you're saying "wait".

> Jetbrains all products pack, baby.

JetBrains CLion won't even try to properly index C/C++ files which aren't officially part of the project.

Plus, if you have errors in some places - which occurs while you're editing your code - then that breaks very badly with JetBrains IDEs. e.g. missing closing paren or #endif


> - You're remoting via SSH and have access to code there (say it's a python server).

VSCode SSH Extension for the win.


"I've not personally found the things you mentioned to ever be a problem, therefore they aren't actually real problems and you just need to get good"


> - Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

LSP-based tools are fine with this, generally. A syntactic understanding is an incomplete solution. I suspect GP meant LSP. (as long as compile_commands.json or equivalent is avilable).

Many of those other caveats are non-issues once LSPs are widespread. Even Github has lsp-like go-to-def/go-to-ref, though it's not perfect.


"Go to definition" often doesn't work in dynamic languages like Python without type hints; it might not work when the code is dynamically generated.


It always work in VSCode if your environment is correctly configured.


No; Python is a dynamic language and when you see a method call like

    x.so_something()
you might not know what is the type of x and what method does do_something refers to.


> Your language has #ifdef or equivalent syntax which does conditional compilation making syntactic tools incomplete.

Your other points make sense, but in this case, at least for C/C++, you can generate a compile_commands.json that will let clangd interpret your code accurately.

If building with make just do `bear -- make` instead of `make`. If building with cmake pass `-DCMAKE_EXPORT_COMPILE_COMMANDS=1`.


Does it evaluate macros? Because macros allow for arbitrary computation.


The macros I see in the real world seem to usually work fine. I’m sure it’s not perfect and you can construct a macro that would confuse it, but it’s a lot better than not having a compilation db at all.


- you just switched branch/rebased and the index is not up to date.

- the project is large enough that the IDE can't cope.

- you want to also match comments, commented out code or in-project documentation

- you want fuzzy search and match similarly named functions

I use clangd integration in my IDE all the time, but often brute force is the right solution.


I abandoned VSCode and went back to vim + ctags + ripgrep after a year with the most popular IDE. I miss some features but it didn’t give me a 10x or even 1.5x improvement in my own work along any dimension.

I attribute that mostly to my several decades of experience with vi(m) and command line tools, not to anything inherently bad about VSCode.

What counts as “better” tools has a lot of subjectivity and circumstances implied. No one set of tools works for everyone. I very often have to work over ssh on servers that don’t allow installing anything, much less Node and npm for VSCode, so I invest my time in the tools that always work everywhere, for the work I do.

The main project I’ve worked on for the last few years has a little less than 500,000 lines of code. VSCode’s LSP takes a few seconds fairly often to maintain the LSP indexes. Running ctags over the same code takes about a second and I can control when that happens. vim has no delays at all, and ripgrep can search all of the files in a second or two.


I have similar feelings... I still use IntelliJ IDEA for JVM languages, but for C, Rust, Go, Python, etc., I've been using vim for years (decades?), and that's just how I prefer to write code in those languages. I do have LSP plugins installed in vim for the languages I work in, and do have a key sequence mapped for jump-to-definition... but I still find myself (rip)grepping through the source at least as often as I j-t-d, maybe more often.


Did you consider Neovim? You get the benefit of vim while also being able to mix in as much LSP tooling as you like. The tradeoff is that it takes some time to set up, although that is getting easier.

That won’t make LSP go any faster though. There’s still something interesting in the fact that a ripgrep of every line in the codebase can still be faster than a dedicated tool.


Considered it and have tried repeatedly to get it to work with mixed success. As you wrote, it takes "some time" to set up. In my case it would only offer marginal improvements over plain vim, since I'm not that interested in the LSP integration (and vim has that too, through a plugin).

In the environments I often work in I can't install anything or run processes like node. I ssh into a server and have to use whatever came with the Linux distro, which means sticking with the tools I will find everywhere. I can't copy the code from the server either. If I get lucky they used version control. I know not everyone works with those constraints. I specialize in working on abandoned and legacy code.


Yes ok. And legacy code might be a good example where grep works well, if it's fair to argue a greater propensity for things like preprocessors, older languages and custom builds that may not play as well with semantic-level tools, let alone be written with modern tooling in mind.


Lol, I'm not working with COBOL or Fortran. Legacy code in my world means the original developers have left, not that it dates from the 1970s. Mostly I work with PHP, shell scripts, various flavors of SQL, Python, sometimes Rails or other stuff. All things modern LSPs can handle.


can you not upload executables over ssh, say for policy reasons or disk-space reasons? how about shell scripts?

i mean, i doubt i'm going to come up with some brilliant breakthrough that makes your life easier that you've somehow overlooked, but i'd like to understand what kinds of constraints people like you often confront

i'm just glad you don't have to use teamviewer


I don't have to use TeamViewer, though I very occasionally have to use Windows RDP.

You can transfer any kind of file over ssh. scp, sftp, rsync will all copy binaries. Mainly the issues come down to policy and billable time. Many of my customers simply don't allow installing anything on their servers without a tedious approval process. Even if I can install things I might spin my wheels trying to get it to work in an environment I don't have root privileges on, with no one willing to help, and I can't bill for that time. I don't work for free to get an editor installed. I use the tools I know I can find on any Linux/BSD server.

With some customers I have root privileges and manage the server for them. With others their IT dept has rules I have to follow (I freelance) if I want to keep a good relationship. Since I juggle multiple customers and environments I find it simpler not having to manage different editors and environments, so I mostly stick with the defaults. I do have a .profile and .vimrc I copy around if allowed to, that's about it.

I can't lose time/money and possibly goodwill whining about not having everything just-so for me. I recently worked on a server over ssh that didn't have tmux installed. Fortunately it did have screen, and I can use that too, no big deal. I spent less than 60 seconds figuring that out and getting to work rather than wasting hours of non-billable time annoying someone about how I needed tmux installed.


i see, thanks!

wrt rdp, i feel like rdp is actually better than vnc or x11-over-ssh, but for cases where regular ssh works, i'd rather use ssh

i wasn't thinking in terms of installing tmux, more like a self-contained binary that doesn't require any kind of 'installation'


I used the word "install" but the usual rule says I can't install, upload, or execute any non-approved software. Usually that just gets stated as a policy, but I have seen Linux home directories on noexec partitions -- government agencies and big corporations can get very strict about that. So copying a self-contained binary up and running it would violate the policy.

I pretty much live in ssh. Remote Desktop means a lot of clicking and watching a GUI visibly repaint. Not efficient. Every so often I have customers using applications that only run on Windows, no API, no command line, so they will enable RDP to that, usually through a VPN.


i see! but i guess your .profile and .vimrc don't count?


They aren't executables.


my cousin wrote a vt52 emulator in bash, and i was looking at a macro assembler written in bash the other day: https://github.com/jhswartz/mle-amd64/blob/master/amd64. i haven't seen a cscope written in bash, but you probably remember how the first versions of ctags were written in sh (or csh?) and ed. so there's not much limit to how far shell functions can go in augmenting your programming environment

if awk, python, or perl is accepted, the possibilities expand further


Sure, but this is taking things to a bit of an absurd extreme. If I worked in a restrictive environment where I couldn't install my own tools, I don't think I would be in a position to burn a ton of my employer's time building sophisticated development tools in bash.

(One-off small scripts for things, sure. But I'm not going to implement something like ctags or cscope or a LSP server in bash.)


certainly it's absurd! nobody would deny that. on the other hand, the problem to solve is also an absurd problem

and i wasn't suggesting trying to bill for doing it, but rather, if you were frequently in this situation, it might be reasonable to spend non-billable time between clients doing it


I guess I don’t see the problem as absurd. As a freelancer I need to focus on the problems the customer will pay for. I don’t write code for free or in my spare time anymore, I used to years ago. i feel comfortable working with the constraints imposed, I think of that as a valuable skill, not a handicap.


i see. thank you very much for being willing to share your invaluable experience and hard-won wisdom


There's also helix now, which requires next to no setup, but requires learning new motions (subject is before the verb in helix)


I looked at Helix but since I dream in vim motions at this point (vi user since it came out) I'd have to see a 10x improvement to switch. VSCode didn't give me a 10X improvement, I doubt Helix would.


Helix certainly won't give you a 10x improvement. It tends to convert a lot of people moving "up" from VS Code, and still a decent chunk, but certainly fewer neovim users moving "down".

Advantages of Helix are pretty straightforward:

1. Very little configuration bullshit to deal with. There's not even a plugin system yet! You just paste your favorite config file and language/LSP config file and you're good to go. For anything else, submit a pull request.

2. Built in LSP support for basically anything an LSP exists for.

3. There's a bit of a new generation command line IDE forming itself around zellij (tmux that doesn't suck) + helix + yazi (basically nnn or mc on crack, highly recommended).

That whole zellij+helix+yazi environment is frankly a joy to work in, and might be the 2-3x improvement over neovim that makes the switch worth it.


Like I wrote, I looked at Helix. Seems cool but not enough for me to switch. And I would have to install it on the machines I work on, which very often I can't do because of company policies, or can't waste the non-billable time on.

I only recently moved from screen to tmux, and I still have to fall back to screen sometimes because tmux doesn't come with every Linux distro. I expect I will retire before I think tmux (or screen, for that matter) "sucks" to the point I would look at something else. And again I very often can't install things on customer servers anyway.


Tmux does suck pretty bad though?

It conflicts with the clipboard and a bunch of hotkeys, and configuring it never works because they have breaking change in how their config file works ever 6months or so.

These days I only use it to launch a long running job in ssh to detach the session it's on and leave.


That’s more or less what I use it for — keeping sessions alive. I don’t use 90% of the features. vim does splits, and there’s ctrl-Z to background it and get a shell.

I know I could get more out of tmux but haven’t really needed to. I use it with the default config. I have learned from experience that the less I try to customize my environment the less non-billable time I waste trying to get that working and maintaining it.


You can use a tool like ALE (the Asynchronous Linting Engine) to run LSPs in normal-Vim; I've been doing it for years and have no complaints! It's rapid.


VSCode is not an IDE, it's an extensible text editor. IDEs are integrated (it's in the name) and get developed as a whole. I'm 99% certain that if you were forced to spend a couple of months in a real IDE (like IDEA or Rider), you would not want to go back to vim, or any other text editor. Speaking as a long time user of both.


I get your point, but VSCode does far more than text editing. The line between an advanced editor and an IDE gets blurry. If you look at the Wikipedia page about IDEs[1] you see that VSCode ticks off more boxes than not. It has integration with source code control, refactoring, a debugger, etc. With the right combination of extensions it gets really close to an IDE as strictly defined. These days advanced text editor vs. "real" IDE seems more like a distinction without much of a difference.

You may feel 99% certain, but you got it wrong. I have quite a bit of experience with IDEs, you shouldn't assume I use vim out of ignorance. I have worked as a programmer for 40+ years, with development tools (integrated or not) that I have forgotten the names of. That includes "real" IDEs like Visual Studio, Metrowerks CodeWarrior, Symantec Think C, MPW, Oracle SQL Developer, Turbo Pascal, XCode, etc. and so on. When I started programming every mainframe and minicomputer came with an IDE for the platform. Unix came along with the tools broken out after I had worked for several years. In high school I learned programming on an HP-2000 BASIC minicomputer -- an IDE.

So I have spent more than "a couple of months in real IDEs" and I still use vim day to day. If I went back to C++ or C# for Windows I would use Visual Studio, but I don't do that anymore. For the kind of work I do now vim + ctags + ripgrep (and awk, sed, bash, etc.) get my work done. At my very first real job I used PWB/Unix[2] -- PWB means Programmer's Work Bench -- an IDE of sorts. I still use the same tools (on Linux) because they work and I can always count on finding a large subset of them on any server I have to work with.

I don't dislike or mean to crap on IDEs. I have used my share of IDEs and would again if the work called for that. I get what I need from the tools I've chosen, other people make different choices, no perfect language, editor, IDE, what have you exists.

[1] https://en.wikipedia.org/wiki/Integrated_development_environ...

[2] https://en.wikipedia.org/wiki/PWB/UNIX


What IDE existed for the HP 2000? (Where I learned, too. In Portland :)


Me too -- in Portland, Cleveland HS, mid-70s.

The HP 2000 [1] had a timeshared BASIC system that the school district made available to schools, over ASR-33 teletypes with dial-up modems. The BASIC system could edit, run (translate to byte code and execute), manage files. No version control or debuggers back then. The HP 2000 had another layer of of the operating system accessible to administrators (the A000 account if I remember right) but it was the same timeshared BASIC system with some additional commands for managing user accounts and files.

No one familiar with modern IDEs would recognize the HP 2000 BASIC system as an IDE, but it was self-contained and fully integrated around writing BASIC programs. HP also offered FORTRAN for it but not under the timeshared BASIC system. A friend wrote an assembler (in BASIC!) and taking advantage of a glitch in the bytecode interpreter we could load and run programs written in assembly language.

After high school I got a job as night computer operator with the Multnomah County ESD (school district) so I had admin access to the HP 2000, and their two HP 3000 systems, and an IBM computer they used for crunching class registrations. Good times.

Someone had an emulator online for a while, accessible over telnet, but I can't find it now.

[1] https://en.wikipedia.org/wiki/HP_Time-Shared_BASIC


i think it's very reasonable to describe time-shared basic systems like that as ides. the paradigmatic example of an 'ide' is probably turbo pascal 1.0, and of the features that separated turbo pascal from 'unintegrated' editor/compiler/assembler/linker/debugger setups, i think the dartmouth timesharing system falls mostly on the 'ide' side of the line. you could stop your program at any point and inspect its variables, change them, evaluate expressions, change the source code, and continue execution. runtime errors would also pop you into the interactive basic prompt where you could do all those things. i believe the hp 2000 timesharing basic had all these features, too


At the time, in the context of other software development environments (like submitting decks of punch cards) the HP 2000 timeshared BASIC environment would count as an IDE. Compared to Turbo Pascal or any modern IDE it falls short.

HP TSB did not have a REPL. If your program crashed or stopped you could not examine variables from the terminal. You could not peek or poke memory locations as you could with microcomputer BASICs (which didn't support multiple users, so didn't have the security concern). You had to insert PRINT statements to debug the code. TSB BASIC didn't have compile/link steps, it tokenized the code as you entered the lines, and the interpreter amounted to a big switch statements on the tokens. P. J Brown's book Writing Interactive Compilers and Interpreters (1981) describes how TSB works. Eventually I got the source code to TSB (written in assembler) and figured it out for myself.

Other BASIC implementations that popped up around the same time had richer feature sets. In my senior year at high school I got (unauthorized) access to a couple of Unix systems in Portland, ordered the Bell Labs Technical Journal issues that described Unix and C, and taught myself from those. I didn't get paid to work on a Unix system until several years later (detours into RSTS-11, TOPS-20, VMS, Microdata, Pr1me, others) but I caught the Unix and C bugs young and I still work with those tools every day.

Some programmer friends and more than a few colleagues over the years have made fun of my continued use of what they call obsolete and arcane tools. I don't mind, I have never felt like I did less or sloppier work than my co-workers, and my freelance customers don't care what I use as long as I can solve their problems. Most of the work in programming doesn't happen at the keyboard anyway. I do pay attention and experiment with all kinds of tools but I usually end up going back to the Unix tools I have long familiarity with. That said I did spend many years in Visual Studio, MPW, CodeWarrior, and MPW writing C and C++ code, and I do think those tools (well, maybe not MPW) offered a lot of benefits over coding with vim and grep, for the projects I did back then.

Maybe ironically I use an iPad Pro, I don't have a desktop or laptop anymore. So I have the most modern hardware and a touch-based (soon) AI-assisted operating system that runs a terminal emulator during my work time.


thank you! i didn't realize that it lacked the features of the microcomputer basics; i have the impression that they were copying the same dartmouth timesharing system that hp was copying, but of course i've never used dtss myself

what kind of obsolete and arcane tools do you use? vim seems to be pretty popular among the youngsters these days. a friend of mine a bit younger than you prefers ex


My usual programming toolbox includes vim, ctags, bash, rg (or grep if I have to), sed, awk, tmux (or screen), git, and usually a CLI for one of MySQL, PostgreSQL, SQLite, Redis. The only vim plugins I like to have: ctrlp (file fuzzy finder) and advanced text motions.

On the iPad I use Blink shell, the Github client, Apple Notes. I always have a paper notebook and pen, my short-term memory gets less reliable every year.

I have noticed a lot of younger people using or at least learning vim and CLI tools, maybe retreating from the complexity of modern software development setups. Maybe just a retro fad, I don’t know.

I also see terminal setups described online that try to reproduce the GUI experience — lots of colors and planes and widgets. Neovim and zellij both look like that to me — a lot of extraneous functionality and visual clutter that mimics VSCode. I prefer a more minimalist environment. I don’t even use syntax coloring.

Everyone has to find the tools that work for them, that takes time. I think most programmers figure out at some point in their career that continually experimenting and polishing their tools doesn’t always help get the work done, and when you freelance you get more conscious of billable vs. non-billable time.


i think you're mostly more modern than i am, despite being much older. i generally use grep rather than rg, screen rather than tmux, emacs rather than vim for programming (though i use vim on my phone and for writing email), no redis, no fuzzy find, and no ipad. the only differences in the opposite direction are that i usually use python or perl rather than sed and awk, and i do use syntax coloring—for me, terminal colors were one of the big pluses of moving from sunos 4 to linux

i don't think younger people using vim is a fad, but we'll see. vim to me, especially neovim, seems mostly like emacs in vi clothing


I think you're arguing semantics here in a way that's not particularly productive. VSCode can be set up in a way that is nearly as featureful as an IDE like IntelliJ IDEA or Eclipse, and the default configuration and OOB experience pushes you hard in that direction. VSCode is designed for software development, not as a general text editor; I would never open up VSCode to edit a configuration file or type up a text file of notes, for example.

Something like vim is designed as a general text-editing tool. Sure, you can load it up with plugins and scripts that give you a bunch of features you'd find in an IDE, but the experience is not the same, and the "integrated" bit of "IDE" is still just not there.

(And I say this as someone who does most of his coding in vim, with LSP plugins installed, only reaching for a "proper" IDE for Java and Scala.)

One metric I would use: if I can sit down at a random co-worker's desk and feel more or less at home in their editor of choice, then it's probably an IDE that has reasonable defaults and is geared for software development. IDEA and VSCode would qualify... vim would certainly not.


What do you use as a general text editor then? I use vscode, having shortcuts like Ctrl+X, Ctrl+D are super useful.


A good IDE can be so much better iff it understands the code. However this requires the IDE to be able to understand the project structure, dependencies etc. which can be considerable effort. In a codebase with many projects employing several different languages it becomes hard to get and maintain the IDE understands everything state.


And an IDE would also fail to find references for most of the cases described in the article: name composition/manipulation, naming consistency across language barriers, and flat namespaces in serialization. And file/path folder naming seems to be irrelevant to the smart IDE argument. "Naming things is hard"


And especially in large monorepos anything that understands the code can become quite sluggish. While ripgrep remains fast.

A kind of in-between I've found for some search and replace action is comby (https://comby.dev/). Having a matching braces feature is a godsend for doing some kind of replacements properly.


I think the first sentence of the author counters your comment. What you described works best in a familiar codebase where the organizing principles have been maintained well and are familiar to the reader and the tools are just the extension of those organizing principles. Even then a deviation from those rules might produce gaps in understanding of what the codebase does.

And grep cuts right through that in a pretty universal way. What the post describes are just ways to not work against grep to optimize for something ephemeral.


Agree. Not just because it's unfamiliar code, you can also get a feel for how the program/programmer(s) structured the whole thing.


Go to definition and find usages only work one symbol at a time. I use both, but I still use global find/replace for groups of symbols sharing the same concept.

For example if I want to rename all “Dog” (DogModel, DogView, DogController) symbols to “Wolf”, find/replace is much better at that because it will tell me about symbols I had forgotten about.


Jetbrains ReSharper (and Rider) is smart enough to handle these things. It’ll suggest renames across other symbols even ones that have related names


And when it fails to find all the related names, you don't know what it missed.


For that use case I think you can use treesitter[1] you can find Dog.* but only if it is a variable name, for example. Avoiding replacement inside of say literals.

[1] https://www.youtube.com/watch?v=MZPR_SC9LzE


There's no reason they have to work one symbol at a time - that's just a missing feature in your language server implementation.

Some language servers support modifying the symbols in contexts like docstrings as well.


I’ve never seen an LSP server that lets you rename “Dog” to “Wolf” where your actual class names are “Dog[A-Za-z]*”?

Do you have an example?


Neither have I; and no, I don't - I misinterpreted what you said.

But I don't see why LSP servers shouldn't support this, still. I'm not sure if the LSP specification allows for this as of current, though.


I would actually love a regexp search-and-replace assisted by either TreeSitter or LSP.

Something that lets me say that I want to replace “Dog\(.*\)” with “Wolf\1”, but where each substitution is performed only within single “symbols” as identified by TS or LSP.



IntelliJ's refactor tool?


IntelliJ doesn't use LSP as far as I know.

It does usually make that kind of DogModel -> WolfModel refactoring.


I am familiar with the situation you describe, and it's a good point.

However, it does suggest that there is an opportunity for factoring "Dog" out in the code, at least by name spacing (e.g. Dog.Model).


That gets to the core of the issue doesn’t it? There are two cultures: Do you prefer to refactor DogView into Dog.View, or do you prefer to refactor Dog.View into DogView.

Personally I value uniqueness/canonicalness over conciseness. I would rather have DogView because then there is one name for the symbol regardless of where I am in the codebase. If the same symbol is used with differently qualified names it is confusing - I want the least qualified name to be more descriptive than “View”.

The other culture is to lean heavily on namespaces and to not worry about uniqueness. In this case you have View and Dog.View that may be used interchangeably in different files. This is the dominant culture in Java and C#.


The second culture that you describe happens also to be how OCaml structures things in modules. It's quite a turnoff for me.


That really depends on the context, and specific situation.


Not everything you need to look for is a language identifier. I often grep for configuration option names in the code to see what the option actually does - sometimes it is easy to grep, sometimes there are too many matches, sometimes they cannot be found because option name composed in the code from separate unrepeatable (because of too many matches) parts. It's not hard to make config options greppable but some coders just don't care about this property.


strongly disagree here. This works if - your IDE/language server is performant - all the tools are fully set up - you know how to query the specific semantic entity you're looking for (remembering shortcuts) - you are only interested in a single specific semantic entity - mixing entities is rarely supported

I dont map out projects in terms of semantics, I map out projects in files and code - That makes querying intuitive and I can easily compose queries that match the specificity of what I care about (e.g. I might want to find a `Server` but I want to show both classes, interfaces and abstract classes).

For the specific toolchain I'm using - typescript - the symbol search is also unusable once it hits a certain project size, it's just way too slow for it to be part of my core workflow


Unfortunately in larger codebases or dynamic languages these tools are just not good enough today. At least not those I and my employers have tried.

They're either incomplete (you don't get ALL references or you get false references) or way too slow (>10 seconds when rg takes 1-2).

Recommendations are most welcome.


Only thing I can recommend is using C# (obviously not always possible). Never had an issue with these functions in Visual Studio proper no matter how big the project.


I can't use an IDE on my entire git history, but git can grep.


On the flipside, IDE's can turn you into lazy, inefficient programmers by doing all the hand-holding for you.

If your feelings are anemic when tasked with doing a grep, its because you have lost a very valuable skill by delegating it to a computer. There are some things the IDE is never going to be able to find - lest it becomes the development environment - so keeping your grep fu sharpened is wise beyond the decades.

(Disclaimer: 40 years of software development, and vim+cscope+grep/silversearcher are all I really need, next to my compiler..)


    > lazy... programmers
Since when was that a bad thing? Since time immemorial, it has been hailed as a universal good for programmers to be lazy. I'm pretty sure Larry Wall has lots of jokes about this on Usenet.

Also, I can clearly remember switching from vim/emacs to Microsoft Visual Studio (please, don't throw your tomatoes just yet!). I was blown away by IntelliSense. Suddenly, I was focusing more on writing business logic, and less time searching for APIs.


This is the wrong type of lazy.

Command line tools like grep are force multipliers for programmers. GUI's come with the risk of not being able to learn how to leverage this power. In the end, that often leads to more manual work.

And today, bash is a lingua franca that you can bring with you almost everywhere. Even Windows "speaks" bash these days, with WSL.

In itself, there's nothing wrong with using the built-in features of a GUI. Right-clicking a method (or using a keyboard shortcut) to find the definition in a given code base IS nice for that particular operation.

But by knowing grep/awk/find/git command line and so on, combined with bash scripting and advanced regular expressions, you open up a new world of possibilities.

All those things CAN be done using Python/C#/Java or whatever your language is. But a 1-liner in bash can be 10-100 lines of C#.


Where does this stupid notion come from that using powerful tools means you can't handle the less powerful ones anymore? Did your skills with a hand screwdriver atrophy when you learned how to use a powered screwdriver? Come on.

I use grep multiple times a day. I write bash scripts quite often. I'm not speaking from a position of ignorance of these tools. They have their place as a lowest common denominator of programming tools. But settling for the lowest common denominator is not a path to productivity.

Doesn't mean you should forget your skills, but it does mean you should investigate better tools. And leverage them. A lot.

> But a 1-liner in bash can be 10-100 lines of C#.

Yes. And the reverse is also true. bash is fast and easy if there's an existing tool you can leverage, and slow and hard when there's not.


Yes? That's a pretty basic phenomenon. When I started using calculators, my mental math declined. When I started typing, my handwriting got way worse.

The OP was saying "use IDEs but don't stop using lower tech tools, because they are powerful"


Precisely. Though there is a caveat.

Every person, including devlopers, have some constraints to what they're able to learn and use effectively. Those limits vary a lot from person to person, though.

For developers who learn technology a bit slowly (compared to some other developers, not the general population), some of these tools may not be worth the effort.

Also, these developers aren't necessarily low tier in terms of business value. They may have talents when it comes to understanding and communicating business requirements with other stakeholders in their organization, and their technical skills may be secondary to those skills and abilities.

BUT: For the general audience at HN, technical capability is central to their identity. Most people here have some capacity to learn technologies that go somewhat beyond the minimum skills required for a tech job. And for this audience, being confident on the linux/unix command line is generally worth the effort.


I count the IDE and stuff like LSP as natural extensions of the compiler. For sure I grep (or equivalent) for stuff, but I highly prefer statically typed languages/ecosystems.

At the end of the day, I'm here to solve problems, and there's no end to them -- might as well get a head start.


> If your feelings are anemic

I'm not feeling anemic. The tool is anemic, as in, underpowered. It returns crap you don't want, and doesn't return stuff you do want.

My grep-fu is fine. It's a perfectly good tool if you have nothing better. But usually you do have something better.

Using the wrong tool to make yourself feel cool is stupid. Using the wrong tool because a good tool could make you lazy shows a lack of respect for the end result.


Leveraging technology is good thing


Huh? I have an old hand-powered drill from my Grandpa in my workshop. I used it once for fun. For all other tasks I use a powered drill. Same for IDEs. They help your refactor and reason about code - both properties I value. Sure, I could print it and use a textmarker, but I'm not Grandpa


Knowing the bash ecosystem translates better to how you use the knife in the kitchen.

Sure you can replace most uses of a knife with power tools, but there is a reason why most top chefs still rely on that knife for most of those tasks.

A hand powered drill is more like a hand powered meatgrinder. It has the same limitation as the powered versions, and is simply a more primitive version.


This breaks down at scale and across languages. All the FAANGs make heavy use of the equivalent of grepping in their code base


True, but IDEs are fragile tools. Sometimes you want to fall back to simpler tools that will always work, and grep is not fragile.


The basis if this article (and its forebear "Too DRY - The Grep Test"[1]) is that grep is fragile. It's just fragile in a way that's different from the way that IDEs are fragile.

1. <http://jamie-wong.com/2013/07/12/grep-test/>


Even with IDEs, I find that I grep through source trees fairly often.

Sometimes it's because I don't completely trust the IDE to find everything I'm interested in (justifiably; sometimes it doesn't). Sometimes it's because I'm not looking to dive into the code and do serious work on it; I'm just doing a quick drive-by check/lookup for something. Sometimes it's because I'm ssh'd into another machine and I don't have the ability to easily open the sources in an IDE.


I've come to really like language servers for big personal and work projects where I already have my tools configured and tuned for efficiently working with it.

But being able to grep is really nice when trying to figure out something out about a source tree that I don't yet have set up to compile, nor am I a developer of. I.e., I've downloaded the source for a tool I've been using pre-built binaries of and am now trying to trace why I might be getting a particular error.


posts like this sound like the author routinely solves harder problems than you are, because the solutions you suggest don't work in the cases the post is about. we've had 'go to definition' since 01978 and 'find usages' since 01980, and you should definitely use them for the cases where they work


From the article,

- dynamically built identifiers is 100% correct, never do this. Breaks both text search and symbol search, results in complete garbage code. I had to deal with bugs in early versions of docker-compose because of this.

- same name for things across the stack? Shouldn't matter, just use find usages on `getAddressById`. Also easy way to bait yourself because database fields aren't 1:1 with front-end fields in anything but the simplest of CRUD webshit.

- translation example: the fundamental problem is using strings as keys when they should be symbols. Flat vs nested is irrelevant here because you should be using neither.

- react component example: As I mentioned in another comment, trivially managed with Find Usages.

Nothing in here strikes me as "routinely solves harder problems," it's just standard web dev.


yes, i agree that standard web dev is full of these problems, which can't be solved with go-to-definition and find-usages. it's a mess. i wasn't claiming that these messy, hard problems where grep is more helpful than etags are exotic; they are in fact very common. they are harder than the problems lucumo is evidently accustomed to dealing with because they don't have correct, complete solutions, so we have to make do with heuristics

advice to the effect of 'you should not make a mess' is obviously correct but also, in many situations, unhelpful. sometimes i'm not smart enough to figure out how to solve a problem without making a mess, and sometimes i inherit other people's messes. in those situations that advice decays into 'you should not try to solve hard problems'


> they are harder than the problems lucumo is evidently accustomed to dealing with because they don't have correct, complete solutions, so we have to make do with heuristics

Funny.

But since you asked. The hardest problems I've solved haven't been technical problems for years. Not that I stopped solving technical problems, or that I started solving only the easier problems. I just learned to solve people problems more.

People problems are much harder than technical problems.

The author showed a simple people problem: someone who needs to know about better tooling. If we were working together, showing them some tricks wouldn't take much time and would improve their productivity.

An example of a harder problem is when someone tries to play aggressive little word games with you. For example, trying to put you down by loudly making assumptions about your career and skills. One way to deal with that is to just laugh it off. Maybe even make a self-deprecating joke. And then continuing as if nothing happened.

But that assumes you want or have to continue working productively with them. If you don't, it can be quite enjoyable to just laugh in their face. After all, it's never the sharpest tool in the shed, or the brightest light that does that. In fact, it's usually the least useful person around, who is just trying to hide that fact. Of course, once you realize that, it becomes hard to laugh, because it's no longer funny. Just sad and pitiful.


> look! i already told you! i deal with the god damned customers so the engineers don't have to! i have people skills! i am good at dealing with people! can't you understand that? what the hell is wrong with you people?

(office space, https://www.youtube.com/watch?v=hNuu9CpdjIo)

look, lucumo, i'm sure you have excellent people skills. which is why you're writing five-paragraph power-trip-fantasy comments on hn about laughing in people's faces as you demonstrate your undeniable dominance over them, then take pity on them. but i'm not sure those comments really represent a contribution to the conversation about code greppability; they're just ego defense. you probably should not have posted them


(edited to remove things that could be interpreted as a personal attack, since i've gotten feedback that my previous phrasing was too inflammatory)

you aren't the first person i've seen expressing the transparently nonsensical sentiment that 'people problems are much harder than technical problems'. i've seen it over and over again for decades, but i've never seen a clear and convincing explanation of why it's nonsense; i think this is worth discussing in some depth

an obvious thing about both people problems and technical problems is that they both cover a full spectrum of difficulty from trivial to impossible. a trivial people problem is buying a soft drink at a convenience store†. a trivial technical problem is tying your shoes. an impossible people problem is ending poverty. an impossible technical problem might be finding a polynomial-time decision procedure for an np-complete problem, or perhaps building a perpetual-motion machine, or a black-hole generator. both kinds of problems have every degree of difficulty in between, too. stable blue leds seemed like an impossible technical problem until shuji nakamura figured out how to make them. conquering asia seemed like an impossible people problem until genghis khan did it

even within the ambit of modifying a software system, figuring out what parts of the code are affected by a possible change, there are trivial technical problems and problems that far exceed current human capacities. nobody knows how to write a bug-free web browser or how to maintain the linux kernel without introducing new bugs

given this obvious fact, what are we to make of someone saying, 'people problems are much harder than technical problems'? obviously it isn't the case that all people problems are much harder than all technical problems, given that some people problems are easy, and some technical problems are impossible. and if we interpret it as meaning that some people problems are much harder than some technical problems, it's a trivial tautology which would be just as true if we reversed the terms to say '[some] technical problems are much harder than [some] people problems'. so nobody would bother making the effort to say it unless they thought someone was asserting the equally ridiculous position that all people problems were easier than technical problems

the most plausible interpretation is that it means that the people problems the speaker is most familiar with, and therefore considers typical, are much harder than the technical problems the speaker is most familiar with. it's not a statement about the world; it's a statement about the author and the environment they're familiar with

we can immediately deduce from this that you are not andrew wiles, who spent six years working alone on a technical problem which had eluded the world's leading mathematicians for some 350 years, for the solution of which he was appointed a knight commander of the order of the british empire and awarded the abel prize, along with a long list of other prizes. you give the appearance of being so unfamiliar with such difficult technical problems that you cannot imagine that they even exist, though surely with a little thought you can see that they do. in any case, for a very long time, you have not been working on any technical problems that seem impossible to you. i believe you that it's not that you started solving only the easier problems; that means that all the problems you ever solved were the easier problems

or, more briefly, you aren't accustomed to dealing with difficult technical problems

perhaps we can also infer that you frequently handle very difficult people problems—perhaps you are a politician or a clinical psychologist in a mental institution, or you have family members with serious mental illness. however, other aspects of your comment make that seem relatively unlikely

______

† if you have no money or don't speak the local language, this people problem becomes less trivial


"you aren't the first person i've seen expressing the transparently nonsensical sentiment that 'people problems are much harder than technical problems'."

No, it's not "transparently nonsensical" -- it expresses a common human experience that techies who are at least somewhat self-aware (obviously that excludes you) have had at work. Their education gave them a toolbox for approaching technical problems, but no training in the people problems.

It's not remotely saying that all technical problems are easy.


[flagged]


I don’t know what you expected. You did the HN equivalent of a model calling another model “fat”.


what do you mean? people problems are not my strong point


with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

with a legacy codebase, or a fork of a dependency that had to be patched which uses an incompatible buildsystem, or any C/C++/obj-c/etc that heavily uses the preprocessor or nonstandard build practices, or codebases that mix lots of different languages over awkward FFI boundaries and so on and so forth -- there are so many situations where sometimes an IDE just can't get you 100% of the way there and you have to revert to grepping to do any real work

that being said, I don't fully support the idea of handcuffing your code in the name of greppability, but I think dismissing it as a metric under the premise that IDEs make grepping "obsolete" is a little bit hasty


> with all due respect, it sounds like you have the privilege of working in some relatively tidy codebases (and I'm jealous!)

I wish, but no. I've found people will make a mess of everything. Which is why I don't trust solutions that rely on humans having more discipline, like what this article advocates.

In any situation where grep is your last saviour, you cannot rely on the greppability of the code. You'll have to check and double check everything, and still accept the risk of errors.


Working on a 32MLOC project, text search is still the quickest way to find a hook that gets you to the deeper investigation. From there, finding definitions/usage definitely matters.

You can maybe skip the greppability if the code base is of a size that you can hold the rough shape and names in your head, but a "get a list of things that sound like they might be related to my problem" operation is still extremely helpful. And it's also worth keeping in mind that greppability matters to onboarding.

Does that mean it should be an overriding design concern? No. But it does mean that if it's cheap to build greppable, you probably should, because it's a net positive.


Sure, if you have the luxury of having a functional IDE for all of your code.

You can't imagine how much faster I was than everybody else at answering questions about a large codebase just because I knew how to use ripgrep (on Windows). "Knowing how to grep" is a superpower.


A bit on the other side of the argument, I use grep plus find plus some shell work to do source code analysis for security reviews. grep doesn't really understand the syntax of languages, and that is mostly OK.

I've used this technique on auditing many code bases including the C family, perl, Visual Basic, C# and SQL.

With this sort of tool, I don't need to look for language-particular parsers--so long as the source is in a text file, this works well.


IDEs are cool and all, but there is no way I'm gonna let VSCode index my 80GB yocto tmp directory. Ctags can crunch the whole thing in a few minutes, and so can grep.

Plus there are cases where grep is really what you need, for example after updating a particular command line tool whose output changed, I was able to find all scripts which grepped the output of the tool in a way that was broken.


It seems like the law of diminishing returns; while I'm sure in a few cases this characteristic of a code writing style is extremely useful, it cuts into other things such as readability and conciseness. Fewer lines can mean fewer bugs, within reason, if you aren't in lisp and are using more than 3 parentheses, you might want to split it up because the compiler/JIT/interpreter is going to anyway.


Honestly, in my 18 years of software development, I haven't "greped" code once.

I only use grep to filter the output of CLI tools.

For code, I use my IDE or repository features.


Do you use the find feature in your IDE? IE not find by reference, just text matching? That's the same as greppability.


Interface-heavy languages break IDEs. In .NET at least, "go to definition" jumps you to the interface definition which you probably aren't interested in (vs. the specific implementation you are trying to dig into). Also with .NET specifically XAML breaks IDE traceability as well.


I can run rg over my project faster than I can do anything in my IDE. Both tools have their places.


Definitely true when you can use static typing.

Unfortunately sometimes you can't, and sometimes you can but people can't be arsed, so this is still a consideration.


"A good IDE"

I am also waiting for world peace! ; )


I tried a good IDE recently: Jetbrains IntelliJ and Webstorm. Considered the topdog of IDEs. Was working on a typescript project which uses npm link to symlink another local project into the node_modules of current project.

The great IDEs IntelliJ and Webstorm stopped autosuggesting completions from the symlinked project.

Open up Sublime Text again. Worked perfectly. That is why Jetbrains and their behemoth IDEs are utter shite.

Write your code to have symmetry and make it easy to grep.


>I tried a good IDE recently: Jetbrains IntelliJ

Having dealt with IntelliJ for 3 years due to education stuff - I laughed out here. Even VS is better than ideaj.


Your observation does not help with the majority of the points in the article. How do you find all usages of a parameter value literal?


By not using literals everywhere. All literals are defined somewhere (start of function, class etc) as enums or vars and used.

Just because I have 20 usage of 'shipping_address' doesn't mean I'll have this string 20 times in different places.

Grep has its place and I often need to grep code base which have been written without much thoughts towards DX. But writing it nicely allows LSP to take over.


This is what the article starts with: "Even in projects exclusively written by myself, I have to search a lot: function names, error messages, class names, that kind of thing."

All of that is trivial to search for with a tool that understands the language.


> All of that is trivial to search for with a tool that understands the language.

Isn't string search, or grepping for patterns, even more trivial? So what is your argument? You found an alternative method, good, but how is it any better?

In my own case, I wrote a library that we used in many projects, and I often wanted to know where and how functions from my lib were used in those projects. For example, to be able to tell how much of an effort it would be for the users to refactor when I changed something. However, your method of choice at least with my IDE (Webstorm) only worked locally within the project. Only string search would let me reliably and easily search all projects.

I actually experimented creating a "meta" project of all projects, but while it worked that lead to too many problems, and the main method to find anything still was string search (CTRL-SHIFT-F Find dialog in IDEA IDEs is string search and it's a wonderful dialog in that IDE family). I also had to open that meta project. Instead, I created a gitignored folder with symlinks to the sources of all the other projects and created a search scope for that folder, in which the search dialog let me string-search all projects' sources at once right from within the library project and still being able to use the excellent Find dialog.

In addition, I found that sometimes the IDE would not find a usage even within the project. I only noticed because I used both methods, and string search showed me one or two places more than the method that relied on the underlying code-parsing. Unfortunately IDEs have bugs, and the method you suggests relies on much more work of the IDE in parsing and indexing compared to the much more mundane string or string pattern search.


> Isn't string search, or grepping for patterns, even more trivial?

It's not trivial when you looking for symbols in context.

> the method you suggests relies on much more work of the IDE in parsing and indexing compared to

...compared to parsing and indexing you have to do manually because a full-text search (especially in a large codebase) will return a lot of irrelevant info?

Funnily enough I also have a personal anecdote. We had a huge PHP code base based on Symfony. We were in the middle of a huge refactoring spree. I saw my colleagues switch from vim/emacs to Idea/WebStorm looking at how I easily found symbols in the code base, found their usages, refactored them etc. compared to the full-text search they were always stuck with.

This was 5-6 years ago, before LSP became ubiquitous.


> It's not trivial

Did you miss the comparison? The "more trivial"? The context of my response? Please read the parent comment I responded to, treating my comment as standalone and adding some new meaning makers no sense.

String search is more trivial than a search that involves an interpretation of the code structure and meaning. I have no idea why you wish to start a discussion about such trivial statement.

> * because a full-text search (especially in a large codebase) will return a lot of irrelevant info?*

It doesn't do that for me but instead works very well. I don't know what you do with your symbol names, but I have barely any generic function names, the vast majority of them are pretty unique.

No idea how you use search, but I'm never looking for "doSomething(", it's always "doSomethingVerySpecific()", or some equally specific string constant.

I don't have the problems you tell me I should have, and my use case was the subject of my comment, as should be clear, as well as my comment being a response to a specific point made by the parent comment.


> All of that is trivial to search for with a tool that understands the language.

Some literal in a log message may come from the code or it might be remapped in some config file outside the language the LSP is looking at, or an environment variable etc.. I just go back and forth with grep and IDE tools, both have different tradeoffs.


The thing is, so many people are weirdly obsessed with never using any other tools besides full-text search. As if using useful tools somehow makes them a lesser programmer or something :)


I actually don't think there's a tool that handles usages when using PHP varvars or when using example number one there which is parametrically choosing a table name.

When you string interpolate to build the name you lose searchability.


Yes, full-text search is a great fallback when everything else fails. But in the use cases listed at the beginning of the article it's usually not needed if you have proper tools


> Honestly, posts like this sound like the author needs to invest some time in learning about better tools for his language. A good IDE alone will save you so much time.

Completely agreed. The React component example in the article is trivial solvable with any modern IDE; right click on class name, "Find Usages" (or use the appropriate hotkey, of course). Trying to grep for a class name when you could just do that is insane.

I mainly see this from juniors who don't know any better, but as seen in this thread and the article, there are also experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.


I worked on codebases large enough where enabling autocomplete/indexing would lock the IDE and cause the workstation to swap hard.


That's a problem of code organisation though. Large codebases should be split into multiple repos. At the end of the day code structure is not something to be decided only by compilation strategy, but by developer ergonomics as well. A massive repo is a massive burden on productivity.


> experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.

Before calling people stubborn or assuming they got left behind out of ignorance, consider your assumptions. 40+ years experience, senior in both experience and age at this point. Long-term vim + command line tools user.

Do you have any evidence that shows "A good IDE alone will save you so much time?" Have you seen studies comparing productivity or code quality or any metric written by people using IDEs vs those using a plain editor with grep?

By "so much faster" what do you mean exactly? I have decades of experience with vim + ctags + grep (rg these days, because I don't want to get called a stubborn stick in the mud). I can find and change things in large codebases pretty fast. I used VSCode for a year on the same codebases and I didn't feel "so much faster," and I committed to it and watched numerous how-to videos and learned the tool well enough to train other programmers on it. No 10x improvement, not even 1.5x. For most tasks I would call it close to the same in terms of time taken to write code. After getting burned a couple times with "Replace symbol" in VSCode I stopped trusting it. After noticing the LSP failed to find some references I trusted it less. I know grep/ack/rg/ctags aren't perfect, but I also know their weaknesses and how to work with them to get them to do what I want. After a year I went back to vim + ctags + rg.

We might have more productive (and friendly) interactions as programmers if we remembered that not everyone works the same way, or on the same kind of code and projects. What we call "best practices" or "modern tools" largely come down to familiarity, received wisdom, opinion, and fashion -- almost never from rigorous metrics and testing. You like your IDE? Great! I like my tools too. Would either of us get "so much faster" using a different set of tools? Probably not. Trying to find the silver bullet that reduces accidental complexity in software development presents an ongoing challenge, but history shows that editors and IDEs don't do much because if they did programmers today would outperform old guys like me by 10x in a measurable way.

At the last full-time job I had, at an educational software company with 30+ programmers, everyone used Eclipse. My first day I got a new desktop with two big monitors, Eclipse installed, ready to go. I installed vim and the CLI subversion client and some other stuff and worked from the command line, as I usually do. I left one of the monitors off, I don't need that much screen space, and I don't have Twitter and Facebook and other junk running on a second monitor all day like most of the other people did. I got made fun of, old man using old tools. Then once a week, like clockwork, Eclipse would auto-install some updates and everyone came to a halt trying to resolve plugin version conflicts, getting the team in sync. Hours and hours wasted regularly just getting the IDE to work. That didn't affect me, I never opened Eclipse. Watching the other programmers it seemed really slow. So just maybe Eclipse could jump to a definition faster than vim + ctags (I doubt it), but amortized over a month Eclipse all by itself wasted more time than anyone possibly saved with the more powerful tool. Anecdote, I know, but I've seen this play out in similar ways at more than one shop.

Just last year a new hire at a place I freelance for spent days trying to get Jetbrains PHPStorm working on a shared remote dev server. Like VSCode it runs a heavy process on the server (including the LSP). Unlike VSCode, PHPStorm can actually kill the whole server, wasting everyone's time and maybe losing work. I have never seen vim or grep bring a whole server down. I could add up how much "faster" PHPStorm might turn out compared to vim, but it will have to recoup the days lost trying to get it to work at all first.


I'm not arguing that a modern IDE is superior to vim + ctags, I'm arguing that working with strings rather than symbols is only done by either the naive or the stubborn. I basically have a checklist that I work with new people on:

1. do you have the application working locally (for some context-specific meaning of locally, as this heavily depends on what you're doing);

2. can you run the tests, or at least some meaningful subset of tests;

3. does the source code not report errors (it is INSANE how many juniors will ignore big red squiggly lines from imported libraries which haven't been set up properly);

4. can you attach a debugger;

5. can you navigate the source code (i.e. symbols, go to definition, find usages).

If you can do all this, it doesn't matter if you're using vim or VSCode or JetBrains. The OP of the article failed at least number 5 and probably others as well because they haven't set up their development environment properly and are resorting to string-based tooling to get around that. Trying to enforce nonstandard naming conventions to get around this (per the JavaScript example) shows lack of experience.


I read the original article differently. Writing code in a way that makes it easier to search doesn’t mean not using an IDE, tests, etc. It doesn’t mean not using an LSP or ctags to find symbols. It just adds to the overall consistency and ease of reading and searching the code.

I work with legacy code that very often doesn’t have a development environment or unit tests, often not even version control. Searching the code for strings and symbols (a subset of strings) gets more important in an unfamiliar code base. So does jumping to symbols and debugging, but nothing the OP wrote implies they only use grep


> I read the original article differently. Writing code in a way that makes it easier to search doesn’t mean not using an IDE, tests, etc. It doesn’t mean not using an LSP or ctags to find symbols. It just adds to the overall consistency and ease of reading and searching the code.

Consistent and searchable code is great, but the article author picked awful ways to achieve that. Adding verbosity and nonstandard field names (which will require a bunch of custom linter rules...) to support someone who doesn't have a proper development environment set up is silly. It reminds me of the arguments from 00s Java developers who insisted that having to type out class names on the left, i.e. `EnterpriseBeanFactoryServerImpl foo = new EnterpriseBeanFactoryServerImpl()` made code more readable, but it's really because they didn't have a dev environment with working type inspection set up. It's a real problem, but other than their point about not using dynamically generated identifiers, the author presents totally the wrong solution.

> I work with legacy code that very often doesn’t have a development environment or unit tests, often not even version control. Searching the code for strings and symbols (a subset of strings) gets more important in an unfamiliar code base. So does jumping to symbols and debugging, but nothing the OP wrote implies they only use grep

Very true, but IMO if you're working with legacy code where tests don't run, there's no source control, and you can't get a LSP working, worrying about whether a class is called `SpecificAttributeAlertDialog` or just `Dialog` is plugging holes in the Titanic at that point.


We can agree that some code bases and developer environments seem better than others. We don’t always have control over that.

The OP described naming things consistently so you don’t have Invoice and customer_inv etc. scattered around referring to the same thing. I think most programmers understand that as a good practice. And the OP mentioned the problem with constructing symbols dynamically. I agree with that. Your examples address different problems.

I took issue with your statement about “experienced engineers who are stubborn and refuse to use tools made after 1990 for some reason.” If the tools made after 1990 have clear and measurable advantages, then I would agree that not using them might come from stubbornness. But you didn’t offer any examples of that, or metrics and experiments, just your opinions and preferences. I’m skeptical of such opinions and prescriptions —. the “clean code” type rules presented as scripture without supporting evidence. OP wrote about some practical considerations that make sense to me.


The second point here made me realize that it'd be super useful for a grep tool to have a "super case insensitive" mode which expands a search for, say, "FooBar|first_name" to something like /foo[-_]?bar|first[-_]?name/i, so that any camel/snake/pascal/kebab/etc case will match. In fact, I struggle to come up with situations where that wouldn't be a great default.


Hey, I just created a new tool called Super Grep that does exactly what you described.

I implemented a format-agnostic search that can match patterns across various naming conventions like camelCase, snake_case, PascalCase, kebab-case. If needed, I'll integrate in space-separated words.

I've just published the tool to PyPI, so you can easily install it using pip (`pip install super-grep`), and then you just run it from the command line with `super-grep`. You can let me know if you think there's a smarter name for it.

Source: https://www.github.com/msmolkin/super-grep


You should post this as a Show HN! But maybe wait a while (like a couple weeks or something) for the current thread to get flushed out of the hivemind cache.

If you do, email a link to hn@ycombinator.com and we'll put it in the second-chance pool (https://news.ycombinator.com/pool, explained at https://news.ycombinator.com/item?id=26998308), so it will get a random placement on HN's front page.


Wow, thanks so much for the encouragement and advice, dang! I'm honored to receive a personal response from you and so soon after posting. I really appreciate the suggestion to post this as a Show HN. If I end up doing it, I'll definitely wait a bit–thanks for that suggestion, as I would have thought to do the opposite otherwise. Nice of you to offer to put it in the second-chance pool as well.


pretty cool and to me a better approach than the prescriptive advice from the OP. to me the crux of the argument is to make the code more readable from a popular tool. but if this can be well-integrated into common ide (or even grep perhaps), it would take away most of the argument down to personal preference.


wow this is so cool!! it feels super amazing to dump a random idea on HN and then somebody makes it! i'm installing python as we speak just so i can use this.


Adding to that, I'm often bitten trying to search for user strings because they're split across lines to adhere to 80 characters.

So if I'm trying to locate the error message "because the disk is full" but it's in the code as:

  ... + " because the " + 
    "disk is full")
then it will fail.

So really, combining both our use cases, what would be great is to simply search for a given case-insensitive alphanumeric string in files that skips all non-alphanumeric characters.

So if I search for:

  Foobar2
it would match all of:

  FooBar2
  foo_bar[2]
  "Foo " + \
    ("bar 2")
  foo.bar.2
And then in the search results, even if you get some accidental hits, you can be happy knowing that you didn't miss anything.


These are both of the problems I regularly have. The first one I immediately saw when reading the title of this submissionw as the "super case insensitive" that I often see when working on Go Codebases particularly when using a combination of Go Classes and YAML or JSON. Also happens with command line arguments being converted to variables.

But the string split thing you mentioned happens a lot when searching for OpenStack error messages in Python that is often split across lines like you showed. My current solution is to randomly shift what I'm searching for, or try pick the most unique line.


fwiw I pretty frequently use `first.?name` - the odds of it matching something like "FirstSname" are low enough that it's not an issue, and it finds all cases and all common separators in one shot.

(`first\S?name` is usually better, by ignoring whitespace -> better ignores comments describing a thing, but `.` is easier to remember and type so I usually just do that)


> "super case insensitive"

lets say someone would make a plugin for their favorite IDE for this kind of search. How would the details look like?

To keep it simple, lets assume we just do the super-case-insensitivity, without the other regex condition. Lets say the user searches for "first_name" and wants to find "FirstName".

one simple solution would be to have a convention where a word starts or ends, e.g. with " ". So the user would enter "first name" into the plugin's search field. The plugin turns it into "/first[-_]?name/i" and gives this regexp to the normal search of the IDE.

another simple solution would be to ignore all word boundaries. So when the user enters "first name", the regexp would become "/f[-_]?i[-_]?r[-_]?s[-_]?t[-_]?n[-_]?a[-_]?m[-_]?e[-_]?/i". Then the search would not only be super-case-insensitive, but super-duper-case-insensitive. I guess the biggest downside would be, that this could get very slow.

I think implementing a plugin like this would be trivial for most IDEs, that support plugins.

Am I missing something?


Hm I'd go even simpler than that. Notably, I'd not do this:

> So the user would enter "first name" into the plugin's search field.

Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.

Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)


> "first_name" or "firstName"

Ya. Query tokenizer would emit "first" and "name" for both. That'd be neat.


Shame on me for jumping past the simple solutions, but...

If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.

Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.


IIUC, you're not missing anything though your interpretation is off from mine*. He wasn't saying it'd be hard, he was saying it should be done.

* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.


The best way would be to make an escape code that matches zero or one punctuation.

So you's search for "/first\_name/i".


That already exists as "?" and was used in their example:

  /first[-_]?name/i
Or to use your example, just checking for underscores and not also dashes:

  /first_?name/i
Backslash is already used to change special characters like "?" from these meanings into just "use this character without interpreting it" (or the reverse, in some dialects).


It would be a mistake to try to solve this problem with regexes.


This reminds me of the substitution mode of Tim Pope's amazing vim plugin [abolish](https://github.com/tpope/vim-abolish?tab=readme-ov-file#subs...)

Basically in vim to substitute text you'd usually do something with :substitute (or :s), like:

:%s/textToSubstitute/replacementText/g

...and have to add a pattern for each differently-cased version of the text.

With the :Subvert command (or :S) you can do all three at once, while maintaining the casing for each replacement. So this:

textToSubstitute

TextToSubstitute

texttosubstitute

:%S/textToSubstitute/replacementText/g

...results in:

replacementText

ReplacementText

replacementtext


The Emacs replace command[1] defaults to preserving UPCASE, Capitalized, and lowercase too.

[1] https://www.gnu.org/software/emacs/manual/html_node/emacs/Re...


Of course it does. Or it wouldn’t be Emacs


Also just realised while looking at the docs it works for search as well as replacement, with:

:S/textToFind

matching all of textToFind TextToFind texttofind TEXTTOFIND

But not TeXttOfFiND.

Golly!


In vim, I believe there's a setting that you can flip to make search case sensitive.

In my setup, `/foo` will match `FoO` and so on, but `/Foo` will only match `Foo`


There's undoubtedly a setting but if you don't want it on all the time you can always add \c at the end of the search term, like /foo\c to denote case insensitivity


I think Nim has this?


Nim comes bundled with a `nimgrep` tool [0], that is essentially grep on steroids. It has `-y` flag for style insensitive matching, so "fooBar", "foo_bar" and even "Foo__Ba_R" can be matched with a simple "foobar" pattern.

The other killer feature of nimgrep is that instead of regex, you can use PEG grammar [1]

  [0] - https://nim-lang.github.io/Nim/nimgrep.html
  [1] - https://nim-lang.org/docs/pegs.html


Fzf?


Fuzzy search is not the same. For instance, it might by default match not only “FooBar” and “foo_bar” but also e.g. “FooQux(BarQuux)”, which in a large code base might mean hundreds of false positives.


Ideally there'd be some sort of ranking or scoring that would happen to sort by. FooQux(BarQuux) would seemingly rank much lower then FooBar when searching for FooBar or "Foo Bar" but might still be useful in results if ranked and displayed lower.


Indeed, that's a good solution – and I believe e.g. fzf does some sort of ranking by default. The devil is however in the details:

One minor inconvenience is that the scoring should ideally be different per filetype. For instance, Python would count "foo-bar" as two symbols ("foo minus bar") whereas Lisp would count it was one symbol, and that should ideally result in different scores when searching for "foobar" in both. Similarly, foo(bar) should ideally have a lower different score than "foo_bar" for symbol search even though the keywords are separated by the same number of characters.

I think this can be accomodated by keeping a per-language list of symbols and associated "penalties", which can be used to calculate "how far" keywords are from each other in the search results weighted by language semantics :)


Let's say you have a FilterModal component and you're using it like this: x-filter-modal

Improving the IDE to find one or the other by searching for one or the other is missing the point or the article, that consistency is important.

I'd rather have a simple IDE and a good codebase than the opposite. In the example that I gave the worst thing is that it's the framework which forces you do use these two names for the same thing.


My point is that if grep tools were more powerful we wouldn't need this very particular kind of consistency, which gives us the very big benefit of being allowed to keep every part of the codebase in its idiomatic naming convention.

I didn't miss the point, I disagreed with the point because I think it's a tool problem, not a code problem. I agree with most other points in the article.


I advocate for greppability as well – and in Swedish it becomes extra fun – as the equivalent phrase in Swedish becomes "grep-bar" or "grep-barhet" and those are actual words in Swedish – "greppbar" roughly means "understandable", "greppbarhet" roughly means "the possibility to understand"


How many other UNIX commands did the Swedes adopt into their language?

I know that they invented "curl". Do you tar xfz?


We do tar, for xfz I think you have to look to the Slavic languages :)

Anyway, to answer your question:

  $ grep -Fxf <(ls -1 /bin) /usr/share/dict/swedish 
  ack
  ar
  as
  black
  dialog
  dig
  du
  ebb
  ed
  editor
  finger
  flock
  gem
  glade
  grep
  id
  import
  last
  less
  make
  man
  montage
  pager
  pass
  pc
  plog
  red
  reset
  rev
  sed
  sort
  sorter
  split
  stat
  tar
  test
  transform
  vi
:)

[edit]: Ironically, grep in that list is not the same word as the one OP is talking about. That one is actually based on grepp, with the double p. grep means pitchfork.


Pitchfork? As in something that might be used to search a haystack?? How delightful.


Yeah, that’s one type.

Another is for turning soil at a small scale by hand (also called a cultivator, I think).

But they all have somewhat long prongs.


I learned from bash.org that "tar -xzvf" is in German accent for "xtract ze vucking files".


Party pooper checking in: easier to remember is that v is the verbose option in most tools, x and f you already know, z is auto-detected for as long as I remember so you don't need to pass that. Add c for creating an archive and, congratulations, you can now do 90% of the tasks you'll ever want to do with tar, especially defusing xkcd bombs!

(To go for 99%, add t for testing an archive to your repertoire. This is all I ever use; anything else I do with the relevant tools that I already know, like compression settings `tar c . | zstd -19 > my.tar.zstd` or extracting to a folder `cd /to/here && tar x ~/Downloads/ar.tar`. I'm sure tar has options for all this but that's not the one thing it should do and do well.)

I hadn't heard of the German option but I love it, shame really that z is obsolete :(


I mean, you're not wrong. Learning what stuff means is good :) But there's also the part where making up a ridiculous story, pun or such enables it being a very strong mnemonic.

I know v is just the verbose option, though I didn't know z was autodetected.

Way back (~15y or so?) I was reading bash.org just for the jokes cause I was on IRC, I knew what a tar/tar.gz file is, but I had never needed to extract one from the command line (might've been on Windows back then). However, because I remembered the funny joke, the first time I was on a Linux system confronted with a tgz, I knew exactly what to type :)

Honestly to this day, I've never needed to create a tar archive, only to unpack them (when I need to archive+compress files it's usually to send to other people, and I pick zip cause everyone can deal with it). But `tar --help` and `man tar` are there in case I ever might.


A classic


As far as I understood, it was part of the language before.

The german equivalent of the word would be probably "greifbar". Being able to hold something, usually used metaphorically.


Which leads to "begreifbar", which I would explain/translate (badly) with "something is begreifbar if it can be understood".


> able to hold

Would "grasp" work?


Yes. "Grasping for straws."


It's closer to grip


"zu greifen" may best translate to "to grip", but "grip" has different mental connotations in English (it refers to mental stability, not intellectual insight).

The best dual purpose translation of "zu greifen"/"gripe" (German/Scandinavian) meaning "zu begreifen"/"begripe"/"understand" would be "to grasp", which covers both physically grabbing into something and also to understand it intellectually.

All these words stem back to the Proto-Indo-European gʰrebʰ, which more or less completes the circle back to "grep".


related to "grok"?


grok /ɡrɒk/

Origin 1960s: a word invented by Robert Heinlein (1907–88), American author.


I've always related grep to grab


Could I suggest that greppbarhet is more precisely translated as “the ability of being understood”?

(Norwegian here. Our languages are similar, but we miss this one.)


Norwegian still translates grep as "grip"/"grab". I always thought of grepping as reaching in with a hand into the text and grabbing lines. That association is close at hand (insert lame chuckle) for German and English speakers too.


In English that association is going to depend a lot on one's accent; until now I've never associated grep-ing with anything other than using grep! (But, equally, that might just be a me thing.)


It doesn’t sound anything like grip in my accent but for some reason the association has always been there for me. Grabbing or ripping parts from the file.


What about groping? Groping around for text.


So, at the extrem opposite of the esoteric "general regular expression print" that grep stands for with few ever knowing it?


s/general/global


Begreppelijk (begrijpelijk) in Dutch


or "Grijpbaar" (grabbable)


So Dutch/German make "begreif" a verb, for Swedish it is just a noun (that means "concept").

But "begrijpelijk" has a clone: "begriplig". An adverb based on a verb in a foreign dictionary. There is no verb that goes "begreppa", it's just "greppa".


The term concept itself suggests grasping or holding/taking hold of, see the latin verb concipio or adjective conceptus.


Dutch also has a noun ("begrip") meaning "notion" or "understanding".


"Jag kan inte begripa svenska."


Oh, you're right.


And we also have "begrepp", which is also a spin on content and understanding it's content.


Oh, that's like German "begreifen", no? (Which means "to grok".)


Grok is right! I'd translate Swedish "greppbar" directly as "grokkable"; "att greppa" as "to grok".


Which is ironic, given that the article is about making it easier to use grep in order to avoid having to understand anything.


Nah, you've got it backwards. The article isn't about dodging understanding - it's about making it way easier to spot patterns in your code. And that's exactly how you start to really get what's going on under the hood. Better searching = faster learning. It's like having a good map when you're exploring a new city


The article advocates making code harder to understand for the sake of better search. It's like forcing a city to conform to a nice, clean, readable map: it'll make exploring easier for you, at the cost of making the city stop working.


Which of his suggestions would "make the city stop working"?


Graspability. ;)

More customarily: intelligibility.


greppbarhet

Grijpbaarheid

I never saw grep as grijp

I guess I do now

(Dutch btw)


I've seen some pretty wild conditional string interpolation where there were like 3-4 separate phrases that each had a number of different options, something akin to `${a ? 'You' : 'we'} {b ? 'did' : 'will do' } {c ? 'thing' : 'things' }`.

When I was first onboarding to this project, I was tasked with updating a component and simply tried to find three of the words I saw in the UI, and this was before we implemented a straightforward path-based routing system. It took me far too long just to find what I was going to be working on, and that's the day I distinctly remember learning this lesson. I was pretty junior, but I'd later return to this code and threw it all away for a number of easily greppable strings.


Tangential: I love it when UIs say "1 object" and "2 objects". Shows attention to detail.

As opposed to "1 objects" or "1 object(s)". A UI filled with "(s)", ughh


I like the more robotic "Objects: 1" or "Objects: 2", since it avoids the pluralization problems entirely (e.g., in French 0 is singular, but in English it's plural; some words have special when pluralized, such as child -> children or attorney general -> attorneys general). And related to this article, it's more greppable/awkable, e.g. `awk /^Objects:/ && $2 > 10`.


Fun fact - I had to localize this kind of logic to my language (Polish). I realized quickly it's fucked up.

This is roughly the logic:

    function strFromNumOfObjects(n) {
      if (n === 1) {
          return "obiekt";
      }
      let last_digit = (n%10);
      let penultimate_digit = Math.trunc((n%100)/10);
      if ((penultimate_digit == 0 || penultimate_digit >= 2) && last_digit > 1 && last_digit <= 4) {
          return "obiekty";
      }
      return "obiektów";
    }
Basically pluralizing words in Polish is a fizz-buzz problem :) In other Slavic languages it should be similar BTW


Yeah, Slavic languages are hard but there is an ICU standard that's somewhat usable

https://devpal.co/icu-message-editor/?data=Zarezerwowa%C5%82...


Moreso when it's not tripped up by "1 sheeps" or "1 diagnoses".


Sounds like you're going to have a bad time

https://www.foo.be/docs/tpj/issues/vol4_1/tpj0401-0013.html


This is the reason many coding styles and tools (including the Linux kernel coding style and the default Rust style as implemented in rustfmt) do not break string constants across lines even if they're longer than the desired line length: you might see the string in the program's output, and want to search for the same string in the code to find where it gets shown.


My team drives me bonkers with this. They hear the general principle "really long lines of code are bad", but extrapolate it to "no characters shall pass the soft gutter no matter what".

Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines. Now when you see that list of structs, you wonder "why is this one different?" and you have to read carefully to determine, nope, it just contained one longer string. Or god forbid the reformat all the structs to match, turning a 1-page file into 3 pages, and making it so you have to read and understand each element of each struct just to see what's going on.

If I could have written the rule of thumb, I would have said "No logic or control shall happen after the end of the gutter." But if there's a paragraph-long string on one line- who cares?? We all have a single keystroke that can toggle soft-wrap, and the odds that you're going to need to know anything about that string other than "it's a long string" are virtually nil.

Sorry. I got triggered. :-)


Yep this triggers the fuck out of me too. It drives me absolutely insane when I'm taking the time and effort to write good test cases that use inline per test data that I've taken the time to format so it's nice and readable for the next person, then the next person comes along, spends 30 seconds writing some 2 line rubbish to hit a code coverage metric, then spends another 60 seconds adding a linter rule that blows all the test data out to 400 lines of unreadable dogshit that uses only the left 15% of screen real estate.


I routinely spot 3-line prints with the string on its own line in our code. Even for cases where the string + print don't even reach the 80 character "limit"


My team also had a similar thing in place. I am saving this article in my pocket saves, so that I can give "proofs" of why this is better

From Zen of Python: ``` Special cases aren't special enough to break the rules. Although practicality beats purity. ``` https://peps.python.org/pep-0020/


This is why autoformatters that frob with line endings are just terrible and fundamentally broken.

I'm fairly firmly in the "wrap at 80" camp by the way; but sometimes a tad longer just makes sense. Or shorter for that matter: forced removal of line breaks is just as bad.


80 feels really impractically narrow. A project I work on uses 110 because it's approximately the widest you can comfortably compare two revisions on the same monitor, or was for some person at some time, and I can live with it, but any less would just feel so cramped. A few indentation levels deep and I'd be writing newspaper columns.


80 lines wide is the width we had back in the late 90s. Displays are nothing like that anymore. I managed to talk my team into setting the linter to something more reasonable, but individuals still feel like they're being virtuous if they stick to 80 and reformat any line they touch that goes over. It's just dogma!


There is usually a way to restructure the code so that it doesn't have multiple levels of nested indentation, which is a good practice IMO because it makes the code easier to read.


80 _is_ impractically narrow. Even 120 is overly strict. SLoC line length isn't something that can or should be enforced by linters, or re-formatted by formatters.


This is world autoformatters have wrought. The central dogma of the autoformatter is that "formatting" is based on dumb syntactic rules with no inflow of imprecise human judgements.


Most autoformatters do not reformat string constants as GP has said, and even if they did, this is something that can be much more accurately and correctly specified with an AF than with a human.

Autoformatting collectively saves probably close to millions of work hours per year in our industry, and that’s at the current adoption. Do you think it’s productive to manually space things out, clean up missing trailing commas and what not? Machines do it better.


> Even if you have, say, 5 sequential related structs, that are all virtually identical, all written on one line so that the similarities and differences are obvious at a mere glance... Then someone comes through and touches my file, and while they're at it, "fix" the line that went 2 characters past the 80 mark by reformatting the 4th struct to span several lines.

Autoformatters absolutely do this. They do not understand considerations like symmetry.

I am doubtful as to the costs of "somewhere in the codebase there is a missing trailing comma".


The wins of autoformatting are 1) never having to have a dispute over formatting or have formatting depend on who last touched code, 2) never manually formatting code or depending on someone's editor configuration, 3) having CI verify formatting, and 4) not having someone (intentionally or unintentionally) make unrelated formatting changes in a commit.

Also, autoformatters can be remarkably good. For instance, rustfmt will do things like:

    x.func(Some(MyStruct {
        field: big + long + expr,
        field2,
    }));
rather than mindlessly introducing three levels of indentation.


I have been places where we allow long strings, but other things aren’t allowed and generally 80 to 100 char limits otherwise. I like 100 for c++/java and 80 for C. If it gets much longer than that (not being strings) then it’s time for a rethink in most cases, grouping/scoping symbols are getting too deep. I’m sure other languages may or may not have that as a reasonable argument. It is just a rule of thumb though.


If I recall, rustfmt had a bug where long string literals (say, over 120 chars or so — or maybe if it was that the string was long enough to extend beyond the gutter when properly indented?) would prevent formatting of the entire file they were in. Has this been fixed?


Not the whole file, but sufficiently long un-line-breakable code in a complex statement can cause rustfmt to give up on trying to format that statement. That's a known issue that needs fixing.


Rust and Javascript and Lisp all get extra points because they put a keyword in front of every function definition. Searching for “fn doTheThing” or “defun do-the-thing” ensures that you find the actual definition. Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in. Some C coding conventions have you split the definition into two lines, first the return type on a line followed by a second line that starts with the function name. It looks ugly, but at least you can search for “^doTheThing” to find just the definition(s).


Golang has a similar property as a side-effect of the following design decision.

  ... the language has been designed to be easy to analyze and can be parsed without a symbol table
Taken from https://go.dev/doc/faq

The "top-level declarations" in source files are exactly: package, import, const, var, type, func. Nothing else. If you're searching for a function, it's always going to start with "func", even if it's an anonymous function. Searching for methods implemented by a struct similarly only needs one to know the "func" keyword and the name of the struct.

Coming from a background of mostly Clojure, Common Lisp, and TypeScript, the "greppability" of Go code is by far the best I have seen.

Of course, in any language, Go included, it's always better to rely on static analysis tools (like the IDE or LSP server) to find references, definitions, etc. But when searching code of some open source library, I always resort to ripgrep rather than setting up a development environment, unless I found something that I want to patch (which in case I set up the devlopment environment and rely on LSP instead of grep to discover definitions and references).


I'm not so sure about greppability in the context of Go. At least at Google (where Go originates, and whose style guide presumably has strong influence on other organizations' use of the language), we discourage "stuttering":

> A piece of Go source code should avoid unnecessary repetition. One common source of this is repetitive names, which often include unnecessary words or repeat their context or type. Code itself can also be unnecessarily repetitive if the same or a similar code segment appears multiple times in close proximity.

https://google.github.io/styleguide/go/decisions#repetitive-...

(see also https://google.github.io/styleguide/go/best-practices#avoid-...)

This is the style rule that motivates the sibling comment about method names being split between method and receiver, for what it's worth.

I don't think this use case has received much attention internally, since it's fairly rare at Google to use grep directly to navigate code. As you suggest, it's much more common to either use your IDE with LSP integration, or Code Search (which you can get a sense of via Chromium's public repository, e.g. https://source.chromium.org/search?q=v8&sq=&ss=chromium%2Fch...).


The thing about stuttering is that the first part of the name is fixed anyway, MOST of the time.

If you want to search for `url.Parse`, you can find most of the usages just by searching for `url.Parse`, because the package will generally be imported as `url` (and you won’t import Parse into your namespace).

It’s not as good as find references via LSP but it is like 99% accurate and works with just grep.


That works somewhat for finding uses of a native Go module, although I've seen lots of cases where the module name is autogenerated and so you need to import with an alias (protobufs, I'm looking at you). It also doesn't work quite so well in reverse -- you need to find files with `package url;`, then find instances of '\bfunc Parse\('.

Lastly, one of the bigger issues is (as aforementioned sibling commenter mentioned) the application of this principle to methods. This is especially bad with method names for established interfaces, like Write or String. Even with a LSP server, you then need to trace up the call stack to figure out what concrete types the function is being called with, then look at the definitions of those types. I can't imagine wanting to do that with only (rip)grep at my disposal.


Wild that the people who made the language that forces me to type

  if err != nil {
      return nil, err
  }
after every second statement are advising against code repetition.


Wild that you don't distinguish between error handling and name repetition.


The issue is about "repetition" - a concept that exists on the syntactic level, and is unavoidable in Go.

To put it in a way you may understand, it's when you have to press keys in the same order a lot of times and you get sad.


Repetition at the syntactic level, as you describe here, is, statistically, zero percent of the time spent in any meaningful software project. It, quite literally, doesn't matter.

Time spent by authors typing source code characters into their editor is almost entirely irrelevant. The only thing that matters is the time spent by readers parsing and understanding the source code in the VCS.


The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.


Single letter variables in Golang are to be used in small, local contexts. Akin to the throwaway i var in for loops. You only grep the struct methods, the same way no one greps 'this' or 'self'.

The code bases you've been reading, and even some of the native libraries, don't do it properly. Probably due to legacy reasons that wouldn't pass readability approvals nowadays.


> The culture of single letter variables in golang, at least in the codebases I've seen, undoes this.

The convention, not just in Go, is that the smaller the scope, the smaller the variable reference.

So, sure, you're going to see single-letter variables in short functions, inside short block scopes, etc, but that is true of almost any language.

I haven't seen single-letter variables in Go that are in a scope that isn't short.

Of course, this could just mean that I haven't seen enough of other peoples Go source.


Zipf's law, right - these rules are a formalization of our brain's functionality with language.

Of course, with enough code, someone does everything.


I like using l for logger and db for database client/pool/handle even if there's a wider scope. And if the bulk of a file is interacting with a single client I might call that c.


> that is true of almost any language

You'd be surprised how often language-local cultures break that rule on either side. And a few times it's even an improvement.


E.g. food and art are very important in Japan, so stomach is i and a drawing/painting is e.


Food is very very important in France so we call it nourriture :)


The way I have seen this is that single letter variables are mostly used when declaration and (all) usages are very close together.

If I see a loop with i or k, v then I can be fairly confident that those are an Index or a Key Value pair. Also I probably don't need to grep them since everything interacting with these variables is probably already on my screen.

Everything that has a wider scope or which would be unclear with a single letter is named with a more descriptive name.

Of course this is highly dependent on the people you work with, but this is the way it works on projects I have worked on.


Golang gets zero points from me because function receivers are declared between func and the name of the function. God ai hate this design choice and boy am I glad I can use golsp.


I search ") myFunc" to find member functions. It would be nice to search "c myFunc", but a parentheses works


Yea, that was my workaround for Go as well. Go both gains and loses points here.


I very rarely use literal grep at all. Perl grep is my standard goto.

For functions: grep -P '^func funcName\('

For methods: grep -P '^func [^)]+\) methodName\('


Is it just hard to get used to, or does it fundamentally make something more difficult?


Can’t say I’ve ever had an issue with it, but it does get a bit wild when you have a function signature that takes a function and returns one, unless you clear it up with some types.

  func (s *Recv) foo(fn func(x any) err) func bar(y any) (*Recv, err)
As an exaggerated example. Easy to parse but not always easy to read at a glance.


this thread is about using `grep` to find things, and this subthread is specifically about how the `func` keyword in golang makes it easy to distinguish the definition of a function from its uses, so yes, because `grep 'func lart('` will not find definitions of `lart` as a method. you might end up with something like `grep 'func .*) *lart('` which is both imprecise and enough noise that you will not want to type it; you'll have to can it in a script, with the associated losses of flexibility and transparency


That's fair, I see many examples in this thread where people pass an exact string directly to grep, as you do. I'm an avid grepper, but my grep tool [1] translates spaces to ".*?", so I would just type "func lart(" in that example and it would work.

An incremental grep tool with just this one transformation rule gets you a lot more mileage out of grep.

[1] https://github.com/minad/consult/blob/screenshots/consult-li...

EDIT: Better demo https://jumpshare.com/s/zMENBSr2LwwauJVjo1wS


that's going to find all the functions that take an argument named lart or of a lart type too, but it also sounds like a thing i really want to try


Also, anything that contains "func" and "lart" as a substring, e.g. foobar(function), blart(baz).

It's not far off from my manually-constructed patterns when I want to make sure I find a function definition (and am willing to tolerate some false positives), but I personally prefer fine-grained control over when it's in use.


Mmh, I type "func\ lart(" when I need the literal string. But it's less often, so it's fair that it's slightly more to type.


yeah!


For functions: grep -P '^func funcName\('

For methods: grep -P '^func [^)]+\) methodName\('

Hope that helps.


thanks, that definitely looks better! pcre is kind of working against you here tho; i assume you're invoking it out of habit


I am actually unaware of downsides of PCRE. Could you explain? I hardly ever use literal grep nowadays.


oh, i mean that instead of

    grep -P '^func [^)]+\) methodName\('
you could say

    grep 'func [^)]*) methodName('
which is a bit less typing

however, i have to admit that i sort of ensnared myself in my own noose here by being too clever! i forgot that grep's regexp dialect only supports + if you \ it, and it took me six tries to figure out why i wasn't getting any grep hits. there's a lot to be said for the predictability and consistency of pcre!


No need for any func, you can just

    grep '\) methodName\('


    grep: Unmatched ) or \)
but your main point might be right; the few non-method matches to (pcre) '\)\s*\w+\s*\(' in /usr/share/go-1.19 seem to be uncommon things like this:

    static void __attribute__ ((constructor)) sigsetup(void) {
    void poison() __attribute__ ((weak));
    C3 = -(R + I) // ADD(5,6) NEG(-5,-6)


I have to always add wildcards between func and the function name, because I can never know how the other developer has decided to specify the name of the receiver. This will always be a problem as far as grepping with primitive tools that don't parse the language.


FYI, many people use thin wrappers like this, it's still a primitive tool that doesn't parse the language, but it can handle that problem: https://jumpshare.com/s/zMENBSr2LwwauJVjo1wS (GIF)


On machines where I control the tooling, this is not an issue. But I can’t take my config to my colleagues machine.


If only AFS had succeeded. What would a modern version of this look like?


What about piped grep statements?

grep func | grep functionName


Receivers are utterly idiotic. Like how could anyone with two working brain cells sign off on something like that?

If you don't want OOP in the language, but want people to be able to write thing.function(arg), you just make function(thing, arg) and thing.function(arg) equivalent syntax.


C# did this for extension methods and it Just Works. You just add the "this" keyword to a function in a pure-static class and you get method-like calling on the first param of that function.


If the function has to be modified in any way in order to grant permission to be used that way, then it is not quite "did this".

Equivalent means that there is no difference at the AST level between o.f(a) and f(o, a), like there is no difference in C among (a + i), a[i], i[a] and (i + a).

However, a this keyword is way better than making the programmers fraction off a parameter and move it to the other side of the function name.


A search term here is "Uniform Function Call Syntax", as present in (e.g.) D:

https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax


I completely agree with that.


How many God AI's have expressed their hate for this design? /s


Go is horrible due to the absence of specific "interface implementation" markers. Gets pretty hard to find where or how a type implements an interface.


Interfaces are consumer contracts, they say "I need this" not "I provide this".

Regardless, any reasonable LSP will let you "Find implementations" of any interface definition.


Yup, but we're talking about grep, not lsp. And that's precisely why I don't write go without an lsp (the rare times I do write go).


In golang you get `func (someName someType) funcname`, so it's much less greppable than languages using `func funcname`


JavaScript has multiple ways to define a function so you sort of lose that getting the actual definition benefit.

on edit: I see someone discussed that you can grep for both arrow functions and named function at the same time and I suppose you can also construct a query that handles a function constructor as well - but this does not really handle curried functions or similar patterns - I guess at that point one is letting the perfect become the enemy of the good.

Most people grepping know the code base and the patterns in use, so they probably only need to grep for one type of function declaration.


C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {. In order to get a complete list of symbols you need to preprocess it, this requires knowing what the compiler flags are. Nobody really knows what their compiler flags are because they are hidden between multiple levels of indirection and a variety of build systems.


> C is so much worse than that. Many people declare symbols using macros for various reasons, so you end up with things like DEFINE_FUNCTION(foo) {.

That’s not really C; that’s a C-based DSL. The same problem exists with Lisp, except even worse, since its preprocessor is much more powerful, and hence encourages DSL-creation much more than C does. But in fact, it can happen with any language - even if a language lacks any built-in processor or macro facility, you can always build a custom one, or use a general purpose macro processor such as M4.

If you are creating a DSL, you need to create custom tooling to go along with it - ideal scenario, your tools are so customisable that supporting a DSL is more about configuration than coding something from scratch.


If your Lisp macro starts with a symbol whose name begins with def, and the next symbol is a name, then good old Exuberant Ctags will index it, and you get jump to definition.

Not so with DEFINE_FUNCTION(foo) {, I think.

  $ cat > foo.lisp
  (define-musical-scale g)
  $ ctags foo.lisp
  $ grep scale tags
  g       foo.lisp        /^(define-musical-scale g)$/;"  f
Exuberant Ctags is not even a tool from the Lisp culture. I suspect it is mostly shunned by Lisp programmers. Except maybe for the Emacs one, which is different. (Same ctags command name, completely different software and tag file format.)


the issue is that the c preprocessor is always available and usually used


Other languages have preprocessors or macro facilities too.

C's is very weak. Languages with more powerful preprocessors/macros than C's include many Lisp dialects, Rust, and PL/I. If you think everyone using a weak preprocessor is bad, wait until you see what people will do when you give them a powerful one.

Microfocus COBOL has an API for writing custom COBOL preprocessors in COBOL (the Integrated Preprocessor Interface). (Or some other language, if you insist.) I bet there are some bizarre abominations hidden in the bowels of various enterprises based on that ("our business doesn't just run on COBOL, it runs on our own custom dialect of COBOL!")


c's macro system is weak on purpose, based on, i suspect, bad experiences with m6 and m4. i think they thought it was easier to debug things like ratfor, tmg, lex, and (much later) protoc, which generate code in a more imperative paradigm for which their existing debugging approaches worked

i can't say i think they were wholly wrong; paging through compiler error messages is not my favorite part of c++ templates. but i have a certain amount of affection for what used to be called gasp, the gas macro system, which i've programmed for example to compute jump offsets for compiling a custom bytecode. and i think m4 is really a pathological case; most hairy macro systems aren't even 10% as bad as m4, due to a combination of several tempting but wrong design decisions. lots of trauma resulted

so when they got a do-over they eliminated the preprocessor entirely in golang, and compensated with reflection, which makes debugging easier rather than harder

probably old hat to you, but i just learned last month how to use x-macros in the c preprocessor to automatically generate serialization and deserialization code for record types (speaking of cobol): http://canonical.org/~kragen/sw/dev3/binmsg_cpp.c (aha, i see you're linking to a page that documents it)


C's is weak yet not weak – you can do various advanced things (like conditional expansion or iteration), but using esoteric voodoo with extreme performance cost. Whereas other preprocessors let you do that using builtins which are fast and easy to grok.

See for example https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,...

Poor C preprocessor performance has a negative real world impact, for example recently with the Linux kernel – https://lwn.net/Articles/983965/ – a more powerful preprocessor would enable people to do those things they are doing anyway much more cheaply


I've always suspected the powerful macro facilities in Lisp are why it's never been very common - the ability to do proper macros means all the very smart programmers create code that has to be read like a maths paper. It's too bespoke to the problem domain and too tempting to make it short rather than understandable.

I like Rust (tho I have not yet programmed in it) but I think if people get too into macro generated code, there is a risk there to its uptake.

It's hard for smart programmers to really believe this, but the old "if you write your code as cleverly as possible, you will not be able to debug it" is a useful warning.


Yes, the usefulness of macros always has to be balanced against their cost. I know of only one codebase that does this particular thing though, Emacs. It is used to define Lisp functions that are implemented in C.


It's a common pattern for just about any binding of C-implementation to a higher-level language. Python has a similar pattern, and I once had to re-invent it from scratch (not knowing any of this) for a game engine.


Not JavaScript. Cool kids never write “function” any more, it’s all arrow functions. You can search for const, which will typically work, but not always (could be a let, var, or multi-const intializer).


Yes but that’s an anti pattern. Arrow functions aren’t there to look cool, they’re how you define lambdas / anonymous functions.

Other than that, functions should be defined by the keyword.


How is that an anti-pattern?

> Other than that, functions should be defined by the keyword.

Says who?


Anonymous functions don't have names. This makes it much harder to do things like profiling (just try to find that one specific arrow function in your performance profile flame graph) and tracing. Tools like Sentry that automatically log stack traces when errors occur become much less useful if every function is anonymous.


    const foo = () => {}
This function is not anonymous, it's called foo.


Interesting, it seems that the javascript runtime is smart enough detect this pattern and actually create a named function (I tried Chrome and Node.js)

    const foo = () => {}
    console.log( foo.name );
actually outputs 'foo', and not the empty string that I was expecting.

   const test = () => ( () => {} );
   const foo = test();
   console.log( foo.name );
outputs the empty string.

Is this behavior required by the standard ?



You're probably remembering how it used to work. This is the example I remember from way back that we shouldn't use because (aside from being unnecessary and weird) this function wouldn't have a name in stack traces:

  var foo = function() {};
Except nowadays it too does have the name "foo".


But to call foo in bar you must define foo before bar.

function foo(){} is also callable if bar is defined before foo.


Not true at the top-level.


Not sure what you find not true about it. All named “function”s get hoisted just like “var”s, I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere. You’re probably thinking about

  const foo = function (){}
without its own name before (). These behave like expressions and cannot be hoisted.


> I use post-definitions of utility functions all the time in file scopes, function scopes, after return statements, everywhere

I haven't figured out if people consider this a best practice, but I love doing it. To me the list of called functions is a high-level explanation of the code, and listing all the definitions first just buries the high-level logic "below the fold". Immediately diving into function contents outside of their broader context is confusing to me.


I don’t monitor “best” practices, so beware. But in languages like C and Pascal I also had a habit of simply declaring all interfaces at the top and then grouping implementations reasonably. It also created a nice “index” of what’s in the file.

Hoisting also enables cross-imports without helper unit extraction headaches. Many hate js/ts at the “kids hate == and null” level but in reality these languages have a very practical design that wins so many rounds irl.


Does the function know it’s called foo for tracing/error logging/etc?


Not really, its an anonymous function stored in a variable foo


To me, arrow functions behave more like I would expect functions to behave. They don’t include all the magic bindings that the function keyword imparts. Feels more “pure” to me. Anonymous functions can be either function () {} or () => {}


As of a few years ago (not sure about now) the backtrace frame info for anonymous functions were far worse than ones defined via the function keyword with a name.


All the wise ones. Well, except for you maybe.

Serious arguments would be:

- readability

- greppability


(It wasn't an insult, but a joke on the username)


Functions and arrow functions have an important difference: arrow functions do not create their own `this`. If you're in a context where a nested function needs to maintain access to the outer function’s `this`, and you don't want to muck with `bind` or `call`, then you need an arrow function.


Am I the only one who hates arrow functions?


I did, until I used them enough where I saw where they were useful.

The bad examples of arrow functions I saw initially were of:

1. Devs trying to mix them in with OOP code as a bandaid over OOP headahes (e.g. bind/this) instead of just not using OOP in the first place.

2. Devs trying to stick functional programming everywhere because they had seen a trivial example where a `.map()` made more semantic sense than a for/for-in/for-of loop. Despite the fact that for/for-in/for-of loops were easier to read for anything non-trivial and also had better performance because you had access to the `break`, `continue` and `return` keywords.


    > also had better performance because you had access to the `break`, `continue` and `return` keywords.
This is a great point.

One more: Debugging `.map()` is also much harder than a for loop.


I feel there are a few ways to invoke .map() in a readable way and many ways that make the code flow needlessly indirect.

Should be a judgment call, and the author needs to be used to doing both looping and mapping constructs, so that they are unafraid of the bit of extra typing needed for the loop.


Another benefit of using for instead of array fns is that it is easy to add await keyword should the fn become async.

But many teams will have it as a rule to always use array fns.


That gives you have the option of making it serially async but not parallel, which can be achieved easily using Promise.all in either scenario.


As an aside: It’s way less ergonomic, but you likely want `Promise.allSettled` rather than `Promise.all` as the first promise that throws aborts the rest.


It doesn’t really abort the rest, it just prioritizes the selection of a first catch-path as a current continuation. The rest is still thenable, and there’s no “abort promise” operation in general. There are abort signals, but it’s up to an async process to accept a signal and decide when/whether to check it.


Admittedly, I was being a bit hand-wavy and describing a bit more of how it feels rather than the way it is (I'm perpetually annoyed that promises can't be cancelled), but I was thinking of the code I've seen many times across many code bases:

    let results;
    try {
      results = await Promise.all(vals.map(someAsyncOp))
    } catch (err) {
      console.error(err)
    }
While you could pull that promises mapping into a variable and keep it thenable, 99% of the time I see the above instead. Promises have some rough edges because they are stateful, so I think it might be easier to recommend swapping that Promise.all for an Promise.allSettled, and using a shared utility for parsing the promise result.

I consider this issue akin to the relationship between `sort`, `reverse`, `splice`, the mutating operation APIs, and their non mutating counterparts `toSorted`, `toReversed`, `toSpliced`. Promise.all is kind of the mutating version of allSettled.


I don't like using them everywhere, but they're very handy for inline anonymous functions.

But it really pains me when I see

export const foo = () => {}

instead of

export function foo() {}


Thank you, that's something I also never have understood myself. For inline anonymous functions like callbacks they make perfect sense. As long as you don't need `this`.

But everywhere else they reduce readability of the code with no tangible benefit I am aware of.


But do they make much of a difference? You have always been able to write:

    myArray.sort(function(a,b){return a-b})
People for some reason treat this syntactic sugar like it gives them some new fundamental ability.


Oh Javascript would be much better if it could only be syntactic sugar...

`function(a,b){return a-b;}` is different from `(a,b) => a - b`

And `function diff(a,b) {return a-b;}` is different from `const diff(a,b) => a - b;`.


I wish javascript had a built-in or at least (defacto) default linter. Like go-fmt or rust fmt. Or clippy even.

One that could enforce these styles. Because not only is the export const foo = () {}

painful on itself, it will quite certainly get intermixed with the

function foo() {}

and then in the next library a

const foo = function() {}

and so on. I'd rather have a consistently irritating style, than this willy-nilly yolo style that the JS community seems to embrace.


ESLint and Prettier are the de facto default linter/formatter combo in JS. There are rules you can enable to enforce your preferred style of function [1][2].

[1] https://eslint.org/docs/latest/rules/func-style

[2] https://eslint.org/docs/latest/rules/prefer-arrow-callback


They are miles away from `gofmt` or `rust fmt` or `cargo clippy` and so on.

It's not opinionated, but requiring you to form your own opinion or at least choose from a palette of opinions.

It requires effort to opt-in rather than effort to opt-out.

The community doesn't frown on code that's not adhering to the common standard or code that doesn't pass the "out of the box" linter.

So, if I have a typescript project with a tree of some 20 dependencies (which is, unfortunately, a tiny project), I'll have at least five styles of code when it browse through it. Some JS, some TS, some strictly linted with "no-bikeshedding", some linted with configs that are bigger than the codebase itself. Some linted with outdated. Many not linted at all. It's really a mess. Even if each of the 20 dependencies themselves are clean, beauties, the whole is an inconsistent mess.


I personally believe that Prettier strikes a decent balance between being configurable vs being opinionated. I've used formatters that are much less opinionated than Prettier (e.g. the last time I used XCode it only handled indentation by default). ESLint also teeters on the edge between configuration vs convention quite well in my opinion, especially given the fact that browser JS and server side JS have vastly different linting requirements. I also love the extensibility of ESLint. Being able to write your own linting rules is a boon on productivity. Having access to custom framework specific linting rules is quite nice (I couldn't use React without the react-hooks/rules-of-hooks third party ESLint rules).

> Some JS, some TS

I think the JS community has done remarkably well amongst dynamically typed languages in settling on one form of gradual typing and adopting it fervently (Flow no longer has any market share at all). Whereas the last time I checked Python still had the Mypy/Pywright divide, and Ruby had the Sorbet/RBS dichotomy.

Ultimately though, most of your critique boils down to the fact that JS (unlike Rust and Go) isn't maintained by a single monolithic entity, and therefore there's no one to dictate the standards you're looking for. If Deno were the sole caretaker of JS for example, we'd have a standard linter and formatter devoid of complex configuration, but Deno doesn't control JS.

This is a consequence of JS being a collaborative product of the various browser vendors, TC39, and the server side JS runtimes that have adapted JS to run on servers. The advantage of this of course though is that JS can run natively in the browser. I think that's a decent tradeoff to make in exchange for having to wade through dependencies with different ideas about when it's appropriate to use arrow functions.


These aren't equivalent as function foo will be hoisted but const foo will not be.


Sure the food that this restaurant serves is pricey, but you have to remember that it also tastes terrible.


Yep, and that usually doesn't matter at the top level.


I like them because it reinforces the idea that functions are just values like any other - having a separate keyword feels like it is inconsistent.


Moreover the binding and lexical scope aspects supported by classic functions are amongst the worst aspects of the language.

Arrow functions are also far more concise and ergonomic when working with higher order functions or simple expressions

The main thing to be wary of with arrow functions is when they are used anonymously inline without it being clear what the function is doing at a glance. That and Error stack traces but the latter is exacerbated by there being no actual standard regarding Error.prototype.stack


Why do you want to reinforce that idea?

To me arrow functions mostly just decrease readability and makes them blend in too much, when it should be important distinction what is a function and what is not.


Not to be dismissive, but because I like it - it just sits right with me.


I'm not a javascript programmer, but I really like the arrow pattern from a distance exactly because it enforces that idea.

My experience is that newcomers are often thrown off and confused by higher order functions. I think partly because, well let's be honest they just are more confusing than normal functions, but I think it's also because languages often bind functions differently from everything else.

`const cool = () => 5`

Makes it obvious and transparent, that `cool' is just a variable where as:

`function cool() {return 5}`

looks very different from other variable bindings.


Since we're on the topic of higher order functions, arrow functions allow you to express function currying very succinctly (which some people prefer). This is a contrived example to illustrate the syntactical differences:

  const arrow = (a) => (b) => `${a}-${b}`
  
  function verbose(a) {
    return function (b) {
      return `${a}-${b}`
    }
  }
  
  function uncurried(a, b) {
    return `${a}-${b}`
  }
  
  const values = ['foo', 'bar', 'baz']
  values.map(arrow('qux'))
  values.map(verbose('qux'))
  values.map(uncurried.bind(null, 'qux'))
  values.map((b) => uncurried('qux', b))


> should be important distinction what is a function and what is not

code is to express logic clearly to the reader. We should assess it for that purpose, before assess for any derivative, secondary concern such as whether categories of things in code (function etc) visually pops out when you use some specific tool like vim, or grep. There are syntax highlighters for a reason. And maybe if grep sucks with code then build the proper tool for code searching, instead of writing code after the tool.


I very much prefer the way scoping is handled in arrow functions.


A simple heuristic I use is to use arrow functions for inline function arguments, and named "function" functions for all others.

One reason is exactly what the subject of discussion is here, it's easier to string-search with that keyword in front of the name, but I don't need that for trivial inline functions (whenever I do I make it an actual function that I declare normally and not inline).

Then there's the different handling of "this", depending on how you write your code this may be an important reason to use an arrow function in some places.


I'm of the opinion that giving a global name to an anonymous function should result in a compilation error.


why the need to pronounce arbitrary preferences, who cares?


I want to talk to the developer who considers greppability when deciding whether to use the "function" keyword but requires his definitions to be greppable by distancing them from their call locations. I just have a few questions for him.


Yes JavaScript.

You can search for both: "function" and "=>" to find all function expressions and arrow function expressions.

All named functions are easily searchable.

All anonymous functions are throw away functions that are only called in one place so you don't need to search for them in the first place.

As soon as an anonymous function becomes important enough to receive a label (i.e. assigning it to a variable, being assigned to a parameter, converting to function expression), it has also become searchable by that label too.


The => is after the param spec, so you’re searching for foo.*=> or something more complex, but then still missing multiline signatures. This is very easy to get caught by in TypeScript, and also happens when dealing with higher-order functions (quite common in React).


Why are you searching for foo.=>

Are you searching through every function, or functions that have a very specific parameter?

And whatever you picked, why?

---------------------------------------------------------------

- If you're searching for every function, then there's no need to search for foo.=>, you only need to search for function and =>.

- If you're searching for a specific parameter, then just search for the parameter. Searching for functions is redundant.

---------------------------------------------------------------

Arrow function expressions and function expressions can both be named or anonymous.

Introducing arrow functions didn't suddenly make JavaScript unsearchable.

JavaScript supported anonymous functions before arrow function expressions were introduced.

Anonymous functions can only ever be:

- run on the spot

- thrown away

- or passed around after they've been given a label

Which means, whenever you actually want to search for something, it's going to be labelled.

So search for the label.


You can still look for `(funcname)\s*=` can't you? I mean it's not like functions get re-declared a lot.


You can still search for `<keyword> = \(.*\) => `, albeit it's a bit cumbersome.


All you need is a tool that actually understands the language.

It's 2024 and HN still suggests using regular expressions to search through a code base.


Regex is a universal tool.

Your special tool might not work on plattform X, fails for edge case - and you generally don't know how it works. With regex or simple string search - I am in control. And can understand why results show up, or investigate when they don't, but should.


> Your special tool might not work on plattform X

As always, people come out with the weirdest of excuses to not use actual tools in the 99.9999% of the cases when they are available, and work.

When that tools doesn't work, or isn't sufficient, use another one like fuzzy text search or regexps.

> and you generally don't know how it works.

Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?

Only in programming I see people deliberately avoid useful tools because <some fringe edge case that comes up once in a millenium in their daily work>


if you think anything works in 99.9999% of cases, you’ve never programmed a computer


It’s you who sees it as excuses. If I have a screwdriver multitool, I don’t need another one which is for d10 only. It simply creates unnecessary clutter in a toolbox. The difference between definition and mention search for a function is:

  gr<bs><bs>ion name<cr>
  vs
  grname<cr>
or for the current identifier, simply

  gr<m-w><cr>
I could even make my own useful tools like “\[fvm]gr” for function, variable or field search and brag about it watching miserable ide guys from the high balcony, but ain’t that unnecessary as well.


> It simply creates unnecessary clutter in a toolbox.

And then you proceed to... invent several pale imitations of a symbol/usages search.

More here: https://news.ycombinator.com/item?id=41435862 so as not to repeat myself


Doesn’t really apply, ignores things just said.


When you specialize in one thing only, do what you want.

But I prefer tools, that I can use wherever I go. To not be dependant and chained to that environment.

"Do you know how your stove works? Or do you truly understand what the device you're typing this comment on truly works?"

Also yes, I do.

" people deliberately avoid useful tools because <some fringe edge case that comes up once in a millenium in their daily work>"

Well, or I did already changed tools often enough, to be fed up with it and rather invest in tech that does not loose its value in the next iteration of the innovation cycle.


> When you specialize in one thing only, do what you want.

I specialize in one thing only: programming

> But I prefer tools, that I can use wherever I go.

Do you always walk everywhere, or do you use a tool available at the time, like cars, planes, bycicles, public transport?

> rather invest in tech that does not loose its value in the next iteration of the innovation cycle.

Things like "fund symbol", "find usages", "find implementation" have been available in actual tools for close to two decades now.


I did not say I do not use what is avaiable, but this debate is about in general having your code in a shape that simply searching for strings work.


Simply searching for strings rarely works well as the codebase grows larger. Because besides knowing where all things named X are, you want to actually see where X is used, or where it's called from, or where it is defined.

With search you end up grepping the code twice:

- first grepping for the name

We're literally in a thread where people invent regexes for how to search the same thing (a function) defined in two different ways (as a function or as a const)

- secondly, manually grepping through search results deducing if it's relevant to what you're looking for

It becomes significantly worse if you want to include third-party libs in your search.

There are countless times when I would just Cmd+B/Cmd+Click a symbol in IDEA and continue my exploration down to Java's own libraries. There are next to zero cases when IDEA would fail to recognise a function and find its usages if it was defined as a const, not as a function. Why would I willingly deny myself these tools as so many in this thread do?


I both agree and disagree with this.

I was very careful about how I phrased my comment. Languages can gain points by being greppable, and they can gain points by having a well–implemented mode for my favorite editor, and they can also gain points by having a well–implemented language server. They can do all three to gain the most points (and of course there are many other ways to gain or lose points; this is just one tiny aspect of language design), but they don’t have to. And nobody should try to force every language designer to shave the yak of writing an Emacs mode or a Language Server.

And sometimes you just want to improve some random piece of software written in a language that you’ve never used before, without having to shave a bunch of yaks to get the language server installed. Sometimes all you have is grep and you’ll want to be able to use it on this weird new language.


Its current year and IDEs still can’t remember how I just transformed the snippet of code and suggest to transform the rest of the files in the same way. All they can do in “refactor” menu is only “rename” and then some extract/etc nonsense which no one uses irl.

By using regexps I have an experience that opens many doors, and the fact that they aren’t automatic could make me sad, if only these doors weren’t completely shut without that experience.


No one is stopping you from using regexps in IDEs.

And you somehow manage to undersell the rename functionality in an IDE. And I've used move/extract functionality multiple times.

I do however agree that applicable transformations (like upgrading to new syntaxes, or ways of doing stuff as languages evolve) could be applied wholesale to large chunks of code.


The thing is: In large codebase the tool may become slow or crash, in a new language you may not have such tool.. Grep is far more robust!


When tools don't work or unsuitable, you use different tools.

And yet people are obsessed with never using useful tools in the first place because they can invent scenarios when this tool doesn't work. Even if these scenarios might never actually come up in their daily work.


Not to move the goal posts too much, but when I am searching a huge Python or Java code base from IntelliJ, I use a mixture of symbol and text search. One good thing about text search, you get hits from comments.


Yup. I do, too.

I'm mostly ranting against this weird "we will never use great tools because full-text search" obsession


All you need is `<keyword> =`

Really, all you need is `<keyword>` and if the first result is a call to that function, just jump to its definition.


Exactly.

Just search the definition.

Any time that a function doesn't have a definition, it's never the target of a search anyway.


I used to define functions as `funcname (arglist)`

And always call the function as `funcname(args)`

So definitions have a space between the name and arg parentheses, while calls do not. Seemed to work well, even in languages with extraneous keywords before definitions since space + paren is shorter than most keywords.

Now days I don’t bother since it really isn’t that useful especially with tags or LSP.

I still put the return type on a line of its own, not for search/grep, but because it is cleaner and looks nice to me—overly long lines are the ugliest of coding IMO. Well that and excessive nesting.


Meanwhile C lacks any such keyword, so the best you can do is search for the name. That gets you a sea of callers with the declarations and definitions mixed in

That’s why in my personal projects I follow classic “type\nname” and grep with “^name\>”.

looks ugly

Single line definitions with long, irregular type names and unaligned function names look ugly. Col 1 names are not only greppable but skimmable. I can speedscroll through code and still see where I am.


Yet you reply to an article that defines functions as variables, which I've seen a lot of developers do usually for no good reason at all.

To me, that's a much common and worse practice with regards to greppability than splitting identifiers using string which I haven't seen much in the wild.


Although in rust, function like macros make it super hard to trace code. I like them when I am writing the code and hate then when I have to read others macros.


In the bygone days of ctags, C function definitions included a space before opening parenthesis, while function calls never had that space. I have a hard time remembering that modern coding styles never have that space and my IDE complains about it. (AFAIK, the modern gtags doesn't rely on that space to determine definitions.) Even without *tags, the convention made it easy to grep for definitions.


space after builtin was recommended instead:

  if (x == 0) { ...
  sizeof (buf);
  return (-1);
  exit(0);


In terms of C, that's one reason I prefer the BSD coding style:

int

foo(void) { }

vs the Linux coding style:

int foo(void) { }

The BSD style allows me to find function definitions using git grep ^foo.


Those also make your language easier to parse, and to read.

Many people insist that IDEs make the entire point moot, but that's the kind of thing that make IDEs easier to write and debug, so I disagree.


One thing which works for C is to search something like `[a-z] foo\(.+\) \{` assuming that spacing matches the coding style, often the shorter form `[a-z] foo\(` works well, which tries to ensure there is a type definition and bin assignment or something before name. Then there is only a handful false positives.


For most functions ^\S.*name( will find declarations and definitions.

Most of us use exuberant ctags to allow jumping to definitions.


Not sure this is very true for Common Lisp. Classic example are accessor functions where the generic function is created by whichever class is defined first and the method where the class is defined. Other macros will construct new symbols for function names (or take them from the macro arguments).


That’s true, but I regard it as fairly minor. Accessor functions don't have any logic in them, so in practice you don’t have to grep for them. But it can be confusing for new players, since they don't know ahead of time which ones are accessors and which are not.


Still you can extend the concept without a lot of work, couldn't you?


> so the best you can do is search for the name

This is why in C projects libs go in "lib/" and sources go in "src/". If your header files have the same directory structure as libs, then "include/" is a also a decent way to find definitions.


C has "classical" tooling like Cscope and Exuberant Ctags. The stuff works very well, except on the odd weird code that does idiotic things that should not be done with preprocessing.

Even for Lisp, you don't want to be grepping, or at least not all the time for basic things.

For TXR Lisp, I provide a program that will scan code and build (or add to) your tags file (either a Vim or Emacs compatible one).

Given

  (defstruct point ()
    x
    y)
it will let your editor jump to the definition of point, x and y.


> Meanwhile C lacks any such keyword

It's a hassle. But not the end of the world.

I usually search for "doTheThing\(.+?\) \{" first.

If I don't get a hit, or too many hits I move to "doTheThing\([^\)]*?\) \{" and so on.


Javascript is a bit trickier i think nowadays with the fat arrow notation : const myFunc = () => console. log("can't find me :p");


C, starting with K&R, has all declarations and definitions on lines at the left margin, and little else. this is easy to grep for.


Doesn't cscope fit this usecase ?


Though glob imports in rust can hide a source, so those should be avoided.


Exactly. I wrote an entire blog post about that: https://corrode.dev/blog/dont-use-preludes-and-globs/


python also!


Python is the only one mentioned that “actually works” without endless exceptions to the rule in the normal case. The ones mentioned (Rust/Javascript/Lisp/Go) all have specific syntax that is commonly enough used which makes it harder to search. Possible, absolutely, but still harder.


I'd say Python works well at greppability because community conventions generally discourage concealing certain kinds of definitions (e.g. function definitions are usually "def whatever").

However, that's just convention. Lots of modules do metaprogramming tricks that obscure greppability, which can be a pain. This is particularly acute when searching for code that is "import-time polymorphic"--that is, code which picks one of several implementations for a piece of functionality at import time at the module scope. That frequently ends up with some hanky-panky a la "exported_function_name = _implementation1 if platform_supported else _implementation2" at the module scope.

While sometimes annoying, that type of thing is usually done for understandable reasons (picking an optimized/platform-supported implementation of an interface--think select or selectors in the stdlib, or any pypi implementation of filesystem monitoring using fsnotify/fanotify/kqueue/fsevents/ReadDirectoryChangesW). Additionally, good type annotations help with greppability, though they can't fully mitigate this issue.

Much less defensible in Python is code that abuses locals/globals to indirect symbol access, or code that abuses star imports to provide interfaces/implementation switching.

Those, fortunately, are rare, but the elephant in the "no greppability ever" room is not: getattr bullshit in OO code is so often utterly obscure, unnecessary and terrible. And it's distressingly common on PyPi. At first I thought this was Ruby's encouragement of method_missing in the bad old days bleeding into the Python community, but the number of programmers for whom getattr magic is catnip seems to be disproportionate to the number of folks with Ruby experience, and, more concerningly, seems to me to be growing over time.


Do people really use text search for this rather than an IDE that parses all of the code and knows exactly where each declaration is, able to instantly jump to them from a key press on any usage...? Wild.


Yes. Not everyone uses or likes an IDE. Also, when you lean on an IDE for navigation, there is a tendency to write more complicated code, since it feels easy to navigate, you don't feel the pain.


Looks fine (subjective) and there is also ctags


There is arrow syntax with js


People don't use LSP?


That’s right, not everyone uses an LSP. Nothing wrong with LSPs, very useful tools. I use ripgrep, or plain grep if I have to, far more often than an LSP.

Working with legacy code — the scenario the author describes — I often can’t install anything on the server.


Fun fact: VS Code uses ripgrep by default.

https://github.com/microsoft/vscode-ripgrep


LSP doesn't always work without issues with large C and C++ codebases which is why one needs to fallback to grep techniques.


> Meanwhile C lacks any such keyword, so the best you can do is...

...use source code tagging or LSP.


ctags.


Rust though does lose some of those points by more or less forcing[1] snake_case. It's really annoying to navigate bindings which are converted from camelCase.

I don't care which case is used. It's a trivial superficial thing, and tribal zealotry about such doesn't reflect well on the language and community.

[1] The warnings can be turned off, but in some cases it requires ugly hacks, and the community seems to be actively hostile to making it easier)


The Rust community is no more zealous about naming conventions than any other language which has naming conventions. Perhaps you're arguing against the concept of naming conventions in general, but that's not a Rust thing, every language of the past 20 years suggests naming conventions if for no other reason than every language provides a standard library which needs to follow some sort of naming conventions itself. Turning off the warnings emitted by the Rust compiler takes two lines of code, either at the root of the crate or in the crate manifest.


I've yet to encounter another compiler that warns about naming conventions, by default at least. So at least it's most enforced zealotry I've encountered.

Yes, it can be turned off. But for e.g. bindgen generated code it was not trivial to find out.


The Rust compiler doesn't produce warnings out of zealotry, but rather as a consequence of pre-1.0 historical decisions. Note that Rust doesn't use any syntax in pattern matching contexts to distinguish between bindings and enum variants. In pre-1.0 versions of Rust, this created footguns where an author might think they were matching on an enum, but the compiler was actually parsing it as a catch-all binding that would cause any following match arms to never be executed. This was exacerbated by the prevailing naming conventions of the time (which you can see in this 2012 blog post: https://pcwalton.github.io/_posts/2012-06-03-maximally-minim... (note the lower-cased enum variants)). So at some point the naming conventions were changed in an attempt to prevent this footgun, and the lint was implemented to nudge people over to the new conventions. However, as time went on the footgun was otherwise fixed by instead causing the compiler to prioritize parsing enum variants rather than bindings, in conjunction with other errors and warnings about non-exhaustive patterns and dead code (which are all desirable in their own right). At this point it's mostly just vestigial, and I highly doubt that anybody really cares about it beyond "our users are accustomed to this warning-by-default, so they might be surprised if we stopped doing this".


Ah, thanks for the info! I do think this default does have some ramifications, especially in that binding casings are typically changed due to it even for "non-native" wrappers which I find materially makes things more difficult.


Hard agree with the idea of greppability, but hard disagree about keeping names the same across boundaries.

I think the benefit of having one symbol exist in only one domain (e.g. “user_request” only showing up in the database-handling code, where it’s used 3 times, and not in the UI code, where it might’ve been used 30 times) reduces more cognitive load than is added by searching for 2 symbols instead of 1 common one.


I’ve also found that I sometimes really like when I grep for a symbol and hit some mapping code. Just knowing that some value goes through a specific mapping layer and then is never mentioned again until the spot where it’s read often answers the question I had by itself, while without the mapping code there’d just be no occurrences of the symbol in the current code base and I’d have no clue which external source it’s coming from.


Probably depends on how your system is structured. if you know you only want to look in the DB code, hopefully it is either all together or there is something about the folder naming pattern you can take advantage of when saying where to search to limit it.

The upside to doing it this way is it makes your grepping more flexible by allowing you to either only search the one part of the codebase to see say DB code or see all the DB and UI things using the concept.


I have mixed thoughts on this too. Fortunately grep (rg in my case) easily handles it:

rg -i ‘foo.?bar’ finds all of foo_bar, fooBar, and FooBar.


Not to mention the readability hit from identifiers like foo.user_request in JavaScript, which triggers both linters and my own sense of language convention.


Both of those are easy to fix. You'll adapt quickly if you pick a different convention.

Additionally, I find that in practice such "unusual" code is actually beneficial - it often makes it easy to see at a glance that the code is somehow in sync with some external spec. Especially when it comes to implicit usages such as in (de)serialization, noticing that quickly is quite valuable.

I'd much rather trash every languages' coding conventions than use subtly different names for objects serialized and shared across languages. It's just a pain.


I agree that code searchability is a good thing but I disagree with those examples. They intentionally increase the chance of errors.

Maybe there’s an alternative way to achieve what the author set out but increasing searchability at the cost of increasing brittleness isn’t it for me.

In this example:

const getTableName = (addressType: 'shipping' | 'billing') => { return `${addressType}_addresses` }

The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.

const getTableName = (addressType: 'shipping' | 'billing') => { if (addressType === 'shipping') { return 'shipping_addresses' } if (addressType === 'billing') { return 'billing_addresses' } throw new TypeError('addressType must be billing or shipping') }

Similarly, flattening dictionaries for readability introduces the chance of a random typo making our lives hell. A single typo in the repetitions below will be awful.

{ "auth.login.title": "Login", "auth.login.emailLabel": "Email", "auth.login.passwordLabel": "Password", "auth.register.title": "Login", "auth.register.emailLabel": "Email", "auth.register.passwordLabel": "Password", }

Typos aren’t unlikely. In a codebase I work with, we have a perpetually open ticket about how ARTISTS is mistyped as ATRISTS in a similarly flat enum.

The issue can’t be solved easily because the enum is now copied across several codebases. But the ticket has a counter for the number of developers that independently discovered the bug and it’s in the mid two digits.


Typos are find-and-fix-once, while unsearchability is a maintenance burden forever.

I don't think coupling variable names by making sure they contain the same strings is the best way to show they're related, compared to an actual map from address type to table name. There might be a lot of things called 'shipping' in my app, only some of which are coupled to `shipping_addresses`.

Shouldn't a linter be able to catch that there is no enum member called MyEnum.ATRISTS, or is it not an actual enum?


> The input string and output are coupled. If you add string conditionals as the author did, you introduce the chance of a mismatch between the input and output.

I think it depends on whether the repetition is accidental or intrinsic. Does the table name happen to contain the address type as a prefix, or does it intrinsically have to? Greppability aside, when things are incidentally related, it's often better to repeat yourself to not give the wrong impression that they're intrinsically related. Conversely, if they are intrinsically related (i.e. it's an invariant of the system that the table name starts with the address type as a prefix) then it's better for the code to align with that.


Agree with you.

What happens when translation files get too big and you want to split and send only relevant parts? Like send only auth keys when user is unauthenticated?

`return translations[auth][login]` is no longer possible.

Or just imagine you want to iterate through `auth` keys. _shudders_


Entrenched typos like ATRISTS are actually a greppability goldmine. Chances are there are more occurrences of pluralized people who are making art in the codebase, but only ATRISTS is the one from that enum.

I certainly would not suggest deliberately mistyping, but there are places where the benefit is approaching the cost. Certain log messages can absolutely benefit from subtle letter garbling that retains readability while adding uniqueness.


REFERER moment.


Yes, and 'greppable' is an underused word/concept in its own right.

I've used this as an organizing principle since forever (https://news.ycombinator.com/item?id=1535916). It's one of the best ways to factor code that I know of.


It's good, but I'm snagging on the examples here; for instance, the insistence on consistent use of camel or snake case. That seems to me to be purely a tooling limitation; a more code-aware grep could just automatically synthesize the covering regexp for both. It feels like there's a sweet spot in between LSP-ism and grep-ism that isn't being captured.


That's why D has a cast keyword:

    ubyte c = cast(ubyte)i;
instead of:

    unsigned char c = (unsigned char)i;
Casts are a blunt instrument that subvert the type system, and so they need to be greppable.

Having the cast keyword also removes the grammatical ambiguities in the expression syntax.


Do you often grep for casts? I never do that.


I regard every cast as a bug in my own code and try to refactor it so there aren't any. I can't get rid of all of them, but they're always worth a second look.

I don't normally grep for them, but others have told me they did.

P.S. one thing about D is you can do things like this:

    ubyte b = i;            // error, losing bits
    ubyte b = cast(ubyte)i; // ugly cast
    ubyte b = i & 0xFF;     // no cast, no error!
It's just one of the nice little details that making programming in D a pleasure.


These things are better checked automatically with a static linter, that presumably already has a full parser and AST representation of the language and knows what is a cast and not.

Nobody should keep a checklist of "100 things to grep for when doing a code review".


Try to think about why you might want to do that. It makes a lot of sense, but if you're not doing it, that might be enlightening...


When doing appsec review in C or C++, yes!


Honestly I know I don’t do that either. I mean if there was some special case where I remembered “oh yeah I had to cast that variable in this special case”. In general I avoid casting as much as I can in C/C++, but especially in C.


Working in spring, which accepts I don't know how many formats from ENV_VARS to yaml, this very much resonates with me, because as a general rule, if one can use a certain option, someone will do it.

Also the reason why I try to avoid Gradle when possible:

The possibilities are endless. At one place I think I found 21 wildly different Gradle configs out of 24 that I checked.

(For anyone that wonders, it was combinations of:

- placeholders vs straightforward depency (this is a thing in maven too)

- for loops doing things based on lists or maps instead of just calmly declaring them one after another, maybe to save some characters

- helper functions so you could declare dependencies like azure(<something>(<version>))

- order of declarations

- Kotlin vs Groovy syntax

I have probably forgotten a couple more but this is thankfully already a few years ago.)


As someone who almost exclusively uses grep for finding what I need in codebases that are new to me and old to me, you can make whatever arbitrary rules you want, as long as you're consistent, I'll be pretty happy with it. If syntax is loose in some area (single vs double quotes, parens or braces or none), just do the same thing every time. Whitespace consistency isn't crucial, but it can't hurt (between function name and parens, for example).


I'm also thinking long-context LLMs are going to make this advice seem pretty archaic in a few years. They're so good at reading code and extremely useful for asking questions of a code base.

That said, I completely agree with the author on not using clever string tricks to compose identifiers. That makes code both harder to search and to read.


Agreed. So long as the code hits performance and business goals, there doesn't need to be an emphasis put on "newness" or any other sort of vanity metric - make the code obvious, searchable, and understandable so that in a time crunch or during an outage it's easy to search and find the culprit.


I am firmly against the suggested changes. I love grepping through code too (often using -A -B -C), but I also like browsing the code, with tools where you can just click on a function and see its definition.

However, changing how the code should be written so that grepping becomes easier is optimizing for the wrong target. It is much more important that the code is easily readable and maintainable.

In addition, some tools are designed explicitly for grepping through code (from the top of my head ack is an example). If grep doesn't work, one should try a more sophisticated tool instead of using different coding styles.


Nothing the author wrote would necessarily make code harder to read or maintain. Consistent naming of the same thing throughout, not constructing variables or table names dynamically, etc. benefit both readers/maintainers and searching.

I understood “grepping” to mean ripgrep (rg) or ack, not just plain grep. I think programmers who use command line tools or vim know about those. VSCode uses rg.


Greppability is really a proxy metric here - these changes all have other benefits even if you never grep (mostly readability tbh).

    const getTableName = (addressType: 'shipping' | 'billing') => {
        return `${addressType}_addresses`
    }
This is a simplified example but in a longer function, readability of the `return` lines would be improved as the reader wouldn't have to reference the union type (which may or may not be defined in the signature). The rewrite is also safer as it errors out if a runtime `addressType` value doesn't match the union type (above code would not throw an error, just return an indeterminate value which would cause undefined behaviour).

"Flat is better than nested" also greatly improves readability in both examples: either reading the i18n line, or reading the classname at definition / call will be more readable when the name contains full context of function.


one of the strangest and most grep-hostile approaches to identifiers that I have ever observed is Nim ignoring both case and underscores in an effort to allow everyone to write code in their preferred style:

https://nim-lang.org/docs/manual.html#partial-caseminusinsen...


Nim even provides a dedicated grep-like tool to search for identifiers regardless of the style https://nim-lang.org/docs/nimgrep.html


And it works pretty well, coming from 6+ years of experience. It's not that strange if consider case insensitive filesystems and email addresses. But on the internet you only hear the opinion of the loudest minority.


This sounds like the advice to prefer the variable name 'ii' over 'i' because you can easily search for it. I loath such advice because it causes the code to become ugly. Similarly, there are 'YODA Conditions' which make code hard to comprehend which solves an insignificant error that is easily caught with tooling. The problem with advice like these is you will encounter deranged developers that become obsessive about such things and make the code base ugly trying to implement dozens of style rules. Code should look good. Making a piece of text look good for other humans to comprehend I consider to be job #1 or #2 for a good developer.


> 'ii' over 'i'

You don't need to search for local variables, nobody names global variables "i" - so the "ii" advice is pointless.

You often do need to search for places where global stuff is referenced, and while IDEs can help with that - the same things that break grepability often break "find references" in IDE. For example if you dynamically construct function names to call, play with reflections, preproccessor, macros, etc.

So it's a good advice to avoid these things.

> you will encounter deranged developers that become obsessive about such things and make the code base ugly

You can abuse any rule, including

> Code should look good.

and I'd argue the more general a rule is - the more likely it is to be abused. So I prefer specific rules like "don't construct identifiers dynamically" to general "be good" rules.


> The problem with advice like these is you will encounter deranged developers that become obsessive about such things and make the code base ugly trying to implement dozens of style rules

That's more of a "deranged developer" problem than a problem with the guidelines themselves. E.g. I think his `getTableName` example is quite sensible, but also one which some dogmatic engineers would flag and code-golf down to the one-liner.


> prefer the variable name 'ii' over 'i'

Vim users don't have this issue, where you can * a variable name you're looking for and that'll enforce word boundaries.

PHP developers also don't have this issue: the $ before is a pretty sure indicator you're working with a variable named i and not a random i in a word or comment somewhere

Y'all should just use proper tools. No newfangled Rust and Netbeans that the kids love these days, the 80s had it all!

(Note this is all in jest :) I do use vim and php but it's obviously not a reason to use a certain language or editor; I just wondered to myself why I don't have this problem and realised there's two reasons.)


> the advice to prefer the variable name 'ii' over 'i' because you can easily search for it

\bi\b is the easy way to search for i.


Those things only make the codebase "ugly" until you learn how to read it.


> This sounds like the advice to prefer the variable name 'ii' over 'i' because you can easily search for it

I've never heard of that advice. I honestly like algebraic names (singular digits) as long as they're well documented in a comment or aliasing another longer-name.

> there are 'YODA Conditions' which make code hard to comprehend which solves an insignificant error that is easily caught with tooling

Yoda conditions [0] are a useful defensive programming technique and does not reduce readability except to someone new to it. I argue it improves readability, particularly for myself.

As for tooling... it doesn't catch every case for every language.

> I loath such advice because it causes the code to become ugly.

Beauty is in the eye of the beholder. While I appreciate your opinion, I also reject it out of hand for professional developers. Instead of deciding whether code is "ugly" perhaps you should decide whether the code is useful. Feel free to keep your pretty code in your personal projects (and show them off so you can highlight how your style really comes together for that one really cool thing you're doing).

> you will encounter deranged developers that become obsessive about such things

I don't like being called deranged but I am definitely obsessed about eliminated whole classes of bugs just by the coding design and style not allowing them to happen. If safe code is "ugly" to you... well then I consider myself to be a better developer than you. I'd rather have ugly code that's easily testable instead of pretty code that's difficult to test in isolation which most developers end up writing.

> Code should look good. Making a piece of text look good for other humans to comprehend I consider to be job #1 or #2 for a good developer.

It depends on the project. Just remember that what looks good to you isn't what looks good to me. So if it's your personal project, then make it look good! If it's something we're both working on... then expect to defend your stylistic choices with numbers and logic instead of arguments about "pretty".

Then, from the article:

> Flat is better than nested

If I'm searching for something in JSON I'm going to use jq [1] instead of grep. Use the right tools for the right job after all. I definitely prefer much richer structured data instead of a flat list of key-value pairs.

[0] https://en.wikipedia.org/wiki/Yoda_conditions

[1] https://en.wikipedia.org/wiki/Jq_(programming_language)


One other thing I'd like to add is greppable comments! In the same vein as TODO and FIXME, I use hashtags in comments to drop hints to future me reading the code. #learning is a universal one:

// #learning: transparent color using color.new(color.white, 100). This is GREAT for hiding plot() lines during inapplicable periods (such as when no trade is on)

But project-specific hashtags are quite useful, too.

// #60within600: bunch API calls to not hit the 60 calls within 10 minutes limit

// This memoizes fn call results to prevent #60within600

The hashtagging was inspired long ago by del.icio.us, if you remember that. https://en.wikipedia.org/wiki/Delicious_(website)


Seems like you're trying to implement a ticketing system in your code to me.

If you need to prevent 60 within 600, write a test.


I'm documenting the (scattered) implementation changes related to the feature / bug-fix, in both code and test cases, using the hashtag. You could just use the JIRA ticket number, too, but it's a bit dry.


astgrep is a very useful tool when grep fails: https://ast-grep.github.io/

It's not as easy to use as grep but I think one can script it to be nearly so. It has huge power but without learning it all one can do searches that grep finds difficult. e.g. finding all the locations where a method is called and showing the parameters even if they are on multiple lines. You can then use rewrite rules to do CLI code refactoring.

I think it also has potential in a build toolchain e.g. to look for patterns you want to discourage as a pre-commit hook.

ultragrep - https://github.com/zendesk/ultragrep - I don't love this quite as much but it does have a way to build indexes so you can do fast greps across a big codebase. It also has a text mode UI if you want it and I find that almost worthwhile.

I use ripgrep most of the time but while I like it, there is a limit to how many grep tools I can remember and I should probably cut down to using ultragrep and astgrep.

plain gnu grep itself is something one has to know when one is on an unfamiliar machine.


I wonder - why isn't this talked about more? We have had tens of thousands of software companies, each with probably a dozen people focused on hyperoptimizing everything. Why hasn't this point been talked about more on the internet to the point where it's obvious today? And it's not specifically about this, it's more in general. Do people just learn this on their own, and not say anything? Or is the discussion related to this topic buried in some old forum somewhere?


It's talked about, just in the opposite direction.

I've left hardcoded strings (think Kafka event type names) in my source for this very reason, but after a round of code review they get squirreled away as constants in separate files because string repetition is bad or something.


Without constants, it's too easy to let a typo sneak in or have inconvenience later replacing one "event" but not replacing an unrelated "event". I'll only do it if the string is used two times at most, but usually I'll make a constant the first time and it doesn't feel like any loss.


Yes, this is exactly what I was fighting against.

If I have three classes that interact with "MyTable", then I can grep for places that interact with "MyTable" and I get back three classes.

After refactoring, the class which now knows about "MyTable" is Constants.java, which has no business knowing about "MyTable". Grepping it now turns up a false-positive and finds 0 of the actual usage sites (3 false-negatives).


It's not exactly a false positive. It's just a level of indirection, 1 more search by the constant name to find usages. What you sacrifice there you gain by having the compiler help find typos and the IDE help with autocompletion.


Sure, but now you have the string constant as a symbol, which you can either grep for (in which case you're delayed by one search, not the end of the world if you were going to unwind callstacks anyway) or, if you have an LSP, you can jump directly from it to users...


`Constants.java` is a massive code-smell (which I have in many projects, but it's still a smell).

The file name is awful.

At worst it should be 'DbConstants' but probably they should be defined elsewhere.


One situation that comes to mind is configuration of applications on containers using environment variables.

It's extremely valuable to be able to just `grep -r PREFIX_` on a codebase and be able to visualize all possible configuration values for that application.

This is encouraged by some frameworks like Django, where you are expected to list all the configuration values in a `settings` module, but is not standard for `viper`, `click` and `pydantic-settings`, which try to be too smart and auto-generate the variable names for you. It's one of these cases where "modern" frameworks and applications try to save a minuscule amount of work by automating some task, but end up reducing the maintainability of the code over time.


This is a great point. One of my pet peeves is seeing an error in the logs which I cannot find in the code for various reasons. Sometimes the error message is constructed in a complicated way with variables concatenated together or the error message is extremely generic and I get matches in 100 different places.

I'm an advocate for the idea that any aspect of a system which communicates either with end users or with sysadmins should be given high exposure in the code base. Typically, this means constructing abstractions in such a way that higher-level business logic and log messages are easily traceable from a single file. I make it so that the business layer sits above all other layers, as close to the program's entry point as possible.


There are of course cases of dynamic data in every language (The table name is an apt example) but usually when I look in code I just expect to be able to follow definitions. If the language doesn't reliably allow me to find "usages of this type" without risking finding another type with the exact same name then I'm already starting up my static type system compiler for the rewrite.

There are exceptions of course: when searching git logs, comments etc doesn't help what the language or IDE does.

And when searching for an unknown symbol (type, function, variable) you don't know the name of, but you know _should_ look like "DogOrder" or "OrderDog" is a common task too. In this case I'd probably search for " Dog.Order\(" or " Order.Dog\(" if I'm looking for a function. The language trait that enabled it is that method names are Pascal Case and always have an opening ( at the end. But my IDE at least lets me search for members (variables, functions) separate from type names. There should be an index in the IDE though that lets you query this data. E.g. looking for types starting with foo could be done with search t:Foo, instead of having to grep for "(struct|class) Foo" or similar. Tooling is the key.


The author uses JavaScript and Python as examples. So I presume they have (most?) experience with dynamic languages.

In static languages, greppability is hardly as much as a factor. Especially with the availability of LSPs and other such tools nowadays.

When I write rust, or Java, I hardly grep, I "go to usages" or "go to definition", "rename symbol" and so on. Similar, but not to that extent, with typescript. But when coding in Javascript, Ruby or Python, no matter how fancy or language-focused an IDE is, I'll be grepping a lot. Decades of Ruby and Rails "black magic" taught me to grep for partial patterns like the author shows, too. Or to just run the code-path entirely (through tests) because the table-definition of the database will change the available methods and behaviour of the code. Yes. I know.

An LSP (or linter, or checker) can only do so much when the available code, methods, classes, behaviour can be changed or added at runtime.


Then again, in languages like Java that go all-in on object orientation and dynamic dispatch, "go to definition" can become a real chore in sufficiently large codebases. I must have wasted hours of my life trying to find which class is the one implementing some interface or abstract method at a certain point. Bonus points if the implementing class changes based on ordinary runtime conditions.


I'm happy to use dynamic languages occasionally too (Bash, Javascript, Python, ..) but I have a rule of thumb that says if I can't see the entire codebase on one screen, then it's too large for dynamic.


Would be great if the wider industry shared that view


This also applies to dependency injection. While it has significant benefits, it hurts clarity of the code. It becomes more difficult to see where each object is coming from.


Magical dependency injection frameworks, that is.

Plain old putting-dependencies-in-the-constructor-instead-of-newing-them is great.

If you 'wire' it yourself, you see the top-level structure of the project in main, e.g.

  cache          <- createCache "./cache"
  workQueue      <- createWorkQueue parallelism
  projectFinder  <- createProjectFinder basePath
  gradleBuilder  <- createGradleBuilder cache
  normaliser     <- createNormaliser
  gradleParser   <- createGradleParser normaliser
  relationFinder <- createRelationFinder cache normaliser
At a glance I can see what uses normaliser, and what is used by normaliser.


We've ditched "magical dependency injection frameworks" for manual injections in constructors and the net result is that development is actually faster. Yes, you write slightly more code, but at the same time, it's much easier to figure out what's going on. Before, if something went wrong, we would get some arcane errors from deep inside the DI framework and had to decypher what it means and what changes we need to apply to the configs. Now, it's just regular code (great with refactorings and greppability, too).


A lot of this reads like code search tools could and should be a lot better. They probably will be with AI finding its way into everything. In the old days, people would Hungarian prefix types, but now the IDE mitigates that with color codes.


Do you have some ideas for how to make code search better?

Right now, code search is basically just text search. If you think code search tools “could and should” be a lot better, what kind of improvements are you thinking about? How would those improvements work?


Not OP, but we wouldn't need to worry so much about picking out distinct greppable names if (big if) there were tools that parsed the code to draw out concepts for us, ex:

1. The popular "Find Usages" which varies widely in accuracy and reliability by language, IDE, and codebase meta-quirks.

2. Tools that show Callee/Caller trees, and sometimes possible data-flows between variables.

3. DSLs to search hierarchies, like how XPath lets you find XML elements based on nesting, rather than relying on a distinctly greppable single tag-name for the leaf you're interested in. (e.g. `<Product><Name>` vs `<ProductName>`)

When things go well, the actual variable name no longer needs to restate certain aspects and relationships that can instead be found through metadata.

For example, `GiftCard.purchaser_customer_uuid` is nicely greppable, but you could relax that to `GiftCard.purchaser` if it had a static type of `UUID<Customer>`. Or perhaps you could go to the `Customer.uuid` definition and say "Show me all variables that can populate or be-populated-by this one, up to X steps out, and excluding ones that are function scoped."

That said, I do advocate for "greppability" as a general practice, since I seldom trust that languages, tools, or institutions will come together in a way that makes it unnecessary.


I guess I wasn’t thinking of “find usages”, but as the article points out, it’s hard to find usages if the usages are dynamic.

The solution—to write code which is less dynamic—helps code search and features like find usages.


Regarding your third point, I put together a tool capable of that to some degree.

It allows you to grep inside source code, but limit the search to e.g. “only docstrings inside class definitions”, among other things. That is, it allows nesting and is syntax aware. That example is for Python, but the tool speaks more languages (thanks to treesitter).

https://github.com/alexpovel/srgn/blob/main/README.md#multip...


> Right now, code search is basically just text search.

We have lots of code search that is much more syntax-aware than just text search, but it tends to be behind very limited UI, because we have all the tech to do much better code search, but no one has come up with a generally-usable UI for it, so we just have very specific instances -- like "go to definition", "find references" , etc.

That takes all the same technological bits that would be need for, say, "find all definitions of functions visible in the current scope whose name starts with 'ban'" or "find all definitions of int8 constants visible in the current scope"...but what's the UI that makes that kind of searching outside of the kind of special cases now behind their own IDE menu items usable?


Vector embeddings.


Unless you have syntax-aware grep support, I don't see how searching nested key json could be better. But grep is the default installed. Not to mention ad-hoc languages that does not have any IDE support.


If you put a lot of arbitrary constraints to not allow it to be better, sure. Enjoy.


There is no conflict between improving tools and learning how to express your code in such a way that as many tools as possible work better OOTB.


gron makes nested JSON greppable https://github.com/tomnomnom/gron


`gron` is so underrated. Usually when I try to show people who useful it is they don't seem to understand how powerful it is. One common use is showing how to customize only one part of a helm chart by checking values of an already installed chart:

    $ helm get values -n $NS $DEPLOYMENT -o json | gron | grep resources | gron -u | json-to-yaml.py
    elasticsearch:
      client:
        resources:
          limits:
            cpu: 3
            memory: 4Gi
          requests:
            cpu: 1
            memory: 2Gi
      data:
        resources:
          limits:
            cpu: 6
            memory: 6Gi
          requests:
            cpu: 200m
            memory: 2Gi
    fluentd:
      resources:
        limits:
          memory: 768Mi
        requests:
          memory: 384Mi
That snip could be provided to another team or a customer as a yaml file that could be included with `helm upgrade -f whatever.yaml`. This is soooo much easier than digging that limited set of data out of the much more detailed data.


> ad-hoc languages

This is self-inflicted.


Code grepping at build time can be useful.

Grepping at at runtime, if you can call it that, is also very powerful. If you have a binary, either your company or a third party one, but don't have the source code easily available, I have used the `strings` program from GNU binutils which shows tokens in binary code, e.g. hardcoded URLs, credentials and so on. It can also be useful for analyzing certain things in memory.


Especially in untyped languages, working with an old or unfamiliar codebase, sometimes the only way to know "was anything else using this code" is just to search for the name of a function or whatever.


I'm ardently in favor of making code human readable as practically as possible. Personally it follows on my personal Rule #1: Be kind (i.e., in this case, to others and your future self.)

However, searchable =/= greppable.

> Flat is better than nested

Context matters but, generally speaking, I would say that flatter is anti-grep.


Conceptually it is akin to having file names that sort well.

Grep is a simple tool, not too different from a simple string sort. It is better than no tool, but is it better than a tool that understands the notation? A strong side of grep is that it is universal and is not tied to a particular notation. Yet if you could easily define a specific notation and have a tool to immediately understand it, would you still prefer grep?

We tend to organize the code according to the tools we have. E.g. if a tool gives us a list of entities in alphabetic order, we will try to name the entities so that they form “logical” groups. This may pass as a local organizational principle and may be useful but it is always intimately coupled with the underlying tool.


Some good recommendations in the article.

Greppability is also helpful when you start scripting your editor. Vim has `includeexpr` and co. to implement some "intelligence" when trying to find declarations etc. This enabled me to write a couple line snippet that immediately could resolve Bazel starlark symbols even in "imported" (`load()`) files. At one point I realized I have better code navigation than any of my colleagues using IDEs.

This, and tools like ripgrep really help a lot. This is something that VS Code developers also realized when indlcluded ripgrep itself as their "backend" of searching in files.


I read parts of the Linux kernel source code pretty often, and getting the definition of a function is often pretty involved:

- I don't always know the return code type, as the calling code assigned a field whose definition I don't know to find either

- I don't know if it's a C function or a preprocessor macro

This often results in me searching for the exact function name, and combing through the uses in the drivers. You then need to re-start all that recursively to understand the function you just read.

I could use clangd for that, but I don't have the ressources on my laptop to compile a kernel


You might find this site useful: https://elixir.bootlin.com


Why not simply hold Ctrl and click on the name of the function?


> I don't have the resources on my laptop to compile a kernel


ctags?


The example for "Don't split up identifiers" is impractical when you need to construct a string on the fly. Author's example might work for the simple case of two potential values, but as soon as addressType can take on more than a trivial number of values, the code becomes very non-DRY.

Also, the example completely glosses over the egregious use of magic constants. Wouldn't it be better to declare addressType an enum?

The example for "Use the same names for things across the stack" only works if you control the whole stack. Sometimes you have to get fetch data from a 3rd party. And even if you do own the whole stack, sometimes it's nice to be able to search for 'streetName' when you're looking for the JS symbol, and 'street_name' when you're looking for the DB column. So I don't think this rule is always beneficial.

At the end of the day, if you really want to make your code grepable, add some verbose comments that include useful keywords. (I've found it useful to include keywords that aren't even in the code but which devs may search for.) This lets you grep regardless of how the code is written or formatted.


Absolutely.

It's what I like about Rails when it comes to file names too. Having controllers/users_controller.rb as a path might sound wasteful because "you're already in the controllers directory, you don't need _controllers in the path".

But when you want to fuzzy find that file, it's really nice to type "users con" and get that file instead of also picking up views, models and other user related files with just a "users" search.


I work on a project where people decided to refer to translations by doing the equivalent of:

  :label="$translate(getProductSectionLabel('title'))"
where the logic is a bit like:

  const getProductSectionLabel = (code) => `myapp.sales.sections.products.${code}`
and then the actual values are in a nested structure, like:

  myapp: {
    sales: {
      sections: {
       products: {
         title: "Products"
         ...
       }
       ...
     }
     ...
    }
    ...
  }
People seem to have gone for that because writing that first part is simpler within the component, but I couldn't get across that this makes the codebase harder to navigate.

Meanwhile, my personal codebases are more like:

  :label"$translate('myapp-sales-products-title')"
and the translation file also has the equivalent of:

  myapp-sales-products-title: "Products"
which is way simpler at the expense of some more duplication (easily mitigated by compressing the translations).


I'm a big proponent of visual scripting (where it makes sense) but you really do miss the text-based tooling like grep.

One trade-off you can make is using text-based serialization so you're at least able to grep the yaml or JSON or whatever and get to the right file at least. This of course costs you some editor load time.

On the flip side you're basically always using an IDE to edit the visual script. In theory symantic search should be possible and built in although reality usually falls short.

Someone in a previous HN thread mentioned the idea of a standard graph syntax. Something that game engines and tools could store their graph-based assets in. If there was a standard syntax then standard tools could be made and we could end up seeing something like a graph grep. One one could imagine a visual studio graph editor type app with plug-in support. Even a standard merge tool would be a huge step up for non-text-based code assets.

A man can dream!


There is a standard graph syntax: Graphviz DOT notation.


I'm not familiar with this but it seems more geared towards visually representing flow charts but it doesn't have the necessary verbosity for visual scripting. I might be wrong but I think this is only part of the puzzle. Can rules be defined such that, for example, a node port must be pointed to, or that a default value is available?

Besides the declaration of a graph, there also needs to be a way to define semantics, ie something akin to a Language Server Protocol. That way tooling can enforce validity while editing a graph.


DOT is not a schema language, no.

It sounds like you're trying to make something new, in which case: I'd forego standardisation concerns for now, and focus on getting something that works, and where future prototyping can be backwards-compatible.


A small, but underappreciated benefit of grammar changes like from the form `mytype myfun()` to `keyword myfun() sigil mytype`


Nice article. Two notes:

First, some of these suggestions will make it harder to introduce bugs when updating the code. That's good! Particularly tricky is when somebody splits up identifiers or function names. These types of things often occur at boundaries (calls between servers or to the db) which can make them tricky to test. Even if all your identifier combining is initially done in a single file, it's easy for someone to see the final shape of the identifier and accidentally hard code it somewhere else.

Second, In the spirit of Titus Winters' "software engineering is programming over time", a codebase should be greppable over time.

That means that if you rename a function, you might consider saving the old name of the function in a comment.


those who said using an IDE is better for tasks like the article describes, I would say, not really

I have experienced a number of cases where IDEs were opened but they were useless, and I had to grep+find+vim+tmux for everything in front of ex-colleagues who didn't know better

1. a python codebase written for web scraping, this project can single-handedly be used to teach people how to not write python

2. production service in linux bare-metal servers where nginx + lua files were used for load balancing, backend written in go a few years ago, and there were php files as well

3. php and typescripts large codebases, debuggers were used as well, and no, no one could reason with the problematic areas, particularly on multiple inheritance diamond problems. I was never contented with my successful attempts to adding code in them

I don't know how useful IDEs can be in these cases, but no one other than me in those ex-companies could handle them. I even lent my laptop with IDE and full environment to some supposedly-very-senior colleagues, they could only avoid the problems at best

edit: and for a project like chrome, [Fixing a bug in Google Chrome as a first-time contributor](https://news.ycombinator.com/item?id=41355303)


Greppability adn debugability are two things that I look for in code reviews. If you ask, "How would you debug that?" and the answer stats with, "I'd rewite it to..." Maybe, just maybe you should write it that way.


Try to keep code reference indirections to a tasteful minimum. If a split is needed that’s one more indirection for whoever will be maintaining it in the future. This weight needs to be on the table.

Keeping things referentially transparent helps a lot here.


Along similar lines: I highly recommend making every metric and log in your system spelled out completely somewhere.

    Don't:
      base      = "abc"
      something = base + ".some.suffix"
    Do:
      something = "abc.some.suffix"
I've also had some luck with hard-coded UUIDs at call sites, e.g.:

    log.Info("something", "callsite", "DECAFBAD-000...")
because it makes it absolutely trivial to find a log, and unlike caller-lines (which are great! use them too!) it doesn't change when you refactor code.


As an avid grepper, I disagree with most of these specific recommendations. Use a tool that actual understands references. Don't make the code harder to read for humans just to please grep.

As for identifiers, use 'foo.?bar' case-insensitively.


Which of the examples are harder to read for humans?


The "Flat is better than nested" ones, particularly the second there.


For the React components? I prefer the flat, curiously.

For the JSON keys, I agree somewhat, although it would be easy to compensate by vertically aligning the values.


Related—“Too DRY - The Grep Test” by Jamie Wong: http://jamie-wong.com/2013/07/12/grep-test/


There's a new kind of language where the practice is to use whitespace, and only whitespace, as your syntax. Newlines separate blocks and spaces separate words.

One of the unexpected extremely powerful things this allows is finding function usage extremely easy in any text editor that supports regex. You just search for ^[functionName] . Since you know that function pretty much will only be used at the beginning of lines. You can thus make edits against the AST with regexes and without parsing the AST at all.

It's pretty amazing, and leads to quite faster development, and allows one to tackle bigger and more complex problems.


Example of such a language?


The ability to search the code base is one reason I insist on correct spelling even in comments when doing code reviews, and also keep adding to my IDE's dictionary so it will catch my spelling errors.


CSS preprocessors are often used in a way that hurts "greppability". For example, SCSS developers often break identifiers like

    .product {
        &-promo {
I always felt it was a bad practice but nevertheless it is often used. Glad to find out that I am not the only one who noticed this.

Of course, one might say that I should have used a specialized tool for this. But it would be more convenient if I could search anything with text search. Also, there is no tool that allows to search over identifiers that are calculated in runtime in a program.


This reminds me of the good practices and guidelines in coding when I was learning to code, which also includes "proper naming" so you can easily find what you are looking for throughout the codebase.


Me too.

But that's also what makes me uncomfortable when reading this article. Proper Naming is truly an "art" of balancing trade-offs.

It takes domain expertise (Ubiquitous language), understanding of the users of the code (other devs, not end-users), and a lifetime of coding f*ups where naming something wrong turned out painful to balance these.

The author gives a nice example of a dynamic table naming. But their refactoring didn't keep the behaviour the same (the else/catch). So it's hard to argue the first is better. And in this case, even without the else/catch, I'd say the latter is better. But there will be cases where greppability is to balanced with readability, testability or refactorability. And in these cases, for me, greppability comes last.


If you really do want your code to be searchable, here’s a couple of practices I’ve adopted:

1) Eliminate spelling mistakes. Eliminate alternative spellings. UK vs US English? Pick a side and stick to it.

2) Eliminate contractions. Or keep a very short list of allowable ones (We permit “info” for instance.)

The point of this is to increase the predictability of the names you use. If you’ve got “tradeable” and “tradable” in your code base, search for it is going to be a pain. You can supplement these rules with common coding standards like “We call these things providers.” but just getting the spelling consistent is huge.


This is why I always recommend avoiding kebab-case as much as possible. You'll eventually need to convert it to snake_case and now you have broken grep. (Nobody is going to remember to use a regex every time.)


Greppability is the reason why I feel it's really quite unfortunate that HTML's dataset attribute went with "canonicalizing" to camel case; if you're using any non-camel-case data attribute names in your project, then they immediately become not fully greppable when making use of `dataset`.

https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement...


Yep. When I was designing https://github.com/dagworks-inc/hamilton part of the idea was to make it easy to understand what and where. That is, enable one to grep for function definitions and their downstream use easily, and where people can't screw this up. You'd be surprised how easy it is to make a code base where grep doesn't help you all that much (at least in the python data transform world) ...


sure. the reason i put a line break between return type and function name in c-likes is `grep ^fname`. but i seriously wish greppability wasn't important. the extensive line-orientedness of unix tools really puts a damper on the whole hose-of-bytes concept, and it's no wonder by the time of plan 9, there was a strong desire to do away with it—cf. "structural regular expressions", as deployed in sam(1), which, of all the places to put them, certainly has historical irony, as sam's (decidedly not line-oriented) editing language nonetheless descends from ed, the definitive line editor, and gave us such hits as "stream ed" and "simulate typing `g/regex/p` into ed".

just the other week i noticed a change in recommended formatting style in a project i contribute to regularly, and the result was source files got about 20% taller, 20% more of a pain in the ass to edit without some sort of syntax folding. the rationale? diff. making you reach for a syntax-aware editor to compensate for a deficiency in the syntax-awareness of a version control frontend is certainly a choice.

the business end of git as seen by most programmers is in fact diff city, sure, but deep down git is a bunch of snapshots. even deltas behave nothing like diffs. pull up the spec for the pack format and look for the word "line". you will not find it.

things could be so much better, but for now we live in a world where the headline is true.


Heh, this was very much the design philosophy behind Hamilton (github.com/dagworks-inc/hamilton).

The basic idea was that if you have a data artifact (columns for dataframes initially), you should be able to ctrl-f and find it in your codebase. 1:1 mapping of data -> function.

People take a long time to figure out that the readability gains from having greppability is worth whatever verbosity that comes, largely because they think of code too much as a craft (make it as small/neat as possible) and not documentation for a live process...


Just spent an hour trying to figure out how a Hugo theme was picking up a shortcode definition. Grep did not help.

Turned out the shortcode name is based on the file name rather than file contents.


The examples are pretty silly, especially the first one.

If you know that you need either a shipping or billing address and the user has specified which one they need, just query based on that.

There’s no need to introduce a function (getTableName) to detemplate a string or match on a case.

Instead just create a function that gets the item you want from the DB and has the table name as input.

On your UI make sure when the users specifies billing or shipping address the correct parameter is passed to the API.


Funny, I started work on a legacy code base a couple months ago and yes, it has all the problems described in the article and that hinders our understanding of it.


> No AI tooling was used in the creation of this article.

That was refreshing.


It just doesn't have that genuine artisan smell to it when someone uses ̶a̶ ̶p̶r̶i̶n̶t̶i̶n̶g̶ ̶p̶r̶e̶s̶s̶ a̶ ̶c̶o̶m̶p̶u̶t̶e̶r̶ ̶w̶i̶t̶h̶ ̶a̶u̶t̶o̶m̶a̶t̶i̶c̶ ̶t̶y̶p̶e̶s̶e̶t̶t̶i̶n̶g̶ s̶p̶e̶l̶l̶c̶h̶e̶c̶k̶ W̶i̶k̶i̶p̶e̶d̶i̶a̶ AI to help write their article.


Setting a variable by split identifier is surprisingly common in CMake (because functions can't return a value):

> set(${VAR}_VERSION ${VERSION})

This is the main reason I don't like CMake.


I don't know how to validate this but this seems to be a specific case of "avoiding magic" where there's a lot of dynamically generated variables and things. Having the static text of the program more or less show its intent helps readability and searchability quite a bit.

I suppose the other extreme is to have a program generator with an input spec and you being left to read through the generated code without access to the input spec.


These are all extremely good suggestions. Especially the flattening bit - yes, it's verbose as hell, but it just makes so much sense whenever you have to deal with the code any time after writing it. Helm charts, please take note, the docs even say that "In most cases, flat should be favored over nested.", yet almost every time I have to deal with a Helm chart, it's a mess of nested structures.


An editor feature I'd like is that as I'm typing an identifier, a hover popup shows me how many other instances appear in my codebase. It should be easy to build a map of identifier->count for instant lookups. I generally know if I want 0 (a new unique identifier), 1, a small number, or a large number. For a few pixels, this would prevent a lot of dumb mistakes and ambiguous names.


I use greppable strings explicitly, like

  requests.get(f'http://a.b.c.d/wol?device={wol_computer}&grep-id=wake-on-lan', timeout=3)
This way I find `grep-id` in the server logs as a reminder of what to grep for, then `grep-id=wake-on-lan` in the entire codebase to find the actual source of the call.

Or I add comments with a grepable token to the code.


I fully understand the point the author is making. However, I am not going to sacrifice good JSON and make it flat just so someone can search for it more easily. With the example they give, it is still readable because it is a simple data structure. But with more complex data their flat structure to me does not make it easy to parse and easier to make mistakes as well.


It's ofter a matter of having the right tool for the job. In your case, https://github.com/tomnomnom/gron might be useful.


Well, I'd say that in the author's case it might be more useful. ;) I never really had the inclination to grep for data like the author does.

I generally work from an IDE anyway, where it is clear that I am working with a value that is part of a JSON object and I can follow it back to the proper structure anyway. In fact, the more I think about it, the more I feel like the article is written for a very specific use case and perspective. Almost to the point where the saying "if all you have is a hammer, everything looks like a nail" is applicable. Where if it doesn't look enough like a nail it should be adjusted to look more like one instead of expanding your toolbox a bit.


Grep is nice, but I would much prefer better tools for searching through code. Something that knows how to parse multiple languages and can infer the types of things. Not to mention indexing, for large code bases, grep'ing through possibly millions of lines of code can be awfully slow.

IDEs do a decent job but are typically lacking compared to the raw power of grep.


I mean, I prefer faster symbol, type, etc. based navigation too, but it doesn't work in all scenarios so grep is an extremely handy fallback.


Not to pick on Black, as it happens with all code formatters, but max line length rules kill greppability.

> Black defaults to 88 characters per line

https://black.readthedocs.io/en/stable/the_black_code_style/...


Greppable commit messages and descriptions are also important, for a similar reason. If you want to learn where a feature exists in the codebase, searching the commits for where it was added is often easier than trying to grep through the codebase to find it. Once you've found the commit or even a nearby commit, it's much easier to find the rest.


I encourage my teams to write logs / output with interpolation with the variables at the end for searchability

For example:

  Added %d users
Vs:

  Added users (%d)

Then it is much easier to track down where things come from without needing wildcards in the search or to care too much about what might be dynamic in cases where its not obvious.


I’ve basically landed on the following form: ‘Short description. [foo={}, bar={}]’

Which will give you grepability and in theory parsability so you can automatically bisect for a value change or something along those lines.


I like it, but it may be painful in languages where long variable names are common.


Indeed. That is basically what the logging library from charm bracelet does.


In my very first job I wrote a spell checker/corrector for code comments. This was specifically to make greppability possible because some of my colleagues were appalling at spelling and it meant that the incredibly detailed comments we used to write were hard to search for key details.


I built the command line tool flatito just for the Rails i18n translations keys.

I am unsure if I like the author's approach because there are other cons, but it's a good point.

* https://github.com/ceritium/flatito


From a true e-commerce story. If you have to somehow deal with product conditions, don't use the condition "new" for new vs. mint, used, damaged products. This has a nightmarish greppability… I'll never do that mistake again.


Totally agree! I wrote this a few years ago: https://blog.cobaltrobotics.com/you-cant-read-code-that-you-...


This can even be as simple as not using multi-line error strings, or expanding variables in them.


This is why I’ve always fought against BEM in CSS. Tends to drive greppability to zero.


I will always remember my professor explaining that greppability is the reason C++ casting operators use a long keyword: static_cast<...> const_cast<...>, etc as you can easily grep for "_cast" or the whole keyword.


Good point. I would refer to another (similar) metric, which could be called "IDE-search-ability): it extends greppability, by adding some more conventions which work well with your (your company's) IDE.


For my purposes, among the most important

Even a major refactor is relatively easy if you can find stuff in your codebase. Even a small bugfix can get complicated if there's a ton of ambiguity


Only read the title but if it's what I guess it is, then I finally met someone who'll understand that I always declare functions in js using the keyword function.


One very simple way to make code less greppable is to use only single leter variables or other short variables that are very likely to be contained in a ton of other words.


Imo this is another big selling point of using tailwinds css. Those log stacks of classes become almost like UUIDs for markup that's discoverable from dev tools.


I just shared this in the work Slack and everyone resoundingly agreed with the sentiment. Definitely going to pay more attention to this now, thanks for sharing!


A good IDE makes up for this if the syntax of the language doesn’t lend itself to easy greppage. I lean heavily on JetBrains search with their editors


I had been bitten by ruby's metaprogramming on this.


All of the apostrophes in this article are wrong. The correct character is ’, but this article uses ‘ (open single quote) throughout.


Thanks, fixed.


This is why I try to avoid code reflection in cases where there's an alternative, even when that alternative is less elegant


Non of these matters if 99% companies use managers who don’t write code to evaluate software engineers.


Python people should think twice before implementing a `__call__` method if they want to improve greppability.


I use grep and git grep all the time.

This post is very welcome, it sums up my own ideas about grep in a better way.


Code that is not written for others in the future may have a limited future


This approach along with ack, instead of grep, has been a godsend to me.


This is one of the top reasons why I prefer explicit over implicit.


is there something like an universal "semantic grep" for code? I think rating code based on some (limitation of a) tool, might not be the best way.


Digital marketers have known this for a long time.


In other words: don't try to be clever?


Observation:

The more broad understanding here is that in computer programming there is a difference between a symbol and a referent (https://open.maricopa.edu/com110/chapter/3-1-language-and-me....) AKA variable and its value.

(Phrased colloquially: "The finger pointing at the Moon is not the Moon.")

Information about a given computer program may be found in its symbols (variable names) and its referents (values that variables are set to).

The referents may be set in source code (i.e. a string or other constant) and be visable/greppable at design-time -- and/or they may be set at runtime (and be non-greppable!)

The article is good; I have nothing against it, it makes an excellent point.

But if we look at what I've outlined above as three possible places for information (source symbol, source referent, runtime referent), then we observe that one of these possibilities is not covered by grep -- that is, runtime referent.

This leads to the question:

"What if grep could be used to search for runtime referents, aka runtime values of different variables?"

Well, in traditional compiled languages, we can't do that without significant monkeying with the language...

In interpreted and Lisp-like languages, yes, the above is possible -- but without being able to be very specific about the what/how/when of the runtime strings (or more broadly, values) to be grepped, doing the above could generate huge amounts of data and speed degradation, particularly if a value is set and reset multiple times in a long loop!

But could it be done? Definitely! Efficiently? Well, that's one of the questions! Deterministically? If random values are in play, probably not. With full code coverage so that values can be known from code paths not taken can be known? That scenario would be challenging to say the least.

Point is, it may be an interesting future language feature to consider in various scenarios, for future language designers...

"The ability to grep through all runtime referents, aka all runtime values, of all variables, for a given runtime of a program..."

Hmmm... now that I think about it, what if a language was created such that you could pass it a special flag, if you passed it this special flag, then variables would maintain their histories in memory (at great expense to performance -- this would be for debugging only!), and then you could grep that memory?

Yes, I know... given the pace of technology, there probably is some system or systems that might implement some aspect or aspects of this...

rr comes to mind:

https://rr-project.org/

https://news.ycombinator.com/item?id=31617600

(Maybe the rr maintainers could be persuaded to implement some interface with grep, either pre or post program run, or maybe implment a timer where the user can grep their program for various strings at various time intervals...)

Anyway, good article!


In C, the word referent is used for the thing a pointer points to. That's your main ungreppable run-time referent.

> "The ability to grep through all runtime referents, aka all runtime values, of all variables, for a given runtime of a program..."

It sounds entirely doable in a garbage-collected run-time. We "stop the world" and then run something similar to the marking pass of a full garbage collection cycle, which looks for matches for a pattern in the entire space of reachable objects. The routine could maintain a path from root pointer to object (easily done on the recursion stack) so that it's reported along with matches.


Sounds good!

>"The routine could maintain a path from root pointer to object (easily done on the recursion stack) so that it's reported along with matches."

Yes, it definitely could!

For extra points (if the additional functionality is desired!), also create an API interface for this functionality for coding AI's, present and future, such that they can also perform this form of grepping automatically when needed, for example, when running/debugging a program they wrote, or are assisting a programmer with making changes to...

(Observation: There seem to emerge two different main patterns with respect to variable inspection... One is to set a watchpoint on a variable and stop the program when the variable changes.

Another would be to globally scan all variables (i.e. grep) at specific intervals (a form of a batch pattern) for specific strings/values, and all of that would be/could be -- settable by the programmer and/or an AI...)

Anyway, your idea is a good one!


Objects in heaps can be reached in more than one way. We don't want to traverse any object twice, because we will loop, so we must use the GC marked bit.

Yet! There is a way to report all the paths by which an grepped-out object is reachable.

Say we encounter an interesting object O for the first time, at path A B C. So we have A.B.C.O.

We keep objects A, B and C in a hash table. The hash table represents nodes we have visited, which are on the path to a hit.

Then say we encounter object B again, at path X Y Z B. B is a hit in our hash table, and so we report X.Y.Z.B.C.O as another hit for object O, without descending into B again.


Grep is indeed a critical tool for navigating and understanding an unfamiliar codebase, but greppability should not be a goal unto itself. The article seems to be making that mistake - it's basically advocating improving greppability at the cost of making the codebase even larger, messier, and harder to read: i.e. reinforcing the problem that makes you reach for grep in the first place. It's a false economy. It's asking you to optimize your code for one specific scenario - trying to figure out where an unfamiliar string comes from; but that isn't the most important or most frequent thing people need to do with code anyway.

(If it is for you, congratulations, you're the janitor in the codebase. It sucks, but that's what you're being paid for. Maintenance is a means, not an end.)

In particular, one of the most important and frequent thing you do with code is read it in order to understand it (locally, at the abstraction level of interest), and the advice from this article compromise it badly - almost as if hoping that, on a greppable enough codebase, you could use grep to avoid reading or thinking entirely.

1. Don't split up identifiers

Don't split them up for the sake of splitting, sure. That's not helping anything. But in the example given, there's likely a good reason for it - for example, it codifies the intended coupling between tables. `billing_address` isn't an independent term in this code, nor are the other `_address` table names. There's a naming pattern there, encoded directly in the initial example. The proposed refactor obscures it and triples the amount of code in the process (all of which is low-value noise) and introduces possibility of making errors (typos, copy-paste) of the kind that isn't picked up by compilers (hope you have good tests!).

FWIW, the author's refactor may be eventually required - if and when the naming pattern in the original code no longer holds. But not before then.

2. Use the same names for things across the stack

Excessive data repackaging is bad, but that tends to be a symptom of having too many layers. A good layer has specific semantics that distinguish it from layers above and below it. This may necessitate renaming some thing, in which case even if such renaming is as trivial as in the example, it should be spelled out explicitly; you can't just return Layer 1 Address object instead of Layer 3 Address object, if the two layers mean something different by "Address"; the triviality of the mapping is incidental and may not hold over time. If it really feels trivial, chances are one of the layers is not necessary in the first place, so go fix that.

3. Flat is better than nested

Now that's just screwing with people, especially wrt. nesting namespaces. It's asking to reintroduce the visual noise that the person reading the code will then have to filter out again mentally.

The way I see it, if you grep for some log message or unrolled identifier and can't find it, you're supposed to keep grepping for parts of the string, until you hit a match. You then go look, and it's usually apparent that you're dealing with a compound identifier or an interpolated string - congratulations, you just learned something important about that part of the legacy codebase, which is the real job you're supposed to be doing.


Also import traceability


Sounds a little erotic…


.* is your friend :)


why two 'p's - grep only has one


That’s usually how it’s done with words that end in one consonant when adding a suffix that starts with a vowel, so as not to change the pronunciation of the short vowel in the root word due to english’s rules around long and short vowels. See also map->mapped, bat->batted, tap->tappable etc


English inserts an additional ‘p’ in some cases; for precedent consider “stoppable”, “unflappable”, “skippable”.

See https://english.stackexchange.com/questions/30001/why-is-shi...



Yeah, this is also a benefit of e.g. C identifiers vs C++, where namespace, class, and method/variable can all be listed in separate places, breaking the ability to locate non-unique method/variable names with grep.


if only there was a language where code is data so that... hold on a sec! #LISP-languages




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: