
Building a Search Engine for Programmers - vdthatte
Hey HN, I&#x27;ve recently started working on a side-project. It&#x27;s basically a vertical search engine for programmers. You&#x27;ll be able to quickly search through documentation, GitHub repos and stack overflow. It&#x27;ll know what language you&#x27;re using and what project you&#x27;re working on and tailor results accordingly.<p>What other features would you want to see in this tool?
======
phaus
Here's a use case you may or may not be interested in. Security Research.

As a security analyst, when I'm trying to figure out what a malicious script /
executable does, it often involves searching for weird strings I find in files
that seem like they would be pretty unique. Or even specific sets of strings,
even if it isn't very easy.

Google used to be awesome for this. Now it is still the best that I'm aware
of, but it has gotten gradually worse over the years. So basically the best
tool for a job is one I would describe as infuriatingly bad.

I think the problem is that it does too much to try to protect a person from
malicious stuff on the web. It also does too much to guess what you might
actually want instead of giving you what you asked for.

Probably the biggest single thing that Google has done to screw it up is that
it no longer respects quotes. Maybe there's a workaround but I haven't figured
it out. 10-15 years ago if I wrapped something in quotes Google would give me
exactly what I wanted. Now its very finicky and 70% of the time it gives me
what it thinks I want. These guesses are almost always wrong.

That being said, there is also a usefulness to being able to search for a code
snippet or another string and get things that are very similar, even when they
aren't an exact match. I think having multiple modes would be useful.

I think there might be some overlap between what I described above being
useful for security researchers and what might be useful for programmers.

~~~
pyuser583
Doesn’t Bing have much more detailed query options? Or is that gone too?

------
mlthoughts2018
My request would be semantic reverse code lookup within a language and across
languages.

For example if I search something related to numpy mmap in Python, in some
cases only core numpy mmap answers make sense as results. In other situations,
the question is about mmap generally and info from other languages or the core
mmap definition could answer my question.

I don’t want to be checking boxes or toggles, or retrying the query with
different text over & over to get the engine to understand these differences
or when one class of answers is appropriate vs the other.

------
shakkhar
Should be "Ask HN" or "Tell HN".

~~~
vulcan01
"Ask HN" would fit better; they're asking a question about something to HN.

------
Syzygies
Literal search, including punctuation and spaces. Ideally regex searches.
Mainstream engines are case-insensitive-on-steroids. Technical searches are
literal.

~~~
karmakaze
Hound search does this[0]. I provide a hosting for it[1].

[0] [https://github.com/hound-search/hound](https://github.com/hound-
search/hound)

[1] [https://gitgrep.com](https://gitgrep.com)

~~~
vdthatte
oh wow this is amazing, will play around with it!

------
hyperpape
Tools surrounding search history. Things like selecting results as your
personal answers to a query, trying to determine when you're asking something
you've asked before.

This is based on something Hillel Wayne wrote (@hillelogram) wrote on twitter
not that long ago. The gist of it was it's ok that we're all using Google as
part of our programming workflows, but why on Earth should we ever need to ask
the same question twice? If you can find those threads, there might be more
there than what I just said.

Obviously privacy is a concern here, but while I'm leery of Google knowing
everything I do, I'm a lot happier with a technical search engine knowing
which pieces of syntax I can't remember.

~~~
mlthoughts2018
> “ why on Earth should we ever need to ask the same question twice?”

Isn’t it simple? I don’t want another entity to have data about my personal
curated search results.

It’s fine for me if Google sees some essentially anonymous query for “apple”
from me, and sees what I selected, and learns in aggregate what searches for
“apple” means, maybe even with extra context that helps distinguish different
modes in the results.

It starts being a huge problem if the next time I visit, Google first detects
it’s me, looks up a special index of my selected results, and modifies the
results to my query to be those.

~~~
quickthrower2
That horse has bolted

------
gitgud
\- It would be good to detect the sydtem you're using instead of writing "
_Ubuntu 19.04 64bit_ " before queries.

\- Would be even cooler to detect the IDE, or even the error message itself
(of course be careful not to leak sensitive information)

------
O_H_E
I sometimes find that obscure blog posts often have more comprehensive answers
to very niche problems than SO.

Also, the market definitely exists. A lot of time, if I am not sure how to
formulate my question yet, google really sucks. It also regularly fails to
find projects/tricks that I know exists and am able to find through my GitHub
stars or browser history after some tedious browsing.

~~~
vdthatte
I feel this! I use GitHub stars more than I admit to, especially when it comes
to unique libraries that I'll never find via google.

------
miccah
If including code snippets, it would be great to easily export it to an online
playground or other sandbox.

Often times I find an example or documentation, and I copy it to a playground
to tweak it / experiment with whatever feature I am implementing.

~~~
vdthatte
interesting! something like JSFiddle but more integrated to your project?

------
claxo
Besides date ranking, which has been requested, maybe ranking code samples and
GH issues on some proxy for code quality? Meaning, I would prefer to see a
snippet of django or numpy before some aadwark repo

------
riedel
Best possible preview snippets would be essential when looking for trivial
stuff : cant remember exactly how to do sth but knew it once (google does that
for the best SO match i think)

Exception messages could be an important thing to focus on. That is the second
thing i when search engines matter to me often (support fuzzy search here:
abstract away the too concrete stuff but keep the actual message).

It would be great if you would understand versioning of documenations: it
always takes me a while to understand if the docs apply to the version i am
actually using.

~~~
vdthatte
exception messages are a pretty solid starting point, totally agree.

searching the most relevant versions are painful! I spend a lot of time in
Swift where Xcode automatically suggests the newer version/syntax but wish VS
code & others did this too.

------
wizzerking
Sort by Date Freshest->Oldest and vice Veersa Patterns ??

~~~
vdthatte
oh yeah absolutely. Sorting via language or framework version would be helpful
too!

~~~
hyperpape
Google and DuckDuckGo both seem pretty bad at surfacing more recent version of
source code and documentation, so there's a lot of room for improvement there.

------
andersco
I would want to be able to right click on an error in my IDE (I use vscode)
and then run a search on that (filtered for the current language, env etc)

~~~
vdthatte
wow totally! Xcode kind of does this but this would save so much time if it
worked properly!

------
sneeuwpopsneeuw
Some information is very hard to find. So on my local machine i have many
books about c and c++ I translate those using what shell command to readable
text so that i can search it. So maybe you can help with the trend that some
programmer seam to notice that certain information is disparaging.

So a combination of easy search of the wayback machine or a search in all
online books.

~~~
vdthatte
wow! never considered books as a source of information for this. This could be
super useful, something that google doesn't currently do.

------
ewired
A VS Code extension that contributes a command to search it which opens the
search results in the right pane. I assume you may already have a VS Code
extension in the works, as that would be the best way to find out what
project/language is being worked on.

~~~
vdthatte
yep my solution was more signin with GitHub and choose your current project
but VS code extension is a no-brainer!

------
andersco
I would want google searches that have been filtered for the current language,
tools etc.

~~~
vdthatte
yeah! the search engine should know what project you're working on. Seems like
such a pretty basic thing that hasn't been built yet haha.

------
vdthatte
I think programmers spend a lot of time searching for solutions online and
making that process more intuitive will be a huge win for everybody haha.

------
maps7
I could see this as being very useful. If you're looking for help let me know
- I would like to spend time on something like this.

~~~
vdthatte
will do! what's your twitter?

~~~
maps7
@dev_ste

------
asicsp
See also quickref.dev [0] [1]

[0]
[https://news.ycombinator.com/item?id=23263918](https://news.ycombinator.com/item?id=23263918)

[1]
[https://lobste.rs/s/dji0it/experimental_search_engine_for](https://lobste.rs/s/dji0it/experimental_search_engine_for)

