Hacker News new | past | comments | ask | show | jobs | submit login

> Search is a very different story, you wouldn't want to have to do a full directory scan for text based search. So some level of indexing would be useful for a client mail service.

I don't know; my ~/code directory has tons of stuff and searching with ripgrep doesn't seem too slow:

  % time rg HelloWorld | wc -l
  4
  rg HelloWorld > /dev/null  0.13s user 0.12s system 99% cpu 0.251 total

  % time rg string | wc -l
  57813
  rg string > /dev/null  0.20s user 0.14s system 99% cpu 0.339 total

Rough estimate of files that rg will search:

  % scc
  ───────────────────────────────────────────────────────────────────────────────
  Language                     Files       Lines     Blanks    Comments      Code
  …
  Total                        11024     1864982     175565      208777   1480640
  ───────────────────────────────────────────────────────────────────────────────
Finding close to 60k matches in 11k files/1.7M lines in about 0.3 seconds isn't too bad.

It should be said I ran a few commands on that directory before the above results, so there's probably some filesystem caching going on, but I can't be bothered to reboot.

For many (not all, obviously) cases I think you may be able to get away without a index. Most people aren't subscribed to tons of email lists and get maybe a few emails a day at the most.

I'd consider anything below ~3 seconds to be fine for search, so this scales to about 100k files/emails. At 10 emails/day on average that's about a decade. Most people do not get 10 emails/day on average.

And you can even do some "poor man indexing" by just making a new directory every five or ten years. Most of the time you want just emails from the last year or so.




> Most people do not get 10 emails/day on average.

I'd like to see the stats, but I seem to average around > 40 emails a day, (most are unactionable) but always considered my email load quite light. For people like my wife who do much of their work communication over email, it appears to be much higher.


Ran some stats on my mails of the last 4 years, here are the daily characteristics:

N = 24k Min = 1 Max = 211 Median = 11 Avg = 16.023907 Stddev = 18.312062

A lot of them are actually chat messages through DeltaChat so not representative of usual mail activity. When I remove them I get this:

N = 16k Min = 1 Max = 56 Median = 10 Avg = 11.378933 Stddev = 8.1572529


Sorry, I had meant by "stats" anything about the average among users, since I suspect that you might actually be an outlier on the lower-end among people who work professionally in or with technology.


I'm also considering a Server/Service that has a web ui component, where it's shared server resources... yeah, running a search on a local ssd/nvme is crazy fast... now do it when there are 100k other users on that filesystem.


I get about 32 per day:

  notmuch count date:-100d..-1d
  3248
And I have more than 100k emails:

  notmuch count
  267584




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: