Sort of related, this type of thing isn't difficult if you use Thunderbird. All your email is stored in a big sqlite file. You can query with all the power of SQL.
A few things I've done: use it to ignore my email except for a small number of senders while I'm working, viewing only starred messages the majority of the day, and copying email threads as html documents inside a project folder. There's something nice about being able to look at an "email inbox" that's a plain html page with six messages.
I haven't written anything up (maybe I should) but it's easy enough.
In Thunderbird, go to Help -> Troubleshooting Information -> Profile Directory.
That opens your profile directory. In there is a file global-messages-db.sqlite that holds all your messages. Treat that as read-only. If it does get messed up for some reason, you can delete it. It's an index that will get rebuilt.
If you close Thunderbird, you can open the db in sqlitebrowser to view the tables. Some tables I've used for queries:
folderLocations - tells you the id of each email folder
messages - metadata for each message
messagesText_content - subject, recipients, body of each email message
conversations - info about your email threads that can be matched with data in messages to retrieve full email threads
Oh that's a cool idea for my work mail - gotta try that.
For personal email though I wanted this to work regardless of the client - got Thunderbird on my personal laptop, a webmail on my work computer and k9 on my phone ; so just client side wasn't enough
Are you talking about "global-messages-db.sqlite" ? From cursory googling, that's only an index.[1] The default storage for Thunderbird is mbox format not SQLite.
When I experiment with some SQL queries on "global-messages-db.sqlite", I notice that many of my gmail emails are not in that index cache.
Though that won't give you full content, in case your intended logic needs that: is the fact that it's an index a problem? That seems ideal, for performance reasons.
Purely because I'm curious, as a user who might end up doing this: are those messages perhaps not synced locally? And/or if you delete the cache and rebuild, do they appear? Or are there just some kinds of messages that intentionally don't appear in that index?
I use sanebox.com for approximately this behavior, and I'm happy with their service.
Sanebox works by logging into my mail server, not replacing it. I have three "send" addresses, and I've learned the hard way that I must authenticate through the SMTP server that goes with the address I'm claiming, or risk being classified as spam by my recipients' email servers. Of course, various email programs allow one to assert any return address one likes. It is a mistake to use this feature. Particularly if one is more tech savvy than one's recipients, it is all too easy to blame them when your mail goes to spam. No, it's your fault. I was that idiot.
Similarly, Qualtrics will send out surveys for colleges and businesses, but won't use the business domain without extra steps. Those surveys don't get read. My college was making this mistake, till I pointed it out. Qualtrics has clear instructions on fixing this.
So Hey is clearly not ready for market, if it can't support its users correctly using business domains for business matters. This isn't a matter of waiting till one's employer signs on with Hey; anyone using Hey who is employed is going to hit this issue.
I have been looking into this type of system with a colleague for a few years now. We have been working on a concept of an attention based email network which is essentially a way to map our natural limited attention to the digital world.
This type of system requires a mindset that involves multiple parties, not just the receiver, so it becomes a non-trival exercise when considering the attention of multiple parties using multiple email systems.
In a practical sense, we looked at adding a multi party reputation system to the email standard which would involve adding meta data to emails to indicate the "trust" level between the recipients. There are clearly quite a number of technical challenges for this type of system. Fastmail seems to be doing some interesting work in this space.
This looks interesting.
How does it handle spoofed email addresses?
A feature I'd like along with this (although I realise this breaks the paradigm of only accepting email you actively want) would be to let unwanted people email me provided they completed a certain amount of mechanical Turk effort up to a suitable standard (ie you'd have a handful of known good values interspersed and see that they gave the expected scores). For some it could be a direct earner of a very modest but for me it would be directly useful for various ML projects.
If you had a really inflated self image, you could have those you really didn't like get a higher target (whether that was communicated would be a interesting point)
It doesn't, however it's less of a security related thing than a sorting filter. I haven't received a spoofed email in a while so the need is not as high.
Re: your mechanical turk idea, my email provider actually has something like that: https://documentation.cpanel.net/display/1144Docs/BoxTrapper - however there are so many use-cases where I think this would block some email I actually want (like my kid's school, my MD, ...) that I never dared doing it.
You can't really spoof from addresses anymore. I mean, you can, but no email provider will accept it. See SPF, DKIM, etc. That part of why sorting based on sender is such a good idea.
It reclassifies email afterwards, you just have to move it the same way you would for the original classification.
It also moves all email from that sender that has been classified previously as well ; although it will not move the mail in folders for which you use a "screening folder" (i.e. if you use an intermediary folder used only to tell Screenr where to classify email) by design. The reason is that I wanted to be able to allow a behavior where mail gets classified as a best guess, but you can move email manually if need be. The best example is the "papertrail" folder I'm using: it receives all the order confirmation, invoices and such automatically, but sometimes I'm also putting similar emails my wife is forwarding me ; but I don't want screenr to classify her as papertrail - so I use a "Classify in papertrail" folder which is only used for me to tell the tool where it should land.
As in replicate the behaviour? I think you can get close but setting the filters is cumbersome. Gmail exposes imap i believe, so it should work right out
A few things I've done: use it to ignore my email except for a small number of senders while I'm working, viewing only starred messages the majority of the day, and copying email threads as html documents inside a project folder. There's something nice about being able to look at an "email inbox" that's a plain html page with six messages.