
Ask HN: How do you architect daily digest emails? - dazamarquez
Hi HN,
I&#x27;m searching how to build a system to daily digest emails, on top of an existing legacy database &#x2F; application.<p>Does it make sense to send daily digests to every user of your app every day? I think that iterate through each user will be very computation expensive.<p>Then if you need to aggregate some data (maybe with SQL JOIN) or other business logic, in order to build the email content, and you keep doing that for each user, your resource usage will be so high.<p>Beside then there&#x27;s more issues: e.g. users may live all over the world and not have the same timezone, so the time of &quot;daily&quot; digest can&#x27;t be the same for everyone.<p>Do you have a strategy for this daily digest? Is there some learning resources you could point to?<p>Muito Obrigado!
======
Nextgrid
> I think that iterate through each user will be very computation expensive.

You can preload the data you need in bulk. Let's say you have a query will
give you a mapping of User ID -> Product IDs. You run that query first (for
all the users) and cache the result in memory (this is where an ORM will
probably be counter-productive and I suggest you convert the result to
primitive types like a dictionary to save memory). It's a huge query however
it's also a single query, so the database can internally optimize it and it
shouldn't be too big of a problem.

You repeat this for all the data you think you'll need (it's fine if you get a
bit extra, the optimization of fetching it in bulk makes up for it).

You can even reuse existing caches your app might be using. If the data you
need is already in Memcached/Redis as a result of another process you could
just fetch it from there directly and avoid hitting the database at all.

Now that you have all that data in memory, you do the actual processing in the
code. Compute and memory capacity is relatively cheap compared to engineering
efforts to optimize it further (especially if it involves rearchitecting your
database layout or denormalizing certain data) and you can go even cheaper if
you outsource this process to a bare-metal server which is more cost-effective
for raw compute power than a cloud provider.

~~~
dazamarquez
Honestly I asked my question without any expectation. I thought most people
scoff at it because it make me seem as a newbie/stupid person. Instead you
gave me a interesting and useful reply. I now have much more leads to dig
deeper in that topic. Thank you!!

~~~
Nextgrid
I originally upvoted your question without replying because I expected there
to be a _right_ solution for this and I was hoping some experts will chime in.

Looking back at it I'm not sure if there is a "right" solution for this (or
maybe there is and the experts are still laughing) but at least here's my take
on it and how I would approach the problem.

Whether it's the "right" solution or not is up for debate, but at least it
will get you started. When attempting to solve a problem I recommend writing
the worst, most hacky solution you can as long as it gets the job done, and
then see if you can optimize it incrementally (by fetching data in bulk, etc).
Throwing hardware at the problem is also a valid solution to at least delay it
(sometimes the "delay" can be measured in _years_ in which case the hacky and
terrible solution would've made you tons of money in the meantime so you still
end up winning).

