> Delete in batches (as opposed to one-by-one), and without moving the event objects out of the database server – try to avoid loading them in the cleaning process before removing them, and don’t bother sending removed items back over the wire.
Nah, I am quite sure
for item in conn.query('SELECT * FROM event'):
if item.created_at < created_at_horizon:
conn.exec('DELETE FROM event WHERE ' + ' AND '.join(f'{k} = :{k}' for k in item.__dict__.keys), item.__dict__)
is fine, so I won't even bother measuring it. It's also secure, don't worry: as you can see I used prepared statements in it.
I've run into this problem as well but I'm reticient to embrace deleting the events table because that data is the very heart and soul of what we are doing with information technology: tracking facts and values.
I would much rather a solution where I can offload old events into some other storage system, so that it doesn't clog my database but the information is still retrievable.
Events tables can also be privacy nightmares, depending on how stringent your industry's requirements are. Not clear ownership, really hard to deprecate events, one of the worst kinds of tech debt in my experience.
Don't collect data if you don't have a clear purpose for it, IMO.
> I would much rather a solution where I can offload old events into some other storage system
At large enough scale, it becomes mandatory to separate the event bus from the system of record. I witnessed this in semiconductor manufacturing. There was a gigantic transaction table of finalized things, and all of the detail event data was on a separate messaging bus/system with its own log files. The raw event data was on the order of 300-400 megabytes per hour per mfg line.
Even with this separation, the high level table was still very scary to use. An unconstrained "SELECT *" would invariably result in a visit from the on-call DBA.
Nah, I am quite sure
is fine, so I won't even bother measuring it. It's also secure, don't worry: as you can see I used prepared statements in it.