Hacker News new | past | comments | ask | show | jobs | submit login

Open db connection

get data for receipt

generate receipt

send email

write success to database

close connection

To a junior programmer this would probably look reasonable, and to be fair, it takes some experience and getting burned, or good training, to know it is not.




I was about to argue but then I realized: I've been working in the MS world for so long, I forgot that not everybody has connection pools.

The MS recommendation is the opposite: get the connection early, finish with it late, let us worry about the "real" connection. I've been working on multi-TB databases like that, with hundreds of concurrent requests, and never had problems.


This has nothing to do with pooling, you can't use a connection pool in this scenario because you're in a transaction. Even in the MS world, that requires a single connection to be used throughout without releasing it back to the pool. If one isn't in a transaction then each access to the db uses a fresh connection from the pool and this problem never comes up, but the OP here is assuming the time between open and close is a transaction.


So the correct version is, I guess:

Open db connection 1

get data for receipt

Close db connection 1

generate receipt

send email

open db connection 2

write success to database

close db connection 2

I guess you could speed this up a lot by doing it in bulk instead of opening and closing db connections twice for every email. Anyway, the version you wrote sounds reasonable for everything but really big operations to me, but then again I'm fairly junior.


You can still do it in one transaction, sort of. Replace "send email" with "queue email" and then either eliminate the step that writes the success to the database (pushing that off to the queue processor, who will open or already have open a database connection) or instead write to the DB that you queued the item.

The advantage you gain is that the queue is a nice buffer if something gets held up with email sending, and also, the queue processor can work in bulk, say sending 100 emails, and then opening a connection to the DB and writing them all in one statement.


So if the "send email" step fails (temporarily), the next worker to come along will grab the same receipt and send it again?

I think a better solution would be to use a centralized queue that actually does the mail sending, and retries in case of failure.


That depends on the implementation of the queue. Most queues have a "failed/retry" concept, where they mark failed jobs to be retried in some set future point in time. So one failed job does not hold up your entire queue processing.


> a centralized queue that actually does the mail sending, and retries in case of failure.

A bit like SMTP?


not necessarily, in the first write we can update receipt status as "queued" so queue system can update to "sent" or retry


Depending on the number of email sending workers, it could be closer to:

    transaction 1:
        mark an unsent receipt as being processed
        get the marked receipt
    send email
    transaction 2:
        mark the receipt as sent


You don't need to close the database connection to not have an open transaction, right?


No you don't. But depending on the ORM, some of them batch writes unless you explicitly tell them not to, but will flush when you close the connection. So some programmers will just close the connection because they don't understand the difference or because they've learned that closing the connection guarantees that the data is written.


> So some programmers will just close the connection because they don't understand the difference or because they've learned that closing the connection guarantees that the data is written.

To be fair, this problem isn't limited to databases. Filesystems, even HDDs/SSDs, have been known to readily ignore flush() calls in order to achieve better benchmark results.


This still can be a pretty reasonable and correct course of action. Everything depends on the amount of data you are locking, and amount of transactions open simultaneously.

When there's a noticeable contention due to the number of parallel transactions, one should consider ways to loosen their data guarantees (the status can be unset for some time) and go for an explicit asynchronous approaches.


Thanks for the illustration. But if Uber team was using ORM, then expect ORM to take care of the pool of connections.

get data for receipt would be a SELECT and only write success to database would do INSERT or UPDATE. I expect junior programmer to complete the above in at least two SQL calls. I have a feeling they were trying something smart.

EDIT: hmm reading the other commenter above, probably they are trying to lock on the data for full data integrity. Okay. That makes sense then. I was looking at the problem from the wrong angle.


Grapevine says that they were holding a transaction open for the entire duration of the Uber ride.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: