Disclaimer: note that I mostly redacted product’s internal details. In places I also used simplified explanations regarding MySQL indexes, especially where detailed explanations already exist elsewhere on the web. I provided links where appropriate.
In the rest of this part 1 post I first provide a short context of how we do certain things at Productive. Then I briefly explain what deadlocks are, followed by the approach we took to fix them. In part 2 which will follow, I will show a different approach we took to resolve deadlocks for our end-to-end environment.
In Productive we use Sidekiq for scheduling asynchronous jobs. Some of these jobs are sending emails to users when they do not track their time. Others are batch jobs which need to update some financial data, copy certain objects in a transaction safe manner, etc. Starting a few months ago we’ve started noticing a surge of deadlocks in our application. In particular, these deadlocks started appearing for batched Sidekiq jobs which have to process a high number of relatively heavy database transactions.
Database deadlocks occur when two transactions cannot proceed as one holds a lock that the other one needs. There is nothing inherently wrong with deadlocks, as long as they do not impact your database in a way that you cannot process certain transactions at all. You do need, however, to have a mechanism in place to retry deadlocked transactions. It is generally sufficient to retry only the second transaction (the one that caused the deadlock), as MySQL only aborts that transaction to release the lock, allowing the first transaction to continue.