Poor MySQL, always getting picked on. However, it is pretty bad that a relational database in the year 2012 cannot handle simultaneous non-conflicting inserts on a table when transactions are involved.

First, some background. The InnoDB documentation does have a section dealing with deadlocks and possible ways to avoid them. There, it offers this wonderful fact:

You can get deadlocks even in the case of transactions that just insert or delete a single row.

A relatively recent update to archweb, the Arch Linux website code, added an inotify-based update script that watches the package repository database files for updates, and then kicks off the script for updating the website database.

Previously this ran on a fixed schedule via cron, so updates to the database were serialized, and things worked as expected. However, it now often occurs that the databases for the same repository for both architectures are updated at nearly the same time (within seconds of each other). The inotify-based script will thus kick off two simultaneous updates; however, these updates should always be non-conflicting and operating on different rows in the various touched tables.

After a few trials, I started noticing updates not running to completion, and tracebacks such as the following (slightly trimmed for brevity).

$ ./manage.py reporead_inotify -v0
Process Process-17:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
...
  File "/srv/http/archweb/devel/management/commands/reporead.py", line 223, in populate_pkg
    populate_files(dbpkg, repopkg, force=force)
  File "/srv/http/archweb/devel/management/commands/reporead.py", line 283, in populate_files
    pkgfile.save(force_insert=True)
  ...
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')

The first step to debugging deadlocks is to ask the database what the heck happened. Here is the output from SHOW ENGINE INNODB STATUS;.

------------------------
LATEST DETECTED DEADLOCK
------------------------
120130 14:53:49
*** (1) TRANSACTION:
TRANSACTION 32116C73, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 12 lock struct(s), heap size 3112, 8 row lock(s), undo log entries 4
MySQL thread id 393561, OS thread handle 0x7f61cc503700, query id 5884410 localhost 127.0.0.1 archweb-update update
INSERT INTO `package_files` (`pkg_id`, `is_directory`, `directory`, `filename`) VALUES (70297, 1, 'usr/share/', NULL)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 139734 n bits 224 index `package_files_pkg_id` of table `archweb`.`package_files` trx id 32116C73 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 32116C74, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
11 lock struct(s), heap size 1248, 6 row lock(s), undo log entries 2
MySQL thread id 393562, OS thread handle 0x7f61d18ce700, query id 5884411 localhost 127.0.0.1 archweb-update update
INSERT INTO `package_files` (`pkg_id`, `is_directory`, `directory`, `filename`) VALUES (70298, 1, 'usr/', NULL)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 139734 n bits 224 index `package_files_pkg_id` of table `archweb`.`package_files` trx id 32116C74 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 139734 n bits 224 index `package_files_pkg_id` of table `archweb`.`package_files` trx id 32116C74 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (2)

Trying to debug this, I came across a great blog post that describes InnoDB deadlocking on concurrent inserts. More specifically, this is due to "gap locking".

What is unfortunate is the post describes this only happening when a UNIQUE index is involved. In the case detailed above, the package_files_pkg_id index is most-definitely not unique, and you can see the values going into that column (70297 and 70298) are different.

At this point, I've been hesitant to "solve" this problem by hacks in the code, as it seems like a total failure at the database level more than anything else. This seems like as good of an excuse as any to get the site switched to using PostgreSQL now that the hosting server isn't responsible for four MySQL databases anymore.