Database concurrency

Content:

Introduction
Two simple examples
Another example
Application thread synchronization
Transactions:
Explicit locking
Long term locking:

Introduction:

Most software developers learn about using database and using SQL relative quickly. But for many it takes a long time before they get exposed to database concurrency problems.

For many reasons:

the topic is rarely discussed in database tutorials
concurrency problems does not materialize in development environment
it is a sligtly complex/abstract problem

But it is something that every software developer need to understand.

This article will show examples using pseudo code with a mix of SQL and some programming language. It should not be a significant problem to convert the examples to a specific programming language.

Most of it is database independent, but sometimes specific databases will be referenced.

Two simple examples:

A lot of beginner have made the following two mistakes.

Problem: do an INSERT with a an id field that need to be unique as it is used as primary key.

Naive approach:

n = SELECT MAX(id)+1 FROM table
INSERT INTO table (id, s) VALUES(n, 'Some text')

This code has a concurrency issue and is not multi user safe. This code should never be used.

Explanation follows.

This flow works fine and are probably what the developer had in mind when writing the code:

User thread #1	User thread #2
n = SELECT MAX(id)+1 FROM table
INSERT INTO table (id, s) VALUES(n, 'Some text')
	n = SELECT MAX(id)+1 FROM table
	INSERT INTO table (id, s) VALUES(n, 'Some text')

But the following two flows does not work fine and can just as well happen:

User thread #1	User thread #2
n = SELECT MAX(id)+1 FROM table
	n = SELECT MAX(id)+1 FROM table
	INSERT INTO table (id, s) VALUES(n, 'Some text')
INSERT INTO table (id, s) VALUES(n, 'Some text')

and:

User thread #1	User thread #2
n = SELECT MAX(id)+1 FROM table
	n = SELECT MAX(id)+1 FROM table
INSERT INTO table (id, s) VALUES(n, 'Some text')
	INSERT INTO table (id, s) VALUES(n, 'Some text')

The problem can be resolved using transactions and a suitable transaction isolation level. More about that later.

But this particular problem is usually solved by using either auto increment or sequence.

Auto increment is used by MySQL, MS SQLServer, IBM DB2 etc. and looks like:

INSERT INTO table (s) VALUES('Some text') -- id is auto generated

Sequence is used by Oracle DB, PostgreSQL etc. and looks like:

INSERT INTO table (id, s) VALUES(idseq.NEXTVAL, 'Some text')

Problem: do an INSERT into one table and then an INSERT into another table where the auto generated primary key from the first table is needed for a field (foreign key).

Naive approach:

INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
n = SELECT MAX(id) FROM table1
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated

This code has a concurrency issue and is not multi user safe. This code should never be used.

Explanation follows.

This flow works fine and are probably what the developer had in mind when writing the code:

User thread #1	User thread #2
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
n = SELECT MAX(id) FROM table1
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated
	INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
	n = SELECT MAX(id) FROM table1
	INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated

But many other flows does not work fine and can just as well happen. One example:

User thread #1	User thread #2
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
	INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
	n = SELECT MAX(id) FROM table1
	INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated
n = SELECT MAX(id) FROM table1
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated

The problem can be resolved using transactions and a suitable transaction isolation level. More about that later.

But this particular problem is often solved by using a special function that retrieves the last auto generated id for a connection.

MySQL:

INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
INSERT INTO table2 (ref, x) VALUES(LAST_INSERT_ID(), 123.50) -- another id is auto generated

MS SQLServer:

INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated
INSERT INTO table2 (ref, x) VALUES(SCOPE_IDENTITY(), 123.50) -- another id is auto generated

Often this value can also be retrieved via the database API:

Java JDBC has stmt.getGeneratedKeys() that return a ResultSet with generated ids for the connection
PHP mysqli extension has $con->insert_id that returns the last generated id for the connection
PHP PDO extensions has $con->lastInsertId() that returns the last generated id for the connection

Another example:

The previous examples where good at illustrating the fundamental problem of concurrency, but they had workarounds that were specific for the problem. We need a more general solution.

So let us look at a third example. This example is not commonly seen like the previous two but it is used in many text books. And it does not have similar workaround.

Problem: check if an account has sufficient balance to accomodate a withdrawal and if it does then do the withdrawal.

Obvious approach:

balance = SELECT balance FROM account WHERE accountno=xxxx
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx

This code has a concurrency issue and is not multi user safe. This code should never be used without mitigation of that concurrency issue.

Explanation follows.

The good flow is:

User thread #1	User thread #2
balance = SELECT balance FROM account WHERE accountno=xxxx
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx
	balance = SELECT balance FROM account WHERE accountno=xxxx
	if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx

One of the bad flows are:

User thread #1	User thread #2
balance = SELECT balance FROM account WHERE accountno=xxxx
	balance = SELECT balance FROM account WHERE accountno=xxxx
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx
	if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx

In this case the account may end up in negative.

What we have seen now is that unless something special is done, then one can not make any assumption about how sequences of SQL statements will be executed for different users/threads.

Application thread synchronization:

One way to solve the problem is to let the application ensure that no more than one thread execute the code concurrently.

Java:

synchronized(sharedobject) {
    // code
}

C#:

lock(sharedobject) {
    // code
}

(for full examples including more languages see Threads)

It is very easy to do.

But the solution has some serious restrictions. It only works in a single server and single application context. It does not work with the application running on two or more servers (cluster). It does not work with two different applications accessing the same database.

This makes it a not recommended solution in most cases.

Transactions:

The concept

The concept of a transaction is to bundle multiple SQL statements into a single transaction to ensure that either all of them succeed or none of the succeed.

The classic example is transferring money from one account to another account:

UPDATE account SET balance = balance - amount WHERE accountno = a
UPDATE account SET balance = balance + amount WHERE accountno = b

What happens is the system crash between the two SQL statements? Money disappear without a trace.

And we can not solve the problem by switching the two SQL statements. In that case a crash in the middle will result in money appearing out of nowhere.

The solution is to bundle the two SQL statements in a transaction:

BEGIN
UPDATE account SET balance = balance - amount WHERE accountno = a
UPDATE account SET balance = balance + amount WHERE accountno = b
COMMIT

Now either both are none are executed.

Note that in some databases it is necessary to explicit disable auto commit (automatic commit after each statement).

COMMIT means "submit" the entire transaction to the database.

ROLLBACK "cancel" the entire transaction.

Almost all databases support transactions. But there is one well-known exception: MySQL with MyISAM tables (MySQL with InndoDB tables does support transactions - so transactions is one very good reason to prefer InnoDB tables over MyISAM tables).

Transaction Isolation Level

Using transactions does not automatically solve concurrency problems.

To solve concurrency problems one need to use transactions and set transaction isolation level appropriately.

Transaction isolation level specify how different transactions are isolated from each other.

The standard transaction isolation levels are:

Transaction isolation level	Semantics	Protect against "dirty reads" (read data that has not been committed yet and in fact may never be committed)	Protect against "non-repeatable reads" (if the same row is read multiple times then the values may be changed)	Protect against "phantom reads" (if a query returning multiple rows is executed multiple times then a different set of rows may be returned
Uncommited Read	No isolation	No	No	No
Commited Read	Only read data that is commited	Yes	No	No
Repeatable Read	Rows read will not be changed by another user thread	Yes	Yes	No
Serializable	Rows in row set will not be changed by another user thread	Yes	Yes	Yes

Some databases has additional levels.

Behind the scenes

Transaction isolation level can be implemented two different ways:

Locking
MVCC (Multi Version Concurrency Control)

Locking is simple. When a user thread needs to ensure that some item in the database does not change then it put a lock on it preventing other from getting a lock. When the user thread is done then the lock is released. User threads needing a certain lock waits until it becomes available.

There are a couple of anomaly scenarios:

Lock timeout: A user thread only waits X milliseconds for a lock, if it does not get it before that it results in an error/exception
Deadlock: User thread #1 got a lock on row A and want a lock on row B. User thread #2 got a lock on row B and want a lock on row A. And they can wait on each other forever. To resolve the situation it usually results in an error/exception

Typical locks are put on either rows or pages in the database.

MVCC maintains different logical copies of data for each user thread. If an updated copy needs to commit and there is a conflict then it results in an error/exception.

This is realy snapshot isolation, but some databases uses it for repeatable read and serializable isolation.

Concurrency solved

Using repeatable read or serializable transaction isolation level can solve concurrency problems.

The problem from above can be solved as:

SET TRANSACTION ISOLATION LEVEL = REPEATABLE READ
BEGIN
balance = SELECT balance FROM account WHERE accountno=xxxx
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx
COMMIT

And it works with clusters and multiple applications as well, because the logic is in the database itself.

Note though that there is a few gotchas to be aware of:

High transaction isolation level comes with a performance penalty
High transaction isolation level tend to result in errors/exceptions that the application must be able to handle (often by retrying the failed transaction)

Syntax

Examples of syntax with common programming languages and database access technologies:

con.setAutoCommit(false);
...
con.commit();

EntityTransaction tx = em.getTransaction();
tx.begin();
...
tx.commit();

Note that in Java EE EJB context then transactions are implicit. See here for details.

Transaction tx = s.beginTransaction();
...
tx.commit();

XxxTransaction tx = con.BeginTransaction();
cmd.Transaction = tx;
...
tx.Commit();

using (TransactionScope scope = new TransactionScope())
{
    ...
    scope.Complete();
}

Note that this also works with LINQ to SQL, Entity Framework and NHibernate.

XxxTransaction tx = con.BeginTransaction()
cmd.Transaction = tx
...
tx.Commit()

Using scope As New TransactionScope()
    ...
    scope.Complete()
End Using

Note that this also works with LINQ to SQL and Entity Framework.

$con->begin_transaction();
...
$con->commit();

$con->beginTransaction();
...
$con->commit();

For complete code examples using transactions see Transactions - Atomicity and Transactions - Isolation.

Explicit locking:

Sometimes more explicit locking can be used than what the database it doing due to transaction isolation level.

The problem from above can in some databases also be solved with:

BEGIN
balance = SELECT balance FROM account WHERE accountno=xxxx FOR UPDATE
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx
COMMIT

And that will work for all transaction isolation levels, because the FOR UPDATE clause cause the database to lock the row.

Often explicit locking is faster than high transaction isolation level. But it is not possible for all problems. And not all databases support it - and even if they do then the syntax may vary.

Long term locking:

Problem:

Both implicit locks via transaction isolation level and explicit locks like SELECT FOR UPDATE are shortterms locks. They are intended to be hold in milliseconds.

Locks that need to be kept for minutes or hours are an entirely different problem.

A typical scenario is:

SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
edit data in 5 minutes
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x

That can also cause concurrency problems.

Good flow:

User session #1	User session #2
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
edit data in 5 minutes
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x
	SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
	edit data in 5 minutes
	UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x

Bad flow:

User session #1	User session #2
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
	SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
edit data in 5 minutes
	edit data in 5 minutes
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x
	UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x

User #2 overwrites the changes made by user #1 so that they are lost.

Neither transaction isolation level nor explicit locking are good solutions due to timeouts and performance impact.

Long term pessimistic locking

The concept is to mark the row as being edited. Simply add a column inuse of type BOOLEAN to the table and use that as indicator.

(it is also possible and maybe even desirable to use a separate table for this sort of indicators, but we will keep it simple)

Two different implementations:

SET TRANSACTION ISOLATION LEVEL = REPEATABLE READ
BEGIN
inuse = SELECT inuse FROM customer WHERE id=x
if inuse then
    inform user that customer is already being edited by someone else
else
    UPDATE customer SET inuse=TRUE WHERE id=x
    SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
end if
edit data in 5 minutes
UPDATE customer SET address=y,zipcode=z,homephone=w,inuse=FALSE WHERE id=x

and:

rowsmodified = UPDATE customer UPDATE customer SET inuse=TRUE WHERE id=x AND NOT inuse
if rowsmodified = 0 then
    inform user that customer is already being edited by someone else
else
    SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x
end if
edit data in 5 minutes
UPDATE customer SET address=y,zipcode=z,homephone=w,inuse=FALSE WHERE id=x

The downsides with pessimistic locking are:

If a user never releases data then the inuse flag is staying set and a cleanup process is needed (easier if instead of a BOOLEAN field a DATETIME field is used and NULL means not locked and a value mean locked at that time)
A user just with no intention of actual changing the data prevents other users from changing the data

Long term optimistic locking

The concept is to check if data has been modified by someone else when saving changes. Simply add a column version of type INTEGER to the table and use that to increment at changes.

Two different implementations:

SELECT name,address,city,zipcode,homephone,cellphone,version FROM customer WHERE id=x
myversion = result[version]
edit data in 5 minutes
SET TRANSACTION ISOLATION LEVEL = REPEATABLE READ
BEGIN
currentversion = SELECT version FROM customer WHERE id=x
if myversion = currentversion then
    UPDATE customer SET address=y,zipcode=z,homephone=w,version=version+1 WHERE id=x
else
    inform user that customer has been modified by someone else so it is not possible to save
end if
COMMIT

and:

SELECT name,address,city,zipcode,homephone,cellphone,version FROM customer WHERE id=x
myversion = result[version]
edit data in 5 minutes
rowmodified = UPDATE customer SET address=y,zipcode=z,homephone=w,version=version+1 WHERE id=x AND version=myversion
if rowmodified = 0 then
    inform user that customer has been modified by someone else so it is not possible to save
end if

The downsides with optimistic locking are:

Users may need to start over again if somebody else has changed data

Optimistic locking works best if the risk of concurrent edits are small.

Some databases has builtin data type for this sort of version number.

Article history:

Version	Date	Description
1.0	July 8th 2006	Initial version (in Danish) published on Eksperten.dk
1.1	February 12th 2010	Minor changes
2.0	September 3rd 2016	Translation to English and complete reformatting and publishing here
2.1	October 7th 2016	Add syntax examples
2.2	October 8th 2016	Add content overview

Comments:

Please send comments to Arne Vajhøj

Database concurrency

Content:

Introduction:

Two simple examples:

Problem: do an INSERT with a an id field that need to be unique as it is used as primary key.

Problem: do an INSERT into one table and then an INSERT into another table where the auto generated primary key from the first table is needed for a field (foreign key).

Another example:

Problem: check if an account has sufficient balance to accomodate a withdrawal and if it does then do the withdrawal.

Application thread synchronization:

Transactions:

The concept

Transaction Isolation Level

Behind the scenes

Concurrency solved

Syntax

Explicit locking:

Long term locking:

Problem:

Long term pessimistic locking

Long term optimistic locking

Article history:

Other articles:

Comments: