Most software developers learn about using database and using SQL relative quickly. But for many it takes a long time before they get exposed to database concurrency problems.
For many reasons:
But it is something that every software developer need to understand.
This article will show examples using pseudo code with a mix of SQL and some programming language. It should not be a significant problem to convert the examples to a specific programming language.
Most of it is database independent, but sometimes specific databases will be referenced.
A lot of beginner have made the following two mistakes.
Naive approach:
n = SELECT MAX(id)+1 FROM table INSERT INTO table (id, s) VALUES(n, 'Some text')
This code has a concurrency issue and is not multi user safe. This code should never be used.
Explanation follows.
This flow works fine and are probably what the developer had in mind when writing the code:
User thread #1 | User thread #2 |
---|---|
n = SELECT MAX(id)+1 FROM table | |
INSERT INTO table (id, s) VALUES(n, 'Some text') | |
n = SELECT MAX(id)+1 FROM table | |
INSERT INTO table (id, s) VALUES(n, 'Some text') |
But the following two flows does not work fine and can just as well happen:
User thread #1 | User thread #2 |
---|---|
n = SELECT MAX(id)+1 FROM table | |
n = SELECT MAX(id)+1 FROM table | |
INSERT INTO table (id, s) VALUES(n, 'Some text') | |
INSERT INTO table (id, s) VALUES(n, 'Some text') |
and:
User thread #1 | User thread #2 |
---|---|
n = SELECT MAX(id)+1 FROM table | |
n = SELECT MAX(id)+1 FROM table | |
INSERT INTO table (id, s) VALUES(n, 'Some text') | |
INSERT INTO table (id, s) VALUES(n, 'Some text') |
The problem can be resolved using transactions and a suitable transaction isolation level. More about that later.
But this particular problem is usually solved by using either auto increment or sequence.
Auto increment is used by MySQL, MS SQLServer, IBM DB2 etc. and looks like:
INSERT INTO table (s) VALUES('Some text') -- id is auto generated
Sequence is used by Oracle DB, PostgreSQL etc. and looks like:
INSERT INTO table (id, s) VALUES(idseq.NEXTVAL, 'Some text')
Naive approach:
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated n = SELECT MAX(id) FROM table1 INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated
This code has a concurrency issue and is not multi user safe. This code should never be used.
Explanation follows.
This flow works fine and are probably what the developer had in mind when writing the code:
User thread #1 | User thread #2 |
---|---|
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated | |
n = SELECT MAX(id) FROM table1 | |
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated | |
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated | |
n = SELECT MAX(id) FROM table1 | |
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated |
But many other flows does not work fine and can just as well happen. One example:
User thread #1 | User thread #2 |
---|---|
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated | |
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated | |
n = SELECT MAX(id) FROM table1 | |
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated | |
n = SELECT MAX(id) FROM table1 | |
INSERT INTO table2 (ref, x) VALUES(n, 123.50) -- another id is auto generated |
The problem can be resolved using transactions and a suitable transaction isolation level. More about that later.
But this particular problem is often solved by using a special function that retrieves the last auto generated id for a connection.
MySQL:
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated INSERT INTO table2 (ref, x) VALUES(LAST_INSERT_ID(), 123.50) -- another id is auto generated
MS SQLServer:
INSERT INTO table1 (s) VALUES('Some text') -- id is auto generated INSERT INTO table2 (ref, x) VALUES(SCOPE_IDENTITY(), 123.50) -- another id is auto generated
Often this value can also be retrieved via the database API:
The previous examples where good at illustrating the fundamental problem of concurrency, but they had workarounds that were specific for the problem. We need a more general solution.
So let us look at a third example. This example is not commonly seen like the previous two but it is used in many text books. And it does not have similar workaround.
Obvious approach:
balance = SELECT balance FROM account WHERE accountno=xxxx if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx
This code has a concurrency issue and is not multi user safe. This code should never be used without mitigation of that concurrency issue.
Explanation follows.
The good flow is:
User thread #1 | User thread #2 |
---|---|
balance = SELECT balance FROM account WHERE accountno=xxxx | |
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx | |
balance = SELECT balance FROM account WHERE accountno=xxxx | |
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx |
One of the bad flows are:
User thread #1 | User thread #2 |
---|---|
balance = SELECT balance FROM account WHERE accountno=xxxx | |
balance = SELECT balance FROM account WHERE accountno=xxxx | |
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx | |
if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx |
In this case the account may end up in negative.
What we have seen now is that unless something special is done, then one can not make any assumption about how sequences of SQL statements will be executed for different users/threads.
One way to solve the problem is to let the application ensure that no more than one thread execute the code concurrently.
Java:
synchronized(sharedobject) {
// code
}
C#:
lock(sharedobject) {
// code
}
(for full examples including more languages see Threads)
It is very easy to do.
But the solution has some serious restrictions. It only works in a single server and single application context. It does not work with the application running on two or more servers (cluster). It does not work with two different applications accessing the same database.
This makes it a not recommended solution in most cases.
The concept of a transaction is to bundle multiple SQL statements into a single transaction to ensure that either all of them succeed or none of the succeed.
The classic example is transferring money from one account to another account:
UPDATE account SET balance = balance - amount WHERE accountno = a UPDATE account SET balance = balance + amount WHERE accountno = b
What happens is the system crash between the two SQL statements? Money disappear without a trace.
And we can not solve the problem by switching the two SQL statements. In that case a crash in the middle will result in money appearing out of nowhere.
The solution is to bundle the two SQL statements in a transaction:
BEGIN UPDATE account SET balance = balance - amount WHERE accountno = a UPDATE account SET balance = balance + amount WHERE accountno = b COMMIT
Now either both are none are executed.
Note that in some databases it is necessary to explicit disable auto commit (automatic commit after each statement).
COMMIT means "submit" the entire transaction to the database.
ROLLBACK "cancel" the entire transaction.
Almost all databases support transactions. But there is one well-known exception: MySQL with MyISAM tables (MySQL with InndoDB tables does support transactions - so transactions is one very good reason to prefer InnoDB tables over MyISAM tables).
Using transactions does not automatically solve concurrency problems.
To solve concurrency problems one need to use transactions and set transaction isolation level appropriately.
Transaction isolation level specify how different transactions are isolated from each other.
The standard transaction isolation levels are:
Transaction isolation level | Semantics | Protect against "dirty reads" (read data that has not been committed yet and in fact may never be committed) | Protect against "non-repeatable reads" (if the same row is read multiple times then the values may be changed) | Protect against "phantom reads" (if a query returning multiple rows is executed multiple times then a different set of rows may be returned |
---|---|---|---|---|
Uncommited Read | No isolation | No | No | No |
Commited Read | Only read data that is commited | Yes | No | No |
Repeatable Read | Rows read will not be changed by another user thread | Yes | Yes | No |
Serializable | Rows in row set will not be changed by another user thread | Yes | Yes | Yes |
Some databases has additional levels.
Transaction isolation level can be implemented two different ways:
Locking is simple. When a user thread needs to ensure that some item in the database does not change then it put a lock on it preventing other from getting a lock. When the user thread is done then the lock is released. User threads needing a certain lock waits until it becomes available.
There are a couple of anomaly scenarios:
Typical locks are put on either rows or pages in the database.
MVCC maintains different logical copies of data for each user thread. If an updated copy needs to commit and there is a conflict then it results in an error/exception.
This is realy snapshot isolation, but some databases uses it for repeatable read and serializable isolation.
Using repeatable read or serializable transaction isolation level can solve concurrency problems.
The problem from above can be solved as:
SET TRANSACTION ISOLATION LEVEL = REPEATABLE READ BEGIN balance = SELECT balance FROM account WHERE accountno=xxxx if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx COMMIT
And it works with clusters and multiple applications as well, because the logic is in the database itself.
Note though that there is a few gotchas to be aware of:
Examples of syntax with common programming languages and database access technologies:
con.setAutoCommit(false);
...
con.commit();
EntityTransaction tx = em.getTransaction();
tx.begin();
...
tx.commit();
Note that in Java EE EJB context then transactions are implicit. See here for details.
Transaction tx = s.beginTransaction();
...
tx.commit();
XxxTransaction tx = con.BeginTransaction();
cmd.Transaction = tx;
...
tx.Commit();
using (TransactionScope scope = new TransactionScope())
{
...
scope.Complete();
}
Note that this also works with LINQ to SQL, Entity Framework and NHibernate.
XxxTransaction tx = con.BeginTransaction()
cmd.Transaction = tx
...
tx.Commit()
Using scope As New TransactionScope()
...
scope.Complete()
End Using
Note that this also works with LINQ to SQL and Entity Framework.
$con->begin_transaction();
...
$con->commit();
$con->beginTransaction();
...
$con->commit();
For complete code examples using transactions see Transactions - Atomicity and Transactions - Isolation.
Sometimes more explicit locking can be used than what the database it doing due to transaction isolation level.
The problem from above can in some databases also be solved with:
BEGIN balance = SELECT balance FROM account WHERE accountno=xxxx FOR UPDATE if balance >= amount then UPDATE account SET balance = balance - amount WHERE accountno=xxxx COMMIT
And that will work for all transaction isolation levels, because the FOR UPDATE clause cause the database to lock the row.
Often explicit locking is faster than high transaction isolation level. But it is not possible for all problems. And not all databases support it - and even if they do then the syntax may vary.
Both implicit locks via transaction isolation level and explicit locks like SELECT FOR UPDATE are shortterms locks. They are intended to be hold in milliseconds.
Locks that need to be kept for minutes or hours are an entirely different problem.
A typical scenario is:
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x edit data in 5 minutes UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x
That can also cause concurrency problems.
Good flow:
User session #1 | User session #2 |
---|---|
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x | |
edit data in 5 minutes | |
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x | |
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x | |
edit data in 5 minutes | |
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x |
Bad flow:
User session #1 | User session #2 |
---|---|
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x | |
SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x | |
edit data in 5 minutes | |
edit data in 5 minutes | |
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x | |
UPDATE customer SET address=y,zipcode=z,homephone=w WHERE id=x |
User #2 overwrites the changes made by user #1 so that they are lost.
Neither transaction isolation level nor explicit locking are good solutions due to timeouts and performance impact.
The concept is to mark the row as being edited. Simply add a column inuse of type BOOLEAN to the table and use that as indicator.
(it is also possible and maybe even desirable to use a separate table for this sort of indicators, but we will keep it simple)
Two different implementations:
SET TRANSACTION ISOLATION LEVEL = REPEATABLE READ BEGIN inuse = SELECT inuse FROM customer WHERE id=x if inuse then inform user that customer is already being edited by someone else else UPDATE customer SET inuse=TRUE WHERE id=x SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x end if edit data in 5 minutes UPDATE customer SET address=y,zipcode=z,homephone=w,inuse=FALSE WHERE id=x
and:
rowsmodified = UPDATE customer UPDATE customer SET inuse=TRUE WHERE id=x AND NOT inuse if rowsmodified = 0 then inform user that customer is already being edited by someone else else SELECT name,address,city,zipcode,homephone,cellphone FROM customer WHERE id=x end if edit data in 5 minutes UPDATE customer SET address=y,zipcode=z,homephone=w,inuse=FALSE WHERE id=x
The downsides with pessimistic locking are:
The concept is to check if data has been modified by someone else when saving changes. Simply add a column version of type INTEGER to the table and use that to increment at changes.
Two different implementations:
SELECT name,address,city,zipcode,homephone,cellphone,version FROM customer WHERE id=x myversion = result[version] edit data in 5 minutes SET TRANSACTION ISOLATION LEVEL = REPEATABLE READ BEGIN currentversion = SELECT version FROM customer WHERE id=x if myversion = currentversion then UPDATE customer SET address=y,zipcode=z,homephone=w,version=version+1 WHERE id=x else inform user that customer has been modified by someone else so it is not possible to save end if COMMIT
and:
SELECT name,address,city,zipcode,homephone,cellphone,version FROM customer WHERE id=x myversion = result[version] edit data in 5 minutes rowmodified = UPDATE customer SET address=y,zipcode=z,homephone=w,version=version+1 WHERE id=x AND version=myversion if rowmodified = 0 then inform user that customer has been modified by someone else so it is not possible to save end if
The downsides with optimistic locking are:
Optimistic locking works best if the risk of concurrent edits are small.
Some databases has builtin data type for this sort of version number.
Version | Date | Description |
---|---|---|
1.0 | July 8th 2006 | Initial version (in Danish) published on Eksperten.dk |
1.1 | February 12th 2010 | Minor changes |
2.0 | September 3rd 2016 | Translation to English and complete reformatting and publishing here |
2.1 | October 7th 2016 | Add syntax examples |
2.2 | October 8th 2016 | Add content overview |
See list of all articles here
Please send comments to Arne Vajhøj