/aug 9, 2015

Distributed Synchronization with Spring JPA

By Chris Hut

Web sites and other distributed, multi-user systems present unique challenges for concurrent access to shared state. In this post we'll take a look at a simple strategy (with one big gotcha) for achieving distributed resource synchronization in the Spring JPA environment.

Case Study: Volunteer Signup

Let us imagine we have built a web portal that enables organizers to create jobs for which users sign up on a first-come, first-serve basis. For simplicity, we will assume that a user can have at most one job. So an initial simple job signup on the database level is merely:

UPDATE user SET job_id = ? WHERE id = ?;

And our Spring/JPA implementation might look like:

@Transactional
public void assignJob(User user, Job job) {
  user.setJob(job);
}

Since this wouldn't be much of a blog post if that was the extent of the requirement, let's make things a little more complicated and add an important specification: Each job has a maximum number of slots which can be filled:

@Transactional
public boolean assignJob(User user, Job job) {
  int count = jobRepository.countAssignedUsers(job);
  if (count >= job.getMaxCount()) {
    return false;
  }

  user.setJob(job);
  return true;
}

It is easy to see the pitfall with such an approach, however. Some jobs might of course be more desirable than others which can result in a concurrency issue: two threads arriving at this method simultaneously will get the same value for count and we could end up with an over-booked gig. If we were developing an extremely simple, single-server web service we could simply make this method synchronized and be done with it; this solution would not help in a multi-machine environment where two users might not hit the same VM.

Database Level Locking

Thus, we need to move the synchronization out from the VM and onto a shared resource: the data server. In MySQL this is accomplished by modifying a SELECT query:

SELECT * FROM job WHERE id = ? FOR UPDATE;

When this statement is executed, other transactions will block on the SELECT statement until the current transaction completes (either committed or rolled back). The syntax varies by RDBMS, but JPA has an abstraction for this functionality called LockModeType. The various Lock Modes available are out of the scope of this blog post (a good overview is available here), but for our purposes LockModeType.PESSIMISTIC_WRITE translates to acquiring a row level lock per the SQL statement above.

We can acquire a lock in several ways: Spring offers a @Lock annotation to decorate JPA repository methods; for more fine-grained control it is often desired to access the JPA EntityManager directly. In a Spring Service class, this is as simple as injecting the EntityManager via the JPA PersistenceContext annotation (not the usual Inject or Autowired):

@PersistenceContext
private EntityManager entityManager;

@Transactional
public boolean assignJob(User user, Job job) {
  entityManager.refresh(job, LockModeType.PESSIMISTIC_WRITE);

  // remainder of method omitted
}

This has the effect of adding a global synchronized keyword to the method: only a single thread across a distributed system will get the lock at one time. Our volunteer coordinators can breathe a big sigh of relief!

Caveat: Repeatable Reads

There is one important caveat when using this pattern for a check-then-act operation (checking if a condition is met before continuing) in a situation such as this: the lock acquisition must be the very first operation performed in the transaction. This is premise is so important, here is that statement a second time called out as a block quote:

Lock acquisition must be the very first operation performed in a transaction

Code such as the following will still exhibit concurrency failures:

@Transactional
public boolean assignJob(Long userId, Job job) {
  User user = userRepository.findOne(userId); // 1

  entityManager.refresh(job, LockModeType.PESSIMISTIC_WRITE); // 2

  // remainder of method omitted
}

The reason is a feature found in all enterprise-grade RDBMS: Transaction isolation. While a full treatment of RDBMS transaction isolation levels is also out of the scope of this post, the key concept is that at levels of REPEATABLE_READ (the MySQL default) or higher, the first query in a transaction serves to capture a snapshot of the state of the entire database, which lasts for the duration of that transaction.

This means that the SELECT query to look up a user at #1 above will capture all of the database state (including the number of users assigned to every job) at that point in time. Two threads arriving at #1 simultaneously will report the same number of users assigned to a job, even though that count is not queried until after the lock is acquired at #2. REPEATABLE_READ isolation means that, for the duration of the transaction, commits in other transaction are not visible.

It is possible to work around this behavior by explicitly setting a lower transaction level (such as READ_COMMITTED), but this is undesirable: even if the reduced transactional safety is acceptable, modifying the transaction level on an ad hoc basis is error prone and an ongoing maintenance challenge. Acquiring the lock at the start of the transaction ensures the code functions as expected.

In a future post, I will cover how (and why) this technique can be used to simulate table-level locking. Any questions or thoughts, please let me know in the comments!

Related Posts

By Chris Hut

Chris has been a Java developer for 19 years. He helped architect the SourceClear SCA platform.