/mar 31, 2015

Avoiding JPA Performance Pitfalls

By Chris Hut

Until my arrival at SourceClear, the Java Persistence API (JPA) is one technology (along with the Spring Framework) I inadvertently avoided using with in my career, despite working for 15 years on consumer websites with large relational data models. I consider this a blessing and a curse. The delay has served me well - gone are the dark days of verbose XML configurations that for so long gave these frameworks (and indeed, Java as a whole) the reputation of being clunky, slow and decidedly un-cool. Now, with simple annotations one can set in motion a complex dance between Spring, Hibernate and the RDBMS, all without writing a single line of XML or SQL. What's not to love?

Well, like a trusty hammer, JPA solves some problems extremely well; other problems require a bit more precision and care. To demonstrate, we'll use a simple app data model archetype that has only been done a few thousand times before* - a photo sharing website.

In this data model, users (the Account table) have zero or more Albums, which contain Photos. Since a single photo can appear in multiple albums, it is modeled as a many-to-many relationship with a join table AlbumPhoto:

+------------------+
|     Account      |
+------------------+
|   id (PK)        |
|   email          |
|   location       |
+------------------+
         |1
         |*
+------------------+
|     Album        |
+------------------+
|   id (PK)        |
|   account_id (FK)|
|   name           |
|   date           |
+------------------+
         |1
         |*
+------------------+
|    AlbumPhoto    |
+------------------+
|   album_id (FK)  |
|   photo_id (FK)  |
+------------------+
         |*
         |1
+------------------+
|      Photo       |
+------------------+
|   id (PK)        |
|   filename       |
|   caption        |
+------------------+

(* including twice by yours truly)

Here are the entity classes to model the above (note that some fields and accessors are omitted for brevity):

Account.java:

@Entity
@Table(name = "account")
public class Account implements Serializable {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private int id;

  private String email;

  private String location;

  @OneToMany(mappedBy = "account")
  @OrderBy("date")
  private Set<Album> albums;

  public Set<Album> getAlbums() {
    return albums;
  }
}

Album.java:

@Entity
@Table(name = "album")
public class Album implements Serializable {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private int id;

  @ManyToOne
  @JoinColumn(name = "account_id", nullable = false)
  private Account account;

  private String name;

  @ManyToMany
  @JoinTable(name = "album_photo",
      joinColumns = @JoinColumn(name = "album_id"),
      inverseJoinColumns = @JoinColumn(name = "photo_id"))
  private Set<Photo> photos;

  public Set<Photo> getPhotos() {
    return photos;
  }
}

Photo.java:

@Entity
@Table(name = "photo")
public class Photo implements Serializable {
  @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
  private int id;

  private String filename;

  private String caption;
}

We can see here the power and simplicity of JPA (some obscure nomenclature notwithstanding). With just a few annotations, we can model each of our to-many relationships; we don't even need to specify an entity class for the join table AlbumPhoto.

(Not pictured is the Spring/Hibernate/JPA setup process which is powerful, but anything but alluring. I hope to address some of the lessons learned there in a future blog post.)

To use these entities, we'll first create a Service method to transactionally load an account and view its albums:

@Transactional(readOnly = true)
@Override
public Account findByEmailWithAlbums(String email) {
  Account account = accountRepository.findByEmail(email);
  return account;
}

The client for our service looks like:

Account account = accountService.findByEmailWithAlbums("[email protected]");
System.out.println(account);

Set<Album> albums = account.getAlbums();
System.out.println(albums);

Running the above, here's the output:

Hibernate: select account0_.id as id1_0_, account0_.email as email2_0_ from account account0_ where account0_.email=?
Account{id=1, email=test@sourceclear.com}
Exception in thread "main" org.hibernate.LazyInitializationException: failed to lazily initialize a collection of role: model.Account.albums, could not initialize proxy - no Session

Hmmm, that didn't go too well. What happened?

Lazy Initialization

For performance reasons, JPA specifies that One-To-Many and Many-To-Many collections are initialized lazily by default. This means that the persistence layer will refrain from loading the join table (Albums in this case) until it is actually requested by calling code. The problem is that by the time the calling code requests the join data, the transaction and database session are already closed. (This case is particularly prevalent in Web MVC scenarios, where the rendering view may request joined data far after the database operations have completed.)

Naive solution: Eager fetching

In addition to the default lazy collection initialization, JPA allows you to initialize To-Many collections eagerly. To enable this we simply need to modify our annotation as follows:

@OneToMany(mappedBy = "account", fetch = FetchType.EAGER)
@OrderBy("date")
private Set<Album> albums;

Re-running our test code the console output is now:

Hibernate: select account0_.id as id1_0_, account0_.email as email2_0_ from account account0_ where account0_.email=?
Hibernate: select albums0_.account_id as account_4_0_0_, albums0_.id as id1_1_0_, albums0_.id as id1_1_1_, albums0_.account_id as account_4_1_1_, albums0_.date as date2_1_1_, albums0_.name as name3_1_1_ from album albums0_ where albums0_.account_id=? order by albums0_.date
Account{id=1, email=test@sourceclear.com}
[Album{id=1, person=Account{id=1, email=test@sourceclear.com}, name=Hawaii 2014, date=2015-03-30 16:21:00.0}, Album{id=10, person=Account{id=1, email=test@sourceclear.com}, name=Europe 2014, date=2015-03-30 18:36:22.0}]

This may appear to be the desired outcome: all data loaded and more importantly, no stack trace. However, we'll see why lazy fetching makes sense as the default after all, and the importance of knowing the use cases involved when working around errors. To demonstrate, let's add some new functionality to fetch a list of accounts in a specific location:

List<Account> accounts = accountService.findByLocation("94112");
System.out.println(accounts);

The console output:

Hibernate: select account0_.id as id1_0_, account0_.email as email2_0_, account0_.location as location3_0_ from account account0_ where account0_.location=?
Hibernate: select albums0_.account_id as account_4_0_0_, albums0_.id as id1_1_0_, albums0_.id as id1_1_1_, albums0_.account_id as account_4_1_1_, albums0_.date as date2_1_1_, albums0_.name as name3_1_1_ from album albums0_ where albums0_.account_id=? order by albums0_.date
Hibernate: select albums0_.account_id as account_4_0_0_, albums0_.id as id1_1_0_, albums0_.id as id1_1_1_, albums0_.account_id as account_4_1_1_, albums0_.date as date2_1_1_, albums0_.name as name3_1_1_ from album albums0_ where albums0_.account_id=? order by albums0_.date
Hibernate: select albums0_.account_id as account_4_0_0_, albums0_.id as id1_1_0_, albums0_.id as id1_1_1_, albums0_.account_id as account_4_1_1_, albums0_.date as date2_1_1_, albums0_.name as name3_1_1_ from album albums0_ where albums0_.account_id=? order by albums0_.date
...

What happened here? The query on the Account table looks fine, but now we have a slew of individual queries on the Album table. In fact, we have one query per user in the "94112" location. This will certainly not scale, and we will quickly revert back to FetchType.LAZY for our join collections.

Potentially-better solution: Manual fetching

Given that eager fetching is almost never a good default for datasets which may be large, what are some better options for retrieving our album data? One possibility is to trigger the fetch manually. Suppose (contrived example warning!) we want to show the first five users' albums from a given location:

@Transactional(readOnly = true)
@Override
public List<Account> findByLocationWithAlbums(String location, int limit) {
  List<Account> accounts = accountRepository.findByLocation(location);

  return accounts.stream()
      .limit(limit)
      .peek(account -> Hibernate.initialize(account.getAlbums()))
      .collect(toList());
}

The call to Hibernate.initialize() has the same effect as FetchType.EAGER on the collection - a separate query to fetch each album row and initialize the Album object is performed. Except in this case, we control the records for which the extra DB roundtrip is performed. The console output shows the Account query and (at most) 5 queries on Albums.

Peak performance (maintenance): Join queries

Spring JPA allows you to specify the query used by a repository method:

@Repository
public interface AccountRepository extends JpaRepository<Account, Integer> {
  List<Account> findByLocation(String location);

  @Query("FROM Account AS account LEFT JOIN FETCH account.albums WHERE account.location = ?1")
  List<Account> findByLocationJoinAlbums(String location);
}

This query retrieves and fills in the album collection data concurrent with the account lookup. The console output shows:

Hibernate: select account0_.id as id1_0_0_, albums1_.id as id1_1_1_, account0_.email as email2_0_0_, account0_.location as location3_0_0_, albums1_.account_id as account_4_1_1_, albums1_.date as date2_1_1_, albums1_.name as name3_1_1_, albums1_.account_id as account_4_0_0__, albums1_.id as id1_1_0__ from account account0_ left outer join album albums1_ on account0_.id=albums1_.account_id where account0_.location=? order by albums1_.date

Everything is fetched in a single database roundtrip. Note that there is a small tradeoff here. Whereas Spring JPA can automatically parse a method name construct (such as findByLocation) into a SQL query, overriding with our own query creates added maintenance cost and fragility in the String query construct. Fortunately, modern IDE's such as IntelliJ IDEA have full JPA/Hibernate integrations which can aid developers here, for example by auto-completing table and column names in custom queries and by linking the internal String values back to their appropriate entity classes/fields (e.g. when selecting "Go To Declaration").

We can go farther down the relational object tree, for example to fetch photos (normally also a lazily initialized collection) at the same time as albums and accounts:

@Query("FROM Account AS account LEFT JOIN FETCH account.albums AS album LEFT JOIN FETCH album.photos WHERE email = ?1")
Account findByEmailWithAlbumsAndPhotos(String email);

// ...
Account account = accountService.findByEmailWithAlbumsAndPhotos("[email protected]");

account.getAlbums().forEach(album -> {
  System.out.println("album: " + album.getName() + " - photo count: " + album.getPhotos().size());
});

The console output displays a single DB operation to retrieve all the data:

Hibernate: select account0_.id as id1_0_0_, albums1_.id as id1_1_1_, photo3_.id as id1_3_2_, account0_.email as email2_0_0_, account0_.location as location3_0_0_, albums1_.account_id as account_4_1_1_, albums1_.date as date2_1_1_, albums1_.name as name3_1_1_, albums1_.account_id as account_4_0_0__, albums1_.id as id1_1_0__, photo3_.caption as caption2_3_2_, photo3_.filename as filename3_3_2_, photos2_.album_id as album_id1_1_1__, photos2_.photo_id as photo_id2_2_1__ from account account0_ left outer join album albums1_ on account0_.id=albums1_.account_id left outer join album_photo photos2_ on albums1_.id=photos2_.album_id left outer join photo photo3_ on photos2_.photo_id=photo3_.id where email=? order by albums1_.date
album: Hawaii 2014 - photo count: 25
album: Europe 2014 - photo count: 8

Note also that in this case, if the data set is too large, we may run into I/O problems with the database. Care must be taken to ensure that the total size of data retrieval never exceeds reasonable expectations. Which leads us to...

Summary: It's All About The Use Case

JPA can be sneaky. It almost seems like the queries and SQL write themselves, and complex object relationships can be defined with little effort. But this "magic" comes with a price: One must always keep in mind the use cases for the an application and attack requirements with the right tools. Without that deep understanding, it is easy to create conditions where application performance suffers in unexpected ways. With it, JPA (particularly combined with Spring and Hibernate) is, I can finally admit, a powerful and worthy component of Java EE.

Related Posts

By Chris Hut

Chris has been a Java developer for 19 years. He helped architect the SourceClear SCA platform.