Thursday, October 21, 2010

Hibernate Batch Processing with MySQL

Batch processing is a common solution to improve database performance by inserting or updating massive data in bulk. However, there are some hiccups you might run into. Let's walk through an example to discover the issues. Suppose we have a simple table named Account with column name and age. Followings are the SQL script (MySQL), Hibernate bean, and batch insert codes.

Account.sql
CREATE TABLE `Account` (
  `id` int(20) NOT NULL auto_increment,
  `name` varchar(20) NOT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Account.java
@Entity
@Table(name="Account")
public class Account implements Serializable {
 
  private static final long serialVersionUID = 6954737637154168665L;
 
  private long id;
  private String name;
  private int age;
 
  @Id @GeneratedValue(strategy=GenerationType.IDENTITY)
  public long getId() {
    return id;
  }

  public void setId(long id) {
    this.id = id;
  }

  @Basic
  @Column(name="name", nullable=false, length=20)
  public String getName() {
    return name;
  }

  public void setName(String name) {
    this.name = name;
  }
 
  @Basic
  @Column(name="age", nullable=false)
  public int getAge() {
    return age;
  }

  public void setAge(int age) {
    this.age = age;
  }
 
}

Test.java
Transaction tx = session.beginTransaction();
        
for(int i=0; i < accounts.size(); i++) {
  Account account = accounts.get(i);
  session.save(account);

  if(i % 50 == 0) {
    session.flush();
    session.clear();
  }
}
tx.commit();

First thing to notice is you will need to flush and clear the data every N (50 here) records as Hibernate caches newly inserted records in session-level cache. Otherwise, you will see OutOfMemoryException somewhere if you have a large amount of records.

Secondly, it's essential to configure hibernate.jdbc.batch_size to achieve optimal performance. The value should be the same with N. You could play with different values to find the best one for your application. In addition, the second-level cache needs to be disabled either in the configuration or through CacheMode programmingly.

But wait... Why didn't I see performance improvement??

The problem is Hibernate explicitly says:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

As a result, we have to use alternative identity generator. I am going to show you how to use MultipleHiLoPerTableGenerator due to efficiency of the HiLo algorithm for identity generation. Here are the updated annotation in Account.java as well as the new table to store the sequence's next hi value.

@GenericGenerator(name="AccountIdGenerator", strategy="org.hibernate.id.MultipleHiLoPerTableGenerator",
  parameters = {
    @Parameter(name="table", value="IdentityGenerator"),
    @Parameter(name="primary_key_column", value="sequence_name"),
    @Parameter(name="primary_key_value", value="Account"),
    @Parameter(name="value_column", value="next_hi_value"),
    @Parameter(name="primary_key_length", value="100"),
    @Parameter(name="max_lo", value="1000")
  }
)
@Id @GeneratedValue(strategy=GenerationType.TABLE, generator="AccountIdGenerator")

CREATE TABLE `IdentityGenerator` (
  `id` int(11) NOT NULL auto_increment,
  `sequence_name` varchar(100) NOT NULL,
  `next_hi_value` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE= InnoDB DEFAULT CHARSET=latin1;

Yes.. I know.. The performance still sucks.

The reason is because MySQL does not support batch form of prepare statement parameter bindings. Fortunately, we could turn on rewriteBatchedStatements so that it will rewrite them to the form as "INSERT INTO Account VALUES (...), (...), (...)".

<property name="hibernate.connection.url">
  jdbc:mysql://127.0.0.1/sheng?rewriteBatchedStatements=true
</property>

Congratulations, you are all set! You will see great performance gain with the same test set you run.