Thursday, October 21, 2010

Hibernate Batch Processing with MySQL

Batch processing is a common solution to improve database performance by inserting or updating massive data in bulk. However, there are some hiccups you might run into. Let's walk through an example to discover the issues. Suppose we have a simple table named Account with column name and age. Followings are the SQL script (MySQL), Hibernate bean, and batch insert codes.

Account.sql
CREATE TABLE `Account` (
  `id` int(20) NOT NULL auto_increment,
  `name` varchar(20) NOT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Account.java
@Entity
@Table(name="Account")
public class Account implements Serializable {
 
  private static final long serialVersionUID = 6954737637154168665L;
 
  private long id;
  private String name;
  private int age;
 
  @Id @GeneratedValue(strategy=GenerationType.IDENTITY)
  public long getId() {
    return id;
  }

  public void setId(long id) {
    this.id = id;
  }

  @Basic
  @Column(name="name", nullable=false, length=20)
  public String getName() {
    return name;
  }

  public void setName(String name) {
    this.name = name;
  }
 
  @Basic
  @Column(name="age", nullable=false)
  public int getAge() {
    return age;
  }

  public void setAge(int age) {
    this.age = age;
  }
 
}

Test.java
Transaction tx = session.beginTransaction();
        
for(int i=0; i < accounts.size(); i++) {
  Account account = accounts.get(i);
  session.save(account);

  if(i % 50 == 0) {
    session.flush();
    session.clear();
  }
}
tx.commit();

First thing to notice is you will need to flush and clear the data every N (50 here) records as Hibernate caches newly inserted records in session-level cache. Otherwise, you will see OutOfMemoryException somewhere if you have a large amount of records.

Secondly, it's essential to configure hibernate.jdbc.batch_size to achieve optimal performance. The value should be the same with N. You could play with different values to find the best one for your application. In addition, the second-level cache needs to be disabled either in the configuration or through CacheMode programmingly.

But wait... Why didn't I see performance improvement??

The problem is Hibernate explicitly says:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

As a result, we have to use alternative identity generator. I am going to show you how to use MultipleHiLoPerTableGenerator due to efficiency of the HiLo algorithm for identity generation. Here are the updated annotation in Account.java as well as the new table to store the sequence's next hi value.

@GenericGenerator(name="AccountIdGenerator", strategy="org.hibernate.id.MultipleHiLoPerTableGenerator",
  parameters = {
    @Parameter(name="table", value="IdentityGenerator"),
    @Parameter(name="primary_key_column", value="sequence_name"),
    @Parameter(name="primary_key_value", value="Account"),
    @Parameter(name="value_column", value="next_hi_value"),
    @Parameter(name="primary_key_length", value="100"),
    @Parameter(name="max_lo", value="1000")
  }
)
@Id @GeneratedValue(strategy=GenerationType.TABLE, generator="AccountIdGenerator")

CREATE TABLE `IdentityGenerator` (
  `id` int(11) NOT NULL auto_increment,
  `sequence_name` varchar(100) NOT NULL,
  `next_hi_value` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE= InnoDB DEFAULT CHARSET=latin1;

Yes.. I know.. The performance still sucks.

The reason is because MySQL does not support batch form of prepare statement parameter bindings. Fortunately, we could turn on rewriteBatchedStatements so that it will rewrite them to the form as "INSERT INTO Account VALUES (...), (...), (...)".

<property name="hibernate.connection.url">
  jdbc:mysql://127.0.0.1/sheng?rewriteBatchedStatements=true
</property>

Congratulations, you are all set! You will see great performance gain with the same test set you run.

Tuesday, June 22, 2010

Running Seam 2.0 on Tomcat

Java Server Faces or JSF is a component-based MVC framework designed for rapid development. It includes predefined UI components and event model. The framework enables the integration of third party components. One of my favorites is RichFaces which provides an array of rich components and skinability.

JBoss Seam is a powerful enterprise web framework for seamless integration between JSF and EJB. Seam supports several features not in the JSF such as conversation scope, JBPM integration, navigation enhancement, just to name a few (Some features are supported since JSF 2.0).

If you had worked on JSF project without using Seam, you might see some exception complaining your session is closed when rendering Hibernate objects. With Seam, you can inject the Hibernate session (or EntityManager) into your bean. Seam will manage the session for you. It saves some tedious plumbing codes to to use Data Transfer Object.

However, I am not the fan of EJB as it is heavyweight. And you would need to deploy on J2EE compliance server such as JBoss. It takes comparatively longer to start up than our old friend Tomcat. So I began my research on how to deploy my web application on Tomcat without sacrificing the features I want in Seam. The first attempt was to use embedded JBoss in Tomcat. It works, but... it still gave me some exceptions during startup and I like to make deployment easier. Why can't I just deploy war file on Tomcat without any change? Fortunately, here is the solution. I assume readers have knowledge on how to configure JPA and Seam.

persistence.xml in the META-INF
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd">

  <persistence-unit name="testdb" type="RESOURCE_LOCAL">
    <provider>org.hibernate.ejb.HibernatePersistence</provider>
    <properties>
      <name="hibernate.connection.driver_class" value="org.gjt.mm.mysql.Driver"/>
      <name="hibernate.dialect" value="org.hibernate.dialect.MySQLInnoDBDialect"/>
      <name="hibernate.connection.url" value="jdbc:mysql://127.0.0.1/test"/>
      <name="hibernate.connection.username" value="test"/>
      <name="hibernate.connection.password" value="test"/>
      <name="hibernate.default_schema" value="test"/>
      <name="hibernate.bytecode.use_reflection_optimizer" value="false"/>
      <name="hibernate.cache.provider_class" value="org.hibernate.cache.HashtableCacheProvider"/>
      <name="show_sql" value="false"/>

      <name="hibernate.connection.provider_class" value="org.hibernate.connection.C3P0ConnectionProvider"/>
      <name="hibernate.c3p0.acquire_increment" value="1"/>
      <name="hibernate.c3p0.idle_test_period" value="120"/>
      <name="hibernate.c3p0.timeout" value="60"/>
      <name="hibernate.c3p0.max_size" value="50"/>
      <name="hibernate.c3p0.min_size" value="1"/>
      <name="hibernate.c3p0.max_statements" value="0"/>
      <name="hibernate.c3p0.preferredTestQuery" value="select 1;"/>
    </properties>
  </persistence-unit>
</persistence>

components.xml in Seam
<components xmlns="http://jboss.com/products/seam/components"
xmlns:core="http://jboss.com/products/seam/core"
xmlns:persistence="http://jboss.com/products/seam/persistence"
xmlns:transaction="http://jboss.com/products/seam/transaction"
xmlns:security="http://jboss.com/products/seam/security"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ui="http://jboss.com/products/seam/ui"
xsi:schemaLocation=
"http://jboss.com/products/seam/core http://jboss.com/products/seam/core-2.1.xsd 
http://jboss.com/products/seam/persistence http://jboss.com/products/seam/persistence-2.1.xsd
http://jboss.com/products/seam/transaction http://jboss.com/products/seam/transaction-2.1.xsd 
http://jboss.com/products/seam/security http://jboss.com/products/seam/security-2.1.xsd 
http://jboss.com/products/seam/components http://jboss.com/products/seam/components-2.1.xsd">

  <core:init debug="false" transaction-management-enabled="false"/>
  <core:manager conversation-timeout="120000"/>
  <core:resource-loader>
  <core:bundle-names>
    <value>org.sheng.bundle.messages</value>
  </core:bundle-names>
  </core:resource-loader>

  <persistence:entity-manager-factory name="entityManagerFactory" persistence-unit-name="testdb"/>
  <persistence:managed-persistence-context name="entityManager" auto-create="true" entity-manager-factory="#{entityManagerFactory}"/>
  <transaction:entity-transaction entity-manager="#{entityManager}"/>
  <factory name="session" scope="STATELESS" auto-create="true" value="#{entityManager.delegate}"/>
</components>

One thing to notice is the persistence-unit-name needs to match the one in persistence.xml. If you are using JPA, just remove the factory tag and inject your entityManager. Now you could simply drop the war file in the webapps folder of your tomcat. The version I use at the time of this writing is 6.0.26.

Tuesday, March 16, 2010

Effective Unit Testing with TestNG

Writing good unit tests is essential to maintain product quality. JUnit is a popular testing framework for Java. Starting with JUnit 4, it has a radical change on the framework in support of annotation based configuration. With annotations, it's no longer required to extend TestCase for all your test cases. However, the lack of good supporting dependency testing makes TestNG, another test automation framework, a formidable alternative to JUnit.

TestNG natively supports dependency testing with dependsOnMethods and dependsOnGroups. It has a concept of group that helps organize test cases on the level of category. Another important feature is parametric testing by using Data Provider. A variety of data values can be injected into same test method to verify different scenarios.

Ideally, every project should have as complete test cases covered as possible so that future enhancements could be assured not breaking existing codes. But, let's face the real world. Developers are driven by tight deadlines. Sluggish testing process would make them reluctant to run the tests.

With the help of in-memory database, you could run your unit testing comfortably fast. Here is an example of how Hibernate defines in-memory database using H2, a SQL database written in Java and proven very fast.

test-hibernate.cfg.xml
<!-- Basic -->
<property name="hibernate.connection.driver_class">org.h2.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.HSQLDialect</property>
<property name="hibernate.connection.url">jdbc:h2:mem:test</property>
<property name="hibernate.connection.username">username</property>
<property name="hibernate.connection.password">password</property>
<property name="hibernate.connection.default_schema">test</property>
<property name="hbm2ddl.auto">create</property>

<!-- Mappings -->
<mapping class="org.sheng.db.User"/>

The property hbm2ddl.auto is set to create to make the testing process idempotent, meaning every execution of same test cases generates same result. You also don't need to clean testing data created from testing phases. Please note I intentionally use HSQLDialect here due to H2Dialect in some version of Hibernate is buggy. Followings are more details of the example:

testng.xml
<suite name="Test Suite">
  <test name="Test Cases">
    <classes>
      <class name="org.sheng.test.MyTest"/>
    </classes>
  </test>
</suite>

MyTest.java
public class MyTest {

  protected Session session;

  @BeforeSuite
  public void init() throws Exception {
    DaoFactory.configureSessionFactory("test-hibernate.cfg.xml");
  }

  @BeforeClass
  public void setUp() throws Exception {
    session = DaoFactory.newSession();
  }

  @DataProvider(name = "users") 
  public Object[][] getUsers() { 
    return new Object[][] { 
      new Object[] { "admin", "password" },
      new Object[] { "manager", "password" },
      new Object[] { "user1", "password" },
      new Object[] { "user2", "password" }
    };
  }

  @Test(groups = {"database"}, dataProvider = "users")
  public void createUser(String username, String password) throws Exception {
    User user = new User();
    user.setUsername(username);
    uesr.setPassword(password);
    DaoFactory.getUserDao().createUser(session, user);
  }

  @Test(groups = {"database"}, dependsOnMethods = {"createUser"})
  public void verifyUsers() throws Exception {
    //Verify users
  }

}

MyTest.java
public class DaoFactory {

  private static AnnotationConfiguration configuration;
  private static SessionFactory sessionFactory;

  public synchronized static void configureSessionFactory(String resource) {
    URL url = getResource(resource);
    configuration = new AnnotationConfiguration();
    sessionFactory = configuration.configure(url).buildSessionFactory();
  }

  public static Session newSession() {
    return sessionFactory.openSession();
  }

  public static URL getResource(String resource) {
    ClassLoader classLoader = null;
    URL url = null;

    try {
      Method method = Thread.class.getMethod("getContextClassLoader", (Class[])null);
      classLoader = (ClassLoader)method.invoke(Thread.currentThread(), (Object[])null);
      url = classLoader.getResource(resource);      

      if(url != null) {
        return url;
      }
    }
    catch(Exception e) {
      e.printStackTrace()
    }

    return ClassLoader.getSystemResource(resource);
  }

}

Effective testing process is important to the success of a project. It's difficult to keep track of every change especially when you are collaborating with other teammates. Well-written test cases also serves as documents to help a developer who is new to the project to make contributions in a timely manner.