Best practices to improve performance in JDBC
This topic illustrates the best practices to improve performance in JDBCwith the following sections:
Overview of JDBC
JDBC defines how a Java program can communicate with a database. Thissection focuses mainly on JDBC 2.0 API. JDBC API provides two packagesthey are java.sql and javax.sql . By using JDBC API, you can connectvirtually any database, send SQL queries to the database and process theresults.
JDBC architecture defines different layers to work with any database andjava, they are JDBC API interfaces and classes which are at top most layer(to work with java ), a driver which is at middle layer (implements the JDBCAPI interfaces that maps java to database specific language) and a databasewhich is at the bottom (to store physical data). The following figureillustrates the JDBC architecture.
JDBC API provides interfaces and classes to work with databases. Connectioninterface encapsulates database connection functionality, Statementinterface encapsulates SQL query representation and execution functionalityand ResultSet interface encapsulates retrieving data which comes fromexecution of SQL query using Statement.
The following are the basic steps to write a JDBC program
1. Import java.sql and javax.sql packages
2. Load JDBC driver
3. Establish connection to the database using Connection interface
4. Create a Statement by passing SQL query
5. Execute the Statement
6. Retrieve results by using ResultSet interface
7. Close Statement and Connection
We will look at these areas one by one, what type of driver you need toload, how to use Connection interface in the best manner, how to usedifferent Statement interfaces, how to process results using ResultSet andfinally how to optimize SQL queries to improve JDBC performance.
Note1: Your JDBC driver should be fully compatible with JDBC 2.0 features inorder to use some of the suggestions mentioned in this section.
Note2: This Section assumes that reader has some basic knowledge of JDBC.
* Choosing right Driver *
Here we will walk through initially about the types of drivers, availabilityof drivers, use of drivers in different situations, and then we will discussabout which driver suits your application best.
Driver is the key player in a JDBC application, it acts as a mediatorbetween Java application and database. It implements JDBC API interfaces fora database, for example Oracle driver for oracle database, Sybase driver forSybase database. It maps Java language to database specific languageincluding SQL.
JDBC defines four types of drivers to work with. Depending on yourrequirement you can choose one among them.
Here is a brief description of each type of driver :
Type of driver Tier Driver mechanism Description 1 Two JDBC-ODBC Thisdriver converts JDBC calls to ODBC calls through JDBC-ODBC Bridge driverwhich in turn converts to database calls. Client requires ODBC libraries. 2Two Native API - Partly - Java driver This driver converts JDBC calls todatabase specific native calls. Client requires database specific libraries.3 Three JDBC - Net -All Java driver This driver passes calls to proxyserver through network protocol which in turn converts to database calls andpasses through database specific protocol. Client doesn't require anydriver. 4 Two Native protocol - All - Java driver This driver directlycalls database. Client doesn't require any driver.
Obviously the choice of choosing a driver depends on availability of driverand requirement. Generally all the databases support their own drivers orfrom third party vendors. If you don't have driver for your database,JDBC-ODBC driver is the only choice because all most all the vendors supportODBC. If you have tiered requirement ( two tier or three tier) for yourapplication, then you can filter down your choices, for example if yourapplication is three tiered, then you can go for Type three driver betweenclient and proxy server shown below. If you want to connect to database fromjava applet, then you have to use Type four driver because it is only thedriver which supports that feature. This figure shows the overall picture ofdrivers from tiered perspective.
This figure illustrates the drivers that can be used for two tiered andthree tiered applications. For both two and three tiered applications, youcan filter down easily to Type three driver but you can use Type one, twoand four drivers for both tiered applications. To be more precise, for javaapplications( non-applet) you can use Type one, two or four driver. Here isexactly where you may make a mistake by choosing a driver without takingperformance into consideration. Let us look at that perspective in thefollowing section.
Type 3 & 4 drivers are faster than other drivers because Type 3 givesfacility for optimization techniques provided by application server such asconnection pooling, caching, load balancing etc and Type 4 driver need nottranslate database calls to ODBC or native connectivity interface. Type 1drivers are slow because they have to convert JDBC calls to ODBC throughJDBC-ODBC Bridge driver initially and then ODBC Driver converts them intodatabase specific calls. Type 2 drivers give average performance whencompared to Type 3 & 4 drivers because the database calls have to beconverted into database specific calls. Type 2 drivers give betterperformance than Type 1 drivers.
Finally, to improve performance
1. Use Type 4 driver for applet to database communication.
2. Use Type 2 driver for two tiered applications for communication betweenjava client and the database that gives better performance when compared toType1 driver
3. Use Type 1 driver if your database doesn't support a driver. This is raresituation because almost all major databases support drivers or you will getthem from third party vendors.
4.Use Type 3 driver to communicate between client and proxy server (weblogic, websphere etc) for three tiered applications that gives betterperformance when compared to Type 1 & 2 drivers.
*Optimization with Connection*
java.sql package in JDBC provides Connection interface that encapsulatesdatabase connection functionality. Using Connection interface, you can finetune the following operations :
1. Set optimal row pre-fetch value
2. Use Connection pool
3. Control transaction
4. Choose optimal isolation level
5. Close Connection when finished
Each of these operations effects the performance. We will walk through eachoperation one by one.
*1. Set optimal row pre-fetch value*
We have different approaches to establish a connection with the database,the first type of approach is :
1. DriverManager.getConnection(String url)
2. DriverManager.getConnection(String url, Properties props)
* * 3. DriverManager.getConnection(String url, String user, Stringpassword)
4. Driver.connect(String url, Properties props)
When you use this approach, you can pass database specific information tothe database by passing properties using Properties object to improveperformance. For example, when you use oracle database you can pass defaultnumber of rows that must be pre-fetched from the database server and thedefault batch value that triggers an execution request. Oracle has defaultvalue as 10 for both properties. By increasing the value of theseproperties, you can reduce the number of database calls which in turnimproves performance. The following code snippet illustrates this approach.
java.util.Properties props = new java.util.Properties();
props.put("user","scott");
props.put("password","tiger");
props.put("defaultRowPrefetch","30");
props.put("defaultBatchValue","5");
Connection con =DriverManger.getConnection("jdbc:oracle:thin:@hoststring", props);
You need to figure out appropriate values for above properties for betterperformance depending on application's requirement. Suppose, you want to setthese properties for search facility, you can increase defaultRowPrefetch sothat you can increase performance significantly.
The second type of approach is to get connection from DataSource.
You can get the connection using javax.sql.DataSource interface. Theadvantage of getting connection from this approach is that the DataSourceworks with JNDI. The implementation of DataSource is done by vendor, forexample you can find this feature in weblogic, websphere etc. The vendorsimply creates DataSource implementation class and binds it to the JNDItree. The following code shows how a vendor creates implementation class andbinds it to JNDI tree.
DataSourceImpl dsi = new DataSourceImpl();
dsi.setServerName("oracle8i");
dsi.setDatabaseName("Demo");
Context ctx = new InitialContext();
ctx.bind("jdbc/demoDB", dsi);
This code registers the DataSourceImpl object to the JNDI tree, then theprogrammer can get the DataSource reference from JNDI tree without knowledgeof the underlying technology.
Context ctx = new InitialContext();
DataSource ds = (DataSource)ctx.lookup("jdbc/demoDB");
Connection con = ds.getConnection();
By using this approach we can improve performance. Nearly all major vendorapplication servers like weblogic, webshpere implement the DataSource bytaking connection from connection pool rather than a single connection everytime. The application server creates connection pool by default. We willdiscuss the advantage of connection pool to improve performance in the nextsection.
*2. Use Connection pool*
Creating a connection to the database server is expensive. It is even moreexpensive if the server is located on another machine. Connection poolcontains a number of open database connections with minimum and maximumconnections, that means the connection pool has open connections betweenminimum and maximum number that you specify. The pool expands and shrinksbetween minimum and maximum size depending on incremental capacity. You needto give minimum, maximum and incremental sizes as properties to the pool inorder to maintain that functionality. You get the connection from the poolrather directly .For example, if you give properties like min, max andincremental sizes as 3, 10 and 1 then pool is created with size 3 initiallyand if it reaches it's capacity 3 and if a client requests a connectionconcurrently, it increments its capacity by 1 till it reaches 10 and lateron it puts all its clients in a queue.
There are a few choices when using connection pool.
1. You can depend on application server if it supports this feature,generally all the application servers support connection pools. Applicationserver creates the connection pool on behalf of you when it starts. You needto give properties like min, max and incremental sizes to the applicationserver.
2. You can use JDBC 2.0 interfaces, ConnectionPoolDataSource andPooledConnection if your driver implements these interfaces
3. Or you can create your own connection pool if you are not using anyapplication server or JDBC 2.0 compatible driver.
By using any of these options, you can increase performance significantly.You need to take care of properties like min, max and incremental sizes. Themaximum number of connections to be given depends on your application'srequirement that means how many concurrent clients can access your databaseand also it depends up on your database's capability to provide maximumnumber of connections.
*3. Control transaction*
In general, transaction represents one unit of work or bunch of code in theprogram that executes in it's entirety or none at all. To be precise, it isall or no work. In JDBC, transaction is a set of one or more Statements thatexecute as a single unit.
java.sql.Connection interface provides some methods to control transactionthey are
public interface Connection {
boolean getAutoCommit();
void setAutoCommit(boolean autocommit);
void commit();
void rollback();
}
JDBC's default mechanism for transactions:
By default in JDBC transaction starts and commits after each statement'sexecution on a connection. That is the AutoCommit mode is true. Programmerneed not write a commit() method explicitly after each statement.
Obviously this default mechanism gives good facility for programmers if theywant to execute a single statement. But it gives poor performance whenmultiple statements on a connection are to be executed because commit isissued after each statement by default, that in turn reduces performance byissuing unnecessary commits. The remedy is to flip it back to AutoCommitmode as false and issue commit() method after a set of statements execute,this is called as batch transaction. Use rollback() in catch block torollback the transaction whenever an exception occurs in your program. Thefollowing code illustrates the batch transaction approach.
try{
connection.setAutoCommit(false);
PreparedStatement ps = connection.preareStatement( "UPDATE employee SETAddress=? WHERE name=?");
ps.setString(1,"Austin");
ps.setString(2,"RR");
ps.executeUpdate();
PreparedStatement ps1 = connection.prepareStatement( "UPDATE account SETsalary=? WHERE name=?");
ps1.setDouble(1, 5000.00);
ps1.setString(2,"RR");
ps1.executeUpdate();
connection.commit();
connection.setAutoCommit(true);
}catch(SQLException e){ connection.rollback();}
finally{
if(ps != null){ ps.close();}
if(ps1 != null){ps1.close();}
if(connection != null){connection.close();}
}
This batch transaction gives good performance by reducing commit calls aftereach statement's execution.
*4. Choose optimal isolation level*
Isolation level represent how a database maintains data integrity againstthe problems like dirty reads, phantom reads and non-repeatable reads whichcan occur due to concurrent transactions. java.sql.Connection interfaceprovides methods and constants to avoid the above mentioned problems bysetting different isolation levels.
public interface Connection {
public static final intTRANSACTION_NONE = 0
public static final int TRANSACTION_READ_COMMITTED = 2
public static final int TRANSACTION_READ_UNCOMMITTED = 1
public static final int TRANSACTION_REPEATABLE_READ = 4
public static final int TRANSACTION_SERIALIZABLE =8
int getTransactionIsolation();
void setTransactionIsolation(int isolationlevelconstant);
}
You can get the existing isolation level with getTransactionIsolation()method and set the isolation level with setTransactionIsolation(intisolationlevelconstant) by passing above constants to this method.
The following table describes isolation level against the problem that itprevents :
Transaction Level Permitted Phenomena Performanceimpact Dirty reads Non Repeatable reads Phantom readsTRANSACTION_NONE N/A N/A N/A FASTEST TRANSACTION_READ_UNCOMMITED YES YESYES FASTEST TRANSACTION_READ_COMMITED NO YES YES FASTTRANSACTION_REPEATABLE_READ NO NO YES MEDIUM TRANSACTION_SERIALIZABLE NO NONO SLOW
YES means that the Isolation level does not prevent the problem
NO means that the Isolation level prevents the problem
By setting isolation levels, you are having an impact on the performance asmentioned in the above table. Database use read and write locks to controlabove isolation levels. Let us have a look at each of these problems andthen look at the impact on the performance.
*Dirty read problem :*
The following figure illustrates Dirty read problem :
Step 1: Database row has PRODUCT = A001 and PRICE = 10
Step 2: Connection1 starts Transaction1 (T1) .
Step 3: Connection2 starts Transaction2 (T2) .
Step 4: T1 updates PRICE =20 for PRODUCT = A001
Step 5: Database has now PRICE = 20 for PRODUCT = A001
Step 6: T2 reads PRICE = 20 for PRODUCT = A001
Step 7: T2 commits transaction
Step 8: T1 rollbacks the transaction because of some problem
The problem is that T2 gets wrong PRICE=20 for PRODUCT = A001 instead of 10because of uncommitted read. Obviously it is very dangerous in criticaltransactions if you read inconsistent data. If you are sure about notaccessing data concurrently then you can allow this problem bysetting TRANSACTION_READ_UNCOMMITEDor TRANSACTION_NONE that in turn improves performance otherwise you have touse TRANSACTION_READ_COMMITED to avoid this problem.
*Unrepeatable read problem :*
The following figure illustrates Unrepeatable read problem :
Step 1: Database row has PRODUCT = A001 and PRICE = 10
Step 2: Connection1 starts Transaction1 (T1) .
Step 3: Connection2 starts Transaction2 (T2) .
Step 4: T1 reads PRICE =10 for PRODUCT = A001
Step 5: T2 updates PRICE = 20 for PRODUCT = A001
Step 6: T2 commits transaction
Step 7: Database row has PRODUCT = A001 and PRICE = 20
Step 8: T1 reads PRICE = 20 for PRODUCT = A001
Step 9: T1 commits transaction
Here the problem is that Transaction1 reads 10 first time and reads 20second time but it is supposed to be 10 always whenever it reads a record inthat transaction. You can control this problem by setting isolation level asTRANSACTION_REPEATABLE_READ.
*Phantom read problem :*
The following figure illustrates Phantom read problem :
Step 1: Database has a row PRODUCT = A001 and COMPANY_ID = 10
Step 2: Connection1 starts Transaction1 (T1) .
Step 3: Connection2 starts Transaction2 (T2) .
Step 4: T1 selects a row with a condition SELECT PRODUCT WHERE COMPANY_ID= 10
Step 5: T2 inserts a row with a condition INSERT PRODUCT=A002 WHERE
COMPANY_ID= 10
Step 6: T2 commits transaction
Step 7: Database has 2 rows with that condition
Step 8: T1 select again with a condition SELECT PRODUCT WHERECOMPANY_ID=10
and gets 2 rows instead of 1 row
Step 9: T1 commits transaction
Here the problem is that T1 gets 2 rows instead of 1 row up on selecting thesame condition second time. You can control this problem by settingisolation level as TRANSACTION_SERIALIZABLE
*Choosing a right isolation level for your program:*
Choosing a right isolation level for your program depends upon yourapplication's requirement. In single application itself the requirementgenerally changes, suppose if you write a program for searching a productcatalog from your database then you can easily chooseTRANSACTION_READ_UNCOMMITEDbecause you need not worry about the problems that are mentioned above, someother program can insert records at the same time, you don't have to bothermuch about that insertion. Obviously this improves performancesignificantly.
If you write a critical program like bank or stocks analysis program whereyou want to control all of the above mentioned problems, you can chooseTRANSACTION_SERIALIZABLE for maximum safety. Here it is the tradeoff betweenthe safety and performance. Ultimately we need safety here.
If you don't have to deal with concurrent transactions your application,then the best choice is TRANSACTION_NONE to improve performance.
Other two isolation levels need good understanding of your requirement. Ifyour application needs only committed records, thenTRANSACTION_READ_COMMITED isolation is the good choice. If your applicationneeds to read a row exclusively till you finish your work, thenTRANSACTION_REPEATABLE_READ is the best choice.
*Note*: Be aware of your database server's support for these isolationlevels. Database servers may not support all of these isolation levels.Oracle server supports only two isolation levels, TRANSACTION_READ_COMMITEDandTRANSACTION_SERIALIZABLE isolation level, default isolation level isTRANSACTION_READ_COMMITED.
*5. Close Connection when finished*
Closing connection explicitly allows garbage collector to recollect memoryas early as possible. Remember that when you use the connection pool,closing connection means that it returns back to the connection pool ratherthan closing direct connection to the database.
*Optimization with Statement*
Statement interface represents SQL query and execution and they providenumber of methods and constants to work with queries. They also provide somemethods to fine tune performance. Programmer may overlook these fine tuningmethods that result in poor performance. The following are the tips toimprove performance by using statement interfaces
1. Choose the right Statement interface
2. Do batch update
3. Do batch retrieval using Statement
2. Close Statement when finished
*1. Choose right Statement interface*
There are three types of Statement interfaces in JDBC to represent the SQLquery and execute that query, they are Statement, PreparedStatement andCallableStatement.
Statement is used for static SQL statement with no input and outputparameters, PreparedStatement is used for dynamic SQL statement with inputparameters and CallableStatement is used for dynamic SQL satement with bothinput and output parameters, but PreparedStatement and CallableStatement canbe used for static SQL statements as well. CallableStatement is mainly meantfor stored procedures.
PreparedStatement gives better performance when compared to Statementbecause it is pre-parsed and pre-compiled by the database once for the firsttime and then onwards it reuses the parsed and compiled statement. Becauseof this feature, it significantly improves performance when a statementexecutes repeatedly, It reduces the overload incurred by parsing andcompiling.
CallableStatement gives better performance when compared toPreparedStatement and Statement when there is a requirement for singlerequest to process multiple complex statements. It parses and stores thestored procedures in the database and does all the work at database itselfthat in turn improves performance. But we loose java portability and we haveto depend up on database specific stored procedures.
*2. Do batch update*
You can send multiple queries to the database at a time using batch updatefeature of statement objects this reduces the number of JDBC calls andimproves performance. Here is an example of how you can do batch update,
statement.addBatch( "sql query1");
statement.addBatch(" sql query2");
statement.addBatch(" sql query3");
statement.executeBatch();
All three types of statements have these methods to do batch update.
*3. Do batch retrieval using Statement*
You can get the default number of rows that is provided by the driver. Youcan improve performance by increasing number of rows to be fetched at a timefrom database using setFetchSize() method of the statement object.
Initially find the default size by using
Statement.getFetchSize(); and then set the size as per your requirement
Statement.setFetchSize(30);
Here it retrieves 30 rows at a time for all result sets of this statement.
*4. Close Statement when finished*
Close statement object as soon as you finish working with that, itexplicitly gives a chance to garbage collector to recollect memory as earlyas possible which in turn effects performance.
Statement.close();
*Optimization with ResultSet*
ResultSet interface represents data that contains the results of executingan SQL Query and it provides a number of methods and constants to work withthat data. It also provides methods to fine tune retrieval of data toimprove performance. The following are the fine tuning tips to improveperformance by using ResultSet interface.
1. Do batch retrieval using ResultSet
2. Set up proper direction for processing the rows
3. Use proper get methods
4. Close ResultSet when finished
*1. Do batch retrieval using ResultSet*
ResultSet interface also provides batch retrieval facility like Statement asmentioned above. It overrides the Statement behaviour.
Initially find the default size by using
ResultSet.getFetchSize(); and then set the size as per requirement
ResultSet.setFetchSize(50);
This feature significantly improves performance when you are dealing withretrieval of large number of rows like search functionality.
*2. Setup proper direction of processing rows*
ResultSet has the capability of setting the direction in which you want toprocess the results, it has three constants for this purpose, they are
FETCH_FORWARD, FETCH_REVERSE, FETCH_UNKNOWN
Initially find the direction by using
ResultSet.getFetchDirection(); and then set the direction accordingly
ResultSet.setFetchDirection(FETCH_REVERSE);
*3. Use proper getxxx() methods*
ResultSet interface provides lot of getxxx() methods to get and convertdatabase data types to java data types and is flexibile in converting nonfeasible data types. For example,
getString(String columnName) returns java String object.
columnName is recommended to be a VARCHAR OR CHAR type of database but itcan also be a NUMERIC, DATE etc.
If you give non recommended parameters, it needs to cast it to proper javadata type that is expensive. For example consider that you select aproduct's id from huge database which returns millions of records fromsearch functionality, it needs to convert all these records that is veryexpensive.
So always use proper getxxx() methods according to JDBC recommendations.
*4. Close ResultSet when finished*
Close ResultSet object as soon as you finish working with ResultSet objecteven though Statement object closes the ResultSet object implicitly when itcloses, closing ResultSet explicitly gives chance to garbage collector torecollect memory as early as possible because ResultSet object may occupylot of memory depending on query.
ResultSet.close();
*Optimization with SQL Query *
This is one of the area where programmers generally make a mistake
If you give a query like
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("select * from employee where name=RR");
The returned result set contains all the columns data. you may not need allthe column data and want only salary for RR.
The better query is "select salary from employee where name=RR"
It returns the required data and reduces unnecessary data retrieval.
*Cache the read-only and read-mostly data*
Every database schema generally has read-only and read-mostly tables. Thesetables are called as lookup tables. Read-only tables contain static datathat never changes in its life time. Read-mostly tables contain semi dynamicdata that changes often. There will not be any sort of writing operations inthese tables.
If an application reads data from these tables for every client request,then it is redundant, unnecessary and expensive. The solution for thisproblem is to cache the read-only table data by reading the data from thattable once and caching the read-mostly table data by reading and refreshingwith time limit. This solution improves performance significantly. See thefollowing link for source code of such caching mechanism.
http://www.javaworld.com/javaworld/jw-07-2001/jw-0720-cache.html
You can tweak this code as per application requirement. For read-only data,you need not refresh data in its life time. For read-mostly data, you needto refresh the data with time limit. It is better to set this refreshingtime limit in properties file so that it can be changed at any time.
*Fetch small amount of data iteratively instead of fetching whole data atonce*
Applications generally require to retrieve huge data from the database usingJDBC in operations like searching data. If the client request for a search,the application might return the whole result set at once. This processtakes lot of time and has an impact on performance. The solution for theproblem is
1. Cache the search data at the server-side and return the data iterativelyto the client. For example, the search returns 1000 records, return data tothe client in 10 iterations where each iteration has 100 records.
2. Use Stored procedures to return data iteratively. This does not useserver-side caching rather server-side application uses Stored procedures toreturn small amount of data iteratively.
Out of these solutions the second solution gives better performance becauseit need not keep the data in the cache (in-memory). The first procedure isuseful when the total amount of data to be returned is not huge.
*Key Points*
1. Use Type two driver for two tiered applications to communicate fromjava client to database that gives better performance than Type1 driver.
2. Use Type four driver for applet to database communication that is twotiered applications and three tiered applications when compared to otherdrivers.
3. Use Type one driver if you don't have a driver for your database. Thisis a rare situation because all major databases support drivers or you willget a driver from third party vendors.
4. Use Type three driver to communicate between client and proxy server (weblogic, websphere etc) for three tiered applications that gives betterperformance when compared to Type 1 &2 drivers.
5. Pass database specific properties like defaultPrefetch if yourdatabase supports any of them.
6. Get database connection from connection pool rather than getting itdirectly
7. Use batch transactions.
8. Choose right isolation level as per your requirement.TRANSACTION_READ_UNCOMMITED gives best performance for concurrenttransaction based applications. TRANSACTION_NONE gives best performancefor non-concurrent transaction based applications.
9. Your database server may not support all isolation levels, be aware ofyour database server features.
10. Use PreparedStatement when you execute the same statement more thanonce.
11. Use CallableStatement when you want result from multiple and complexstatements for a single request.
12. Use batch update facility available in Statements.
13. Use batch retrieval facility available in Statements or ResultSet.
14. Set up proper direction for processing rows.
15. Use proper getXXX() methods.
16. Close ResultSet, Statement and Connection whenever you finish yourwork with them.
17. Write precise SQL queries.
18. Cache read-only and read-mostly tables data.
19. Fetch small amount of data iteratively rather than whole data at oncewhen retrieving large amount of data like searching database etc.
Monday, April 13, 2009
Subscribe to:
Posts (Atom)