Apr 1 ’05
DB2 UDB for OS/390 and z/OS Application and SQL Performance Trade-Offs
Performance is a primary concern for most application implementations, but what’s the cost of high performance? We use relational databases for a reason. They’re supposed to make things easier by separating the user from the responsibility of having to know where and how their data is stored and managed. All the user needs to know is what they want, and the database is supposed to find it. The SQL language itself is a powerful programming language, enabling users to quickly answer complex questions. Advanced database objects such as constraints, triggers, stored procedures, and user-defined functions also play a significant role in application development and performance.
Any advanced database and application design can have a significant performance impact, so we spend time tuning, but at what cost? There have to be trade-offs in any database design. Do you want centralization, reusability, security, availability, ease of maintenance, quick time to delivery, or do you want the highest performance? Maybe you just want it all.
Locating Code at the Server
There are many advanced features of DB2 Universal Database (UDB) for z/OS that let you take advantage of the power of the mainframe server:
- Complex SQL statements
- User-Defined Functions (UDFs)
- Stored procedures
- Triggers and constraints.
These advanced features facilitate rapid application development by pushing some of the logic of the application into the database server. Usually, advanced functionality can be incorporated into the database using these features at a much lower development cost than coding the feature into the application itself. A feature such as database-enforced Referential Integrity (RI) is a perfect example of something that’s easy to implement in the database, but would take significantly longer time to code in a program.
These advanced database features also let you place application logic as part of the database engine itself, making this logic more easily reusable. Reusing existing logic will mean faster time-to-market for new applications that need that logic; having the logic centrally located makes it easier to manage than client code. Often, having data-intensive logic located on the database server will result in improved performance as that logic can process the data at the server, and only return a result to the client (see Figure 1).
SQL has evolved into a robust programming language. With correlated nested table expressions, UDFs, common table expressions, and recursion, most program processes can be written as a single SQL statement. This is extremely valuable because SQL is the world’s most portable language. Yes, Java can run on a PC, mainframe, midrange server, or cellular telephone, but SQL is even more portable because it always runs on the server to which it’s directed. For mainframe applications, this is extraordinarily valuable. Any statement that can run under SPUFI on the mainframe can be cut and pasted into any piece of application code using any of several programming languages on several client platforms.
SQL that runs in SPUFI can run in a batch job, on a Web server, or even on a PC using software such as Microsoft Excel. So the more logic that can be placed into a complex SQL statement, the more portable that logic becomes. Complex SQL statements that are a part of COBOL programs today can easily be ported to a Web application server on an Intel-based machine. The more logic that’s incorporated in the SQL statement, the less logic that needs to be translated from COBOL into another language such as Java.
The SQL language is also easier to code than most other application programming languages. For example, C programs that are thousands of lines in length can be rewritten as SQL statements that are hundreds of lines in length. Simplifying and shortening the code can mean faster time to implement and test fixes or changes to the code. Since SQL is a standardized language, there’s a better chance that more people are going to be qualified to work with it than with a language such as COBOL or C. Most Java programmers know SQL, but fewer know C, and even fewer know COBOL. Programmers who have written SQL to access Oracle databases can write SQL that accesses DB2.
The reusability, flexibility, and portability of complex SQL doesn’t come without a performance price. Complex SQL can be an extreme performance advantage if the program processes written into the SQL statement aggregate or filter data. If the complex SQL statement logic is data-intensive, then it will be a performance gain over the equivalent COBOL or Java logic. If the application program can issue one large statement that returns a result rather than many smaller statements that return data, it will gain all the advantages that complex SQL has to offer and a performance advantage. If a complex SQL statement processes data, rather than filters it, then it can be a performance detriment. SQL statements that use lots of UDFs, CASE expressions, and data conversions in the SELECT clause may impact performance. Nested table expressions that contain expressions in a SELECT clause and then have references to those nested expressions in outer expressions may also significantly impact performance.
So, complex SQL statements can be a performance advantage or disadvantage. Data-intensive logic is almost always a performance advantage; data-processing logic is almost always a performance disadvantage. The trade-off is being able to balance the performance with the reusability, flexibility, and portability of complex SQL (see Figure 2).
Functions are a useful way of extending the programming power of the database engine. Functions let us push additional logic into our SQL statements. User-defined scalar functions work on individual values of a parameter list, and return a single value result. A table function can return an actual table to an SQL statement for further processing (just like any other table).
UDFs provide a major breakthrough in database programming technology; they actually let developers and DBAs extend the capabilities of the database. This allows for more processing to be pushed into the database engine, which allows these types of processes to become more centralized and controllable. Virtually any type of processing can be placed in a UDF, including legacy application programs. This can be used to create some amazing results, as well as push legacy processing into SQL statements.
Once your processing is inside SQL statements, you can put those SQL statements anywhere. So that anywhere you can run your SQL statements (say, from a Web browser), you can run your programs! So, just like complex SQL statements, UDFs place more logic into the highly portable SQL statements (see Figures 3 and 4).
Also just like complex SQL, UDFs can be a performance advantage or disadvantage. If the UDFs process large amounts of data and return a result to the SQL statement, they may be a performance advantage over the equivalent client application code. However, if a UDF is invoked to process data only, then it can be a performance disadvantage, especially if the UDF is invoked many times or embedded in a table expression, as data type casting and task switch overhead are expensive (DB2 V8 relieves some of this overhead). Converting a legacy program into a UDF in about a day’s time, invoking that program from an SQL statement, and then placing that SQL statement where it can be accessed via a client process may just be worth that expense (see Figure 5).
Stored procedures are becoming more prevalent on the mainframe and can be part of a valuable implementation strategy. Stored procedures can be a performance benefit for distributed applications or a performance problem. In every good implementation, there are trade-offs. Most of the trade-offs involve sacrificing performance for things such as flexibility, reusability, security, and time to delivery. It’s possible to minimize the impact of distributed application performance with the proper use of stored procedures.
Since stored procedures can be used to encapsulate business logic in a central location on the mainframe, they offer a great advantage as a source of secured, reusable code. By using a stored procedure, the client will only need to have authority to execute the stored procedures and won’t need authority to the DB2 tables that are accessed from the stored procedures.
A properly implemented stored procedure can help improve availability. Stored procedures can be stopped, queuing all requesters. A change can be implemented while access is prevented, and the procedures restarted once the change has been made. If business logic and SQL access are encapsulated within stored procedures, there’s less dependency on client or application server code for business processes. The client takes care of things such as display logic and edits, and the stored procedure contains the business logic. This simplifies the change process and makes the code more reusable. Like UDFs, stored procedures can be used to access legacy data stores and quickly Web-enable our legacy processes.
The major advantage to stored procedures occurs when they’re implemented in a client/server application that must issue several remote SQL statements. The network overhead involved in sending multiple SQL commands and receiving result sets is significant, so proper use of stored procedures to accept a request, process that request with encapsulated SQL statements and business logic, and return a result will lessen the traffic across the network and reduce the application overhead (see Figure 6).
If a stored procedure is coded in this manner, then it can be a significant performance improvement. Conversely, if the stored procedures contain only a few or one SQL statement, the advantages of security, availability, and reusability can be realized, but performance will be worse than the equivalent single statement executions from the client due to task switch overhead (see Figure 7).
Triggers and Constraints
Triggers and constraints can be used to move application logic into the database. The greatest advantage to triggers and constraints is that they’re generally data-intensive operations, and these types of operations are better performers when placed close to the data. These features consist of:
- Database-enforced RI
- Table check constraints.
A trigger is a database object that contains some application logic in the form of SQL statements that are invoked when data in a DB2 table is changed. These triggers are installed into the database, and are then dependent upon the table on which they’re defined. SQL DELETE, UPDATE, and INSERT statements can activate triggers. They can be used to replicate data, enforce certain rules, and fabricate data.
Database-enforced RI can be used to ensure that relationships from tables are maintained automatically. Child table data cannot be created unless a parent row exists and rules can be implemented to tell DB2 to restrict or cascade deletes to a parent when child data exists.
Table check constraints are used to ensure values of specific table columns and are invoked during LOAD, insert, and update operations.
Triggers and constraints ease the programming burden because the logic, in the form of SQL, is much easier to code than the equivalent application programming logic. This helps make the application programs smaller and easier to manage. In addition, since the triggers and constraints are connected to DB2 tables, they’re centrally located rules and universally enforced. This helps to ensure data integrity across many application processes. Triggers can also be used to automatically invoke UDFs and stored procedures, which can introduce some automatic and centrally controlled application logic.
There are advantages to using triggers and constraints. They certainly provide for better data integrity, faster application delivery time, and centrally located reusable code. Since the logic in triggers and constraints is usually data-intensive, they typically outperform the equivalent application logic simply because no data has to be returned to the application when these automated processes fire. There’s one trade-off for performance, however. When triggers, RI or check constraints are used in place of application edits, they can be a serious performance disadvantage. This is especially true if several edits on a data-entry screen are verified at the server. It could be as bad as one trip to the server and back per edit. This would seriously increase message traffic between the client and the server. For this reason, data edits are best performed at the client when possible (see Figure 8).
Advanced SQL, UDFs, stored procedures, triggers, and constraints are all powerful database features that can be used to add more control, flexibility, reusability, and security into your database engine. They allow business logic that’s data-intensive to be placed close to the data, which can provide a significant performance boost. These advanced database features also introduce more implementation choices for security, location of application code, availability, and flexibility. Centralized reusable objects improve time to delivery of new applications, and UDFs and stored procedures allow for legacy programs to be placed into SQL and rapidly made Web-available. Properly implement these features to ensure the best performance. In situations where these choices result in less than optimal performance, the business case for implementing these advanced features needs to be weighed against that cost.