Denormalizing the Common Data
In many search situations, there are common criteria and not-so-common criteria. When you think about searching for a person, would it be faster to find someone named John Smith or someone named Zaphod Beeblebrox? Chances are John Smith, being a more common name, would be a more expensive search. Now, add in such things as potential spelling errors and you need to add such things as LIKE predicates, compound predicates connected by OR’s, or the use of hash codes or soundex algorithms to make the search more generic. This combination of factors can cause extremely slow performance for generic searches against common names.
In some of these situations, people have resorted to denormalization for improved search performance. The data is stored in a relational database for specific reasons: flexibility, data consistency, and manageability. Denormalization defeats all of these reasons and, in some cases, is an admission of failure. You should definitely try all options before considering denormalizing your database.
If you find your back against a wall and must consider denormalization, then perhaps a partial denormalization is possible. In a situation such as a 10-minute search response for a common name against 1 billion stored names, denormalizing a data store of 1 billion names is unmanageable. The search performance problems were only with the common names. The solution in this situation was full denormalization for only common names. Analysis of the data was performed, and a table of the keys for the most common names was developed. This table was used as the driver for an extract and load process to pull the common data and move it into the denormalized table (see Figure 8). It also was used as the driver for database triggers that fired whenever changes to the base tables were made and data needed to be denormalized to the common name search table. The search process was modified to search the denormalized data first, and if nothing was found, then run the regular search query against the full set of normalized data (see Figure 9).
All this can be quite complicated, but the payoff is worth it; it’s sometimes the only way to meet Service-Level Agreements (SLAs).
Generic searching can be a performance killer. Specific searches are a performance advantage. DB2 provides several options for searching either generically or specifically. Taking advantage of these features with some clever experiments and coding can prove extremely beneficial and offer an extreme advantage over denormalization.