Reorg

Reorg is rearranging the rows according to the clustering order while preserving free space and reusing space. These actions usually improve access performance, especially when the access depends on clustering. It is also important to be proactive and identify critical conditions that may affect availability if an object is running out of space or reaching its maximum size. The DB2 administrator spends time and efforts in identifying those specific objects that require REORG to avoid performance and availability problems. An automated process is needed to identify the objects in the most critical condition, generate utility statements and execute the reorg jobs.

 

Identifying Candidates

Catalog statistics are commonly used to identify objects for REORG such as cluster ratio, number of relocated overflow rows or rows inserted not in optimal position. Statistics in the catalog are updated by statistics collection and might be outdated, especially when the condition of the object changes rapidly. The condition of the object can be equally identified using Real Time Statistics without relying on outdated catalog statistics. Real Time Statistics provide accurate current statistics on the rate of changes since the last reorg, number of extents, mass deletes, relocated rows and un-clustered rows. You can use Real Time Statistics to identify the objects that require REORG and generate jobs accordingly.

 

Triggering REORG

TUC allows you to define rules to examine Real Time Statistics and catalog statistics and qualify objects for REORG execution. REORG jobs can be automatically triggered  and handed over to a scheduler to REORG only the selected objects. TUC processes the indexes explicitly only if the associated tablespace partition was not selected for REORG.

 

Prioritizing Objects

It is a good idea to first reorg objects in critical condition or path. Objects should be prioritized and processed by the order of priority. Objects are qualified for reorg based on different criteria. If REORG is needed because of a high number of extents, then objects with more extents should be processed first. If REORG is needed because of a poor cluster ratio then objects with lower ratio should be processed first. The problem is how to determine which objects should be processed first if these different criteria are used. It should be possible to assign a priority based on the computed value of each condition using the same scale. In addition, objects in the critical path can be assigned with fixed priority that position the objects at the top of the list regardless of their condition. For example, tables that are critical for availability and performance.

priorities

TUC allows you to define priorities based  on the condition of each object. This way objects in critical condition can be processed first. For example, a tablespace with 200 extents should be processed before an object with 50 extents.

 

REORG based on Performance

Performance depends on access path and may be severely affected if the need for REORG is ignored. However, in some cases, REORG does not improve the access performance and there is no point in REORG unless there is a space issue. To avoid unnecessary REORG, it is important to monitor the access performance and identify if there is any increase in the number of synchronous IO requests that is usually associated with performance degradation. The idea is to REORG objects only when there is an impact on performance. TUC allows you to trigger REORG when there is a performance decrease and save the resources used by useless REORGs. TUC collects statistics trace records and is analyzing the access performance to justify REORG. An object is automatically excluded from REORG if it was identified as a candidate for REORG but does not suffer from increased IO.

reorgbyio

Processing Quota

You might not be able to afford to reorg all the objects that require REORG and prefer to REORG only some of the objects within the limits of your batch window and available resources. It is a good idea to limit the processed objects and process only a quota based on the number or size of the objects. The objects must be first sorted by priority and only the top priority objects that fit into the defined quota should be processed. Quota based on the size of the objects is easier to adjust into a predefined batch window. However, in some cases, a deadline must be defined to force normal termination of reorg jobs that run beyond the specified batch window. TUC allows you to define a quota of objects or total size to process only a limited number of objects.

batchwindow

REORG indexes

Indexes are rebuilt when the tablespace is reorged. However, in many cases, when there is no need to reorg the tablespace, the indexes may still require reorg. In fact, indexes tend to require reorg more often than the associated tablespaces. It should be possible to identify if the associated tablespace was selected for reorg, and generate REOGR INDEX statements accordingly.

union

Resizing Objects

REORG deletes and redefines the VSAM clusters using the primary and secondary allocation quantities defined in the DB2 catalog. Objects space can be altered prior to reorg. Space should be calculated considering the expected number of rows in each tables or at least the current number of rows to avoid failures on going out of space. You can use Real Time Statistics to get the current number of rows in each tablespace. The calculated space must also take into consideration free space and compression. Statistics history can provide indication on the growth rate of the table and you can allocate more space to accommodate the expected future growth.

 

TUC allows you to estimate  the maximum number of expected rows for each table. TUC automatically corrects the estimated number of rows when it is exceeded by the actual number of rows in the table. TUC calculates the required space using the provided estimates and alters space allocation and compression prior to REORG.

 

Inline Backup and Statistics

REORG can also backup the table and collect statistics so there is no point in running COPY and RUNSTATS separately. Overall lookup at the execution of utilities allows you to avoid backups for tables that were already covered by inline backups and avoid collecting statistics for tables that were already covered by inline statistics.

Go Back Up