Small tables are easier to maintain and access. Keeping your growing tables small and efficient is important for performance and maintenance. Smaller tables and indexes are accessed faster. Backups for smaller tables complete faster and require less space. You can decrease the size of your tables by archiving or purging old data. REORG can purge old data using DISCARD conditions. TUC automates the generation of the REORG DISCARD utility and allows you to easily purge or archive old data from your DB2 tables. This way you can trim down critical tables and improve access performance. The purged data is written to sequential flat files. TUC records the discarded flat files and allows you to easily reload or recall the data when needed. The discarded data can be loaded into history tables or reloaded back to the discarded tables.
Prior to DB2 version 8, you could run REORG DISCARD in SHRLEVEL NONE only. Prolonged execution affected availability and failures affected recoverability. Incorrect restart could have been resulted in loss of data. For this reason, administrators prohibited the usage of REORG DISCARD but also avoided purging data using their own authority to protect themselves from being blamed for accidental deletion of business critical data. In many cases, developers were instructed to write programs to delete rows. The programs had to commit the work every few minutes to release the locks and allow restart from the most recent commit point. These programming requirements made the development of purging programs complex. In addition, purging programs using SQL to delete rows, also impact logging because every deleted row is recorded in the DB2 log for recovery purposes. REORG DISCARD, on the other hand, allows you to use the LOG NO option and avoid the logging overhead. More important is that if data is deleted accidently, you can easily reload back the discarded data using the discard datasets created by REORG DISCARD.
Preparing Discard Conditions
Administrators can have better control over data purging if they allow users to predefine discard conditions and trigger the discard process when needed but leave the operability of REORG jobs in the hands of the administrators. This can be achieved by managing a repository of discard conditions accessible to users where they can define how the data should be pruned. TUC allows you to define the discard conditions for each table. Discard conditions are somewhat limited because you cannot use expressions or sub-queries. It is possible to extend discard conditions predicates by using a query that is executed prior to REORG DISCARD. TUC extends the functionality of discard conditions by allowing you to use a discard query. The results of the discard query can be used to generate the required discard conditions. You can also imbed symbols in discard conditions that are resolved at runtime based on user provided values. Finally the REORG statement with the desired discard conditions should be prepared before execution. The users should be able to trigger the execution of the discard process under the supervision of administrators. This way, if the process fails, administrators are still responsible to restart the process or recover the data.
Recording Discard Datasets
Discard datasets should also be recorded in a repository to allow you to easily recall discarded data. The datasets names can be identified by analyzing the output of the REORG utility. However, the output might be lost if the job is cancelled or when the job abends. The output might also be partial after restart. However, you can use the templates definitions to identify the datasets by resolving the symbols for all discarded objects and identifying the last dataset allocated by the template in use. This way there is no need to analyze or capture the output of the reorg step. Recording the datasets in a repository allows you to quickly identify all the discard datasets that were used to discard the data and generate LOAD jobs to load the data into history tables for archiving or reload back to the discarded tables for recall.
Discarding Dependent Rows
When discarding data from a parent table, you cannot always use the same discard conditions to remove the dependent rows from the dependent tables. To do that you need to first discard the parent rows using REORG DISCARD and then use the CHECK DATA utility with the DELETE YES option to remove the dependent rows from the dependent tables. CHECK DELETE required you to create a matching exception table for all dependent tables. The exception table should be created exactly like the dependent table and the additional RID and TIMESTAMP columns can be added. The exception table must be also created in a tablespace with the same page size and attributes as the dependent table tablespace. TUC automatically creates the required exceptions tables. The primary allocation size of the exceptions tablespace should accommodate for at least 20 percent of the rows in the dependent table. The Exception tables are dropped automatically when empty or when expired. The CHECK DELETE statements should be generated before reorg discard and executed after reorg discard. In addition, if some dependent tables are also purged using REORG DISCARD, you can avoid running CHECK DELETE but you might need to use the REPAIR SET NOCHECKPEND to remove the restricted state. While REORG DISCARD can create an inline copy, CHECK DELETE YES LOG NO might leave the tablespace in COPY PENDING, and an image copy is required to ensure recoverability for dependent tables. TUC allows you to generate the entire process have your pruned data in full integrity and availability upon completion.