O: Archiving : A case study for data replication

The case study below considers 2 schemas, HR schema and CUST schema.

These schemas are storing the employee and customer data for GROWYOURMONEY bank pvt ltd.

In the HR schema we have the EMPLOYEES table and the JOB_HISTORY table.

In the CUST schema we have the CUSTOMER table and the DAILY_TRANSACTIONS tables.

On a daily basis new employees are getting added to the company, and people are getting transferred to different departments because some departments are overstaffed and some departments have got new projects. GROWYOURMONEY bank has a good policy. It does not believe in laying off employees. So it cross trains its employees to work in other departments, which helps in the overall GROWTH of the employees and the GROWTH of its customer money.

This results in more customers getting added to the bank, for which the bank has to keep employing more people to mange the customer money more efficiently.

For this the bank also has to manage the growing data of its employees and customers, thus keeping the DBAs and IT staff on their toes.

To manage this growing employee and customer data, there is a need for the GROWYOURMONEY IT team to come up with a good data management policy.

One of the solutions they have come up with is an ARCHIVING strategy to archive the data of customers who dont want their money to grow and have left the bank. This strategy is working very well as has been demonstrated in the post below.

Archiving - A Case Study for optimizing disk space utilization with archiving

The new challenge being faced by the Database Administrators and IT is the huge high amount of time taken to generate the reports of customers as the customers of GROWYOURMONEY bank are increasing at a very large rate. For this they have done a high level analysis of the systems and have realized that the application system to manage the customer and daily transaction data and backend processes like report generation etc are using the same database.

While the software team is working on optimizing the queries which are slowing down the system, the IT team has decided that from a long term perspective they will need to run the application module and the backend reports module on 2 two different databases.

To achieve this they have to come up with a feasible oracle data replication solution.

Some of the well known oracle data replications solutions that can be considered are

Oracle Golden Gate
Oracle Advanced Queues
Oracle change notification
Oracle dataguard

Of the above, one approach they have considered is using the Oracle Dataguard configuration.

Dataguard Broker - Steps

Dataguard - Switchover and Failover

But considering the budget allocation for IT by the management, and the high licencing costs of Oracle Dataguard, and the fact that the replica database does not require data for all the tables but only a few tables which are required for the heavy reports in the reports module, they have decided to make an inhouse data replication module.

So the IT management has decided to use few supposedly underemployed employees who are not contributing to the bank ITs mission critical projects to build a data archive and data replication module.

This is to give a background on the need for the data archiving and replication.

Below are the steps of the data replication module developed to copy data from the required tables to the replica schema.

Create the objects of the data replication module.

Create HR schema objects

Create CUST schema objects

Once the objects in the replication schema, the HR schema and CUST schema are created, we now insert test data into these tables and start the data replication process.

Scripts to insert test data into EMPLOYEE schema tables

Scripts to insert test data into CUSTOMER schema tables

Replication schema used : REPLICA

Below are few grants to be given as DBA or SYSTEM user on tables to be archived/replicated.

GRANT SELECT, DELETE ON HR.EMPLOYEES TO REPLICA;
GRANT SELECT, DELETE ON HR.JOB_HISTORY TO REPLICA;
GRANT SELECT, DELETE ON HR.REPL_QUEUE TO REPLICA;

GRANT SELECT, DELETE ON CUST.CUSTOMER TO REPLICA;
GRANT SELECT, DELETE ON CUST.DAILY_TRANSACTIONS TO REPLICA;
GRANT SELECT, DELETE ON CUST.REPL_QUEUE TO REPLICA;


-- this grant to be given by DBA user is required to shrink space after a delete
GRANT ALTER ANY TABLE TO REPLICA;

-- this grant required on DAILY_TRANSACTIONS to shrink space after a delete for ARCHIVING
ALTER TABLE CUST.DAILY_TRANSACTIONS ENABLE ROW MOVEMENT;
ALTER TABLE CUST.DAILY_TRANSACTIONS SHRINK SPACE CASCADE;

-- this grant required on CUSTOMER to shrink space after a delete  for ARCHIVING
ALTER TABLE CUST.CUSTOMER ENABLE ROW MOVEMENT;
ALTER TABLE CUST.CUSTOMER SHRINK SPACE CASCADE;

-- this grant required on REPL_QUEUE to shrink space after a delete for data replication
ALTER TABLE CUST.REPL_QUEUE ENABLE ROW MOVEMENT;
ALTER TABLE CUST.REPL_QUEUE SHRINK SPACE CASCADE;

-- this grant required on EMPLOYEES to shrink space after a delete for ARCHIVING
ALTER TABLE HR.EMPLOYEES ENABLE ROW MOVEMENT;
ALTER TABLE HR.EMPLOYEES SHRINK SPACE CASCADE;

-- this grant required on JOB_HISTORY to shrink space after a delete for ARCHIVING
ALTER TABLE HR.JOB_HISTORY ENABLE ROW MOVEMENT;
ALTER TABLE HR.JOB_HISTORY SHRINK SPACE CASCADE;

-- this grant required on REPL_QUEUE to shrink space after a delete for data replication
ALTER TABLE HR.REPL_QUEUE ENABLE ROW MOVEMENT;
ALTER TABLE HR.REPL_QUEUE SHRINK SPACE CASCADE;

Configuration changes for data replication.

In PKG_ARCHIVE we need to change the RUN_MODE to REPLICATE.

RUN_MODE CONSTANT VARCHAR2(10) := 'REPLICATE';

A sample configuration in ARCHIVE_RULES table for data replication is below

NUM_RULE_ID	TXT_RULE_DESC	TXT_RULE_QUERY	TXT_ACTIVE_FLG	TXT_ARCHIVE_TABLE	TXT_DELETE_TABLE	TXT_TBL_OWNER
100002	replicate the customer data to HR_ARCHIVE schema	SELECT UNIQUE_ID FROM CUST.REPL_QUEUE WHERE REPL_STAT = 'PNDNG' AND TBL_NAME = 'CUSTOMER'	Y	CUSTOMER	REPL_QUEUE	CUST
100002	replicate the customer data to HR_ARCHIVE schema	SELECT UNIQUE_ID FROM CUST.REPL_QUEUE WHERE REPL_STAT = 'PNDNG' AND TBL_NAME = 'DAILY_TRANSACTIONS'	Y	DAILY_TRANSACTIONS	REPL_QUEUE	CUST
100002	replicate the employee data to hr_archive schema	SELECT UNIQUE_ID FROM HR.REPL_QUEUE WHERE REPL_STAT = 'PNDNG' AND TBL_NAME = 'EMPLOYEES'	Y	EMPLOYEES	REPL_QUEUE	HR
100002	replicate the employee data to hr_archive schema	SELECT UNIQUE_ID FROM HR.REPL_QUEUE WHERE REPL_STAT = 'PNDNG' AND TBL_NAME = 'JOB_HISTORY'	Y	JOB_HISTORY	REPL_QUEUE	HR

Before starting the replication process, we select the count of records in the tables from which data is to be replicated into the REPLICA schema.

SELECT COUNT(1) FROM HR.EMPLOYEES; --407

SELECT COUNT(1) FROM HR.JOB_HISTORY; -- 910

SELECT COUNT(1) FROM HR.REPL_QUEUE WHERE TBL_NAME = 'EMPLOYEES'; --300

SELECT COUNT(1) FROM HR.REPL_QUEUE WHERE TBL_NAME = 'JOB_HISTORY'; --900

SELECT COUNT(1) FROM REPLICA.EMPLOYEES; --0

SELECT COUNT(1) FROM REPLICA.JOB_HISTORY; -- 0

---------------------------------------------------------------------------
SELECT COUNT(1) FROM CUST.CUSTOMER; --16

SELECT COUNT(1) FROM CUST.DAILY_TRANSACTIONS; -- 376

SELECT COUNT(1) FROM CUST.REPL_QUEUE WHERE TBL_NAME = 'CUSTOMER'; --16

SELECT COUNT(1) FROM CUST.REPL_QUEUE WHERE TBL_NAME = 'DAILY_TRANSACTIONS'; --376

SELECT COUNT(1) FROM REPLICA.CUSTOMER; --0

SELECT COUNT(1) FROM REPLICA.DAILY_TRANSACTIONS; -- 0

We now run the archive_machine in replication mode and again verify the table data.

BEGIN
  PR_ARCHIVE_MACHINE();
  COMMIT;
--rollback; 
END;

We will again run the script to get the count of records which are archived.

SELECT COUNT(1) FROM HR.EMPLOYEES; --407

SELECT COUNT(1) FROM HR.JOB_HISTORY; -- 910

SELECT COUNT(1) FROM HR.REPL_QUEUE WHERE TBL_NAME = 'EMPLOYEES'; --0

SELECT COUNT(1) FROM HR.REPL_QUEUE WHERE TBL_NAME = 'JOB_HISTORY'; --0

SELECT COUNT(1) FROM REPLICA.EMPLOYEES; --300

SELECT COUNT(1) FROM REPLICA.JOB_HISTORY; -- 900

---------------------------------------------------------------------------
SELECT COUNT(1) FROM CUST.CUSTOMER; --16

SELECT COUNT(1) FROM CUST.DAILY_TRANSACTIONS; -- 376

SELECT COUNT(1) FROM CUST.REPL_QUEUE WHERE TBL_NAME = 'CUSTOMER'; --0

SELECT COUNT(1) FROM CUST.REPL_QUEUE WHERE TBL_NAME = 'DAILY_TRANSACTIONS'; --0

SELECT COUNT(1) FROM REPLICA.CUSTOMER; --15

SELECT COUNT(1) FROM REPLICA.DAILY_TRANSACTIONS; -- 375

From the counts above we can see that records have got inserted into the corresponding tables in the replication schema.

The count of records in REPL_QUEUE table for each table whose data is to be archived matches with the count of records in the replication schema for that corresponding table.

The count of tables in the main tables is more than that of the correspondging table in the replication schema.

This is because when the replication process is enabled, already there was some data existing in the main schema tables.

For these exisiting records(before the replication is enabled) a one time manual insert operation can be done. Scripts for manual one time insert of existing data into the replication schema.

AS SYSTEM or a DBA user we give the following grants

GRANT SELECT, INSERT ON REPLICA.JOB_HISTORY TO HR;
GRANT SELECT, INSERT ON REPLICA.EMPLOYEES TO HR;

GRANT SELECT, INSERT ON REPLICA.CUSTOMER TO CUST;
GRANT SELECT, INSERT ON REPLICA.DAILY_TRANSACTIONS TO CUST;

INSERT INTO REPLICA.CUSTOMER 
( CUST_ID, CUST_NAME, PHONE_NUMBER, EMAIL, ADDR_LINE_1, ADDR_LINE_2, ADDR_LINE_3, COUNTRY_ID, REGION_ID, GENDER, ARCHIVE_LOG_HEADER_ID, ARCHIVE_DATE ) 
SELECT 
CUST_ID, CUST_NAME, PHONE_NUMBER, EMAIL, ADDR_LINE_1, ADDR_LINE_2, ADDR_LINE_3, COUNTRY_ID, REGION_ID, GENDER, NULL, SYSDATE 
FROM CUSTOMER 
WHERE CUST_ID NOT IN ( SELECT CUST_ID FROM REPLICA.CUSTOMER);

INSERT INTO REPLICA.DAILY_TRANSACTIONS 
( TRANS_ID, TRANS_DATE, CUST_ID, TRANS_AMT, TRANS_CRNCY, TRANS_IND, ACCT_NO, BNK_NAME, BNK_LOCATION, BANK_IFSC, ARCHIVE_LOG_HEADER_ID, ARCHIVE_DATE )
SELECT TRANS_ID, TRANS_DATE, CUST_ID, TRANS_AMT, TRANS_CRNCY, TRANS_IND, ACCT_NO, BNK_NAME, BNK_LOCATION, BANK_IFSC, NULL, SYSDATE FROM DAILY_TRANSACTIONS WHERE TRANS_ID NOT IN 
(SELECT TRANS_ID FROM REPLICA.DAILY_TRANSACTIONS);

INSERT INTO REPLICA.EMPLOYEES 
( EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, DEPARTMENT_ID, ARCHIVE_LOG_HEADER_ID, ARCHIVE_DATE ) 
SELECT EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID, DEPARTMENT_ID, NULL, SYSDATE FROM EMPLOYEES 
WHERE EMPLOYEE_ID NOT IN (SELECT EMPLOYEE_ID FROM REPLICA.EMPLOYEES);

INSERT INTO REPLICA.JOB_HISTORY 
( EMPLOYEE_ID, START_DATE, END_DATE, JOB_ID, DEPARTMENT_ID, ARCHIVE_LOG_HEADER_ID, ARCHIVE_DATE ) 
SELECT EMPLOYEE_ID, START_DATE, END_DATE, JOB_ID, DEPARTMENT_ID, NULL, SYSDATE FROM JOB_HISTORY 
WHERE EMPLOYEE_ID NOT IN (SELECT EMPLOYEE_ID FROM REPLICA.JOB_HISTORY);

Now if we run the counts on both the schemas it should match.

SELECT COUNT(1) FROM HR.EMPLOYEES; --407

SELECT COUNT(1) FROM HR.JOB_HISTORY; -- 910

SELECT COUNT(1) FROM REPLICA.EMPLOYEES; --407

SELECT COUNT(1) FROM REPLICA.JOB_HISTORY; -- 910

---------------------------------------------------------------------------
SELECT COUNT(1) FROM CUST.CUSTOMER; --16

SELECT COUNT(1) FROM CUST.DAILY_TRANSACTIONS; -- 376

SELECT COUNT(1) FROM REPLICA.CUSTOMER; --16

SELECT COUNT(1) FROM REPLICA.DAILY_TRANSACTIONS; -- 376

Thought for the day

Do not grow weary when you pray;

do not neglect to give alms

Sirach 7:10

And do not forget the birth pangs of your mother

Remember that it was of you parents you were born

How can you repay what they have given to you?

Sirach 7:28

Do not hate hard labor or farm work

which was created by the Most High.

Sirach 7:13

Navigation Bar

Friday, January 24, 2025

Archiving : A case study for data replication

No comments:

Post a Comment