Friday, September 14, 2012

Oracle: Duplicate Records

Dear Readers,

My name is Franz Devantier, creator of this blog.  I am an Oracle Certified Professional (OCP DBA 11g) Security DBA.  I will be sharing with you the basic duties of an Oracle DBA, and also some of the undocumented, and not so well known tasks. 

Duplicate Records
You have a relational database, with child keys related to parent keys, but there are duplicates in the parent table, so instead of a unique parent value you are getting at least two matching parent records.

You may be experiencing difficulties with duplicate records.  You need to find them and make a decision as to what you are going to do with them.

One of the problems that you have, is that once you have identified a duplicate record, or multiple records with the same key values.  When you try to delete the duplicate, you can easily end up deleting all records with that key.  Also which record do you delete and which one do you keep?  If you try to change the key of one of the duplicates, then all the records with the same key are updated.  So you really have an interesting problem.

Of course when you enable the primary key, you can always do a enable novalidate.  This will not validate what is already in the table, but prevent duplicates from being generated in the future.  However if you are querying the data in the table, you will very likely experience difficulties with data integrity.  So we really need to fix the duplicates.

There are really two sides to the problem:
1) Finding the duplicates
2) Dealing with the duplicates (Let the IT Manager make this call, then they can't blame you)

Lets first look at a way to find the duplicates:  One approach is to write a discreet PL/SQL procedure to find the duplicates for you.  A discreet PL/SQL procedure is simply a procedure that is not saved in the database, but simply run from the command line.  Now you may say to me; "Yes but PL/SQL is not part of a DBA's speciality".  At the same time the programmers may be saying; "It is the DBA's responsibility to sort out the integrity problem of duplicates on the database".  So at some point in time, we need to stop passing the responsibility out to others, and stand up to the challenge.  By the way, PL/SQL should be part of the tools that any Oracle DBA worthy of the title should have, at least a working knowledge of.  Besides if you can fix this problem, you will impress the technical people around you, and that is an achievement.

Here is a discreet PL/SQL procedure, that you can just run from within SQL*Plus:

SQL>
set serveroutput on  -- this facilitates DBMS_OUTPUT

DECLARE 
-- declare all your variables here.
v1_keyField1  date      default null;
v1_KeyField2 number default null;
v1_rowid         rowid   default null;
-- We are going to use rowids,
-- to identify the individual duplicates.

-- second set of identical variables
v2_keyField1  date      default null;
v2_KeyField2 number default null;
v2_rowid         rowid   default null;

-- declare a cursor here is the variables section.       
CURSOR C1 IS SELECT rowid,keyField1, KeyField2
                          FROM <user_name>.<table_name>
                          order by keyField1, KeyField2; 
-- make sure that you use the order by clause to enable, the duplicates
-- to be located right next to each other.  this is important, to facilitate this
-- duplicate finding procedure.  

-- start processing
BEGIN
   FOR item IN C1   -- we are using a cursor for loop here to simplify the controlling
   LOOP                   -- of the looping process
      -- We load the second set of variables with the
      -- values in the cursor (first record in the cursor)
      v2_keyField1  := item.keyField1;
      v2_KeyField2 := item.KeyField2;
      v2_rowid         := item.rowid;   
   
      -- Now we simply compare the second set of variables
      -- with the first set of variables.
      -- the first time through, the first set of variables will have a null value.
      -- from the second and subsequent passes through the loop,
      -- if the first and second set of variables are the same it indicates
      -- a duplicate record.  We will then write the duplicate to the screen,
      -- using DBMS_OUTPUT     
      if      v2_keyField1      =  v1_keyField1
      and  v2_keyField2      =  v1_keyField2
      then
          -- This is where the duplicate records are printed out to the screen,
          -- make sure that you have set serveroutput to on, otherwise you will
          -- get no output here.
          DBMS_OUTPUT.PUT_LINE('keyField1   = ' || v1_keyField1  );
          DBMS_OUTPUT.PUT_LINE('keyField2   = ' || v1_keyField2  );
          DBMS_OUTPUT.PUT_LINE('Rowid         = ' || v1_rowid);
          DBMS_OUTPUT.PUT_LINE('Duplicate');
          DBMS_OUTPUT.PUT_LINE('KeyField1   = ' || v2_keyField1  );
          DBMS_OUTPUT.PUT_LINE('keyField2   = ' || v2_keyField2  );
          DBMS_OUTPUT.PUT_LINE('Rowid         = ' || v2_rowid);
          DBMS_OUTPUT.PUT_LINE('  ');
      end if;
   
      -- Finally we set the first set of variables to the values in the first record of the cursor.   
      v1_keyField1  := item.keyField1;
      v1_KeyField2 := item.KeyField2;
      v1_rowid         := item.rowid;
   
    -- Now we hit the end loop clause and go back to the top of the loop to process
    -- the second record in the cursor.
   END LOOP;  -- this will exit out, once we get to the end of the cursor.
END;
/
SQL>

2)
Now we have a listing of the duplicate records.  To decide what to do, we can look at the entire record, and make a decision.

SQL> select * from <user_name>.<table_name>
           2 where rowid in (v1_rowid, v2_rowid);

Now you can delete a record, referenced by the rowid, or update the key values of the record, if you decide to keep the duplicate, and referenced by the rowid.

What to do with the duplicates would probably be a business dicision, but at least now you are able to identify the duplicate records, and access the individual duplicates seperately.  You could probably prepare a report, detailing the duplicates, and your recommendations for handling them, and present them to the IT manager.  Then prepare a script for yourself, to carry out the proposed process.

When the IT manager comes back to you with the decision, you can potentially run the process through, and report back to the manager within minutes.  If you play your cards right, you can become known as the DBA, who can fix duplicate issues quickly; this could make you into a valuable resource.

Franz Devantier,

Need a database health check?
devantierf@gmail.com


Income stabilizer (win-win opportunity)
Join the carefully selected and tested cash-flow generating program below to potentially create a long-term residual or annuity type income enhancer for yourself.

Traffic Wave - Free Report:  The report will give you all the information you need to start making a nice long-term residual income stream for yourself.


No comments:

Post a Comment