Navigation Bar

Saturday, August 7, 2021

XMLDB: Convert xml file into equivalent relational dataset

Reading and parsing through large xml files can be a very cumbersome task. One also has to get well versed with the concepts and syntax used in xml db for extracting and modifying data elements in xml files. This can be made easy by converting the xml into an equivalent dataset. Thus, a very useful aspect and feature of XML DB is to convert in-bound xml files into equivalent relational datasets. 
An example is below
Create the directories in which the xml file and its corresponding xsd are going to reside
CREATE OR REPLACE DIRECTORY xsd_dir AS 'E:\example\xmldb\xsd';
CREATE OR REPLACE DIRECTORY xml_dir AS 'E:\example\xmldb\xml';
These sample files will be stored on the server in the directories created as above. 
Sample xsd file - employee.xsd 
-------------------------------------
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xdb="http://xmlns.oracle.com/xdb" xdb:storeVarrayAsTable="true">
<xs:element name="EMPLOYEE_INFO" type="EMPLOYEE_INFO_TYPE" xdb:defaultTable="EMP_INFO_TAB"/>
<xs:complexType name="EMPLOYEE_INFO_TYPE" xdb:SQLType="EMP_INFO_OT2" xdb:maintainDOM="false">
 <xs:sequence>
  <xs:element name="EMPLOYEE" type="EMPLOYEE_TYPE" xdb:SQLCollType="EMP_INFO_NTT_1" minOccurs="0" maxOccurs="unbounded"  xdb:maintainOrder="false"/>
 </xs:sequence>
</xs:complexType>
<xs:complexType name="EMPLOYEE_TYPE" xdb:SQLType="EMP_INFO_OT3" xdb:maintainDOM="false">
 <xs:sequence >
  <xs:element name="NAME" type="xs:string"  />
  <xs:element name="DEPARTMENT" type="xs:string" />
  <xs:element name="TELEPHONE" type="xs:string" />
  <xs:element name="EMAIL" type="xs:string" />
 </xs:sequence>
 <xs:attribute name="EMPLOYEE_NUMBER" type="xs:int" use="required"/>
</xs:complexType>
</xs:schema>

Sample xml file - employee.xml
---------------------------------------
<?xml version="1.0"?>
<EMPLOYEE_INFO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="http://localhost:8080/example/dataset/xsd/employee.xsd">
<EMPLOYEE  EMPLOYEE_NUMBER="105"><NAME>MASASHI OKAMURA</NAME><DEPARTMENT>DESIGN DEPARTMENT</DEPARTMENT><TELEPHONE>03-1452-4567</TELEPHONE><EMAIL>OKAMURA@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="109"><NAME>AIKO TANAKA</NAME><DEPARTMENT>SALES DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>TANAKA@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="110"><NAME>Chang Hu</NAME><DEPARTMENT>HR DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>Chang@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="111"><NAME>Lin Dan</NAME><DEPARTMENT>IT DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>LinDan@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="112"><NAME>Lee Chen</NAME><DEPARTMENT>IT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>LeeChen@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="113"><NAME>King Kong</NAME><DEPARTMENT>HR DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>KingKong@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="114"><NAME>Yu Ming</NAME><DEPARTMENT>ADMIN DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>YuMing@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="115"><NAME>Huan He</NAME><DEPARTMENT>SALES DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>HuanHe@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="116"><NAME>Jiang Li</NAME><DEPARTMENT>ADMIN DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>JiangLi@XMLTR.CO.JP</EMAIL></EMPLOYEE>
<EMPLOYEE  EMPLOYEE_NUMBER="117"><NAME>Shizuka</NAME><DEPARTMENT>ADMIN DEPARTMENT</DEPARTMENT><TELEPHONE>03-6459-98764</TELEPHONE><EMAIL>Shizuka@XMLTR.CO.JP</EMAIL></EMPLOYEE>
</EMPLOYEE_INFO>
Since XML DB manages its metadata in a repository in the database, we first create the folder structure for storing this information. 
For this we use DBMS_XDB and create the folder structure as follows; 
This folder structure is given in the xml file for the path of the xsd file.
DECLARE   
   v_return BOOLEAN;
   
BEGIN
	BEGIN
         v_return := DBMS_XDB.CREATEFOLDER('/example/');
	EXCEPTION
         WHEN OTHERS THEN NULL;
    END;
	
	BEGIN
		v_return := DBMS_XDB.CREATEFOLDER('/example/dataset/');
	EXCEPTION
         WHEN OTHERS THEN NULL;
    END;
	
	BEGIN
		v_return := DBMS_XDB.CREATEFOLDER('/example/dataset/xsd/');
	EXCEPTION
         WHEN OTHERS THEN NULL;
    END;
	
	BEGIN
		v_return := DBMS_XDB.CREATEFOLDER('/example/dataset/xml/');
	EXCEPTION
         WHEN OTHERS THEN NULL;
    END;
   COMMIT;
END;
/
Once the xsd is in the relevant directory specified by the db directory 'xsd_dir', on the server, we upload and register the xsd. 
Uploading the xsd is done using DBMS_XDB.CREATERESOURCE API. 
Registering the xsd is done using DBMS_XMLSCHEMA.REGISTERSCHEMA API.
BEGIN
  IF DBMS_XDB.CREATERESOURCE(
     abspath => '/example/dataset/xsd/employee.xsd',
     data    => BFILENAME ('XSD_DIR','employee.xsd')
     )
  THEN
     DBMS_XMLSCHEMA.REGISTERSCHEMA(
        schemaurl => 'http://localhost:8080/example/dataset/xsd/employee.xsd',
        schemadoc => sys.UriFactory.getUri('/example/dataset/xsd/employee.xsd')
        );
     COMMIT;
  END IF;
END;
/
SHOW ERRORS
Ensure the xml file to be uploaded is in the directory specified by 'xml_dir'. Now we can load the xml file into the database using DBMS_XDB.CREATERESOURCE API.
DECLARE
   v_return BOOLEAN;
BEGIN
   v_return := DBMS_XDB.CREATERESOURCE(
                  abspath => '/example/dataset/xml/employee.xml',
                  data    => BFILENAME('XML_DIR', 'employee.xml')
                  );
   COMMIT;
END;
/
You can query the db tables for the various db tables and objects created as shown below.
SELECT object_name, object_type
FROM   user_objects
WHERE  created > TRUNC(SYSDATE)
AND OBJECT_NAME LIKE 'EMP%';
This query gives a list of tables and types created once the employee xml directory is registered.
EMP_INFO_TAB	TABLE
EMP_INFO_NTT_1	TYPE
EMP_INFO_OT2	TYPE
EMP_INFO_OT3	TYPE
EMP_INFO_TAB$xd	TRIGGER
SELECT column_name
,      data_type
FROM   user_tab_cols
WHERE  table_name = 'EMP_INFO_TAB';
column_name                    data_type 
-----------------                    -----------
SYS_NC_OID$	        RAW
SYS_NC_ROWINFO$	    XMLTYPE
XMLEXTRA	        XMLTYPEEXTRA
SYS_NC00004$	    XMLTYPEPI
SYS_NC00005$	    XMLTYPEPI
XMLDATA	            EMP_INFO_OT2
SYS_NC00007$	    RAW
SYS_NC00008$	    EMP_INFO_NTT_1
SYS_NC0000800009$   RAW
ACLOID	            RAW
OWNERID	            RAW
The columns of the table are mostly of RAW datatype or some object datatype that is created. 
So where does the data actually reside in the table. 
It resides in the XMLDATA column of type EMP_INFO_OT2.
SELECT type_name, attr_name, attr_type_name
   FROM   user_type_attrs
   WHERE  type_name LIKE 'EMP_INFO%'
   ORDER  BY  type_name, attr_no;
type_name                attr_name                 attr_type_name 
-------------                ------------                  -------------------
EMP_INFO_OT2	EMPLOYEE	     EMP_INFO_NTT_1
EMP_INFO_OT3	EMPLOYEE_NUMBER	 NUMBER
EMP_INFO_OT3	NAME	         VARCHAR2
EMP_INFO_OT3	DEPARTMENT	     VARCHAR2
EMP_INFO_OT3	TELEPHONE	     VARCHAR2
EMP_INFO_OT3	EMAIL	         VARCHAR2
SELECT type_name, elem_type_name
FROM   user_coll_types
WHERE  type_name LIKE 'EMP_INFO%';
type_name           elem_type_name 
-------------           ---------------------
EMP_INFO_NTT_1	EMP_INFO_OT3
From the above 2 queries, it is clear that EMP_INFO_OT3 is an object type that defines a single row of data. 
And EMP_INFO_NTT_1 is a collections of these objects.
SELECT schema_url, schema
FROM   user_xml_schemas;
schema_url schema
-------------- ---------
http://localhost:8080/example/dataset/xsd/employee.xsd	
Querying xml db for the uploaded xml data
----------------------------------------------------
SELECT EXTRACTVALUE(xseq.column_value, '/EMPLOYEE/NAME')       AS name
,      EXTRACTVALUE(xseq.column_value, '/EMPLOYEE/DEPARTMENT') AS department
,      EXTRACTVALUE(xseq.column_value, '/EMPLOYEE/TELEPHONE')  AS telephone
,      EXTRACTVALUE(xseq.column_value, '/EMPLOYEE/EMAIL')      AS email
FROM   EMP_INFO_TAB  do
,      TABLE(XMLSEQUENCE(EXTRACT(VALUE(do), '/EMPLOYEE_INFO/EMPLOYEE'))) xseq;


SELECT xml.NAME
	  ,xml.DEPARTMENT
	  ,xml.TELEPHONE
	  ,xml.EMAIL
FROM   EMP_INFO_TAB do
	 , TABLE (do.xmldata.Employee) xml;
Output of both these queries should be as below
name                          department                 telephone             email
------                           --------------                  -----------              -------
MASASHI OKAMURA	 DESIGN DEPARTMENT	03-1452-4567	OKAMURA@XMLTR.CO.JP
AIKO TANAKA	     SALES DEPARTMENT	03-6459-98764	TANAKA@XMLTR.CO.JP
Chang Hu	     HR DEPARTMENT	    03-6459-98764	Chang@XMLTR.CO.JP
Lin Dan	         IT DEPARTMENT	    03-6459-98764	LinDan@XMLTR.CO.JP
Lee Chen	     IT	                03-6459-98764	LeeChen@XMLTR.CO.JP
King Kong	     HR DEPARTMENT	    03-6459-98764	KingKong@XMLTR.CO.JP
Yu Ming	         ADMIN DEPARTMENT	03-6459-98764	YuMing@XMLTR.CO.JP
Huan He	         SALES DEPARTMENT	03-6459-98764	HuanHe@XMLTR.CO.JP
Jiang Li	     ADMIN DEPARTMENT	03-6459-98764	JiangLi@XMLTR.CO.JP
Shizuka	         ADMIN DEPARTMENT	03-6459-98764	Shizuka@XMLTR.CO.JP	 
To rollback this data, in case changes and further testing are to be done execute
BEGIN
   DBMS_XDB.DELETERESOURCE(
      abspath => '/example/dataset/xml/employee.xml'
      );
   DBMS_XDB.DELETERESOURCE(
      abspath => '/example/dataset/xsd/employee.xsd'
      );
   DBMS_XMLSCHEMA.DELETESCHEMA(
      schemaurl => 'http://localhost:8080/example/dataset/xsd/employee.xsd',
      delete_option => 4
      );
   DBMS_XDB.DELETERESOURCE(
      abspath       => '/example',
      delete_option => 4
      );
   COMMIT;
END;
/

No comments:

Post a Comment