SSIS vs Python Experiment 2

This exercise is similar to Experiment 1, but with a bit more complexity. Once again, I am creating an SSIS package & then replicating the functionality using Python for a PostgreSQL database. In Experiment 1, the task was to read from CSV file into a table, with the inclusion of a Derived Column. In Experiment 2, the data source will be an XML file & a Conditional Split/Filter will be applied.

Overview: Insert data into a table from an XML document while applying a filter.

  1. Create sample data from an existing table (output in XML format).
  2. Import into target table.

Notes:

  • I used the Person.Person table in the AdventureWorks2014 database in both environments.
  • The sample data is the first 100 records from the table.
  • The filter was applied on the EmailPromotion field. Excluding records with an EmailPromotion value of 0 left 49 records to be inserted.

Process:

Sample Data: XML Document Creation – SQL Server
SELECT TOP (100) p.businessentityid,
    p.persontype,
    p.namestyle,
    p.title,
    p.firstname,
    p.middlename,
    p.lastname,
    p.suffix,
    p.emailpromotion,
    p.modifieddate
FROM Person.Person p FOR XML AUTO, ROOT('Person'), ELEMENTS XSINIL;

Note: ROOT was necessary to avoid an “XML document cannot contain multiple root level elements” xml validation error in the SSIS package.

Sample Data: XLM Document Creation – PostgreSQL
COPY
    (SELECT query_to_xml 
        ('SELECT p.businessentityid,
        p.persontype,
        p.namestyle,
        p.title,
        p.firstname,
        p.middlename,
        p.lastname,
        p.suffix,
        p.emailpromotion,
        p.modifieddate FROM Person.Person p LIMIT 100', TRUE, FALSE, '')
    )
TO 'C:\quick\DBExports\PersonTop100_Postgres.xml';

Note: Note the use of the “query_to_xml” function. Additional documentation can be found here.

SSIS Package:

The SSIS package consisted of the:

  • Connection Manager to the Database
  • Data Flow with:
    • an XML Source
    • a Conditional Split
    • an OLE DB Destination

The condition for the Conditional Split:

emailpromotion != 0

The package ran successfully, resulting in the output below.

Python/PostgreSQL:

  • XPath was used to access the desired nodes from the XML document.
  • Filtering was accomplished by adding a WHERE clause.
  • The XML document had an extra line at the end of the file; specifically, the file terminated with “</table>\n”. The additional “\n” caused an error when attempting to import the data.

The Python code consists of 2 blocks; one to remove the extra line & the other to write the data to the table.

with open('C:\quick\DBExports\PersonTop100_Postgres.xml', 'r') as file:
    filedata = file.read()

filedata = filedata.replace('</table>\\n', '</table>')

with open('C:\quick\DBExports\PersonTop100_Postgres.xml', 'w') as file:
    file.write(filedata)
import psycopg2
with psycopg2.connect(user='postgres', password='mnbvcx', database='Adventureworks', host='localhost') as connection:
    with connection.cursor() as cursor:
        cursor.execute("INSERT INTO x.xperson (businessentityid, persontype, namestyle, title, firstname, \
            middlename, lastname, emailpromotion, modifieddate) \
            SELECT (xpath('//row/businessentityid/text()', x))[1]::text::int, \
                   (xpath('//row/persontype/text()', x))[1]::text, \
                   (xpath('//row/namestyle/text()', x))[1]::text::bool, \
                   (xpath('//row/title/text()', x))[1]::text, \
                   (xpath('//row/firstname/text()', x))[1]::text, \
                   (xpath('//row/middlename/text()', x))[1]::text, \
                   (xpath('//row/lastname/text()', x))[1]::text, \
                   (xpath('//row/emailpromotion/text()', x))[1]::text::int, \
                   (xpath('//row/modifieddate/text()', x))[1]::text::TIMESTAMP \
            FROM unnest(xpath('//row', pg_read_file('C:\quick\DBExports\PersonTop100_Postgres.xml')::XML)) x \
            WHERE (xpath('//row/emailpromotion/text()', x))[1]::text::int <> 0 ; \
            ")

The code ran successfully. The results can be seen below.

Operations:

  • Query output to XML format
    • In Postgres, this involved the “query_to_xml” function
  • Import to tables from XML document – with filtering
  • Selecting XML nodes with XPath
  • Reading & writing to files

Closing:

  • If you have thoughts, tips, tricks, corrections, etc., please leave a comment below.
  • If you have an questions, please feel free to ask in the comments & I will be happy to follow up.
SSIS vs Python Experiment 2

Leave a Reply

Your email address will not be published. Required fields are marked *