GLTRANS - (Incremental update to DW) Identifying newly inserted or updated records

 5 Replies
 0 Subscribed to this topic
 43 Subscribed to this forum
Sort:
Author
Messages
debika_sharma
Basic Member Send Private Message
Posts: 13
Basic Member
I am a data warehouse/BI architect developing a data warehouse that must extract records from GLTRANS once nightly. I don't want to do a full refresh of the GLTRANS table (that would mean pulling over several hundred million rows). Instead I would like to find a way to identify the new records and the records that have been updated since my last extract.

So for example, lets say I have in my data warehouse all records from GLTRANS through 2/22/07. When my ETL job runs on the next day, I want it to extract only those records that are new or have changed since 2/22/07.

Any suggestions would be extremely helpful.
Deleted User
New Member Send Private Message
Posts: 0
New Member

You can use the "Update_Date" field which holds the date when the the record was last inserted or updated in GLTRANS.

Note: Please check whether all the programs in your system updates the "update_date" field in GLTRANS when the record is modified.

When ever ETL job runs it needs to store the current date(Last_Run_date) into a table.
Next time ETL job runs it should pick up the last inserted date i.e highest(Last_Run_date) from that table.

A filter can be added to the ETL
Select * from GLTRANS where Update_Date > Last_Run_Date.

Hope this helps...

John Henley
Send Private Message
Posts: 3351
Debika,
What product are you using to do the ETL? I've had some success with Informatica doing incremental updates for some AR tables based on existence of primary keys for INSERTs and field changes for UPDATEs. That was necessary for the AR tables because they--like must of the Lawson tables--don't have update dates/times (or don't consistent them consistently.

I looked at GL190 and some of the GL programs, and it looks like update_date--as Saraj suggested--should work OK for GLTRANS.
Thanks for using the LawsonGuru.com forums!
John
debika_sharma
Basic Member Send Private Message
Posts: 13
Basic Member
Thanks for the feedback so far, however I am still in need of a solution due to conflicting scenarios that exist....

I thought I could use the update_date field, but I have a scenario that invalidates the logic. In our system, a record can be inserted into GLTRANS on 2/28/07 and not yet posted (so not in r-status 8 or 9). In this case the posted_date column is blank but the update_date column stores 2/28/07. Then, lets say this record actually posts on 8/31/07, at which point the posted date column would say 8/31/07 and the update_date column would remain unchanged and still say 2/28/07. In this scenario, I would not be able to identify this record.

Any feedback would be appreciated.

(To answer the other question - we are using Informatica for our ETL.)
John Henley
Send Private Message
Posts: 3351
Debika,
1. In your scenario, GL190 will change UPDATE_DATE from 2/28/07 to whatever the system date is when GL190 runs and the status changes to 9.
2. Have you looked at using Informatica's incremental updates? It does field-by-field comparisons and it's pretty fast, although with hundreds of millions of rows that may not be the case. I'm judging based on millions not hundreds of millions. However you could do some things to speed that up, like only looking current and future fiscal year, etc.
Thanks for using the LawsonGuru.com forums!
John
Deleted User
New Member Send Private Message
Posts: 0
New Member
I agree to what John suggested. GL190 should take care of your problem.
If you are still facing that problem then you can modify your ETL transformation to check whether Update_Date field or Posting_Date field is greater than your Last_Run_Date. The logical "OR" will always return you the row if it was posted or modified/inserted. Well I had came across similar situation and it worked but we were using OWB (Oracle Warehouse Builder) - ETL tool.