Friday, March 9, 2012

Keeping track of last time of extract

I am building an SSIS package in which I extract data from a number of transaction tables. The rows in each transaction table contain a timestamp. I would like to do the following.

* Every time I extract rows from a table I want to extract only the records added since last extract (no rows are modified or deleted - only added)
* In the extract process I want to find the maximum timestamp in a given table. This timestamp should be stored and updated in a table on the SQL Server. This way, the next time the extraction process is run only "new" rows are extracted

How would you go about doing this in SSIS? I have thought of the following approach, but I am unsure whether it is too cumbersome.

* A package variable is defined for each of the transaction tables
* A series of Execute SQL Tasks populates each of these variables with the timestamps from the table in SQL Server
* A series of data flow tasks extract the data (using the variables populated above in the WHERE-condition). Contained in the data flow tasks is a script component which records the maximum timestamp extracted and places this in yet another package variable (I am unsure how to do this, by the way...)
* A series of Execute SQL Tasks updates the table in SQL Server with the new timestamps

How would you do this?

Thanx! SmileThis is how I'd do it. It isn't particularly a SSIS solution...just the way I'd do it!

Have 2 values stored in a table called (e.g.) tblConfig. They are:
LastExtractDate
ThisExtractDate

Step 1) UPDATE tblConfig SET LastExtractDate=ThisExtractDate, ThisExtractDate = GETDATE()

Step 2) SELECT * FROM <source_table> WHERE <some_tstamp_value> >= LastExtractDate AND <some_tstamp_value> < ThisExtractDate> (this would be inside a data-flow source adapter)

OK, column names may change etc...but you get the idea!!!

-Jamie|||Very nice... And simple. Smile One question though (for doing this in SSIS): Do you enlist the two tasks in the same transaction (using a sequence container for instance) so that if the data flow tasks fails, you will not "miss" any rows on the next run?

/Michael|||

Reckless wrote:

Very nice... And simple. Smile One question though (for doing this in SSIS): Do you enlist the two tasks in the same transaction (using a sequence container for instance) so that if the data flow tasks fails, you will not "miss" any rows on the next run?

/Michael

Yeah. Good idea!

-Jamie|||Hi, folks:
Exactly how do we do this. Do we do a OLE data source > something > OLE data destination. Thanks in advance.|||

Al_chan wrote:

Hi, folks:
Exactly how do we do this. Do we do a OLE data source > something > OLE data destination. Thanks in advance.

Hi Al,
Step 1 I would do in an Exec SQL Task.
Step 2 most likely in a data flow - yes, using an OLE DB Data Source.

-Jamie

No comments:

Post a Comment