Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
351 views
in Technique[技术] by (71.8m points)

sql server 2008 - How do I optimize Upsert (Update and Insert) operation within SSIS package?

I am not a DBA but I do work for a small company as the IT person. I have to replicate a database from staging to production. I have created an SSIS package to do this but it takes hours to run. This isn't a large data warehouse type of project, either, it's a pretty straightforward Upsert. I'm assuming that I am the weak link in how I designed it.

Here's my procedure:

  1. Truncate staging tables (EXECUTE SQL TASK)
  2. Pull data from a development table into staging (Data Flow Task)
  3. Run a data flow task
    1. OLE DB Source
    2. Conditional Split Transformation (Condition used: [!]ISNULL(is_new_flag))
    3. If new insert, if existing update

The data flow task is mimicked a few times to change tables/values but the flow is the same. I've read several things about OLE DB components being slow to updates being slow and have tried a few things but haven't gotten it to run very quickly.

I'm not sure what other details to give, but I can give anything that's asked for.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Sample package using SSIS 2008 R2 that inserts or updates using batch operation:

Here is a sample package written in SSIS 2008 R2 that illustrates how to perform insert, update between two databases using batch operations.

  • Using OLE DB Command will slow down the update operations on your package because it does not perform batch operations. Every row is updated individually.

The sample uses two databases namely Source and Destination. In my example, both the databases reside on the server but the logic can still be applied for databases residing on different servers and locations.

I created a table named dbo.SourceTable in my source database Source.

CREATE TABLE [dbo].[SourceTable](
    [RowNumber] [bigint] NOT NULL,
    [CreatedOn] [datetime] NOT NULL,
    [ModifiedOn] [datetime] NOT NULL,
    [IsActive] [bit] NULL
)

Also, created two tables named dbo.DestinationTable and dbo.StagingTable in my destination database Destination.

CREATE TABLE [dbo].[DestinationTable](
    [RowNumber] [bigint] NOT NULL,
    [CreatedOn] [datetime] NOT NULL,
    [ModifiedOn] [datetime] NOT NULL
) 
GO

CREATE TABLE [dbo].[StagingTable](
    [RowNumber] [bigint] NOT NULL,
    [CreatedOn] [datetime] NOT NULL,
    [ModifiedOn] [datetime] NOT NULL
) 
GO

Inserted about 1.4 million rows in the table dbo.SourceTable with unique values into RowNumber column. The tables dbo.DestinationTable and dbo.StagingTable were empty to begin with. All the rows in the table dbo.SourceTable have the flag IsActive set to false.

Row counts - 1

Created an SSIS package with two OLE DB connection managers, each connecting to Source and Destination databases. Designed the Control Flow as shown below:

  • First Execute SQL Task executes the statement TRUNCATE TABLE dbo.StagingTable against the destination database to truncate the staging tables.

  • Next section explains how the Data Flow Task is configured.

  • Second Execute SQL Task executes the below given SQL statement that updates data in dbo.DestinationTable using the data available in dbo.StagingTable, assuming that there is a unique key that matches between those two tables. In this case, the unique key is the column RowNumber.

Script to update:

UPDATE      D 
SET         D.CreatedOn = S.CreatedOn
        ,   D.ModifiedOn = S.ModifiedOn 
FROM        dbo.DestinationTable D 
INNER JOIN  dbo.StagingTable S 
ON          D.RowNumber = S.RowNumber

Control Flow

I have designed the Data Flow Task as shown below.

  • OLE DB Source reads data from dbo.SourceTable using the SQL command SELECT RowNumber,CreatedOn, ModifiedOn FROM Source.dbo.SourceTable WHERE IsActive = 1

  • Lookup transformation is used to check if the RowNumber value already exists in the table dbo.DestinationTable

  • If the record does not exist, it will be redirected to the OLE DB Destination named as Insert into destination table, which inserts the row into dbo.DestinationTable

  • If the record exists, it will be redirected to the OLE DB Destination named as Insert into staging table, which inserts the row into dbo.StagingTable. This data in staging table will be used in the second `Execute SQL Task to perform batch update.

Data Flow tab

To activate few more rows for OLE DB Source, I ran the below query to activate some records

UPDATE  dbo.SourceTable 
SET     IsActive = 1 
WHERE   (RowNumber % 9 = 1) 
OR      (RowNumber % 9 = 2)

Update rows - 1

First execution of the package looked as shown below. All the rows were directed to destination table because it was empty. The execution of the package on my machine took about 3 seconds.

Execution 1

Execution time 1

Ran the row count query again to find the row counts in all three table.

Row counts - 2

To activate few more rows for OLE DB Source, I ran the below query to activate some records

UPDATE  dbo.SourceTable 
SET     IsActive = 1 
WHERE   (RowNumber % 9 = 3) 
OR      (RowNumber % 9 = 5) 
OR      (RowNumber % 9 = 6) 
OR      (RowNumber % 9 = 7)

Row counts - 3

Second execution of the package looked as shown below. 314,268 rows that were previously inserted during first execution were redirected to staging table. 628,766 new rows were directly inserted into the destination table. The execution of the package on my machine took about 12 seconds. 314,268 rows in destination table were updated in the second Execute SQL Task with the data using staging table.

Execution - 2

Execution time - 2

Ran the row count query again to find the row counts in all three table.

Row counts - 3

I hope that gives you an idea to implement your solution.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...