Friday, February 24, 2012

How to achieve partition parallelism?

Hi,

Is it possible to achieve partition parallelism in SSIS? What I am asking is, In DataStage, if I load some data like 'data reader -> trans1 -> trans2 -> destination' (and assume that I have 4 nodes configured), the tool divides the data into 4 different datasets and executes the package as 4 instances. This way the data load is very fast. Is it possible in SSIS?

Of course we can divide the dataset and load them thru multiple instances? But then dividing the dataset will differ for every load and so we need to modify the package all the time. Even if we divide the dataset, I am not sure 4 instances will run in 4 different nodes or in a same node? So anybody has any idea about it?

Thanks.

You can't do that like you can in Data Stage/Ab Initio etc... unfortunately, no. The only way is, like you describe, mimicing parallellism by breaking teh data up explicitly and executing (say) 4 different data-flows.

I wouldn't be surprised to see it in an upcoming version though. I know its on their radar.

-Jamie

|||Thanks for your clarification Jamie. If I mimic parallelism, how can I ensure that each instance is executed in different node? Is it possible to do it?|||

Have a look at this entry in BOL: http://msdn2.microsoft.com/en-us/ms345184.aspx

One approach that can be successful is to use a conditional split to divide a data load to (say) four remote raw files, then use SQL Server Agent to execute jobs on the four remote servers to read the raw files.

The performance is impacted by the need to write and read the data from the raw files on disk, which may negate any benefit of the parallelism. However, net benefits depend on the size of the load, nature of the data, speed of the network, and speed of read/write from disk.

Donald

No comments:

Post a Comment