Sunday, February 19, 2012

How to accomplish this?

I'm a new user to SSIS and am trying to figure out something that I suspect may be very basic, but I'm having a hard time figuring it out.

I have a single table of "events" generated by an application. I will get a daily feed of these events. Events may be like so:

Event A Created
Event A Modified
Event A Property A Created
Event B Created
Event C Created
Event B Property A Modified
Event A Property B Created
Event C Voided
Event A Property A Closed
Event A Closed

In the end I want to create two tables like so:

Table 1:
Event A / Closed
Event B / Modified

(Event C is not here because it was voided, so any prior records of C are gone).

Table 2:
Event A / Property A / Closed
Event A / Property B / Created

(Only events with properties would be in this table).

In essence, I'm collapsing the transaction-based feed into normalized event-based and property-based tables, with only the latest "status" stored. These tables would be used to generate various other reports.

Obviously, as usual, the true situation is more complex than this but this simplified example covers my basic problem and I hope it can help someone to point me in the right direction.

"Events" can span over a day. It isn't practical to recreate the two sub-tables every day from scratch using SQL (and we don't want to do a complex query against the massive source table).

I would like to iterate over the new batch of events one record at a time. If I get an Event Created, then I will create a row in Table 1. If I get an Event Property Created, I will create a row in Table 2. Voids would cause a delete in both tables. Modifies of Events or Properties would cause the appropriate update in either table (Creates will always come before Modifies in the feed).

I tried using a Conditional Split, but I found that it appears to not go record-by-record, but instead prepares a list of records for each condition and then processes it in parallel... That was quite a shock. I expected it to go record-by-record and process things in order because I depend on "Create" to create the records that "Modifies" will update.

So my question is, is there a way to do what I am looking for in SSIS? Am I missing a simple setting to make Conditional Split process records in order?

Thanks for your help.May be you need to move the 'modifies' logic to a separate dataflow after the 'creates' are done. I don't think there is a easy way of enforcing the order of parallel data pipelines in a single dataflow.|||

Hi StGeorge,

I think you need to create some mechanism by which to order properties. I trust you when you say your actual process is more complicated, so I may be headed down the path with this suggestion - but here goes:

It sounds as if you are adding a row to Table 2 on a state named Created. It also sounds as if you are removing this row from Table 2 and its parent row in Table 1 on a similar state named Void.

Would it work if you created an integer column in Table 2 called PropertyOrder? You could populate it with 1 on Created, 3 on Void, and 2 for any other property event. In SSIS, you could then load the records from the source ordered by this field. If your transactional environment is intense (lots of asynchronous transactions), you may be forced to stage the ordered data (in a table or raw file) and then act on this resultset in a subsequent data flow or Execute SQL Task to guarantee transactional integrity (which is how I interpret Table 1).

I may be way off here. If I am, please accept my apologies.

The method I propose above is roughly analogous to a method many deterministic network protocols use to guarantee packets are output in the order sent. The packets are generated synchronously, transmitted asynchronously, received in any order, cached, and reconstructed by ordered key before delivery. I think you have all the pieces except the key by which to order the records.

Hope this helps,
Andy

No comments:

Post a Comment