I apologize upfront for a very long message.
We get large files, usually flat files that can contain over a million
records. We have to process these files in their entirety and then send
an
summary back to the client. This summary generally contains information
about the rejected records and so forth. My biggest problem is knowing
when
a file is done. We have a custom adapter, on the sent side, that receives
the messages and then uses an internal thread-pool to distribute message
processing over those threads. This way we get off the engine thread
quickly but the actual work may go on for several hours until the threads
finish their work. The key problem here is to recognize when the
processing
for all messages that cam in one file is done. As it seems, there are
several things that complicate the problem. First, on the receive side,
we
use a regular FILE adapter that converts the flat file into xml messages
and
publishes those messages to the message box. The send adapter subscribes
to
those messages and distributes those messages over the thread pool. So by
the time, the messages are received by the send adapter, the notion of the
file has already eva****ated. Furthermore, the send adapter returns after
scheduling this large volume of messages to the threads so the send
adapter - from BizTalk point of view - is done. So how do we recognize
that
we are completely finished with processing all messages that came in a
file
to completion. Currently, in order to obtain speed, we do not employ any
orchestration in this scenario.
I am considering making the following changes to this entire scenario.
1) Write a pipeline component for the receive pipeline that would somehow
extract the filename and put it in a promoted property of the message that
is being published.
2) Write an orchestration that uses a correlation set on that filename.
When a message is publishes, if an orchestration exists for that filename,
then it will receive that message. Otherwise a new orchestration will be
started. This way, we will have one orchestration per file rather than
millions of orchestrations.
3) The orchestration counts the number of messages it receives and then
forwards them to the send ****t where the adapter schedules them on
threads.
Each thread publishes a response message back to the orchestration.
4) The orchestration receives the response messages and counts them. When
it has received an equal number of responses back, it assumes that the
file
processing is done.
5) At this time, the orchestration can send a summary re****t back to the
client.
I have following questions:
1) Is it possible to write a component for the receive pipeline to publish
a
end of file message.
2) What pattern do I implement in orchestration to ensure that there is
one
orchestration per file.
3) The send adapter, after it processes the message, is going to publish
the
response. This response will be of different type then the original
message. Can the same running instance of orchestration receive a message
of a different type and correlate it back to the original message?
4) It seems like a lot of work to do for what I need to do. If done in a
custom developed application, it is easy to process a file and then issue
a
summary. Is this is the best way to accomplish this?
I would greatly appreciate if you can answer my questions.
Thanks.
Waqar


|