-->

Tuesday 12 August 2008

Data handling in workflows

In order for workflows to be usable, they must convey some information and process it. NGinn provides a simple to understand data model that allows for easy integration with external applications.

Each instance of NGinn process contains some data. It is called process instance data. This data consists of variables, each variable having a name and holding value of some type. These variables are global to process instance. There is also 'task data', that is a set of variables containing data for a task instance.

More on variables

Variables are the base of NGinn data handling concept. Each variable has a name, type specification, 'requiredness' and direction. Here's an example of process variable definition:


<variable name="requestorName" type="string" required="true" isArray="false" dir="In" />

This line defines a 'requestorName' variable holding a string value, which is required. The variable is single instance (isArray="false") and it is an input variable.
Let's explain this a bit:
  • direction (dir="In"). In NGinn, variables can be input (dir="In"), output (dir="Out"), both ways (dir="InOut") and local (dir="Local"). Think of them as procedure arguments. Input variables are used for passing data to process or task instance. Output variables can return execution results from a process or task instance. Local variables are local to process or task, that is they are internal only and invisible to the outside world.
  • type (type="string"). Variables can be of some type. NGinn supports basic types, such as string, int, date, bool and complex types (like 'structs' in C or C#), explained later.
  • array (isArray="false"). Variables can be single instance, holding only one value, or multiple instance, holding an array of values of the same type.
  • required. Variables can be required or optional. Required variables must be passed to process or task. Optional variables can be ommitted, and there is an option for providing default value for optional variables. Default value will be used when the variable is not specified.

Process data consists of a set of process variables - each process has its own set of variables. The names and types of variables are defined in process definition. Each task executing in the process instance has its own data, held in task variables and completely separated from the process data. It means that a task can operate only on its own variables and cannot access parent process' or other task's data.
So, there is a question - if there is a complete isolation between task and process data, how do they exchange the information?
The answer is: through data bindings.

Data bindings

Data bindings define how process data is mapped to task input data, or how task output data is mapped back to process data.
Each task has a set of input bindings and output bindings. Input bindings define what information from process instance data will be put in task's input variables. Output bindings define what will happen with task output data - what process variables will receive the values of task output variables.
Input bindings are executed before the task is started. Output bindings are executed after the task completes.

<input-bindings>
<binding variable="requestor" bindingType="CopyVar" sourceVariable="requestedById" />
<binding variable="requestorName" bindingType="CopyVar" sourceVariable="requestedByName" />
</input-bindings>
<output-bindings>
<binding variable="managerApprovalDecision" bindingType="Expr">
<expression>decision</expression>
</binding>
</output-bindings>

This is an example of task's input and output bindings.

Each binding defines variable that receives the data ('variable' attribute). In case of input bindings it is a task input variable and in case of output bindings it is a process instance variable. In this example, first input binding defines that 'requestor' task input variable will receive the value of 'requestedByName' process variable. 'CopyVar' binding type means that the binding is a simple value copy - from sourceVariable to variable.
The output binding above is more interesting. It is a binding for managerApprovalDecision process variable and this is expression-based binding (type="Expr"). It means that in order to get the value for managerApprovalDecision, expression will be evaluated. Here the expression is 'decision' - so it returns a value of 'decision' task variable. Of course, expressions can be more complex.

Data types

NGinn has built-in support for several basic data types, but the list of types can be extended by custom type definitions.
Supported basic types are:
  • string
  • int
  • double
  • date
  • datetime (time stamp)
  • future releases may add more simple types
Complex data structures can be built out of simple types, in a recursive way. Each process definition contains 'processDataTypes' section, where custom data types can be defined.
There are two kinds of custom data types that can be defined in NGinn - records (structs, similar to C/C# structs) and enums (enumeration of possible values).
Here's an example of a struct:

<processDataTypes>
<struct name="OrderItem">
<member name="code" type="string" required="true" />
<member name="name" type="string" required="true" />
<member name="quantity" type="int" required="true" />
</struct>
</processDataTypes>

This is a definition of 'OrderItem' type, which is a struct with three fields: 'code', 'name' and 'quantity'. Each field si required and is single value (not an array).

And here's an enum type with two possible values: YES and NO:

<enum name="YesNo">
<value>NO</value>
<value>YES</value>
</enum>

Process data and XML

Data structures in NGinn are designed so that they can be easily converted to XML and back, so data exchange with external applications is simplified. Process or task data definition can be converted to XML schema, so by defining process or task data structure we also get XML data schemas for data exchange.
For example, let's take some process data definition:


<processDataTypes>
<enum name="YesNo">
<value>NO</value>
<value>YES</value>
</enum>

<enum name="WeekDay">
<value>Sun</value>
<value>Mon</value>
<value>Tue</value>
<value>Wed</value>
<value>Thu</value>
<value>Fri</value>
<value>Sat</value>
</enum>

<struct name="DeStrukt">
<member name="Decision" type="YesNo" required="true" isArray="false" />
<member name="Day" type="WeekDay" required="true" isArray="false" />
</struct>


</processDataTypes>
<variables>
<variable name="value" type="DeStrukt" required="true" dir="In" />
</variables>

This definition contains two enumeration types and one struct type. And here's how it converts to XML schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="YesNo">
<xs:restriction base="xs:string">
<xs:enumeration value="NO" />
<xs:enumeration value="YES" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="WeekDay">
<xs:restriction base="xs:string">
<xs:enumeration value="Sun" />
<xs:enumeration value="Mon" />
<xs:enumeration value="Tue" />
<xs:enumeration value="Wed" />
<xs:enumeration value="Thu" />
<xs:enumeration value="Fri" />
<xs:enumeration value="Sat" />
</xs:restriction>
</xs:simpleType>
<xs:complexType name="DeStrukt">
<xs:sequence>
<xs:element name="Decision" type="YesNo" minOccurs="1" maxOccurs="1" />
<xs:element name="Day" type="WeekDay" minOccurs="1" maxOccurs="1" />
</xs:sequence>
</xs:complexType>
<xs:element name="DataStructs">
<xs:complexType>
<xs:sequence>
<xs:element name="value" type="DeStrukt" minOccurs="1" maxOccurs="1" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Why is it important? Because it defines the structure of XML message with process input data, so we can start new process instance by sending xml similar to this:

<DataStructs>
<value>
<Decision>YES</Decision>
<Day>Fri</Day>
</value>
</DataStructs>

With the XML schema auto-generated from process definition it's much easier to integrate NGinn with external tools. For example - we can take this schema and feed it into Microsoft InfoPath form designer. Then we can design an InfoPath form that can be used to start new process instance, and the InfoPath designer will know the resulting XML data structure from the schema.
Not only process input data can be defined in XML, this applies also to process output data and to task input/output data. This way, XML can be used to integrate NGinn with external applications without introducing custom protocols.