-->

Friday 28 November 2008

How to use NGinn rules engine

1. Required libraries

To use NGinn.RippleBoo engine in your application you need to add references to the following libraries:

  • NGinn.RippleBoo

  • Rhino.DSL.dll

  • Boo.Lang.dll

  • Boo.Lang.Compiler.dll

  • NLog.dll



2. Invoking RippleBoo

The code below shows how to configure rule repository and how to execute some rules.

using System;
using System.Collections.Generic;
using NGinn.RippleBoo;

class TestMe
{
private RuleRepository _repos;

public TestMe()
{
_repos = new RuleRepository();
_repos.BaseDirectory = "c:\\rules";
_repos.ImportNamespaces.Add("System");
}

public void RunSomeRules()
{
Dictionary<string, object> variables = new Dictionary<string,object>();
variables["Email"] = "my@email.com";
variables["Timestamp"] = DateTime.Now;

Dictionary<string, object> context = new Dictionary<string,object>();
context["Output"] = Console.Output;

_repos.EvaluateRules("some_rules.boo", variables, context);
}
}


RuleRepository class stores common configuration properties for your rules and allows you to call rules stored as '*.boo' files in base directory. It compiles the rule scripts and caches them so subsequent evaluations are fast. If rule script changes, it will be automatically recompiled. You should create rule repository once and hold it as long as needed.

Rule evaluation is done in 'RunSomeRules' method. To execute rules you call RuleRepository.EvaluateRules, passing the rule file name and two dictionaries.
First one contains variables that can be referenced from rules through 'Variables' object. The second one contains 'context' object, they can be referenced from rules throug 'Context' object.

Example rule:
ruleset "SomeRules":
rule "R1":
when Variables.Email.EndsWith("mydomain.com")
action:
Context.Output.WriteLine("Email from my domain")



This rule references the 'Email' variable and the 'Output' context object. Please note that in rules you don't have to quote the variable names - it's because of Boo language's IQuackFu magic interface.
RuleRepository.EvaluateRules method is thread safe.

Thursday 27 November 2008

Rules engine improved

First attempts to use the RippleBoo rules engine in real software showed that version 0.1 wasn't very useful, so I had to prepare version 0.2.
First of all, the structure of rule definition was changed - now it's more descriptive:



rule "SPAM":
label "Spam? - move to spam"
when IS_SPAM()
except_rule "Friendly_spam"
action:
MOVE_TO "Spam"
else_rule "WORK"



What we have here:

  • declaration of rule "SPAM"
  • label - for documentation
  • when - this is rule condition
  • except_rule - this is the 'exception' rule - containing an exception for the rule condition. Our 'SPAM' rule will be fired when its condition is true and the exception does not fire
  • action - executed when rule fires
  • else_rule - successor when rule condition is not satisfied


You should read it like so: when IS_SPAM() returs true, move message to 'SPAM' folder, except for messages where "Friendly spam" rule applies. If IS_SPAM() returns false, do nothing but proceed to rule "WORK"
That's basically how Ripple Down Rules work. You should note that only one rule will be fired - the one with satisfied condition and no exceptions to apply. This can be a problem when you want to execute some code each time rule condition is satisfied, no matter if there are exceptions or not. In such case you can either put your code in rule condition, or use special 'side_effect' block:



rule "VERY_IMPORTANT":
label "Important? - mark high priority"
when __msg.From == "customer_care@mybank.com"
side_effect:
__msg.Priority = "High"



The 'side_effect' will be executed just after rule condition evals to true but BEFORE checking exception rules. In contrast, the 'action' block will be executed only when rule conditions eval to true AND no exceptions apply (no rule is fired when evaling exception subtree).


Here's an example rule definition file, containing simple email message processing rules. NGinn.RippleBoo engine allows you to declare your own 'local' variables and helper functions that can be used in rules:

#variable alias
__msg = Variables.Message

#helper function - check if message is spam
IS_SPAM = def() :
return __msg.Subject.IndexOf("[--spam--]") >= 0

#helper - move message to specified folder
MOVE_TO = def(folder):
Context.MessageDb.MoveMessage(__msg, folder)


ruleset "Email_default_rules":

rule "SPAM":
label "Spam? - move to spam"
when IS_SPAM()
except_rule "Friendly_spam"
action:
MOVE_TO "Spam"
else_rule "WORK"

rule "Friendly_spam":
label "Interesting subject? - read!"
when __msg.Subject.IndexOf("enlarge") >= 0
action:
MOVE_TO "Useful_spam"

rule "WORK":
label "Work? - move to WORK"
when __msg.From.EndsWith("mycompany.com")
action:
MOVE_TO "Work"
else_rule "VERY_IMPORTANT"


rule "VERY_IMPORTANT":
label "Important? - mark high priority"
when __msg.From == "customer_care@mybank.com"
side_effect:
__msg.Priority = "High"




And here's a graphical representation of the ruleset defined above.



The picture is automatically generated from rule definition, using the GraphViz tool (useful, but very user-unfriendly, unix-style program).

Other features


What is important, we can define several rulesets in single file. First ruleset will be the default one, but RippleBoo allows you to call also the other rulesets.
You can also call other rulesets from your actions, by executing
goto_ruleset "another ruleset"

Think of secondary rulesets as sub-procedures that can be called from the main procedure.

There is also an option to execute rules from external file
goto_file "another_rules.boo"

This will execute rules from another file.
Remember, you call goto_ruleset or goto_file from an action block inside some rule. Only one action will be executed, so you don't need to worry about continuation after goto - because there will be no continuation. Simply - there is no return from goto_ruleset or goto_file.



OK, I'll shed some light on using RippleBoo in your programs in next posts, because now I'm getting sick of code formatting at this blog engine. Does anyone know why it sucks so much and what can I do so it stops messing with my html?

Friday 24 October 2008

Rules engine for NGinn

Today I have added a first working version of a rules engine to NGinn. The source code is in 'NGinn.RippleBoo' folder. 

The RippleBoo engine implements algorighm called 'Ripple Down Rules' - basically it is a binary decision tree. Each rule has simple "if then " structure, where condition is a boolean expression and action is a block of instructions. Apart from that, rule defines what will be the next rule to evaluate by specifying successor rule in positive case and successor rule in negative case. 

When rule condition evals to true, its action is executed and next rule to evaluate will be the 'positive' successor rule. When condition evals to false, action will not be executed and next rule to evaluate will be the 'negative' successor. In effect, we get a binary decision tree, but we don't have to worry about its completeness because it is guaranteed that at least one rule will fire no matter what are the conditions (because the first rule is always true).

Rules were implemented in Boo language using the RhinoDSL library from the Rhino-tools package. Rhino DSL is a library for building DSLs (domain specific languages) in Boo. Here's a link to its author's blog: http://ayende.com/Blog/archive/2007/12/03/Implementing-a-DSL.aspx. The guy has done a great work and many interesting examples of DSLs can be found there.

Below is an example ruleset in my "rule definition language". BTW, it's also a valid Boo script:


Ruleset "MyRules"


rule "R1", "R2", null, V.Counter < 9:
log.Info("AAA");

rule "R2", "R3", null, V.Counter < 8:
log.Info ("R2")

rule "R3", "R4", null, V.Counter < 7:
log.Info ("R3")

rule "R4", null, "R5", V.Counter == 1:
log.Info ("R4")

rule "R5", "R6", null, 1 == 1:
log.Info ("R5: Counter is ${V.Counter}")

rule "R6", "X", null, 2 % 2 == 0:
log.Info ("Rule six: {0}", date.Now)

rule "X", null, null, date.Today > date.Parse('2008-10-11'):
log.Warn("The X Rule!!!")


Sorry for the formatting, I'll fix that in spare time. And a short explanation of what each 'rule' means.
'rule' keyword defines a new rule. It has 5 parameters:
  • rule Id
  • id of positive successor rule (null if there is no successor)
  • id of negative successor rule (null if there is no successor)
  • condition
  • and action (action starts in new line, after last colon - because Boo allows such syntax).

So this entry:

rule "R6", "X", null, 2 % 2 == 0:
log.Info ("Rule six: {0}", date.Now)

means 'define rule R6 that will fire if expression "2 % 2 == 0" evals to true. If it is true, execute action that writes current date to log file. Next rule to evaluate will be "X", or none if the rule doesn't fire'

Currently rules engine is a completely standalone project, but I plan to integrate it into NGinn process engine.It will be used in many places, certainly as a part of process logic, but also for message routing and preprocessing. 

The main problem is that Boo is not yet used in NGinn, except for the RippleBoo project. Currently Script.Net language is the main script environment for NGinn processes and I wouldn't like to mix these two languages. So probably only one is here to stay, and chances are it will be Boo. Script.Net is more elastic and easier to use, but Boo is more mature, better tested and documented. Main issue with Boo is that it's a compiled language, so it will require more effort to integrate it with NGinn engine which is very 'dynamic' in nature. 


Wednesday 8 October 2008

BPMN - a close family

I have always considered BMPN (Business Process Modelling Notation) to be the 'best' language in its domain - very expressive and well thought out, able to describe real world situations without using strange hacks and without oversimplification. However, I have never thought about implementing it in NGinn - full BPMN 1.1 implementation seemed too complex to be an open-source project objective, especially single-person project.

So I have turned to less complex ideas after reading a bit about YAWL and decided to implement similar language for .Net. But it turns out that YAWL (and NGinn as a consequence) use very similar concepts that can be found in BPMN. That's because all of them are all based on Petri nets, but differ in everything that was added over basic Petri-net specification. For example, BPMN defines several control structures based on non-local events, such as errors (exception handling), compensation and cancellation - quite useful. NGinn has no special constructs for exception handling and no notion of compensating. But when we analyze what 'workflow patterns' can be imlemented in these languages, it turns out that there are no patterns in BPMN that could not be implemented in NGinn or YAWL. It's only a matter of convenience - for example, error handling or compensating is easy to do in BPMN and not so obvious in NGinn (custom logic required). Maybe a material for 2.0 version.

Here's a link to a very nice website about BPMN - Dive Into BPM. Enjoy the dive!

Tuesday 7 October 2008

Today I'd like to describe some examples of processes that are known to be working in current version of nginn. The focus is on control structures, not the actual task functionality (which is very incomplete as for now). I have selected rather complex and not very obvious examples because the basic ones such as parallelism (AND-split), sequences and decisions (XOR-splits), well, should just work or there wouln't be much to talk about.

Deferred choice with a timeout

This is a very common pattern - deferred choice with a timeout. It can be used for adding some time limits to manual or other tasks. When token is placed in 'start', both tasks are enabled - 'eval_candidate' and 'timeout'. When 'eval_candidate' completes first, timeout is cancelled. When timeout completes first (deadline is reached), eval_candidate is cancelled. 

Deferred choice - complex situation


This proces is an example of more complex implicit choice. There are two places with implicit choice (p1 and p2), each having two possible tasks. However, they share the t2 task. Functionality here is that system enables all tasks: t1, t2 and t3 after tokens arrive at p1 and p2. This construction ensures that either t1 and t3 can complete, or t2 can complete. When t2 completes, t1 and t3 will be cancelled. When t1 completes, t2 will be cancelled and t3 will stay enabled. When t3 completes, t2 will be cancelled and t1 will stay enabled.

OR-join with 'escaping' tokens

This is rather a complex example, so I was very happy to see it working. What we have here. First of all, there's t1 task with an OR-split. The split can choose V1 or V2 path, or both of them. The eval_candidate4 task is a corresponding OR-join.

The catch here is that we have a deferred choice in place p1, and eval_candidate3 task can 'steal' token from p1, effectively moving it out of OR-join's scope. Situations where either V1 or V2 path is chosen are not very interesting. However, if both V1 and V2 are chosen, the eval_candidate4 OR-join should wait for two tokens to arrive before eval_candidate4 can be enabled. But if eval_candidate3 steals the token, eval_candidate4 should 'change its mind' and wait for one token only. Why? Because no more tokens can arrive in such situation, so all possible OR-join's input paths don't contain more tokens.

OR-join with tokens 'stolen' by a cancellation (cancel sets)

Here the situation is similar to the previous case - we have an OR-split and OR-join and two paths V1 and V2. However, there's this little red arrow from t3 to p2. This arrow is a cancellation (cancel set), meaning that when t3 completes all tokens should be removed from p2 (effectively cancelling the eval_candidate2 task). 

Effect is that when both V1 and V2 are chosen, you need to complete eval_candidate and eval_candidate2 before eval_candidate4 will be enabled. Alternatively, you can complete t3, then you will not have to do eval_candidate2. After you complete eval_candidate2, completing t3 has no side-effects.

Short update about current development status

Recently I have made some important changes to the NGinn engine and feel that it's getting close to what I'd like to achieve. Here's the list of most important changes made:

  1. ProcessInstance class makeover. Most important change is that tokens no longer have an identity. At the beginning it was assumed that each token is an independent object and tracking the relation between tasks and tokens has been quite complicated. However, all tokens are the same, they don't convey information - so it was sensible to get rid of their identity. Now only numbers count - all we need to know about tokens is how many of them sit in each place. Results: 50% of code thrown away while retaining the same functionality. Performance and clarity improved.
  2. Custom process state serialization. I have decided to use custom XML serialization instead of binary serialization used previously. Main reason is that binary serialization doesn't support versioning and upgrading the library breaks old version of processes. It adds some work to task implementation, but we have complete control of persistence.
  3. Introduced distributed transactions (each step of process is run in a separate transaction).
  4. Basic infractructure is working. Now I need to concentrate on details and providing complete functionality. Especially, task implementation is quite behind.
  5. Number of examples was added to NGinn.XmlFormsWWW project. It demonstrates how to start and cancel process instances and how to handle worklist functionality (manual tasks). Simple TODO List web application is working (sort of).
Summing up, NGinn API is maturing and there are no heavy public interface modifications. It's time to start documenting it. I hope to use the engine in some commercial project, so this should speed things up and improve the quality. Sounds nice.

Tuesday 23 September 2008

Tasks in NGinn

Tasks are what the workflow is 'really' made of - they provide the functionality. The rest of workflow definition - places and arrows - just defines how the tasks are interconnected and what are run-time dependencies between them.
So let's try to describe what are the most common types of tasks and what can they do. NGinn is not complete yet, so this will rather be a wish list than a typical technical documentation.

  1. Manual task

    Manual tasks are tasks that are assigned to people (application users). Usually application will provide some kind of 'TODO' list where each user can see tasks currently assigned to him and from where he/she can pick up next task to be done. NGinn provides 'Manual task' building block, but it does not contain actual TODO list or GUI implementation - this is application specific and NGinn does not restrict the implementation.
    Manual tasks have the following configurable parameters:
    • Assignee - id of person responsible for the task
    • Assignee group - id of group responsible for the task (either Assignee or Assignee group must be specified)
    • Task title (short summary)
    • Description (textual description of the task)
    Much more can be said about manual tasks, for example we haven't touched at all the subject of resource management (people database) and organizational structure (groups). I'm going to give you more details on this in next posts.

  2. Timer task

    Timer tasks are used to introduce configurable delays into the process. In runtime, timer task 'starts' when it is enabled (that is, when it gets all required input tokens) and then waits specified amount of time before completing. Task has two parameters:
    • Delay amount (for example: 00:00:30, meaning 30 seconds) or
    • Due date (fixed moment in time when the task will complete). Exactly one of these parameters needs to be specified, depending on situation.


  3. Subprocess task

    As the name suggests, it's a task for starting a sub-process. When this task is enabled it initiates an instance of sub-process and waits until the sub-process completes. Then the task will also complete.
    To start a sub-process we need to know it's name (definition ID) and we need to have input data in correct structure (as defined by process input variables). In this case subprocess task's input data becomes the input data for the newly created process, and when the sub-process completes it's output data becomes Subprocess task's output data. Therefore we need to make sure the subprocess task has the same input/output data structure as the sub-process.

  4. Notification task

    Notification task is used for sending email / sms / other notifications to users.

  5. 'Receive Message' task

    The 'receive message' task waits for a message. It is used in scenarios where communication with external systems is necessary and when our process needs to wait for some information sent by external party. Each external message that can be received contains some data and must contain a special ID, called Message Correlation ID (MCID). The MCID is a runtime parameter of the Receive Message task, that is we need to specify what is the MCID for each Receive Message task. We are free to choose any MCID, but it must uniquely identify the task waiting for the message. By default (when not specified), MCID is assumed to have the following structure: [process instance id].[task id], for example e3bc903829badca321.wait_task.
    The structure of message is defined by Receive Message task's output variables - the message should simply contain values of these variables. When message is received its contents are retrieved and put in Receive Message task's output variables. Then the task completes.
    The most important fact here is that the MCID must be known to the external party when it sends us the message. So either the MCID is mutually agreed on, or our process needs to send the MCID to the external system before it can receive the message from it.


  6. Script task

    Script tasks are used for adding custom logic to the process. Currently they can be programmed in 'Script.NET' language. Script tasks are synchronous, they cannot be 'put to sleep' and reactivated by NGinn execution engine. Script code can access and modify task's variables, but it can also communicate with other objects in application's runtime. They can be used for communication between business processes and the rest of the application.

  7. Empty task

    Empty task does nothing - completes just after being started. But all variable bindings do their work and they can be used for synchronization without side effects - and this is the main purpose of empty tasks.

  8. REST/WS call task

    Synchronous communication with external systems, using XML/HTTP or SOAP. The task sends a HTTP request containing it's input data and expects to receive XML with the output data (XML structure is defined by task's output data structure). Currently there's no implementation of web service calls - I'm waiting for the right idea.

  9. Custom tasks

    Custom tasks can be used to introduce some custom or application-specific components into NGinn process description language. Custom tasks can be implemented in any CLR language, they just have to implement few interfaces and conform to some rules. This is a good topic for separate post.
And this is it, I consider the list to be complete and broad enough at the same time. Almost all real-time task examples can be implemented using one (or more) of NGinn tasks, and what can't be implemented or is difficult to stuff into built-in task type can be done as a custom task.

Monday 8 September 2008

Great resource on workflow patterns

Here's a link to 'Workflow Patterns' website - a great source of information about workflow definition patterns. Please take a look at animated examples of how each pattern works - this is very much like it is implemented in NGinn. No wonder, however, as the website was created by YAWL guys.

Saturday 6 September 2008

First presentation of NGinn

Today I had a short presentation of the NGinn project at 'Zine day' - Warsaw based meeting of .Net geeks from around the Poland . My presentation was a part of open source project contest (and has even won a second prize, thanks guys!). What really pleased me was that many people got interested in the project and were asking lots of questions about how they could use the nginn engine in their projects. That gives me some clue on what features should be implemented in nearest future and how important it is to release the first version. The presentation was in polish, here's the file - sorry, no english version for now.
I enjoyed the meeting very much and would like to give my thanks to organisers, attendees - especially those who gave their votes to nginn.

Tuesday 12 August 2008

Data handling in workflows

In order for workflows to be usable, they must convey some information and process it. NGinn provides a simple to understand data model that allows for easy integration with external applications.

Each instance of NGinn process contains some data. It is called process instance data. This data consists of variables, each variable having a name and holding value of some type. These variables are global to process instance. There is also 'task data', that is a set of variables containing data for a task instance.

More on variables

Variables are the base of NGinn data handling concept. Each variable has a name, type specification, 'requiredness' and direction. Here's an example of process variable definition:


<variable name="requestorName" type="string" required="true" isArray="false" dir="In" />

This line defines a 'requestorName' variable holding a string value, which is required. The variable is single instance (isArray="false") and it is an input variable.
Let's explain this a bit:
  • direction (dir="In"). In NGinn, variables can be input (dir="In"), output (dir="Out"), both ways (dir="InOut") and local (dir="Local"). Think of them as procedure arguments. Input variables are used for passing data to process or task instance. Output variables can return execution results from a process or task instance. Local variables are local to process or task, that is they are internal only and invisible to the outside world.
  • type (type="string"). Variables can be of some type. NGinn supports basic types, such as string, int, date, bool and complex types (like 'structs' in C or C#), explained later.
  • array (isArray="false"). Variables can be single instance, holding only one value, or multiple instance, holding an array of values of the same type.
  • required. Variables can be required or optional. Required variables must be passed to process or task. Optional variables can be ommitted, and there is an option for providing default value for optional variables. Default value will be used when the variable is not specified.

Process data consists of a set of process variables - each process has its own set of variables. The names and types of variables are defined in process definition. Each task executing in the process instance has its own data, held in task variables and completely separated from the process data. It means that a task can operate only on its own variables and cannot access parent process' or other task's data.
So, there is a question - if there is a complete isolation between task and process data, how do they exchange the information?
The answer is: through data bindings.

Data bindings

Data bindings define how process data is mapped to task input data, or how task output data is mapped back to process data.
Each task has a set of input bindings and output bindings. Input bindings define what information from process instance data will be put in task's input variables. Output bindings define what will happen with task output data - what process variables will receive the values of task output variables.
Input bindings are executed before the task is started. Output bindings are executed after the task completes.

<input-bindings>
<binding variable="requestor" bindingType="CopyVar" sourceVariable="requestedById" />
<binding variable="requestorName" bindingType="CopyVar" sourceVariable="requestedByName" />
</input-bindings>
<output-bindings>
<binding variable="managerApprovalDecision" bindingType="Expr">
<expression>decision</expression>
</binding>
</output-bindings>

This is an example of task's input and output bindings.

Each binding defines variable that receives the data ('variable' attribute). In case of input bindings it is a task input variable and in case of output bindings it is a process instance variable. In this example, first input binding defines that 'requestor' task input variable will receive the value of 'requestedByName' process variable. 'CopyVar' binding type means that the binding is a simple value copy - from sourceVariable to variable.
The output binding above is more interesting. It is a binding for managerApprovalDecision process variable and this is expression-based binding (type="Expr"). It means that in order to get the value for managerApprovalDecision, expression will be evaluated. Here the expression is 'decision' - so it returns a value of 'decision' task variable. Of course, expressions can be more complex.

Data types

NGinn has built-in support for several basic data types, but the list of types can be extended by custom type definitions.
Supported basic types are:
  • string
  • int
  • double
  • date
  • datetime (time stamp)
  • future releases may add more simple types
Complex data structures can be built out of simple types, in a recursive way. Each process definition contains 'processDataTypes' section, where custom data types can be defined.
There are two kinds of custom data types that can be defined in NGinn - records (structs, similar to C/C# structs) and enums (enumeration of possible values).
Here's an example of a struct:

<processDataTypes>
<struct name="OrderItem">
<member name="code" type="string" required="true" />
<member name="name" type="string" required="true" />
<member name="quantity" type="int" required="true" />
</struct>
</processDataTypes>

This is a definition of 'OrderItem' type, which is a struct with three fields: 'code', 'name' and 'quantity'. Each field si required and is single value (not an array).

And here's an enum type with two possible values: YES and NO:

<enum name="YesNo">
<value>NO</value>
<value>YES</value>
</enum>

Process data and XML

Data structures in NGinn are designed so that they can be easily converted to XML and back, so data exchange with external applications is simplified. Process or task data definition can be converted to XML schema, so by defining process or task data structure we also get XML data schemas for data exchange.
For example, let's take some process data definition:


<processDataTypes>
<enum name="YesNo">
<value>NO</value>
<value>YES</value>
</enum>

<enum name="WeekDay">
<value>Sun</value>
<value>Mon</value>
<value>Tue</value>
<value>Wed</value>
<value>Thu</value>
<value>Fri</value>
<value>Sat</value>
</enum>

<struct name="DeStrukt">
<member name="Decision" type="YesNo" required="true" isArray="false" />
<member name="Day" type="WeekDay" required="true" isArray="false" />
</struct>


</processDataTypes>
<variables>
<variable name="value" type="DeStrukt" required="true" dir="In" />
</variables>

This definition contains two enumeration types and one struct type. And here's how it converts to XML schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="YesNo">
<xs:restriction base="xs:string">
<xs:enumeration value="NO" />
<xs:enumeration value="YES" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="WeekDay">
<xs:restriction base="xs:string">
<xs:enumeration value="Sun" />
<xs:enumeration value="Mon" />
<xs:enumeration value="Tue" />
<xs:enumeration value="Wed" />
<xs:enumeration value="Thu" />
<xs:enumeration value="Fri" />
<xs:enumeration value="Sat" />
</xs:restriction>
</xs:simpleType>
<xs:complexType name="DeStrukt">
<xs:sequence>
<xs:element name="Decision" type="YesNo" minOccurs="1" maxOccurs="1" />
<xs:element name="Day" type="WeekDay" minOccurs="1" maxOccurs="1" />
</xs:sequence>
</xs:complexType>
<xs:element name="DataStructs">
<xs:complexType>
<xs:sequence>
<xs:element name="value" type="DeStrukt" minOccurs="1" maxOccurs="1" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Why is it important? Because it defines the structure of XML message with process input data, so we can start new process instance by sending xml similar to this:

<DataStructs>
<value>
<Decision>YES</Decision>
<Day>Fri</Day>
</value>
</DataStructs>

With the XML schema auto-generated from process definition it's much easier to integrate NGinn with external tools. For example - we can take this schema and feed it into Microsoft InfoPath form designer. Then we can design an InfoPath form that can be used to start new process instance, and the InfoPath designer will know the resulting XML data structure from the schema.
Not only process input data can be defined in XML, this applies also to process output data and to task input/output data. This way, XML can be used to integrate NGinn with external applications without introducing custom protocols.

Monday 30 June 2008

A simple process definition example

NGinn is under development and almost every part of it keeps changing. It also happens to the language itself. Currently NGinn is based on XML description of process structure and there is no graphical representation. It is important to have graphical design tools for process graphs, but I think it is too early to develop them -XML form is enough for now. Graphical NGinn representation will be developed after the language becomes stable enough.

OK, so how does NGinn process definition look like?

First of all, NGinn process is a Petri-net, so the definition contains the net structure. Petri net consists of places, transitions and connections between them. Then process-specific extensions are added, such as specialized tasks and process data specification. Let's see an example:

1. Here's a simple process

2. And here's the XML description of this process:



<?xml version="1.0" encoding="utf-8"?>
<process version="2" name="TimerTask" xmlns="http://www.nginn.org/WorkflowDefinition.1_0.xsd">
<places>
<place id="start" type="StartPlace"></place>
<place id="end" type="EndPlace"></place>
</places>
<tasks>
<task id="init" type="EmptyTask" splitType="AND">
</task>
<task id="timeout" type="TimerTask" joinType="AND" splitType="AND">
<timerTask>
<delayTime>00:01:00</delayTime>
</timerTask>
</task>
<task id="timeout2" type="TimerTask" joinType="AND" splitType="AND">
<timerTask>
<delayTime>00:01:00</delayTime>
</timerTask>
</task>
</tasks>

<flows>
<flow from="start" to="init" />
<flow from="init" to="timeout" />
<flow from="init" to="timeout2" />
<flow from="timeout" to="end" />
<flow from="timeout2" to="end" />
</flows>
<processDataTypes>
</processDataTypes>
<variables>
<variable name="delayAmount" type="string" required="true" dir="In" />
</variables>
</process>


OK, so let's see what we have here:
  • the main process element is the root of process definition file. Contains process identification attributes, such as name and version number
  • The 'places' section - contains list of places in the process definition. Here we have only starting and ending place (note the 'type' attribute identifying the start and end place)
  • The 'tasks' section - contains a list of tasks, that is transitions in Petri-net terminology. Tasks can be of several types, offering different functionalities. Here we used empty task (init), which does nothing but is present for synchronization purposes, and two timer tasks which wait for specified period of time. Each type of task is defined in 'task' element, however internal structure of 'task' element depends on the task type.
    Note the 'joinType' and 'splitType' attributes of tasks. These are very important attributes - they specify the synchronization logic between tasks. There are three types of split and join: AND, OR and XOR, we will discuss each type in later posts.
  • The 'flows' section, connecting places and tasks. Each 'flow' has its starting node (from) and ending node (to).

That's the Petri-net structure description. However, there is an inconsistency with Petri-net specification. Note that the 'init' task is connected directly to 'timeout1' and 'timeout2' tasks, without intermediate places. Petri nets forbid that - there can be flow only from place to a transition and from transition to a place, place-place and transition-transition connections are not allowed. In NGinn two tasks can be connected. In such case and implicit place is inserted between these two tasks. In our example, there would be an implicit place between init and timeout1 and second implicit place between init and timeout2. The purpose of such construct is only to simplify process definition, however in some cases we will need to specify places explicitly.

There are two more sections in the process definition XML:

  • 'processDataTypes' section - contains definitions of data structures used for representing process data. Here it is empty, and data structures will be discussed in later posts
  • 'variables' section, containing definitions of process variables. Process variables are like arguments of a function - there can be input variables (input arguments), output variables (return values) and local variables. Also, input-output variables are possible. Here we have only one  variable - 'delayAmount' string.

OK, done with process definition. But what does this process do?

  1. Init task is executed. It does nothing, but then the execution is split in two parallel flows(remember the 'AND' split). So the 'init' task consumes one token from the 'start' place and produces two tokens in the implicit places for timeout1 and timeout2 tasks.
  2. Timeout1 and Timeout2 execute simultaneously - each of them waits exactly 1 minute and then completes, consuming token from the implicit place between 'init' and itself and producing token in the 'end' place. 
  3. There are no tokens except for the 'end' place, so the process is completed.

Friday 27 June 2008

Process definition

NGinn process model is based on Petri nets extended with process-specific information. Picture below shows a basic Petri net. It consists of places (circles) and transitions (rectangles), connected with arrows.

 

I will not explain here how Petri net works - it can be found in many sources, for example here. Most important principle is that tokens (black dots) represent current process status. Each transition consumes tokens from its input places and produces tokens in output places, so effectively tokens move across the Petri net. In NGinn, we add two special places to the picture - a start place and an end place. Start place is the process starting point - process starts when a token is placed in the start place. End place is the finish point - process finishes when all tokens reach the end place. In the picture above, we could say P1 is the start place and P4 is the end place. 

Transitions in NGinn are called tasks. Tasks are the basic building block of a process and they represent various actions that are needed to complete the process. There are many types of tasks in NGinn, they will be described later. When a task completes, it consumes one or more tokens from its input places (which tokens are consumed depends of task 'join' type - described later) and produces one or more tokens in output places (how many tokens are produced depends on task 'split' type - also described later).

Places in NGinn have 'original' Petri-net meaning. They are just a places where tokens are held during periods of inactivity (when nothing happens in a process or when we are waiting for a task to complete).

NGinn idea is based on the YAWL language -process building blocks are generally the same, so you can read about YAWL to get the idea on how the process definition looks like. However, NGinn provides completely different implementation of the engine and process tasks.

Introduction

Hi, this blog will be dedicated to the 'NGinn' workflow engine. Here I will document its functionality and the development process. You can find nginn at http://code.google.com/p/nginn.

You can wonder why someone would create a workflow engine for .Net if there is one freely available from Microsoft and built in .Net - Windows Workflow Foundation. My opinion is that WF does not provide the functionality expected from a business process engine. First of all, WF is low level, that is provides basic constructs that could be used to build workflow engine but are not very useful for modelling business processes. WF looks like a set of components useful for programmers (actually it looks like a graphical representation of some procedural code) and developers probably are happy with it. However, business process analysts would find it difficult to program directly in WF. Secondly, WF does not provide process description standard - programmers are free to model process logic and process data as they want, there is no standard process representation. This limits the portability of process definitions and makes it difficult to integrate different applications and processes.

NGinn will focus on providing more standard and restrictive process description language in order to enhance portability, and at the same time it will offer higher-level process building blocks that can be understood and used by business analysts. NGinn will also include additional components needed for running business processes:

  • Resource management - so the information about people and organizational structure can be accessed and used in process definitions
  • Embeddable and standalone process execution engine
  • GUI for end users (a 'proof of concept' worklist application)
  • Integration and communication components (email notifications, web service calls, etc)



* highlighter
* cnj