configuration

Before evoking configuration it’s perhaps a good idea to have a look at the post announcing ruote 2.1, it’s detailing the design decisions behind ruote 2.1. It helps understand the 3 objects detailed in this page, engine, worker and storage.

In ruote 2.1, the engine class became shallow, just a few methods that insert launch and reply orders in the storage and read it when querying for process statuses.

The real engine is composed of a storage (persistent core) and of one or more workers.

Storage implementations are meant to be process/thread safe, ie they can be used by multiple worker processes and engines.

The engine class is nothing more than a dashboard, with indicators and a few knobs.

This is the ruote configuration you can see in the entry page of this documentation :

            require 'ruote'
            require 'ruote/storage/fs_storage'
            
            engine = Ruote::Engine.new(Ruote::Worker.new(Ruote::FsStorage.new('work')))
            

This is a super-vanilla configuration. Packaging the engine, a worker, and the storage together.

Using ruote-dm DataMapper persistence, an engine/worker initalization would look like :

            require 'ruote'
            require 'ruote/dm/storage'
            
            engine = Ruote::Engine.new(
              Ruote::Worker.new(
                Ruote::Dm::DmStorage.new(:default)))
            

Where :default simply indicates, we’ll want to work with the default configured DataMapper repository.


engine

Engine options are in fact passed to the storage at initialization time.

            require 'ruote'
            require 'ruote/storage/fs_storage'
            
            engine = Ruote::Engine.new(
              Ruote::Worker.new(
                Ruote::FsStorage.new(
                  'work',
                  'remote_definition_allowed' => true, 'ruby_eval_allowed' => true)))
            

They can also be set directly in the storage (document type ‘configurations’, key ‘engine’), but that’s an advanced technique (and you don’t want/need to change it at runtime).

Some people might prefer writing :

            engine = Ruote::Engine.new(
              Ruote::Worker.new(
                Ruote::FsStorage.new('work')))
            
            engine.configure('remote_definition_allowed', true)
            engine.configure('ruby_eval_allowed', true)
            

engine options

  • participant_threads_enabled: (from ruote 2.3.0 on)

Defaults to true.

By default, the dispatching of workitems to participants is done in a new Ruby thread. The goal is not to block the worker with potentially lengthy operations. Dispatching via a homing pigeon, for example, is a costly operation, you have to take the pigeon out of his cage, attach the message, and set it free… By default such blocking operations are performed in their own threads.

In deployments where multi-workers are the norm, having one worker blocked dispatching is no problem. In such contexts, turning off the threaded dispatching is OK.

  • remote_definition_allowed:

Defaults to false.

Remote definitions are process definitions reachable over HTTP. Since process definition are ‘code’, ruote, by default, prevents you from doing things like

            Ruote.process_definition :name => 'main process' do
              sequence do
                subprocess 'http://example.com/definitions/head_process.rb'
                subprocess 'http://example.com/definitions/tail_process.rb'
              end
            end
            
            # or
            
            engine.variables['head'] = 'http://example.com/definitions/head_process.rb'
            engine.variables['tail'] = 'http://example.com/definitions/tail_process.rb'
            
            Ruote.process_definition :name => 'main process' do
              sequence do
                head
                tail
              end
            end
            
            # or simply
            
            engine.launch('http://example.com/definitions/main.xml')
            

You have to explicitely set ‘remote_definition_allowed’ to true.

  • ruby_eval_allowed:

Defaults to false.

More about this option in the page about the dollar notation.

  • wait_logger_max:

Defaults to 147.

(this setting only makes sense in “1 worker – test/development” environments, in other environments, just leave it alone)

The WaitLogger is a component inside of ruote that keeps track of the 147 latest messages processed by the local worker (if any). This 147 number can be tuned thanks to the ‘wait_logger_max’ option.

  • preserve_configuration:

Default to false.

More of a storage configuration. When set to true, the engine/worker/storage group will not write any configuration to the persistence behind the storage. It will simply read.

This option is useful in multi-worker setting, when configuration is done once and then read by the workers.

  • restless_worker:

Defaults to false.

More of a worker configuration. When set to true, the worker will not sleep between its storage polls for msgs and schedules to execute. This option is set to true by some storage implementations, those that block in a connection with their persistence backend and get pushed msgs and/or schedules.

Safely ignore this option.

  • worker_state_enabled:

Defaults to false.

When set to true, “unlocks” the #worker_state= method of Ruote::Dashboard (Ruote::Engine). Possible states are “running” (default), “paused” and “stopped”. Workers reads that state and pause/resume/stop accordingly. Please note that a stopped worker won’t read state further since it’s stopped and gone, use “paused” for pause/resume.

engine on_error / on_terminate

There are the on_error and the on_cancel attributes common to all expressions. Engine#on_error and #on_terminate are quite close to those, but there is an important catch. The on_error attribute will cancel the expression to which it is attached and then run the on_error ‘routine’, while Engine#on_error runs independently of the process whose error triggered the reaction. Same thing for on_terminate.

The processes triggered by on_error and on_terminate are independent processes (but they do not trigger further on_error / on_terminate, cascade prevention).

Looking at the functional tests for on_error and on_terminate might help understand.

on_error

Each time an unchecked error occurs in a process instance, the participant or the subprocess given in on_error will get triggered.

            # you can pass a participant name
            engine.on_error = 'administrator'
            
            # or a subprocess name
            engine.on_error = 'error_procedure'
            
            # or directly a subprocess definition
            engine.on_error = Ruote.define do
              concurrence do
                administrator :msg => 'something went wrong'
                supervisor :msg => 'something went wrong'
              end
            end
            

The workitem used in the on_error “handler” is a copy of the workitem at the error point.

engine on_terminate

This handler launches a subprocess each time a process instance terminates in a regular way.

Its usage is similar to Engine#on_error :

            # you can pass a participant name
            engine.on_terminate = 'archiver'
            
            # or a subprocess name
            engine.on_terminate = 'archival_procedure'
            
            # or directly a subprocess definition
            engine.on_terminate = Ruote.define do
              concurrence do
                supervisor :msg => 'process ${wfid} terminated'
                archiver
              end
            end
            

The workitem passed to the triggered process instance is a copy of the one in the process that just terminated.


worker

As of now, there are no configuration options for workers. They don’t complain, they are not syndicated, they just work.


storage

Workflows / business processes usually involve real persons, humans. They are slower than computers. These days processes also imply multiple systems / services. These two things imply that workflows / processes may last a long time. Persistence is necessary, this is done with the storage.

Since workers share the storage, it has not only to provide reliable persistence but also helpers to avoid worker collisions.

The following table summarizes the various storage implementations.

multiple workers tells if yes or no the storage supports multiple workers;
remote worker indicates if workers not on the same host as the storage are possible;
speed is a relative indication of the speed of the storage.

storage multiple workers ? remote workers ? speed
Ruote::HashStorage no (1) no best in-memory storage, limited to the current ruby process, totally transient
Ruote::FsStorage yes (2) no 2nd hierarchy of JSON files, uses file locks to prevent collisions when multiple workers
Ruote::Redis::Storage yes yes 1st Redis based persistence
Ruote::Sequel::Storage yes yes 4th Sequel based persistence
ruote-postgres yes yes 4th persistence directly based on PostgreSQL
Ruote::Mon::Storage yes yes like the Redis storage MongoDB based persistence
Ruote::Beanstalk::Storage yes yes 3rd FsStorage based persistence with a Beanstalk front
Ruote::Dm::Storage yes (3) yes 5th DataMapper based persistence (not maintained anymore)
Ruote::Couch::Storage yes yes slowest Apache CouchDB based persistence, very slow (not maintained anymore)

(1) well, it’s ‘yes’ but there isn’t much to gain from one more worker in the same Ruby runtime
(2) multiple workers, but not on Windows
(3) “no” before ruote-dm 2.2.0

Ruote::HashStorage

source

A completely transient, in-memory storage for ruote. Cannot be shared by multiple workers. Mostly used for testing or for transient workflows.

Ruote::FsStorage

source

Stores ruote information into a hierarchy of JSON files. Can be shared by multiple workers (though on the same host).

Rather fast. Easy to use (just a bunch of files).

Doesn’t work with multiple workers on Windows (the file locking mechanism it uses is not supported on this platform).

Ruote::Redis::Storage

source

A redis based storage. Very fast, usable by multiple workers (remotely).

Ruote::Sequel::Storage

source

A Sequel persistence. Tested with PostgreSQL and MySQL. OK with multiple workers.

ruote-dm and ruote-sequel share the same schema (1 table).

Ruote::Dm::Storage

source

A DataMapper storage implementation. OK with multiple workers (since ruote-dm 2.2.0).

ruote-dm and ruote-sequel share the same schema (1 table).

Ruote::Couch::Storage

source

A CouchDB storage implementation. OK with multiple workers.

It’s rather slow. People tend to use it to store workitems, and let the msgs and schedules be stored in faster implementations (see CompositeStorage).

Ruote::Mon::Storage

source

MongoDB storage implementation.

Ruote::Beanstalk::BsStorage

This storage is an experiment. It uses a set of beanstalk tubes to make a FsStorage available to remote workers.

source

composite storage

source

The “composite” storage lets you select which storage to use for which category of object to be persisted. Here is a table detailing those ‘categories’ :

type description
expressions the atomic pieces of process instance
msgs ‘messages’ to apply/reply to expressions
schedules msgs scheduled for later processing
errors errors that occured during process execution (msgs processing)
variables engine (global) variables
configurations engine configurations
workitems StorageParticipant workitems
history (if you use Ruote::StorageHistory)

An example where everything is handled by a FsStorage while the ‘msgs’ are stored in a HashStorage :

              opts = {
                'remote_definition_allowed' => true,
                'ruby_eval_allowed' => true
              }
            
              engine =
                Ruote::Engine.new(
                  Ruote::Worker.new(
                    Ruote::CompositeStorage.new(
                      Ruote::FsStorage.new('ruote_work', opts),
                      'msgs' => Ruote::HashStorage.new(opts))))