SaFi Bank Space : Orchestration engine

Status

COMPLETE

Impact

MEDIUM

Driver

Juraj Macháč (Unlicensed)

Approver

Contributors

Informed

Due date

Resources

(blue star) Relevant data

(blue star) Background

Some processes (e.g. onboarding, but also any processes that make a series of operations) require orchestrating a lot of other services and actions. It should be possible to fairly simply define and change these actions and provide a visual overview.

The workflow engine should be suitable to orchestrate processes across microservices. The main focus and properties to look for are:

  • Performance - the engine will be used to orchestrate many parallel business processes. Latency between the steps should be minimized

  • Resilience - the engine should be resilient enough to not lose its state under any (reasonable) circumstances.

  • Testability - one should be able to write tests for each of the defined business processes. The workflow engine should offer a testing framework.

  • Process versions - it should be possible to have multiple versions of the same process running concurrently to support backward-compatible upgrades.

  • Retry-ability - to achieve eventual consistency, the workflow should guarantee (or have configurable) at least once delivery of the message to the consumer.

  • Clarity of definition - the processes should be possible to define by code which improves the ability to review the definitions.

(blue star) Options considered

Zeebe (Camunda)

Temporal.io

Netflix Conductor

Description

Orchestration engine oriented towards BPMN 2.0 specification which was created as a light-weight, performant version of Camunda BPM.

https://camunda.com/products/cloud/workflow-engine/

Orchestration engine oriented towards writing reliable, resilient and scalable applications with the execution of both the workflow and jobs distributed across the worker nodes. A fork from Cadence workflow which is used by Uber.

https://temporal.io/

Dedicated workflow engine for orchestrating microservices which is developed and used by Netflix

https://netflix.github.io/conductor/

Pros and cons

(plus) Defined in BPMN 2.0, which is standard for defining business processes and also acts like a live documentation

(plus) horizontally scalable

(plus) exports a stream of events for further processing in warehouse (e.g. to elastic)

(plus) Java worker client SDK available

(minus) Java testing framework provided but deprecated, and a new one is yet in active development

(minus) The default UI (zeebe-operate) does not support RBAC. It is possible to change instance data from the UI directly. However, there should be read-only alternatives available

(minus) Although it’s in public release version (1.3.x), it’s still quite new tech with sparse community

(minus) while the definition is in standard BPMN 2.0 and is nicely visualized, it also makes it hard to code review as it’s an XML.

(plus) Defined directly as Java/Kotlin code. All conditions and the flow is easily developed without having to learn BPMN

(plus) horizontally scalable

(plus) deployed in Jago already, we can inspire ourselves from it in terms of deployment and usage

(plus) Kotlin SDK available

(plus) Testing framework available

(plus) Changes to the existing workflow can be done only through changing the code. Hence having more control over the changes.

(plus) Workflows are defined in Kotlin which promotes readability and possibility to review the code

(minus) The default UI is not very user friendly and more tech-oriented

(plus) Definitions are in JSON which are then built to a graph. Reviewable by developers

(plus) Java SDK available

(minus) No type safety in variables

(minus) No testing framework. Testing done only through mocking services it connects to

(minus) Jago moved away from Conductor to Temporal.io

Estimated cost

(blue star) Action items

(blue star) Outcome

Temporal.io was chosen as an orchestration engine because it’s closest to the development and promotes readability of the defined workflows. While the visual representation as form of a living documentation is certainly a nice feature, it should not be the dealbreaker point for workflows which are anyways going to be implemented by developers.