Registration is open for Symphony Innovate New York 2022. Sign-up

Introducing: Symphony’s Engineering Blog

Do mesh with me!

At Symphony, we provide secure and compliant collaboration as a service to financial services organizations. Founded in 2014 by a consortium of financial institutions, we inherited a monolithic single tenant architecture that has served us well so far. As our open platform solidifies its place in the fintech ecosystem, we continue to further push the boundaries of interoperability, connect with multiple partners, and expand our portfolio with acquisitions. In executing this mission, our globally distributed engineering team faces the challenges of modernizing our core architecture while maintaining best-in-class services to power mission critical applications and workflows.

In our first blog of this series, we will focus more precisely on how a microservices architecture can be choreographed to deliver overall product functions, allow data sharing, and power data analytics in an enterprise architecture. As software architects, we need to answer these questions:

  • How can we scope and organize services in a microservice architecture to scale to a high number of services and follow organizational evolutions?
  • How can we ease interconnections across the services?
  • Which patterns will help provide operational excellence?

Recognizing synchronous request/response patterns for communication between microservices can hamper scalability of a service-oriented architecture, we sought to find other patterns for scaling our platform.

Enter… data mesh

To begin our architecting exercise, we needed to understand how to group data and functions into cohesive services to support our various product lines, anticipating the need for data sharing. We decided to follow the data mesh approach, a concept developed by ThoughtWorks’ Zhamak Dehghani, starting with Domain Driven Design (DDD).

With our first architecture designs, we quickly realized that we would likely have to deal with dozens of services to provide all the functionality of our product portfolio. As a result, the concept of bounded context proved to be critical. Bounded context allows for identification of models interrelated due to their common presence in related business needs. We can then identify touchpoints in between contexts, possibly surfacing through concepts with similar semantics in adjacent capabilities. By extrapolating data models to functions and services, bounded context helps group functions based on context and utility: those that manipulate the same data and have frequent interactions, functions that are bridging data flows in between contexts, and unrelated functions.

From this view (fully acknowledging Conway’s law), it becomes easier to attain organizational alignment even as our organization is constantly evolving. It becomes easier to identify team ownership on a set of services and make organizational decisions without performing major rewrites of our services. We also gain a better understanding of runtime coupling in between services, and can start applying governance on the interconnection points.

A simplified, partial decomposition of our DDD model with three identified bounded contexts could be defined as follows for our secure and compliant collaboration services:

Three main bounded contexts stand out: collaboration, identity, and compliance. Concepts like “participant” and “public profile” are linked across the contexts. This model also allows us to highlight data sets from other contexts that are required for services. For instance, profile discovery and conversations services depend on compliance control rules to authorize access to public profiles or addition of participants to conversations.

The more entities, services, and contexts are set up and running, the harder it becomes for development teams to identify and rely on existing assets.

Discoverability

In our architecture group, we initiated registries for all data entities, APIs, and service documentations to provide visibility and standardization throughout the organization. Here is some information we collect at Symphony from our engineering teams with example standards.

Registry type
Contracts
Addressing
Schema
Entity schema – JSON Schema, Protobuf, Avro
Link to APIs or topics
Duplication, latency to data update, cardinality, changes throughput
API
API specifications – OpenAPI, gRPC
Paths Link to service
Response time, throughput, uptime
Service
Not standardized
Domains Routes Clusters
Response time, throughput, uptime

We enrich registries with automated or gated checks and documentation to apply governance across the deliverables from the different teams, provide best practices, ensure consistency, and enforce API or service scopes to maintain separation of concerns.

Service mesh

A data mesh provides an excellent framework for asynchronous communication, but there will still be a need for some synchronous interconnection between services. A service mesh pattern can address this need.

Event mesh

A particular slice of data mesh is about real-time data sharing, providing an excellent framework for an event-driven architecture. We refer to this subdivision as the event mesh.

A key aspect of implementing this pattern is the ability to incorporate data from external sources by building data caches, replicas, or materialized views within bounded contexts. We can achieve this without tight coupling, details of which will be presented in a future article in this series.

Tradeoffs in mesh approaches

At Symphony, as in most IT or SaaS companies, one of our primary goals is to maintain the highest service levels possible for mission-critical applications. We use the following metrics to measure progress:

  • Uptime SLA / SLO: percentage of time the service is able to process requests successfully over a one-month or one-year period
  • Mean Time to Recovery (MTTR): the mean time required to bring service back into a stable state after an incident
  • Incident severity distribution: a score to measure the severity of an incident’s impact on organizations or individuals
  • Latency: average, p90, p99 time required to process a service request
  • Throughput: amount of requests a functioning service is capable of processing per unit of time

Here is a comparison between service mesh and event mesh, with their impacts on these metrics. Value for assessment is done in comparison with a do-nothing approach.

Metric
Service Mesh
Event Mesh
Uptime SLA
Increased through standardizing exception-handling flows
Greatly increased through contained blast radius
MTTR
Reduced through standardizing exception-handling flows
Greatly reduced through contained blast radius
Incident severity
Reduced through standardizing exception-handling flows
Greatly reduced through contained blast radius
Latency
Significant variance during incidents, low end-to-end latency
Lesser variance, longer end-to-end latency
Throughput
Capped by throughput of slowest service in end-to-end flow
Driven by front APIs throughput only
Key benefit
Minimize end-to-end latency
Full isolation

Conclusion

Through an example and simplified model, we have introduced the usage of DDD and its extrapolation to structuring microservices and governing their interactions. We have highlighted the impacts of two service communication patterns on key operational metrics: service mesh and event mesh. The pattern with the most significant positive impact is event mesh. However, not all communications can be asynchronous. It makes sense to favor event mesh whenever possible, and otherwise rely on service mesh.

In our next blog, we will explore the positioning of event modeling in software development lifecycle, as well as detail our practical approach to build a collaborative and governed event catalog.

Subscribe to community updates

Please click here, if the form above isn't displayed properly
By submitting this form, you agree to Symphony's Terms of Service and Privacy Policy

You may also like

Tech4Fin

Carrying over history to the mesh

In our journey building an event mesh, we face the challenge of reconciling with Symphony’s engineering history. Specifically, we are transitioning from a single-tenant, monolithic architecture to a multi-tenant, microservice architecture. And critically, due to the nature of our business, this shift must be both progressive and smooth.

Tech4Fin

Automate workflows quickly and easily with Symphony’s new Workflow Developer Kit

At Symphony, streamlining complex financial workflows is a core part of our mission, and automation is one of our most powerful tools for achieving this objective. For years, our developers as well as users throughout our community have built bots on the Symphony platform that automate individual processes or segments of workflows. These have ranged from integrations with tools like Salesforce and ServiceNow to streamlining customized client onboarding processes.

Innovation

The Buyside: A Technology Evolution

The Buyside has undergone a massive evolution in the past few years. Indeed, a radical transformation has re-shaped all asset management firms – and the professionals who operate within them. As a result, the needs of the Buyside are quite different now than they were just 24 months ago.