Introduction to Data Mesh Architecture

Maja Perusic, Iain Hunter

/ 2024-03-13

Why Data Mesh

In 1988 IBM researchers Barry Devlin and Paul Murphy proposed a solution to the growing business need to have a centralized data repository with the concept of a Data Warehouse. The Data Warehouse became the dominant architecture for data management within organizations for the next 20 years.

More recently the advent of cloud computing and the continued falling price of storage meant that data warehouses couldn’t handle the volume of data large organizations generated. This fact led to organizations creating data lakes and more recently combining the two approaches with Data Lake-Houses.

Classical data approaches solved the problem for the limited data we needed to manage in the past, but nowadays it’s not uncommon for companies to generate billions of data events per day. The amount of data generated surpasses what is humanly possible to manage. Companies have tried multiple approaches – increasing the size of the IT departments, specialized data engineering teams, IT personnel within each line of business. However, each of these solutions brings some challenges.

One centralized data team quickly becomes a bottleneck, as they can’t answer all the analytical questions from all the stakeholders at the same time. If each department handles their own data, we’re back in the decentralized data architecture where many valuable insights are expensive to compute, complicated to manage or even lost. In each case the Data Lake slowly becomes a Data Swamp where organizations struggle to get insight from the data they generate.

In 2019 Zhamak Dehghani from Thoughtworks proposed a new solution. What if we can have it all - decentralized data, with each subject managed by its own department that knows it best, but easily accessible from a central catalogue?

What is Data Mesh

Data Mesh is an analytical data architecture and operating model where data is treated as a product and owned by teams that most intimately know and consume the data. (Thoughtworks, https://www.thoughtworks.com/what-we-do/data-and-ai/data-mesh))

In today’s world data IS a valuable product, and Data Mesh enforces handling it as such. Data Mesh architecture proposes that each line-of-business has its own team that owns the data, does their everyday operations with it, but also publishes it as a data product to a centralized data location. From there, it can be rapidly consumed and shared to whoever needs it.

In her Data Mesh idea, Dehgani introduces terminology for the key actors required to implement Data Mesh successfully:

center-big

Data Producers - Each line of business that publishes one or more data products. Producers create the data, transform it as needed, and are responsible for all data quality, finally publishing it to the centralized data repository.
Data Governance – Data Mesh recognizes the need in large enterprises to ensure data is managed securely and only cleared individuals can access PII data. The Data Governance team is responsible for cataloguing, managing access and granting access to Data Consumers
Data Consumers – Any individual or team who is interested in a published data product. These might be Data Science or Business intelligence teams or C-Level execs looking for insight reports. Consumers refer to the catalog managed by Data Governance and request access to the data products they are interested in. If Governance approves their request they will have immediate access, confident that the data is timely and correct.

If your company already has multiple data teams which sometimes collaborate or share a data source, congratulations, implementing Data Mesh may be a case of building on and formalizing these existing structures.

Data Mesh Core Pillars

center-big

Data mesh architecture is based on four core pillars:

1 Domain Ownership

Different parts of the system can and should be divided into domains, the simplest way of achieving this is to divide data ownership along lines of business. For example, a bank could be divided into Investments, Retail, Credit Cards, Insurance. Data Products being owned by different Data Producer teams within each LOB, who guarantee data correctness.

2 Data as a product

When data is published, it needs to be designed as a fully self-contained product. This means that the data is cleaned, syntax and semantics are standardized, and quality of provided data is ensured so it’s easily usable outside of its original domain.

3 Self-serve data platform

In the self-serve data platform, data should be easily discoverable in the central data catalog and requesting access is a standardized process. It should support easy publishing to the platform by data producers, with strong controls on security and access management.

4 Federated computational governance

Governance must ensure consistency and alignment with the organization's overall strategy. It’s typically organized from representatives of both Producers and Consumers, who must agree on global policies such as interoperability, documentation, security, privacy and compliance.

Implementing a Data Mesh Architecture

center-big

Figure 1, Datamesh-architecture.com

Please refer to our post Creating and Managing a Data Mesh in AWS with Lake Formation where we discuss best practice for how a Data Mesh can be constructed in AWS.

Conclusion

Should you adopt Data Mesh in your organisation? We would argue that the pillars and actors in Data Mesh can be successfully adopted by organisations large and small if you’re looking for a framework to think about how to manage your data in a federated way.

Data Mesh is clearly designed for large organisations who have challenges in cataloguing and providing data products to consumers in a timely manner. By design you’re encouraged to implement Data Mesh along lines of business, making it feel like natural way or organising data initiatives instead of forcing large data programs of work across the business.

The downside currently (as you’ll see in our Data Mesh AWS post linked above) is that that tooling lags behind theory. Implementing Data Mesh can be challenging technically. However, vendors are recognizing the demand in the market and are responding rapidly. AWS launched Data Zone in preview this year and it is undoubtedly heavily influenced by the Data Mesh paradigm.

Overall, we feel that the Data Mesh architecture can fit well into large organisations. Traction continues to grow in the market with companies like JP Morgan acknowledging the value.

If you would like to know more about Data Mesh or are thinking about implementing one in your business, please reach out. Our experts can help you rapidly get a Data Mesh proof-of-value implemented.

Introduction to Data Mesh Architecture

Maja Perusic, Iain Hunter

Why Data Mesh

What is Data Mesh

Data Mesh Core Pillars

Implementing a Data Mesh Architecture

Conclusion

Share This Story, Choose Your Platform!

Share This Story

Drive your business forward!