The vast amounts of data being collected over the years have given rise to a number of problems, particularly the accessibility to such data. The normal approach of establishing a centralized location to serve as a repository for such data has proved to be untenable as technology evolves and the number of people seeking to access the data in such a location. The outbreak of the COVID-19 pandemic only served to further highlight the problem, necessitating an alternative for more democratic data access.
What data mesh is about
Given these challenges, Zhamak Dehghani, a director of technology for IT consultancy firm ThoughtWorks, introduced the idea of a data mesh, a decentralized data architecture that organizes data in domains, which are collections of data organized around a particular business purpose, such as marketing, procurement, or a particular customer segment or region. By organizing data into domains, more ownership is provided to the producers of a given dataset, making these producers responsible for its quality, accessibility, and security.
Another benefit of the data mesh is that it recognizes and respects the differences between operational data and analytical data. Operational data sits in databases behind business capabilities served with microservices, has a transactional nature, keeps the current state, and serves the needs of the applications running the business. Analytical data is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports. It allows users to work on these different data types accordingly without needing to separate the organization, teams, and people who work on them.
A data mesh can help close the insights gap and grease the wheels of innovation, allowing companies to better predict the direction of change and proactively respond to it.
Building a data mesh
The data mesh does not need to be constructed in one fell swoop. In fact, this can be set up one step at a time. For example, a company can start by providing data from an operational data warehouse through a data mesh to feed into operational reporting of its production performance. Then, the data product team will work on improving data quality and standardize data into a harmonized format. Business users are thus able to explore and develop new applications more quickly at the proof-of-concept stage and then scale them to full production.
Centralized standards for data quality, data architecture, and data sovereignty must also be established and adopted by all data product owners. Some companies that already have centralized standards in place can adjust them to reflect the needs of a decentralized data organization. Others start by defining standards for a data domain, testing them for practical applicability, and improving them as needed. They then roll the standards out in waves to the rest of the organization, alongside the conduct of training and capability-building sessions to ensure the governance is consistently applied across the organization.
In most cases, building a data mesh is a continuum. Leaders must communicate with the organization on what the company is trying to achieve and what the road map looks like in terms of timing and capability building. Bringing a data mesh from concept to reality requires managing it as a business transformation, not a technological one. As such, companies can achieve this realization by doing a couple of important practices:
Put the business in the lead - Stewardship of the data mesh implementation must come from the business, supported by executive sponsors and backed by a formal change-management team. There also needs to be a committed data product owner within the business who is willing to take on the challenge of “selling” data internally to other business users and application teams. In addition, there should be a central data-infrastructure team that can implement “data governance as code” in tools that are not yet fully mature.
Let ROI guide data provisioning – Between a centralized or decentralized approach toward data management, both methods can be effective. Companies with a modern IT landscape and well-established local data repositories might get more value from exposing data through virtualized links (while still registering it in a central data marketplace or catalog). By contrast, those in the middle of an enterprise resource planning (ERP) transformation or other large IT change might find it better to move toward a central data platform.
Regular dialog helps to sustain long-term efforts, keeping the transition alive and reinforcing its benefits.