Understanding Data Product: Viewing Data as an Asset in a Data Mesh Framework

A significant shift is underway in the data-driven landscape of the modern business world. Instead of seeing data as a by-product of business processes, forward-thinking organizations are now embracing Data Product Thinking, fundamentally reorienting their perspective to treat data as a product. Spurred by the revolutionary Data Mesh approach, this paradigm shift is dramatically reshaping how businesses create, manage, and utilize their data.


What is Data Product Thinking?

At its core, Data Product Thinking encapsulates the idea that data, like any other product, should be designed, created, and managed to meet the needs of its data consumers. From raw data harvested by data engineers to the sophisticated data products developed and deployed by data product managers and developers, every element in the data lifecycle serves a purpose and brings value to the business.


Moving from Monolithic to Distributed Data Management

This isn’t just about managing databases, data pipelines, or ensuring data quality. It’s about a profound shift in data management, moving away from monolithic data warehouses to a distributed, domain-oriented data mesh architecture. By creating reusable data assets and products that cater to specific business needs, organizations can turn their data into a strategic tool that drives business success and competitive advantage.


Principles of Data Mesh

Central to this new wave of data-focused strategies is the Data Mesh. This innovative approach seeks to redefine how businesses handle their data management, shifting from a centralized model to a more distributed, domain-focused one. Data Mesh is built around four fundamental principles:


1. Domain-Oriented Decentralized Data Ownership and Architecture

The first principle asserts that data ownership should reside with the specific domain teams that best understand and utilize the data. This approach ensures that the teams responsible for the data products are those who are most familiar with the data sources and their value.


2. Data as a Product

Each domain team is responsible for the full lifecycle of their data product, from inception to retirement. This brings about a shift in mindset where data isn’t just a by-product of operations but is considered a standalone product with its own intrinsic value.


3. Self-Serve Data Infrastructure as a Platform

This principle emphasizes that a data infrastructure should be designed to be self-serve for data consumers, data analysts, and data scientists. This ensures the accessibility of data and enables domain teams to manage their data products independently.


4. Federated Computational Governance

Data quality, security, and privacy governance are shared across the domain data teams within the federated data governance model, ensuring a high level of data quality and accountability in the Data Mesh.


Embracing Data Product Thinking

Central to the Data Mesh approach is the concept of Data Product Thinking. It’s a perspective that redefines the way data teams view, manage, and interact with their data assets. By treating data as a product, organizations can optimize their data management strategies, aligning their data with their business objectives more efficiently and effectively.


What Does it Mean to Treat Data as a Product?

Treating data as a product implies that data isn’t merely an output of operations, but a standalone, valuable asset that can create business value and competitive advantage. This shift in perspective means that data must have defined quality standards, a lifecycle, and a dedicated team for its development and maintenance — namely, the data product team. Each data product is designed to serve the needs of specific data consumers, ensuring that the data is not just available but valuable, usable, and fit for purpose.


Data Products vs. Data as a Product

The terms data product and data as a product may sound similar, but there’s a critical distinction. A data product is often a well-defined output that serves specific use cases, like a report, a dashboard, or a dataset used to train machine learning algorithms. On the other hand, data as a product is a broader concept that encapsulates the entire journey of data — from raw data to a refined, valuable asset.


Why Is Data as a Product Essential in Today’s Data-Driven Landscape?

The concept of data as a product aligns perfectly with today’s data-driven landscape. As businesses become more reliant on data for their decision-making processes, treating data as a valuable asset rather than just a by-product of operations can lead to more meaningful insights and better business decisions.


Traits of Successful Data Products

Successful data products exhibit several key traits:


  • Discoverable: They provide information about their existence, purpose, owner, and key metrics. Discoverability allows data users to confidently search, find, and use the data they need.
  • Understandable: Data users must be able to understand the data product, including its semantics and syntax. Understanding how the data is presented, serialized, and accessed is crucial.
  • Trustworthy: Data users need to confidently know that the data product is truthful. Aspects like timeliness, completeness, data lineage, and operational qualities contribute to the trustworthiness of the data.
  • Addressable: Successful data products provide a unique and permanent address that users can access either programmatically or manually.
  • Interoperable and Composable: Effective data products standardize elements like field types, identifiers, and metadata fields to facilitate interoperability and composability.
  • Natively Accessible: Data products should be natively accessible to various data user personas, allowing data analysts, data scientists, or developers to access and use them with their preferred tools and methods.
  • Valuable on Its Own: A successful data product is valuable as a standalone product, contributing to business growth and customer satisfaction.


Evolving Roles: Data Product Managers/Owners

The role of Data Product Managers (DPMs) or Data Product Owners has gained significance in the new data-oriented business environment. These individuals are key figures in developing, managing, and improving data products, playing a crucial part in the interface between domain experts, data scientists, data engineers, and business analysts.


As an integral part of the domain team, DPMs work closely with domain and data experts to transform business needs into data requirements and ensure that these requirements are met. Their goal is to provide data products that are not only compliant with FAIR principles (Findability, Accessibility, Interoperability, and Reusability) but also bring measurable value to the business.


The Lifecycle of Data Products in a Data Mesh Environment

The lifecycle of data products in a Data Mesh environment kickstarts with their formation, where raw data undergoes transformation to valuable assets. This process forms the foundation for data-driven decision-making and strategic initiatives.


Creation of Data Products: From Raw Data to Valuable Assets

The creation of data products pivots on a series of steps, including data collection, preprocessing, and cleaning. During this phase, a data contract is developed, outlining the data usage and handling guidelines. Once created, these assets are added to an enterprise product catalog, enhancing the discoverability of the newly available data.


Developing Data Products: Data Pipelines and Dataset Instances

With the transformation of raw data into valuable assets complete, the focus shifts to the development of data products. This entails crafting data pipelines — sequential data processing steps — and generating dataset instances. Each pipeline is custom-built to fulfill particular business objectives.


Data as a Product Examples

To illustrate how a data product looks within a Data Mesh, consider a “Customer Purchase History” dataset from a hypothetical retail company. This data product encompasses valuable information about customer transactions and is a key asset for teams like marketing and sales.


Customer Purchase History - Data Product Catalogue Entry

  • Data Product Name: Customer Purchase History
  • Data Product ID: DPH123
  • Data Owner: Marketing Domain
  • Data Product Manager: Jane Doe (Contact: jane.doe@company.com)
  • SLA: Data refreshed daily at 12:00 AM UTC; 99.5% availability
  • Data Confidentiality: Contains Personally Identifiable Information (PII), handled according to GDPR and company’s privacy policies
  • Data Quality Checks: Completeness, validity, accuracy, consistency, and uniformity checks with each ingestion
  • Description: Historical data of all customer transactions across retail outlets and online platforms. Includes transaction data, payment method, basket size, timestamp, store location, and product details.
  • Technical Information:
    • Data Format: Parquet
    • Data Size: ~500 GB updated daily
    • API Access: Yes
    • Access Endpoint: https://api.company.com/data/dph123
    • Data Dictionary: Attached document

  • Usage: Connect via provided API endpoint or download directly. Filter data by various attributes such as date range, store location, product category.
  • Related Documentation: Link to API Documentation, Data Dictionary, Usage Guidelines, GDPR compliance details
  • Version: v2.0.3


Real-Time Inventory Status - Data Product Catalogue Entry

  • Title: Real-Time Inventory Status
  • Description: Provides near real-time updates about product availability in stores or warehouses. Used for tracking product availability and making data-driven decisions.
  • Domain: Supply Chain and Inventory Management
  • Domain Team: Warehouse Operations Team
  • Data Product Manager: Jane Doe
  • Data Steward: John Smith
  • Data Source(s): Warehouse Management Systems, In-store Point of Sales Systems
  • Technical Information:
    • Data Contract: Agreed format of inventory status messages including Product ID, Store ID, Current Quantity, Last Updated Timestamp
    • Data Platform: Apache Kafka with connectors for sourcing data, and Kafka Streams or KSQL for processing it
    • Data Frequency: Near real-time, updates with every inventory change event
    • Data Quality Metrics: Freshness, accuracy metrics

  • Usage: Consumed by multiple teams for monitoring availability, planning campaigns, and inventory management. Links to business and technical documentation provided.
  • Access and Security: Requires appropriate authentication and authorization. Adheres to GDPR regulations and privacy laws.
  • Data Discoverability: Discoverable in the central self-serve data platform's catalog
  • Version: v1.0.0


The Future of Data Management with Data as a Product

As we traverse further into the era of digital transformation, the concept of Data as a Product emerges as a powerful paradigm. It represents a significant shift from the traditional, monolithic data management approach, granting organizations the ability to scale and adapt quickly in the data-centric business environment. By embodying a decentralized, product-oriented model, the data mesh architecture unlocks the potential to treat data as valuable, standalone products that serve specific business needs, are owned by domain teams, and are governed through self-serve data platforms.


With the application of data product thinking, your organization can embrace a more agile, robust, and efficient way of leveraging data. It paves the way for a future where every stakeholder can discover, understand, trust, and use data autonomously to drive actionable insights and impactful results.


Transitioning towards a Data as a Product mindset may require rethinking your current data strategies and structures. If you’re considering this shift, our data engineering experts are ready to guide your journey. With deep experience in data product management and data mesh implementation, we can help you craft and execute a strategy tailored to your organization’s unique requirements.


Contact our data engineering experts to explore more about how your organization can benefit from this approach. The future of data management is here, and it’s more promising than ever.


FAQs

What is meant by "Data as a Product"?


"Data as a Product" is a concept where data is treated as a standalone, valuable asset rather than just an output of business operations. It requires the data to be self-describing, discoverable, secure, and trustworthy.


How does the Data as a Product approach benefit businesses?


This approach benefits businesses by making data more manageable, useful, and efficient. It promotes interoperability, domain orientation, self-serve access, and decentralized governance, making it easier for different teams to utilize the data.


What is the role of a Data Product Manager in a Data Mesh architecture?


In a Data Mesh architecture, a Data Product Manager acts as a bridge between data and domain experts, guiding the development and usage of data products. They are part of the domain team and have an intimate understanding of the product and its associated data.