Link Search Menu Expand Document
IntroducingADML-20220128

Analytics Design Markup Language (ADML)—a universal framework to build compelling data products.

By Nadav Rayman and Dr James Pearce, February 2022.

Abstract

We introduce Analytics Design Markup Language (ADML), a methodology which encompasses the set of processes required to develop and deliver a successful data product and an accompanying markup language based on JSON. ADML is used to capture the context and codify the outcomes, assumptions, data requirements, resources, hypotheses and learning that comprise the data product, as well as the interrelationships between these components. It relates the data product to its intended and realised business benefit value. The markup language provides a flexible mechanism for defining a schema for the lifecycle of a data product built to address a specific issue and it ensures the resulting data product aligns with an organisation’s strategic imperatives or operational obligations. It is applicable for a spectrum of data product types spanning analytical dashboards to machine learning models. ADML provides a common language for the component processes required to build a data product.

Introduction

Decision support systems have been a mainstream technique of exploiting data for business management for over two decades. Over the years these systems have been represented in numerous technology trends: reporting, corporate performance management, business intelligence, data warehousing, analytics, big data, artificial intelligence (AI) to name a few.

The last five years have brought an unprecedented adoption of these systems, as technology has become more affordable and easier to use.

Despite this long history and wide adoption, organisations struggle to co-ordinate activities to define, design and enhance the artifacts of these decision support systems that we will refer to as “data products”.

What is a data product?

We define a data product as a configuration of data that can be consumed to solve a particular problem. A data product embeds business requirements, design thinking and intended outcomes. It is important to note that, in many cases, people start analysis with a data product in mind. We assert that, regardless of whether this is the case, ADML can be applied to refine the definition of the product being developed. In the instances where the analysis is less focused, ADML encourages those who use it to align their designs and artifacts with a defined outcome to be delivered via the data product.

Our definition of a data product includes different types of products that might be developed. Despite the emergence of specific functions in organisations that relate to only one type of data product (such as data science teams, data analytics teams and visualisation teams), all of these can — and should — be unified by the desired outcomes and associated hypotheses of the business problem being addressed.

Data product versus outcome focus

Irrespective of whether the development of a data products starts with the data product itself or with the end outcome in mind, ADML is positioned to ensure the data product is aligned with the outcome. With ADML, data products are developed with the outcome and context of the problem firmly in mind. Equally, by starting from the outcome desired, data products can be developed to ensure the outcome is achieved.

Challenges in building data products

Reflecting on our experiences across a wide variety of industries and sizes of organisation, we observe the emergence of the following consistent themes concerning challenges in developing data products are well adopted and contribute business value.

Lack of stakeholder consultation

A key dysfunctional behaviour that we observe often in analytics implementations is the lack of representation of stakeholders in the design process. This can be driven by both:

  • Practitioners who “think they know best” and do not have the patience to consult with a broad range of stakeholders; and
  • Executives and managers who are “too busy to get into the detail” and delegate decisions to other staff without the context they need or without decision-making authority.

At its worst, this results in data products that are not relevant to a business, data products that people refuse to use due to lack of buy-in to the design process or projects that fail to deliver a data product due to a lack of direction.

Lack of lineage and context

Organisations invest large amounts of time and money building a suite of data products that suffer from a lack of adoption, and they continue to produce similar assets over and over again, resulting in a duplication of effort. Even when there is a catalogue in place that allows people to discover these assets, people often complain of a lack of context as follows.

  • They do not understand what business rules have been applied;
  • They do not understand the data sources and lineage, or how the data has been transformed from the source to the asset;
  • The quality limitations of the data are unclear;
  • They do not understand what design assumptions have been made and by whom; and
  • Acceptable uses of the asset are not provided.

A ‘data first’ approach

Often organisations develop data products based on the data that is currently captured by an upstream system, such as a data warehouse or system of record. This approach gives rise to the following issues:

  • A lack of focus or agreement on the goals of the asset, leading to an asset being developed that is of low utility;
  • A vague-at-best data product in mind, resulting in a poorly designed asset that fails to achieve its desired outcomes; and
  • An absence of design for end-use consumption, leading to low adoption.

Lack of data validation

A presumption of most people is that data is a resource readily available and in a form that is appropriate to answer any business question. The reality is that this is rarely the case. In fact, data has often never been validated for use besides retrieval for an operator on a screen. This is possibly the biggest area of misunderstanding by executive stakeholders wanting to extract better analytical insight or even just produce consistent reports.

Typical areas of friction in using data for analytics include:

  • Identification of the specific source of data (i.e. the specific system, table or API endpoint);
  • Enabling source system access (for political or technical reasons);
  • Process compliance issues in capturing data consistently;
  • Data integrity issues between systems; and
  • Lack of business rule consensus to address uncertainty in raw data.

Lack of structure to capture learnings

Resulting as a by-product of a competing list of priorities, work across multiple teams and a workforce with high levels of turnover, capturing learnings from the use of data products is often deferred or omitted entirely. In doing so, organisations miss out storing the insight of what did and didn’t work in their corporate memory bank, and are unable to use past experiences to guide future decisions.

Without retaining this knowledge the feedback loop is impaired and has a reduced ability to act on:

  • Underlying data readiness issues that need to be resolved before a data product can be developed;
  • Improvements that should be made to a data product;
  • Whether a data product no longer delivers sufficient benefits and should be retired; and
  • How one data product might benefit another.

Lack of requirements and design diligence

In the traditional product development process, design often comes from a grab bag of user requirements gathered by a business analyst from existing users. These users have a bias towards “fixing” the current state and can lack the context required to prioritise appropriately. This can lead to a myopic focus on features over usability and delivering future business value in developing the data product. Some good methodologies have evolved over time, such as CRISP-DM or Microsoft’s Team Data Science Process (TDSP), but they do not apply to all types of data products. Other techniques, such as design thinking, offer some useful inspiration, but are not specifically geared to the needs of data product design to:

  • Define the business imperative;
  • Define the target audience;
  • Define the information model;
  • Validate the data resources available against those needed; and
  • Validate the utility of the data product.

ADML framework

ADML is a framework to manage the design and specification of data analytics assets, which we term “data products”.

It is underpinned by:

  • A design methodology; and
  • A technical schema to capture outputs of the design process.

By design methodology, we mean the processes and procedures to follow.

By technical schema, we mean the standardised data structure to capture the outputs of the design methodology. The markup language adheres to this technical schema.

ADML objectives

The ADML design methodology aims to go further than provide a set of general principles that one might find in a white paper, and instead furnishes a practical set of processes that can be used to predictably create compelling data products. This approach persists the design reasons and context with the data products that are produced, allowing the learnings generated to flow through the asset development cycle as refinements. This representation improves corporate memory and leads to reuse of existing assets and insights, reduction of duplicated effort and increased benefit realisation from assets.

ADML is specifically designed to provide a framework that is agnostic to technology choices, operating models and project management methodologies to manage the specification of data analytics assets. This is to ensure that it can be easily implemented in organisations without conflicting with existing standards.

ADML is intended to be easy to follow; we provide a way to interpret needs, issues, outcomes and available resources into a set of clear features for a data product. It will aid stakeholders’ understanding of the complex domain of analytics and data products. It provides a common set of terminology to facilitate discussions and agreement on matters concerning scope.

Why a markup language?

We developed ADML as a markup language because we wanted to provide a taxonomy that could be implemented into any analytics management ecosystem, and that is descriptive, interoperable and portable. Automating the validation of a markup language is straightforward, it is easily rendered in tools such as text editors and browsers, and is portable.

What ADML is not

ADML is not a data modelling methodology. It is not a data governance methodology. It does not prescribe data management techniques.

ADML benefits

There are many benefits of using a structured approach such as ADML to improve the development of data products and address problems using data assets. They include:

  • Developing a reusable library of assets to improve the speed and reduce the cost of development;
  • Capturing the expected contribution of a data product to business value to help with managing resources and giving priority to the right data investments;
  • Linking data lineage to context (i.e., problems, imperatives and business value contribution) so that change processes can be managed in line with their value;
  • Ensuring data products are developed that focus on solving business issues;
  • Embedding the value of data to help organisations realise the value of data assets;
  • Codifying organisational learning and facilitating continuous improvement;
  • Combining business requirements with design thinking to ensure data products deliver value and are aligned with the intended outcome and organisational needs or issues, thereby mitigating unintended consequences of data product use;
  • Linking with complementary methodologies such as CRISP-DM while avoiding lock-in and constraints; and
  • Implementable at any stage during a data asset’s lifecycle by any type of user archetype, reducing risk for new and in-flight projects alike.

Limitations

As with any framework, the adoption of ADML is not a panacea. It will require additional governance, resources, commitment to the rituals and an agreed location in which to store the ADML schema and associated artifacts.

Prerequisites

An organisation will be best suited to benefit from adopting ADML when they satisfy one or more of the conditions listed below.

  • There is organisational alignment in the development of data products using cross-functional teams;
  • It is investing in, or considering investing in, data assets;
  • There is a desire to solve strategic, tactical and operational issues using data; and
  • It has an appetite to increase its analytics maturity, improve return on data investment or increase the number of data-informed decisions made.

Methodology overview

The analytics design process underpinning ADML we describe as the “Analytics Heliosphere”. The Analytics Heliosphere is a dynamic interconnected system that responds to changing business conditions. As data products at its core mature and generate more “charge”, it in turn grows in its magnetic force and extends out further into the organisation.

The Analytics Heliosphere is broken down into the following core components:

  • Four “waves”: themes that are emitted by an organisation and can be harnessed into the design of a data product;
  • Four design “rituals” that solicit a set of attributes to describe the needs of a data product; and
  • The data product that is produced and continuously enhanced as a result of the design process.
The Analytics Heliosphere
img

Data product overview

At the core of the Analytics Heliosphere is the data product. Specifying and developing it is the central purpose of all design rituals. We define a data product as an analytics asset that is consumed by an end user or as a component of an application that is in turn used by an end user.

In addition to tracking the design ritual outputs against a data product, the following core metadata is recommended to be captured against a data product:

Data product metadataExample metadata values
Product categoryDashboard
Report
Alert
Recommendation
What-if tool
Predictive model
Optimisation
Publishing mechanismAPI
Application embedded
Visualisation portal
Desktop tool
Dataset
Approved purposeCompliance
Operational
Strategic planning
Exploratory
Reliability ratingGold
Silver
Bronze
Sensitivity ratingPublic
General
Confidential
Highly Confidential

This will provide for a richly described catalogue of data products across multiple technology platforms and teams.

Detailed methodology

To ensure consistency of the design process, we propose a set of “rituals”: a sequence of activities with an array of structured questions to inform the design lifecycle.

It can be very tempting to be skip steps to placate stakeholders or to “expedite” an outcome. We urge you to trust the process and the benefits that each stage brings to:

  • Articulate implicit knowledge and assumptions;
  • Bring consensus between a wide variety of stakeholders on priorities and business rules;
  • Consolidate a view of what is needed and what is possible across different parts of the organisation;
  • Anticipate obstacles based on data limitations; and
  • Mitigate wasted time building assets that are not adopted.

Information design

The purpose of this ritual is to define the semantics of a business area, identifying the information needs to support monitoring of performance and descriptive analytics. The primary “wave” that this ritual is concerned with are the business needs that imply measurement.

The four stages (typically run in a facilitated workshop for each process area) guide workshop facilitators and participants through problem definition from different perspectives covering:

  1. Imperative definition;
  2. User information design;
  3. Information mapping; and
  4. Conceptual data model.

1. Imperative definition

The purpose of this stage is to elicit umbrella statements that summarise the business need for data analytics within each prioritised process area (functional areas or capabilities of importance). We assume that a business strategy planning process has already occurred to inform these priorities.

The statements should read as “imperatives”, such as maximise customer acquisition or optimise gross margin. They should be general enough to cover a wide set of information needs but not as general as a functional area such as “marketing” or “sales”. Ideally the imperatives will be cross-functional to best represent how information flows between departments and job functions to fulfil a business outcome.

Arriving at a definitive statement of imperative can take some time, as a group of workshop participants debate semantics, business priorities, relevance as a need of data analytics versus more general technology or process needs and the appropriate level of summarisation. Once these issues are teased out, though, you will find that you have a very constructive set of statements that can be used as overarching “analytic themes” which relate to multiple metrics and data products.

Typical drivers of an imperative include:

Imperative driverDescription
Strategic alignmentRequirement to support an organisation’s strategic goal. This may be either the establishment of a new capability or improving an existing capability.
Cross-functional integrationIntegrating multiple functional areas or teams to fulfil an end-to-end process.
Addressing an unsolved problemSolves a problem of lack of visibility to monitor a business activity.
Core business capabilityIs known to be integral to the successful operation of an organisation, usually associated with improving productivity of monitoring a business activity.

2. User information design

For each imperative, the user audience information needs are broken down for each of three archetypes:

User archetypeDescriptionExample roleInformation needsHistorical systems used
SponsorA stakeholder who is ultimately accountable for an outcomeChief Executive OfficerEvaluative information e.g., KPIs / scorecardsCorporate performance management / executive information systems
OptimiserAn agent who is responsible for identifying improvements to achieve a particular outcome.Manager
Analyst
Suggestive information
Driver metrics / dashboards
Business intelligence visualisation tools
Statistical packages
ImplementerAn agent who is responsible for taking action in the business process. Typically acts on instructions or recommendations and is not tasked with reflection nor analysis.Sales Representative
Field Technician
Instructive information
Alerts / reports
Operational reporting

The workshop session should elicit the “who” (which roles in the organisation) and the “what” (specifics of information needs). It is important to note that this stage is more concerned with the information elements required and not the detail of the design or format.

NOTE: It is recommended to avoid capturing simple “counts” and “aggregates” as these are implicit in the identification of activities and entities in the information mapping stage that follows.

Historically the audiences represented by the archetypes were serviced by distinct systems; this led to the disadvantage of disconnected data. Implicit in the suggested process is to inform the design of a shared “data hub” to allow a seamless flow of information between different decision making roles and navigation of data granularity.

Note that the workshop activity ideally includes representatives of each archetype to avoid too much bias towards the specific understanding of business needs for a particular group of users. It is common to meet resistance from stakeholders to commit time to a half-day workshop to provide input into requirements, but it is a key driver of success to ensure their participation in this critical stage of the design process and well worth the persistence!

3. Information mapping

Information mapping is a stage of the workshop which contextualises the flow of information, eliciting the nouns and verbs that describe a business process. Not to be confused with formal business process mapping, the purpose is to capture the semantics of a process that needs to be measured. This naturally follows on from user information design, which has identified outcome metrics (KPIs) and driver metrics and places them in the context of particular business activities that are being measured.

A distinction is made between categories of information elements to broadly delineate between:

  • The measurement of a business process; and
  • Descriptors which are categorical data elements (typically used to group or “slice” results).
Information element categoryInformation element typeDescriptionExample valueExample symbol type
MeasurementOutcomeAn evaluative measure of a desired business outcome, typically a KPI.Revenue per customer versus targetimg
MeasurementEffectiveness measureThe measurement of a business process’s efficiency or effectiveness, contributing to an outcome.Churn rateimg
MeasurementActivityAn event or set of events that occur as part of a business process.Sales pipelineimg
DescriptorEntityAn object that participates in or is created as a result of a business process.Opportunityimg
DescriptorFacetGrouping of aspects or attributes of an organisational entity or activity.Sales stageimg
DescriptorResourceA person or system that records data.CRM systemimg

Information mapping is best facilitated by a visual map which can be easily read left-to-right. This sometimes means simplifying the connections between elements for simplicity of reading the map. To assist with review after a workshop, it is recommended to use standard symbols to represent each element type.

Example Information Map
img

 

4. Conceptual data model

The purpose of this stage is to consolidate the inventory of information elements gathered in a series of workshops. A conceptual data model represents a high-level view of how data should be organised for analytics. The primary objective of the conceptual data model is conform to a common language to describe things in a business.

Note: Earlier we emphasised that ADML is not a data modelling methodology. Although this section discusses the role of conceptual data models, it does not insist on a particular data modelling or architecture approach.

It should represent the following concepts, and the interrelationship between them

  • Analytic themes;
  • Entities (generally relating to “dimensions” in data models); and
  • Activities (generally relating to “facts” in data models).
Step 1. Initial mapping to conceptual elements

An initial list should be compiled of all the information elements, categorised by information element type, which are then “mapped” to a conceptual item. This is an efficient way to consolidate the raw information element list by:

  • Reducing duplication of similar concepts that were named differently by different stakeholders;
  • Grouping metrics and other attributes with the relevant activity; and
  • Provide lineage of which aspects of the conceptual data model are shared by different analytic themes.

Illustrative example

Analytic themeItem nameInformation element typeTarget conceptual item
Maximise customer portfolio valueRevenue per customer vs targetOutcomePipeline
Maximise customer portfolio valueChurn rateEffectiveness measurePipeline
Maximise customer portfolio valueSales pipelineActivityPipeline
Maximise customer portfolio valueOpportunityEntityOpportunity
Maximise customer portfolio valueSales stageFacetPipeline
Maximise customer portfolio valueCRM systemResource
Maximise customer portfolio valueSales representativeEntityStaff
Improve customer satisfactionDays to issue resolution vs targetActivityService requests
Improve customer satisfactionCall wait timeActivityCommunications
Improve customer satisfactionOperatorEntityStaff
Improve customer satisfactionInbound callActivityCommunications
Step 2. Cross-tabulate entities and activities

This exercise captures glossary definitions of each conceptual element and identifies the nature of the relationship between entities and activities. Rather than simply marking an “x” at the intersection of the matrix, we recommend succinctly describing how the entity and activity are related in words.

A simplified illustrative example

ActivityDescriptionCampaign activitySalesInventoryOrdersWork scheduleDelivery Schedule
Entity A sequence of all inbound and outbound communications and outcomesA sequence of all sales transactionsA timeseries snapshot of stock levels and movements between locationsA sequence of requests for stock by StoresAn allocation of staff to meet forecast capacity requirements by storeA daily assignment of vehicles to fulfil orders
CustomerAn individual or organisation who has previously made an enquiry or purchased a product.Approached by
Responded to offer from
Converted to sale by
Associated withN/AN/AN/AN/A
StoreA location at which a sale is processedN/AAssociated withChecks levels ofReplenishes stock levels withGenerates a weeklyN/A
ProductAn item that is able to be purchased by a customerN/AAssociated withQuantities represented inIncluded inN/AN/A
CampaignA promotional activity to a customerGenerates a series of communications which may result in a contact with a person or a response/conversionMay be associated withN/AN/AN/AN/A
StaffAn individual who is employed directly by the companyN/APaid commission byN/AN/ARostered to work by the weekly scheduleN/A
VehicleA member of the company fleet which redistributes stock from warehouses to storesN/AN/AN/ADeliversN/AAdheres to
WarehouseA location where products are stored in reserveN/AN/AN/AProvides stock to fulfilN/AN/A

Hypothesis design

The purpose of this ritual is to define the levers and constraints of a particular business issue, identifying the needs to support data-driven intervention, often associated with the practice of statistical analysis, machine learning and prescriptive analytics.

The primary “wave” that this ritual is concerned with are the business issues that the organisation wishes to influence. As opposed to information design, which is concerned with driving visibility and accountability for the “what”, hypothesis design is focused on the “so what” using scientific method to build an understanding of causality which can drive improvement in performance. It identifies the actions and interventions that an organisation can take and relates them to the anticipated outcomes that result from making them. In this way, it captures a set of assumptions that can be tested with data.

The two stages (typically run in a facilitated workshop per process area) guide through the hypothesis refinement covering:

  1. Issue definition; and
  2. Driver hypothesis mapping.

1. Issue definition

A list of problem statements or “issues” are compiled against each “imperative” identified in the information design stage.

For each issue, three questions are answered:

  1. What is the target state outcome?
  2. How would success be measured?
  3. What interventions are proposed to be made?

2. Driver hypothesis mapping

Hypothesis mapping contextualises the relationship between potential intervention decisions, drivers, constraints and the outcomes that are targeted for change. A defining characteristic of a hypothesis is that it can be tested using data and analysis, given the required data can be made available. Even before all the required data is obtained, hypotheses can use assumptions about relationships and their strength to investigate scenarios.

This generates a list of follow on questions to be answered about:

  • What are the business impediments to influence change?
  • Is all the data required actually being collected to investigate the hypothesis?
  • What are the discernible dynamics based on historical data? What controlled experiments may be required to determine the elasticity of particular drivers?
Information element typeDescriptionExample valueExample symbol type
IssueAn existing or potential problem affecting process imperatives and the ability to achieve goals.Not retaining customers beyond cooling off periodimg
Outcome changeThe desired change in an outcome that will result from the issue being successfully addressed.Increase retention rate by 10%img
DriverThe variables that are related to the outcome change.Competitor priceimg
Intervention
The things we can change to bring about an outcome change.Price match offerimg

It is important—and often confronting to participants—to be specific in attaching a quantity to the outcome change. This grounds the proposed data product in the reality that a successful result requires effecting a change in the outcome, and concept has often never been considered prior to the workshop.

 

Example Hypothesis Map
img

 

Data readiness

The purpose of this design ritual is to validate whether data exists in a form that will support the analytics objectives. The primary “wave” that this ritual is concerned with are the resources that capture or record data.

Before a data product can be built, the feasibility of fulfilling stakeholder expectations needs to be validated. Based on the information design and hypothesis design, sources of data need to be identified and tested for its completeness and utility.

It is critical to the success of any analytics initiative that stakeholders take an active interest in the limitations of an organisation’s data resources to support their analytics needs, and understand what compromises may be required to produce a short-term output.

Data sourcing

The purpose of this activity is to capture annotations of raw data elements with the context of their meaning, and a decision of where they map to in the target conceptual model.

Whilst seemingly straightforward, this is typically an extremely challenging exercise! To complete this effectively requires sound system subject matter expertise, particularly of how business processes are defined to populate a system. Often this knowledge is spread across a group of people whose different perspectives need to be reconciled and validated.

The output is best captured as a “catalogue” of data elements, with additional metadata columns that describe what the source data represents, any known issues and what target concept it maps to.

Data quality review

Discovering the quality of data is an inevitability of any analytics endeavour.

The intention of highlighting this as a key stage is to focus on communicating the limitations of data in its raw form to stakeholders. This may drive some operational initiatives to address the issues, or at very least an agreement to modify the scope to a more modest set of objectives that are better supported by the available data. The principle that we emphasise here is to validate the data early in a project lifecycle. This “feasibility” stage should do more than simple profiling, it should test whether source data can be “reshaped” to the constructs outlined in the information and hypothesis design phases.

The purpose here is not to be too specific about the technique to use, but rather suggest a conformed set of evaluation criteria that is easily relatable to different data usage scenarios.

Data readiness categoryDescriptionExample measurements
AvailabilityHow easy is it to acquire data in a timely and repeatable manner?Data extraction window
ClarityIs the meaning of the data well understood?Data definition supplied
% Business rule adherence
CoverageTo what extent is data being captured as part of a business process?% Null / Blank
ConsistencyIs there too much variation in how data is being captured?Value Distribution
% Abnormalities
AuthorityHas the best source of data been made available?Anecdotal Stakeholder Feedback
Attribute duplication across data sets
Cross-relatableCan data be related to other entities and processes reliably?% Orphaned rows in data set comparisons

For each readiness category a rating should be captured with a scenario specific annotation of any issues or limitations observed.

Data readiness rating valueData readiness rating description
1Inadequate
2Limited
3Fair
4Good
5Exceptional

 

Learning

The purpose of this design ritual is to evaluate the performance of a data product. The primary “wave” that this ritual is concerned with are the outcomes that can be related to a data product.

A key reason for this design ritual is to ensure corporate memory of why the data product was developed in the first place, and what challenges were faced and overcome in the past. This knowledge can be used to refine assumptions, update expected benefits and inform future decisions.

It also helps to understand whether the data product continues to provide value in the way that was intended and hence understand when the data product should be adjusted, when it should be redeveloped and when it should be retired.

Learning session

The purpose of a learning session is to conduct a review of a data product to understand its benefits and areas for improvement. The workshop is similar to a post-implementation review, with an important distinction: the aim of the learning session is to collect information in a structured way that can be drawn on easily in future data product iterations.

Each session generates a snapshot in time, ideally stored in a way that accumulates a historical record of all review sessions. The frequency of the review sessions depends on the stability of the data product and the frequency of decisions made with it; an early stage data product will likely require more frequent reviews to ensure its continuous improvement.

Learning categoryLearning category descriptionAdditional questions to prompt responses
PrecedentHave we done this before?What was the context?
What did we do?
HistoryWhat happened last time?What interventions were taken?
What were the outcomes?
Did the results validate the assumptions?
Differences to precedentHow is this different to previous times?How were our assumptions different to historical assumptions?
Can we translate these differences into quantitative associations?
ConclusionWhat should we do differently?How are our future interventions informed by the combination of what we have done before, what happened and how this time is different from other times?

Benefits tracking

The original benefits of a data product are often forgotten with time. The issue with this is that a data product may be accruing benefits “invisibly” that will only be evident (or worse, not evident!) when a data product is retired.

As an ongoing activity associated with a data product, the capture of benefits can mitigate this issue. This may be an automated measurement, or a point in time manual review, capturing the following:

Data product benefit attributesExample
Benefit nameIncremental profit
Benefit amount$230,000
Period2021-06-01 to 2021-12-31

Ideally over time this will depict a time series view of benefits accrued by data product. Data executives will appreciate the value of this to help justify operational expenditure for the maintenance of their portfolio of data assets.

Implementation of ADML schema

To ensure that the details collected through the design process we’ve outlined is easily consolidated in a central hub, we developed the ADML schema. The ADML schema is a conformed data structure defined in a JSON format.

Components

To implement the ADML schema you will need the following:

  • A data capture system that collects the data elements in the JSON schema;
  • A process to output the data into the JSON format files; and
  • A system to accumulate the JSON files and display them in a catalogue.

Schema definition

ADML is defined as a JSON schema that can be implemented in a range of platforms.

The core concepts in the JSON schema are:

  • Objects;
  • Attributes,
  • Relationships; and
  • Bridges.

For a complete technical guide to the JSON schema refer to https://admlguide.github.io/.

Objects

An object is a collection of attributes that represents a concept in the design process (e.g., hypothesis). Custom objects can be created to extend the logical model.

NameDescription
Process areaA functional theme, ideally cross-functional.
ImperativeAn overarching theme for a set of related improvement objectives.
PersonaThe type of audience for a data product.
MeasurementMetrics which measure a business process.
ActivityAn event that occurs as part of a business process.
EntityAn object that participates or is created as part of a business process.
HypothesisA theory of how issues may be able to be affected.
IssueAn existing or potential problem affecting process imperatives and the ability to achieve goals.
Outcome changeThe desired change in an outcome that will result from the issue being successfully addressed.
DriverThe variables that are related to the outcome change.
InterventionThe things we can change to bring about an outcome change.
ResourceA person or technology which records data.
Data readinessAn assessment of the fit for purpose of data resources to support an analytics need.
Data readiness ratingA reference of rating levels (scaled from 1 to 5)
Data productA configuration of data that can be consumed to solve a particular problem.
CatalogueA configurable set of categorical labels for a data product
LearningA time series review of a data product’s performance.
Data product benefitA time series recording of benefits associated with a data product.

Attributes

Attributes are the fields within an object or bridge that correspond to data values within the data file. For example, a Hypothesis object may have attributes such as HypothesisID and HypothesisName. Custom attributes can be added to any object.

For a complete list of attributes refer to https://admlguide.github.io/.

Relationships

Relationships are a logical definition of how objects are connected, such as a Imperative:ImperativeID joins to Process Area:ImperativeID.

Relationships are always assumed to be one-to-many, and generally use a foreign key relationship.

For a complete list of relationships refer to https://admlguide.github.io/.

Bridges

Bridges provide a physical schema object to resolve many-to-many relationships.

For a complete list of relationships refer to https://admlguide.github.io/.

Entity relationship diagram

This entity relationship diagram provides a summary of all of the objects in ADML and how they are related.

ADML entity relationship diagram
img

ADML example

Shown below for illustrative purposes is an excerpt from an ADML JSON file.

class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json" style="break-inside: unset;">