IntroducingADML-20220128

Analytics Design Markup Language (ADML)—a universal framework to build compelling data products.

By Nadav Rayman and Dr James Pearce, February 2022.

Abstract

We introduce Analytics Design Markup Language (ADML), a methodology which encompasses the set of processes required to develop and deliver a successful data product and an accompanying markup language based on JSON. ADML is used to capture the context and codify the outcomes, assumptions, data requirements, resources, hypotheses and learning that comprise the data product, as well as the interrelationships between these components. It relates the data product to its intended and realised business benefit value. The markup language provides a flexible mechanism for defining a schema for the lifecycle of a data product built to address a specific issue and it ensures the resulting data product aligns with an organisation’s strategic imperatives or operational obligations. It is applicable for a spectrum of data product types spanning analytical dashboards to machine learning models. ADML provides a common language for the component processes required to build a data product.

Introduction

Decision support systems have been a mainstream technique of exploiting data for business management for over two decades. Over the years these systems have been represented in numerous technology trends: reporting, corporate performance management, business intelligence, data warehousing, analytics, big data, artificial intelligence (AI) to name a few.

The last five years have brought an unprecedented adoption of these systems, as technology has become more affordable and easier to use.

Despite this long history and wide adoption, organisations struggle to co-ordinate activities to define, design and enhance the artifacts of these decision support systems that we will refer to as “data products”.

What is a data product?

We define a data product as a configuration of data that can be consumed to solve a particular problem. A data product embeds business requirements, design thinking and intended outcomes. It is important to note that, in many cases, people start analysis with a data product in mind. We assert that, regardless of whether this is the case, ADML can be applied to refine the definition of the product being developed. In the instances where the analysis is less focused, ADML encourages those who use it to align their designs and artifacts with a defined outcome to be delivered via the data product.

Our definition of a data product includes different types of products that might be developed. Despite the emergence of specific functions in organisations that relate to only one type of data product (such as data science teams, data analytics teams and visualisation teams), all of these can — and should — be unified by the desired outcomes and associated hypotheses of the business problem being addressed.

Data product versus outcome focus

Irrespective of whether the development of a data products starts with the data product itself or with the end outcome in mind, ADML is positioned to ensure the data product is aligned with the outcome. With ADML, data products are developed with the outcome and context of the problem firmly in mind. Equally, by starting from the outcome desired, data products can be developed to ensure the outcome is achieved.

Challenges in building data products

Reflecting on our experiences across a wide variety of industries and sizes of organisation, we observe the emergence of the following consistent themes concerning challenges in developing data products are well adopted and contribute business value.

Lack of stakeholder consultation

A key dysfunctional behaviour that we observe often in analytics implementations is the lack of representation of stakeholders in the design process. This can be driven by both:

Practitioners who “think they know best” and do not have the patience to consult with a broad range of stakeholders; and
Executives and managers who are “too busy to get into the detail” and delegate decisions to other staff without the context they need or without decision-making authority.

At its worst, this results in data products that are not relevant to a business, data products that people refuse to use due to lack of buy-in to the design process or projects that fail to deliver a data product due to a lack of direction.

Lack of lineage and context

Organisations invest large amounts of time and money building a suite of data products that suffer from a lack of adoption, and they continue to produce similar assets over and over again, resulting in a duplication of effort. Even when there is a catalogue in place that allows people to discover these assets, people often complain of a lack of context as follows.

They do not understand what business rules have been applied;
They do not understand the data sources and lineage, or how the data has been transformed from the source to the asset;
The quality limitations of the data are unclear;
They do not understand what design assumptions have been made and by whom; and
Acceptable uses of the asset are not provided.

A ‘data first’ approach

Often organisations develop data products based on the data that is currently captured by an upstream system, such as a data warehouse or system of record. This approach gives rise to the following issues:

A lack of focus or agreement on the goals of the asset, leading to an asset being developed that is of low utility;
A vague-at-best data product in mind, resulting in a poorly designed asset that fails to achieve its desired outcomes; and
An absence of design for end-use consumption, leading to low adoption.

Lack of data validation

A presumption of most people is that data is a resource readily available and in a form that is appropriate to answer any business question. The reality is that this is rarely the case. In fact, data has often never been validated for use besides retrieval for an operator on a screen. This is possibly the biggest area of misunderstanding by executive stakeholders wanting to extract better analytical insight or even just produce consistent reports.

Typical areas of friction in using data for analytics include:

Identification of the specific source of data (i.e. the specific system, table or API endpoint);
Enabling source system access (for political or technical reasons);
Process compliance issues in capturing data consistently;
Data integrity issues between systems; and
Lack of business rule consensus to address uncertainty in raw data.

Lack of structure to capture learnings

Resulting as a by-product of a competing list of priorities, work across multiple teams and a workforce with high levels of turnover, capturing learnings from the use of data products is often deferred or omitted entirely. In doing so, organisations miss out storing the insight of what did and didn’t work in their corporate memory bank, and are unable to use past experiences to guide future decisions.

Without retaining this knowledge the feedback loop is impaired and has a reduced ability to act on:

Underlying data readiness issues that need to be resolved before a data product can be developed;
Improvements that should be made to a data product;
Whether a data product no longer delivers sufficient benefits and should be retired; and
How one data product might benefit another.

Lack of requirements and design diligence

In the traditional product development process, design often comes from a grab bag of user requirements gathered by a business analyst from existing users. These users have a bias towards “fixing” the current state and can lack the context required to prioritise appropriately. This can lead to a myopic focus on features over usability and delivering future business value in developing the data product. Some good methodologies have evolved over time, such as CRISP-DM or Microsoft’s Team Data Science Process (TDSP), but they do not apply to all types of data products. Other techniques, such as design thinking, offer some useful inspiration, but are not specifically geared to the needs of data product design to:

Define the business imperative;
Define the target audience;
Define the information model;
Validate the data resources available against those needed; and
Validate the utility of the data product.

ADML framework

ADML is a framework to manage the design and specification of data analytics assets, which we term “data products”.

It is underpinned by:

A design methodology; and
A technical schema to capture outputs of the design process.

By design methodology, we mean the processes and procedures to follow.

By technical schema, we mean the standardised data structure to capture the outputs of the design methodology. The markup language adheres to this technical schema.

ADML objectives

The ADML design methodology aims to go further than provide a set of general principles that one might find in a white paper, and instead furnishes a practical set of processes that can be used to predictably create compelling data products. This approach persists the design reasons and context with the data products that are produced, allowing the learnings generated to flow through the asset development cycle as refinements. This representation improves corporate memory and leads to reuse of existing assets and insights, reduction of duplicated effort and increased benefit realisation from assets.

ADML is specifically designed to provide a framework that is agnostic to technology choices, operating models and project management methodologies to manage the specification of data analytics assets. This is to ensure that it can be easily implemented in organisations without conflicting with existing standards.

ADML is intended to be easy to follow; we provide a way to interpret needs, issues, outcomes and available resources into a set of clear features for a data product. It will aid stakeholders’ understanding of the complex domain of analytics and data products. It provides a common set of terminology to facilitate discussions and agreement on matters concerning scope.

Why a markup language?

We developed ADML as a markup language because we wanted to provide a taxonomy that could be implemented into any analytics management ecosystem, and that is descriptive, interoperable and portable. Automating the validation of a markup language is straightforward, it is easily rendered in tools such as text editors and browsers, and is portable.

What ADML is not

ADML is not a data modelling methodology. It is not a data governance methodology. It does not prescribe data management techniques.

ADML benefits

There are many benefits of using a structured approach such as ADML to improve the development of data products and address problems using data assets. They include:

Developing a reusable library of assets to improve the speed and reduce the cost of development;
Capturing the expected contribution of a data product to business value to help with managing resources and giving priority to the right data investments;
Linking data lineage to context (i.e., problems, imperatives and business value contribution) so that change processes can be managed in line with their value;
Ensuring data products are developed that focus on solving business issues;
Embedding the value of data to help organisations realise the value of data assets;
Codifying organisational learning and facilitating continuous improvement;
Combining business requirements with design thinking to ensure data products deliver value and are aligned with the intended outcome and organisational needs or issues, thereby mitigating unintended consequences of data product use;
Linking with complementary methodologies such as CRISP-DM while avoiding lock-in and constraints; and
Implementable at any stage during a data asset’s lifecycle by any type of user archetype, reducing risk for new and in-flight projects alike.

Limitations

As with any framework, the adoption of ADML is not a panacea. It will require additional governance, resources, commitment to the rituals and an agreed location in which to store the ADML schema and associated artifacts.

Prerequisites

An organisation will be best suited to benefit from adopting ADML when they satisfy one or more of the conditions listed below.

There is organisational alignment in the development of data products using cross-functional teams;
It is investing in, or considering investing in, data assets;
There is a desire to solve strategic, tactical and operational issues using data; and
It has an appetite to increase its analytics maturity, improve return on data investment or increase the number of data-informed decisions made.

Methodology overview

The analytics design process underpinning ADML we describe as the “Analytics Heliosphere”. The Analytics Heliosphere is a dynamic interconnected system that responds to changing business conditions. As data products at its core mature and generate more “charge”, it in turn grows in its magnetic force and extends out further into the organisation.

The Analytics Heliosphere is broken down into the following core components:

Four “waves”: themes that are emitted by an organisation and can be harnessed into the design of a data product;
Four design “rituals” that solicit a set of attributes to describe the needs of a data product; and
The data product that is produced and continuously enhanced as a result of the design process.

Data product overview

At the core of the Analytics Heliosphere is the data product. Specifying and developing it is the central purpose of all design rituals. We define a data product as an analytics asset that is consumed by an end user or as a component of an application that is in turn used by an end user.

In addition to tracking the design ritual outputs against a data product, the following core metadata is recommended to be captured against a data product:

Data product metadata	Example metadata values
Product category	Dashboard Report Alert Recommendation What-if tool Predictive model Optimisation
Publishing mechanism	API Application embedded Visualisation portal Desktop tool Dataset
Approved purpose	Compliance Operational Strategic planning Exploratory
Reliability rating	Gold Silver Bronze
Sensitivity rating	Public General Confidential Highly Confidential

This will provide for a richly described catalogue of data products across multiple technology platforms and teams.

Detailed methodology

To ensure consistency of the design process, we propose a set of “rituals”: a sequence of activities with an array of structured questions to inform the design lifecycle.

It can be very tempting to be skip steps to placate stakeholders or to “expedite” an outcome. We urge you to trust the process and the benefits that each stage brings to:

Articulate implicit knowledge and assumptions;
Bring consensus between a wide variety of stakeholders on priorities and business rules;
Consolidate a view of what is needed and what is possible across different parts of the organisation;
Anticipate obstacles based on data limitations; and
Mitigate wasted time building assets that are not adopted.

Information design

The purpose of this ritual is to define the semantics of a business area, identifying the information needs to support monitoring of performance and descriptive analytics. The primary “wave” that this ritual is concerned with are the business needs that imply measurement.

The four stages (typically run in a facilitated workshop for each process area) guide workshop facilitators and participants through problem definition from different perspectives covering:

Imperative definition;
User information design;
Information mapping; and
Conceptual data model.

1. Imperative definition

The purpose of this stage is to elicit umbrella statements that summarise the business need for data analytics within each prioritised process area (functional areas or capabilities of importance). We assume that a business strategy planning process has already occurred to inform these priorities.

The statements should read as “imperatives”, such as maximise customer acquisition or optimise gross margin. They should be general enough to cover a wide set of information needs but not as general as a functional area such as “marketing” or “sales”. Ideally the imperatives will be cross-functional to best represent how information flows between departments and job functions to fulfil a business outcome.

Arriving at a definitive statement of imperative can take some time, as a group of workshop participants debate semantics, business priorities, relevance as a need of data analytics versus more general technology or process needs and the appropriate level of summarisation. Once these issues are teased out, though, you will find that you have a very constructive set of statements that can be used as overarching “analytic themes” which relate to multiple metrics and data products.

Typical drivers of an imperative include:

Imperative driver	Description
Strategic alignment	Requirement to support an organisation’s strategic goal. This may be either the establishment of a new capability or improving an existing capability.
Cross-functional integration	Integrating multiple functional areas or teams to fulfil an end-to-end process.
Addressing an unsolved problem	Solves a problem of lack of visibility to monitor a business activity.
Core business capability	Is known to be integral to the successful operation of an organisation, usually associated with improving productivity of monitoring a business activity.

2. User information design

For each imperative, the user audience information needs are broken down for each of three archetypes:

User archetype	Description	Example role	Information needs	Historical systems used
Sponsor	A stakeholder who is ultimately accountable for an outcome	Chief Executive Officer	Evaluative information e.g., KPIs / scorecards	Corporate performance management / executive information systems
Optimiser	An agent who is responsible for identifying improvements to achieve a particular outcome.	Manager Analyst	Suggestive information Driver metrics / dashboards	Business intelligence visualisation tools Statistical packages
Implementer	An agent who is responsible for taking action in the business process. Typically acts on instructions or recommendations and is not tasked with reflection nor analysis.	Sales Representative Field Technician	Instructive information Alerts / reports	Operational reporting

The workshop session should elicit the “who” (which roles in the organisation) and the “what” (specifics of information needs). It is important to note that this stage is more concerned with the information elements required and not the detail of the design or format.

NOTE: It is recommended to avoid capturing simple “counts” and “aggregates” as these are implicit in the identification of activities and entities in the information mapping stage that follows.

Historically the audiences represented by the archetypes were serviced by distinct systems; this led to the disadvantage of disconnected data. Implicit in the suggested process is to inform the design of a shared “data hub” to allow a seamless flow of information between different decision making roles and navigation of data granularity.

Note that the workshop activity ideally includes representatives of each archetype to avoid too much bias towards the specific understanding of business needs for a particular group of users. It is common to meet resistance from stakeholders to commit time to a half-day workshop to provide input into requirements, but it is a key driver of success to ensure their participation in this critical stage of the design process and well worth the persistence!

3. Information mapping

Information mapping is a stage of the workshop which contextualises the flow of information, eliciting the nouns and verbs that describe a business process. Not to be confused with formal business process mapping, the purpose is to capture the semantics of a process that needs to be measured. This naturally follows on from user information design, which has identified outcome metrics (KPIs) and driver metrics and places them in the context of particular business activities that are being measured.

A distinction is made between categories of information elements to broadly delineate between:

The measurement of a business process; and
Descriptors which are categorical data elements (typically used to group or “slice” results).

Information element category	Information element type	Description	Example value
Measurement	Outcome	An evaluative measure of a desired business outcome, typically a KPI.	Revenue per customer versus target
Measurement	Effectiveness measure	The measurement of a business process’s efficiency or effectiveness, contributing to an outcome.	Churn rate
Measurement	Activity	An event or set of events that occur as part of a business process.	Sales pipeline
Descriptor	Entity	An object that participates in or is created as a result of a business process.	Opportunity
Descriptor	Facet	Grouping of aspects or attributes of an organisational entity or activity.	Sales stage
Descriptor	Resource	A person or system that records data.	CRM system

Information mapping is best facilitated by a visual map which can be easily read left-to-right. This sometimes means simplifying the connections between elements for simplicity of reading the map. To assist with review after a workshop, it is recommended to use standard symbols to represent each element type.

4. Conceptual data model

The purpose of this stage is to consolidate the inventory of information elements gathered in a series of workshops. A conceptual data model represents a high-level view of how data should be organised for analytics. The primary objective of the conceptual data model is conform to a common language to describe things in a business.

Note: Earlier we emphasised that ADML is not a data modelling methodology. Although this section discusses the role of conceptual data models, it does not insist on a particular data modelling or architecture approach.

It should represent the following concepts, and the interrelationship between them

Analytic themes;
Entities (generally relating to “dimensions” in data models); and
Activities (generally relating to “facts” in data models).

Step 1. Initial mapping to conceptual elements

An initial list should be compiled of all the information elements, categorised by information element type, which are then “mapped” to a conceptual item. This is an efficient way to consolidate the raw information element list by:

Reducing duplication of similar concepts that were named differently by different stakeholders;
Grouping metrics and other attributes with the relevant activity; and
Provide lineage of which aspects of the conceptual data model are shared by different analytic themes.

Illustrative example

Analytic theme	Item name	Information element type	Target conceptual item
Maximise customer portfolio value	Revenue per customer vs target	Outcome	Pipeline
Maximise customer portfolio value	Churn rate	Effectiveness measure	Pipeline
Maximise customer portfolio value	Sales pipeline	Activity	Pipeline
Maximise customer portfolio value	Opportunity	Entity	Opportunity
Maximise customer portfolio value	Sales stage	Facet	Pipeline
Maximise customer portfolio value	CRM system	Resource	—
Maximise customer portfolio value	Sales representative	Entity	Staff
Improve customer satisfaction	Days to issue resolution vs target	Activity	Service requests
Improve customer satisfaction	Call wait time	Activity	Communications
Improve customer satisfaction	Operator	Entity	Staff
Improve customer satisfaction	Inbound call	Activity	Communications

Step 2. Cross-tabulate entities and activities

This exercise captures glossary definitions of each conceptual element and identifies the nature of the relationship between entities and activities. Rather than simply marking an “x” at the intersection of the matrix, we recommend succinctly describing how the entity and activity are related in words.

A simplified illustrative example

Activity	Description	Campaign activity	Sales	Inventory	Orders	Work schedule	Delivery Schedule
Entity		A sequence of all inbound and outbound communications and outcomes	A sequence of all sales transactions	A timeseries snapshot of stock levels and movements between locations	A sequence of requests for stock by Stores	An allocation of staff to meet forecast capacity requirements by store	A daily assignment of vehicles to fulfil orders
Customer	An individual or organisation who has previously made an enquiry or purchased a product.	Approached by Responded to offer from Converted to sale by	Associated with	N/A	N/A	N/A	N/A
Store	A location at which a sale is processed	N/A	Associated with	Checks levels of	Replenishes stock levels with	Generates a weekly	N/A
Product	An item that is able to be purchased by a customer	N/A	Associated with	Quantities represented in	Included in	N/A	N/A
Campaign	A promotional activity to a customer	Generates a series of communications which may result in a contact with a person or a response/conversion	May be associated with	N/A	N/A	N/A	N/A
Staff	An individual who is employed directly by the company	N/A	Paid commission by	N/A	N/A	Rostered to work by the weekly schedule	N/A
Vehicle	A member of the company fleet which redistributes stock from warehouses to stores	N/A	N/A	N/A	Delivers	N/A	Adheres to
Warehouse	A location where products are stored in reserve	N/A	N/A	N/A	Provides stock to fulfil	N/A	N/A

Hypothesis design

The purpose of this ritual is to define the levers and constraints of a particular business issue, identifying the needs to support data-driven intervention, often associated with the practice of statistical analysis, machine learning and prescriptive analytics.

The primary “wave” that this ritual is concerned with are the business issues that the organisation wishes to influence. As opposed to information design, which is concerned with driving visibility and accountability for the “what”, hypothesis design is focused on the “so what” using scientific method to build an understanding of causality which can drive improvement in performance. It identifies the actions and interventions that an organisation can take and relates them to the anticipated outcomes that result from making them. In this way, it captures a set of assumptions that can be tested with data.

The two stages (typically run in a facilitated workshop per process area) guide through the hypothesis refinement covering:

Issue definition; and
Driver hypothesis mapping.

1. Issue definition

A list of problem statements or “issues” are compiled against each “imperative” identified in the information design stage.

For each issue, three questions are answered:

What is the target state outcome?
How would success be measured?
What interventions are proposed to be made?

2. Driver hypothesis mapping

Hypothesis mapping contextualises the relationship between potential intervention decisions, drivers, constraints and the outcomes that are targeted for change. A defining characteristic of a hypothesis is that it can be tested using data and analysis, given the required data can be made available. Even before all the required data is obtained, hypotheses can use assumptions about relationships and their strength to investigate scenarios.

This generates a list of follow on questions to be answered about:

What are the business impediments to influence change?
Is all the data required actually being collected to investigate the hypothesis?
What are the discernible dynamics based on historical data? What controlled experiments may be required to determine the elasticity of particular drivers?

Information element type	Description	Example value
Issue	An existing or potential problem affecting process imperatives and the ability to achieve goals.	Not retaining customers beyond cooling off period
Outcome change	The desired change in an outcome that will result from the issue being successfully addressed.	Increase retention rate by 10%
Driver	The variables that are related to the outcome change.	Competitor price
Intervention	The things we can change to bring about an outcome change.	Price match offer

It is important—and often confronting to participants—to be specific in attaching a quantity to the outcome change. This grounds the proposed data product in the reality that a successful result requires effecting a change in the outcome, and concept has often never been considered prior to the workshop.

Data readiness

The purpose of this design ritual is to validate whether data exists in a form that will support the analytics objectives. The primary “wave” that this ritual is concerned with are the resources that capture or record data.

Before a data product can be built, the feasibility of fulfilling stakeholder expectations needs to be validated. Based on the information design and hypothesis design, sources of data need to be identified and tested for its completeness and utility.

It is critical to the success of any analytics initiative that stakeholders take an active interest in the limitations of an organisation’s data resources to support their analytics needs, and understand what compromises may be required to produce a short-term output.

Data sourcing

The purpose of this activity is to capture annotations of raw data elements with the context of their meaning, and a decision of where they map to in the target conceptual model.

Whilst seemingly straightforward, this is typically an extremely challenging exercise! To complete this effectively requires sound system subject matter expertise, particularly of how business processes are defined to populate a system. Often this knowledge is spread across a group of people whose different perspectives need to be reconciled and validated.

The output is best captured as a “catalogue” of data elements, with additional metadata columns that describe what the source data represents, any known issues and what target concept it maps to.

Data quality review

Discovering the quality of data is an inevitability of any analytics endeavour.

The intention of highlighting this as a key stage is to focus on communicating the limitations of data in its raw form to stakeholders. This may drive some operational initiatives to address the issues, or at very least an agreement to modify the scope to a more modest set of objectives that are better supported by the available data. The principle that we emphasise here is to validate the data early in a project lifecycle. This “feasibility” stage should do more than simple profiling, it should test whether source data can be “reshaped” to the constructs outlined in the information and hypothesis design phases.

The purpose here is not to be too specific about the technique to use, but rather suggest a conformed set of evaluation criteria that is easily relatable to different data usage scenarios.

Data readiness category	Description	Example measurements
Availability	How easy is it to acquire data in a timely and repeatable manner?	Data extraction window
Clarity	Is the meaning of the data well understood?	Data definition supplied % Business rule adherence
Coverage	To what extent is data being captured as part of a business process?	% Null / Blank
Consistency	Is there too much variation in how data is being captured?	Value Distribution % Abnormalities
Authority	Has the best source of data been made available?	Anecdotal Stakeholder Feedback Attribute duplication across data sets
Cross-relatable	Can data be related to other entities and processes reliably?	% Orphaned rows in data set comparisons

For each readiness category a rating should be captured with a scenario specific annotation of any issues or limitations observed.

Data readiness rating value	Data readiness rating description
1	Inadequate
2	Limited
3	Fair
4	Good
5	Exceptional

Learning

The purpose of this design ritual is to evaluate the performance of a data product. The primary “wave” that this ritual is concerned with are the outcomes that can be related to a data product.

A key reason for this design ritual is to ensure corporate memory of why the data product was developed in the first place, and what challenges were faced and overcome in the past. This knowledge can be used to refine assumptions, update expected benefits and inform future decisions.

It also helps to understand whether the data product continues to provide value in the way that was intended and hence understand when the data product should be adjusted, when it should be redeveloped and when it should be retired.

Learning session

The purpose of a learning session is to conduct a review of a data product to understand its benefits and areas for improvement. The workshop is similar to a post-implementation review, with an important distinction: the aim of the learning session is to collect information in a structured way that can be drawn on easily in future data product iterations.

Each session generates a snapshot in time, ideally stored in a way that accumulates a historical record of all review sessions. The frequency of the review sessions depends on the stability of the data product and the frequency of decisions made with it; an early stage data product will likely require more frequent reviews to ensure its continuous improvement.

Learning category	Learning category description	Additional questions to prompt responses
Precedent	Have we done this before?	What was the context? What did we do?
History	What happened last time?	What interventions were taken? What were the outcomes? Did the results validate the assumptions?
Differences to precedent	How is this different to previous times?	How were our assumptions different to historical assumptions? Can we translate these differences into quantitative associations?
Conclusion	What should we do differently?	How are our future interventions informed by the combination of what we have done before, what happened and how this time is different from other times?

Benefits tracking

The original benefits of a data product are often forgotten with time. The issue with this is that a data product may be accruing benefits “invisibly” that will only be evident (or worse, not evident!) when a data product is retired.

As an ongoing activity associated with a data product, the capture of benefits can mitigate this issue. This may be an automated measurement, or a point in time manual review, capturing the following:

Data product benefit attributes	Example
Benefit name	Incremental profit
Benefit amount	$230,000
Period	2021-06-01 to 2021-12-31

Ideally over time this will depict a time series view of benefits accrued by data product. Data executives will appreciate the value of this to help justify operational expenditure for the maintenance of their portfolio of data assets.

Implementation of ADML schema

To ensure that the details collected through the design process we’ve outlined is easily consolidated in a central hub, we developed the ADML schema. The ADML schema is a conformed data structure defined in a JSON format.

Components

To implement the ADML schema you will need the following:

A data capture system that collects the data elements in the JSON schema;
A process to output the data into the JSON format files; and
A system to accumulate the JSON files and display them in a catalogue.

Schema definition

ADML is defined as a JSON schema that can be implemented in a range of platforms.

The core concepts in the JSON schema are:

Objects;
Attributes,
Relationships; and
Bridges.

For a complete technical guide to the JSON schema refer to https://admlguide.github.io/.

Objects

An object is a collection of attributes that represents a concept in the design process (e.g., hypothesis). Custom objects can be created to extend the logical model.

Name	Description
Process area	A functional theme, ideally cross-functional.
Imperative	An overarching theme for a set of related improvement objectives.
Persona	The type of audience for a data product.
Measurement	Metrics which measure a business process.
Activity	An event that occurs as part of a business process.
Entity	An object that participates or is created as part of a business process.
Hypothesis	A theory of how issues may be able to be affected.
Issue	An existing or potential problem affecting process imperatives and the ability to achieve goals.
Outcome change	The desired change in an outcome that will result from the issue being successfully addressed.
Driver	The variables that are related to the outcome change.
Intervention	The things we can change to bring about an outcome change.
Resource	A person or technology which records data.
Data readiness	An assessment of the fit for purpose of data resources to support an analytics need.
Data readiness rating	A reference of rating levels (scaled from 1 to 5)
Data product	A configuration of data that can be consumed to solve a particular problem.
Catalogue	A configurable set of categorical labels for a data product
Learning	A time series review of a data product’s performance.
Data product benefit	A time series recording of benefits associated with a data product.

Attributes

Attributes are the fields within an object or bridge that correspond to data values within the data file. For example, a Hypothesis object may have attributes such as HypothesisID and HypothesisName. Custom attributes can be added to any object.

For a complete list of attributes refer to https://admlguide.github.io/.

Relationships

Relationships are a logical definition of how objects are connected, such as a Imperative:ImperativeID joins to Process Area:ImperativeID.

Relationships are always assumed to be one-to-many, and generally use a foreign key relationship.

For a complete list of relationships refer to https://admlguide.github.io/.

Bridges

Bridges provide a physical schema object to resolve many-to-many relationships.

For a complete list of relationships refer to https://admlguide.github.io/.

Entity relationship diagram

This entity relationship diagram provides a summary of all of the objects in ADML and how they are related.

ADML example

Shown below for illustrative purposes is an excerpt from an ADML JSON file.

class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json" style="break-inside: unset;">

xxxxxxxxxx

  "schema": "ADML",

  "schemaVersion": 1.0,

  "schemalocation": "https://admlguide.github.io/",

  "name": "ADML for Organisation A",

  "description": "Organisation A's Enterprise Design Catalogue",

  "version": "1.2",

  "object":"activity",

   "customAttributes":[

         "name":"Owner",

         "value":"John Smith"

],

   "activity": [ {"activityID": "1", "activityName":"Sales Pipeline"},

      {"activityID":"2","activityName":"Order Fulfillment"}

Custom extensions and tools

Custom properties and attributes

ADML has been designed to be inclusive rather than prescriptive. It allows for user customisation and extension in those areas where it would not be feasible to enumerate all the possibilities an organisation may have.

Extensions can be to:

An object’s properties; and
Attributes within an object.

To manage these customisations, the JSON schema has a concept of a model which allows an organisation to track versions of these customisations independently of the JSON schema version.

The following example demonstrates a customised attribute:

class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json">

xxxxxxxxxx

   "customAttributes":[

         "name":"Owner",

         "value":"John Smith"

Custom objects

Whilst the ADML schema is strict about the semantics to conform to the design methodology, it can theoretically be extended to include other objects to extend the schema further. Adherence to the standard properties of an object is recommended to ensure consistency across an organisation’s ADML model. We strongly encourage the community to contribute their innovation so future versions may include learnings across a number of implementations.

Tools

There are a number of commonly available tools that can assist the management of ADML. Some of these are described below.

Editing: As ADMLs component files are JSON files, standard text editors and integrated development environments (IDEs) are excellent for creating, checking, comparing ADML files and versions. Some examples are Visual Studio Code, Atom, and Intellij, although any editor capable of editing text will suffice.
Validation: Since ADML comprises JSON files, it is straightforward to validate the JSON created against the ADML JSON Schema. Libraries are readily available for validation against a schema; see https://json-schema.org/implemetation.html. Editors (as mentioned in the previous point) often include validation extensions. There are also online validators not listed here.
Excel workbook: As an aid to capturing the information, the authors have provided a workbook in Excel, which can be found at https://admlguide.github.io/. An accompaniment to this workbook is a script to generate files that adhere to the ADML JSON Schema.
Source code management tools: Because the ADML files are JSON text files, they are ideal candidates for managing via common source code management tools such as git, Github and Bitbucket.

Conclusion

In this white paper we have introduced ADML and given guidance on how it can be used as part of the development process for successful data products. Our experience is that using ADML is a strong driver of producing a successful data product and avoiding the common pitfalls of analytics projects. ADML provides a framework for eliciting and managing the context of data products, and complements existing technologies, standards, processes and methodologies.

Further information can be found on the ADML web site and on the ADML blog pages.

Appendices

A. Glossary

Term	Description	References
ADML	Analytics Design Markup Language - a specification for capturing decisions and observations in the design process of data products.
Analytics heliosphere	A dynamic interconnected system that responds to changing business conditions.	Methodology overview
Data product	A configuration of data that can be consumed to solve a particular problem.
Decision support system	An information system that supports business or organisational decision-making activities.	https://en.wikipedia.org/wiki/Decision_support_system
KPI	Key performance indicator.
Key performance indicator	A metric against an imperative used by an organisation to monitors its overall performance.

B. Licence

ADML is licensed under Creative Commons Attribution-NoDerivatives 4.0 International.