Reliable Mental Models

The more complex the source and steps used to create the dataset, the greater the need for reliable mental models. Start your Data Card with simple explanations, and progressively embrace the complexity in your dataset.

A mental model is a simplified representation of the dataset that a reader can create in their minds, using context and structure from the Data Card. A solid mental model behaves as a lens that we can use to unpack the complexity and interconnectedness of different factors that define a dataset and its use.

Start with the simplest possible explanation and add more information.

Layer more sophisticated details that can aid simplicity – such as examples, context, justifications, policies and processes.

Introduce structural consistency by using a repeatable strategy that adheres to a consistent mental model to organize information within parts of the Data Card. Use well-defined structures that readers can use as ‘mental hooks’ and combat unnecessary complexity.

The Translated Wikipedia Biographies breaks down each datapoint into three components – the "scraped" content, the "translation," and the "annotation." They use this taxonomy to consistently describe the sources and the data collection method, so that there is no confusion about how different parts of the data was collected and put together.

Manage similar-sounding or easily conflated information with care.

It can be challenging to determine all possible applications of general purpose datasets that can be used in many tasks. Many times, these datasets can be used in closely related applications, but with very different implications.

Where possible, clarify the distinction between seemingly close applications that are easy to mix up. For example, face detection and face recognition are two distinct problems that require different datasets but are easily conflated. You can also consider defining similar-sounding terms to avoid confusion, within the card itself.

Though this is a Model Card (and not a Data Card), the Mediapipe BlazeFace Sparse Model Card clearly describes a range of unsuitable use cases. While this model is designed to detect the presence of faces in an image, the authors clearly state that counting faces detected and using the detected entities for identity recognition are out-of-scope.