Overview
Consider how cross-functional stakeholders in a dataset’s lifecycle engage in decision-making on the basis of a single transparency artifact by using this broad (yet decomposable) typology of Producers, Agents, and Users.
“Producers” of datasets are upstream creators of dataset and documentation, responsible for dataset collection, ownership, launch and maintenance. We have observed producers typically subscribe to a single, informal notion of “users” of Data Cards—loosely characterized by high data domain expertise, familiarity with similar datasets, and deep technical knowledge. However, in practice, we find that only a few “readers” or Agents actually meet all these requirements.
“Agents” are cross-functional stakeholders who read Data Cards (and many times, ML model-related documentation), and possess the agency to use or determine how themselves or others might use the described datasets or AI systems. Agents can be operational and / or reviewer type roles. This distinction is important, because reviewers include stakeholders who may never directly use the dataset, but will engage with the Data Card (for e.g. reviewers or non-technical subject matter experts). Agents may or may not possess the technical expertise to navigate information presented in typical dataset documentation, but often have access to expertise as required.
Finally, Agents are distinct from “Users”, who are individuals and representatives who interact with products that rely on models trained on datasets. Users may consent to providing their data as a part of the product experience, but they typically require a significantly different set of explanations and controls grounded within product experiences – even when it comes to datasets.
These groups exist on a continuum, so stakeholders may fall into more than one group concurrently, depending on their context. Use this typology to unearth assumptions that are often made about the rich intersectional attributes of individual stakeholders, such as expertise (e.g. novice or expert), data fluency (e.g. none to high), job roles (e.g. Data Scientist, Policy Maker), function performed vis-à-vis the data (Data Contributor, Rater), and goals or tasks (Publishing a dataset, Comparing datasets) when conceptualizing your Data Cards.
We’ve broken this typology down further into different types of Producers, Agents, and Users. While they may have common responsibilities and tasks, each of these sub-groups are characterized by their relationship to the dataset and its Data Card.
Producers = Create Datasets / Documentation | ||
Responsibilities & Tasks | Sub-groups | Identifying Question & Roles |
Responsible for the dataset’s design, creation, quality testing, documentation, launch, adoption, follow-up maintenance, and future updates Common tasks: Driving dataset adoption, producing documentation, implementing, future-proofing, fairness & security tests and analysis, improvements to datasets and/or components |
SOURCE - People who implicitly or explicitly contribute data towards a dataset. The people, behaviors, and cultures represented by a dataset. | Who implicitly or explicitly contributes data towards your dataset? For example: Product Users, Data Contributors, Surveyed Population |
CORE - The team of people responsible for producing and publishing dataset(s) and launch, adoption and/or success. | Who all are responsible for producing, publishing and ensuring success of your dataset(s)? For example: Researchers, Data Scientists, Software Engineers, Managers, Subject Matter Experts |
|
ADJACENT - Individuals and groups recruited to collect or label the data, provide advice on methods or interpretation, at various points during the data lifecycle. | Who all have been recruited to produce data or advice on critical decisions? For example: Surveyors, Raters, Labellers, Validators, 3rd Party Vendors, Domain Experts |
|
IMPACTED - Current and future team members, partners, clients, or data-hosting platforms, responsible for dataset maintenance or upkeep, deploying in production, monitoring. | Who is responsible for dataset maintenance or upkeep, deploying in production, monitoring? For example: Domain Experts, Data Platform Owners, Data Aggregators |
Agents = Use, Evaluate, or Determine How the Dataset Is or Should Be Used | ||
Responsibilities & Tasks | Sub-groups | Identifying Question & Roles |
Producer’s stakeholders – people who will evaluate and use the dataset for their work, products, organizations, or communities Common tasks: Manage complexity in pipelines and products, approve use or purchase of dataset, track accountability, make trade-offs in implementation, deploy in production, archive and audit datasets |
CORE - Industry and academic roles that use dataset(s) in their products, platforms, tools, and research. | Who will use your dataset(s) in production, tooling and research? For example: Developers, Product Managers, Data Scientists, Creative Coders, Researchers, Teachers, Students |
ADJACENT - Roles that don't use the dataset, but evaluate and make decisions that can directly affect the goals of the producers or core agents. | Who will make critical decisions about the data but may not use it? For example: Industry Consultants, Policy Experts, Legal Entities, Investigative Journalists, Community Reps, Domain Experts |
|
IMPACTED -Professional, expert-system, and domain expert roles whose work is affected by availability, updates, and removal of the data. | Who will be affected by changes, updates, and removal of the data? For example: Domain Experts, Data Service Providers, Data Aggregators, Production Roles |
Contribute to Data and Represent Demographics Who Are Impacted by the Way Data Is Used | ||
Responsibilities & Tasks | Sub-groups | Identifying Question & Roles |
Interact with the products, devices, and applications created by agents using the producer’s datasets Common tasks: Use consumer or expert products, understand data/privacy pertaining to contributing data, providing feedback on product experiences, report concerns regarding data use, ask for data removal |
TYPICAL - Individuals or cohorts of users of a product or service that uses the data, and have an as-expected or neutral experience. | Who are end-users who have a normal or typical experience of classes of products that use the data? For example: Consumers of products, platforms, or services |
IMPACTED - Individuals or cohorts of end users of products and services who are significantly affected (positive or negative) as a result of the data being used in the product or service. | Who are end-users who have an atypical (positive or negative) experience of classes of products that use the data? For example: Users with extreme experiences, Non-profit organizations, Legal representatives |
|
CONTRIBUTORS -Users who produce or opt-in data in the product experience, which is then collected and turned into a dataset. In this case, these are often the same as source producers. | Who are end-users who produce or opt-in data in the product experience that is used to update the dataset(s)? For example: Users who opt-in data, People who operate machines that generate data, Research and Industry partners |
Key Takeaways
- There are three broad categories of stakeholders in a dataset’s lifecycle – producers of the dataset and its documentation, agents that use the datasets, their data cards and documentation, and users who interact with products that use the datasets.
- Producers create Data Cards while agents read Data Cards. Users interact with products and require a different set of explanations and controls.
- An individual may fall into one or more of these groups at any given time, depending on context.
- Agents may not always use the dataset directly, but will always read the documentation. They may or may not possess the technical expertise necessary themselves, but often have access to expertise as required.
- This typology represents a continuum of constantly shifting needs and expectations from datasets and their documentation. There is no one-size-fits-all solution.
Actions
- Map Your Stakeholders. This basic activity will help you review your dataset’s lifecycle to identify your stakeholders. Take note of who may interact with the dataset or its documentation. Consider how stakeholders may contribute to the Data Card(s).
- Priority Matrix. Not all agents will be equally important to your Data Card efforts. Use the priority matrix activity to leverage meaningful criteria and drill down on your highest priority audience groups.
- Align on Agents. This advanced activity will help you create personas and archetypes that describe your agents’ proficiency, as well as uses for datasets and Data Card(s) to reference when creating Data Card(s). Use our worksheets to conduct semi-structured interviews with individuals who represent your agents.
- Agent Information Journeys. Use this typology to craft “user journeys” for your Data Card efforts in the Agent Information Journey activity. Determine what’s essential to convey to an agent using the Data Card and set them up for success.
Considerations
- Does your team have a shared vocabulary around stakeholders that can affect the data, and be affected by the data?
- Have you considered cross-functional producers, agents, and users organized by their upstream or downstream involvement in the dataset’s lifecycle and over time?