The Data Cards Playbook

Data Cards Playbook logo

A toolkit for transparency in AI dataset documentation.

What is a Data Card?

Transparent Dataset Documentation for Responsible AI

Data Cards are structured summaries of essential facts about various aspects of ML datasets needed by stakeholders across a project's lifecycle for responsible AI development.

Dataset Name (Acronym)

Write a short summary describing your dataset (limit 200 words). Include information about the content and topic of the data, sources and motivations for the dataset, benefits and the problems or use cases it is suitable for.

Dataset Link

Data Card Author(s)

  • Name, Team: (Owner / Contributor / Manager)
  • Name, Team: (Owner / Contributor / Manager)
  • Name, Team: (Owner / Contributor / Manager)

Authorship

Publishers, Dataset Owners, Funding Sources

Publishers

Publishing Organization(s)

Organization Name

Industry Type(s)

  • Corporate - Tech
  • Corporate - Non-Tech (please specify)
  • Academic - Tech
  • Academic - Non-Tech (please specify)
  • Not-for-profit - Tech
  • Not-for-profit - Non-Tech (please specify)
  • Individual (please specify)
  • Others (please specify)

Contact Detail(s)

  • Publishing POC: Provide the name for a POC for this dataset's publishers
  • Affiliation: Provide the POC's institutional affiliation
  • Contact: Provide the POC's contact details
  • Mailing List: Provide a mailing list if available
  • Website: Provide a website for the dataset if available

Dataset Owners

Team(s)

Name of Group or Team

Contact Detail(s)

  • Dataset Owner(s): Provide the names of the dataset owners
  • Affiliation: Provide the affiliation of the dataset owners
  • Contact: Provide the email of the dataset owner
  • Group Email: Provide a link to the mailing-list@server.com for the dataset owner team
  • Website: Provide a link to the website for the dataset owner team

Author(s)

  • Name, Title, Affiliation, YYYY
  • Name, Title, Affiliation, YYYY
  • Name, Title, Affiliation, YYYY
  • Name, Title, Affiliation, YYYY

Funding Sources

Institution(s)

  • Name of Institution
  • Name of Institution
  • Name of Institution

Funding or Grant Summary(ies)

For example, Institution 1 and institution 2 jointly funded this dataset as a part of the XYZ data program, funded by XYZ grant awarded by institution 3 for the years YYYY-YYYY.

Summarize here. Link to documents if available.

Additional Notes: Add here

Dataset Overview

Sensitivity of Data, Dataset Version and Maintenance

Data Subject(s)

  • Sensitive Data about people
  • Non-Sensitive Data about people
  • Data about natural phenomena
  • Data about places and objects
  • Synthetically generated data
  • Data about systems or products and their behaviors
  • Unknown
  • Others (Please specify)

Dataset Snapshot

Category Data
Size of Dataset 123456 MB
Number of Instances 123456
Number of Fields 123456
Labeled Classes 123456
Number of Labels 123456789
Average Labeles Per Instance 123456
Algorithmic Labels 123456789
Human Labels 123456789
Other Characteristics 123456

Above: Provide a caption for the above table of visualization.

Additional Notes: Add here.

Content Description

Summarize here. Include links if available.

Additional Notes: Add here.

Descriptive Statistics

Statistic Field Name Field Name Field Name Field Name Field Name Field Name
count
mean
std
min
25%
50%
75%
max
mode

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here.

Sensitivity of Data

Sensitivity Type(s)

  • User Content
  • User Metadata
  • User Activity Data
  • Identifiable Data
  • S/PII
  • Business Data
  • Employee Data
  • Pseudonymous Data
  • Anonymous Data
  • Health Data
  • Children’s Data
  • None
  • Others (Please specify)

Field(s) with Sensitive Data

Intentional Collected Sensitive Data

(S/PII were collected as a part of the dataset creation process.)

Field Name Description
Field Name Type of S/PII
Field Name Type of S/PII
Field Name Type of S/PII

Unintentionally Collected Sensitive Data

(S/PII were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.)

Field Name Description
Field Name Type of S/PII
Field Name Type of S/PII
Field Name Type of S/PII

Additional Notes: Add here

Security and Privacy Handling

Summarize here. Include links and metrics where applicable.

Method: description

Method: description

Method: description

Additional Notes: Add here

Risk Type(s)

  • Direct Risk
  • Indirect Risk
  • Residual Risk
  • No Known Risks
  • Others (Please Specify)

Supplemental Link(s)

Link Name or Document Type: link

Link Name or Document Type: link

Link Name or Document Type: link

Risk(s) and Mitigation(s)

Summarize here. Include links and metrics where applicable.

Risk type: Description + Mitigations

Risk type: Description + Mitigations

Risk type: Description + Mitigations

Additional Notes: Add here

Dataset Version and Maintenance

Maintenance Status

Regularly Updated - New versions of the dataset have been or will continue to be made available.

Actively Maintained - No new versions will be made available, but this dataset will be actively maintained, including but not limited to updates to the data.

Limited Maintenance - The data will not be updated, but any technical issues will be addressed.

Deprecated - This dataset is obsolete or is no longer being maintained.

Version Details

Current Version: 1.0

Last Updated: MM/YYYY

Release Date: MM/YYYY

Maintenance Plan

Summarize here. Include links and metrics where applicable.

Versioning: Summarize here. Include information about criteria for versioning the dataset.

Updates: Summarize here. Include information about criteria for refreshing or updating the dataset.

Errors: Summarize here. Include information about criteria for refreshing or updating the dataset.

Feedback: Summarize here. Include information about criteria for refreshing or updating the dataset.

Additional Notes: Add here

Next Planned Update(s)

Version affected: 1.0

Next data update: MM/YYYY

Next version: 1.1

Next version update: MM/YYYY

Expected Change(s)

Updates to Data: Summarize here. Include links, charts, and visualizations as appropriate.

Updates to Dataset: Summarize here. Include links, charts, and visualizations as appropriate.

Additional Notes: Add here

Example of Data Points

Primary Data Modality

  • Image Data
  • Text Data
  • Tabular Data
  • Audio Data
  • Video Data
  • Time Series
  • Graph Data
  • Geospatial Data
  • Multimodel (please specify)
  • Unknown
  • Others (please specify)

Sampling of Data Points

  • Demo Link
  • Typical Data Point Link
  • Outlier Data Point Link
  • Other Data Point Link
  • Other Data Point Link

Data Fields

Field Name Field Value Description
Field Name Field Value Description
Field Name Field Value Description
Field Name Field Value Description

Above: Provide a caption for the above table or visualization if used.

Additional Notes: Add here

Typical Data Point

Summarize here. Include any criteria for typicality of data point.

{'q_id': '8houtx',
      'title': 'Why does water heated to room temperature feel colder than the air around it?',
      'selftext': '',
      'document': '',
      'subreddit': 'explainlikeimfive',
      'answers': {'a_id': ['dylcnfk', 'dylcj49'],
      'text': ["Water transfers heat more efficiently than air. When something feels cold it's because heat is being transferred from your skin to whatever you're touching. ... Get out of the water and have a breeze blow on you while you're wet, all of the water starts evaporating, pulling even more heat from you."],
      'score': [5, 2]},
      'title_urls': {'url': []},
      'selftext_urls': {'url': []},
      'answers_urls': {'url': []}}
    

Additional Notes: Add here

Atypical Data Point

Summarize here. Include any criteria for atypicality of data point.

{'q_id': '8houtx',
      'title': 'Why does water heated to room temperature feel colder than the air around it?',
      'selftext': '',
      'document': '',
      'subreddit': 'explainlikeimfive',
      'answers': {'a_id': ['dylcnfk', 'dylcj49'],
      'text': ["Water transfers heat more efficiently than air. When something feels cold it's because heat is being transferred from your skin to whatever you're touching. ... Get out of the water and have a breeze blow on you while you're wet, all of the water starts evaporating, pulling even more heat from you."],
      'score': [5, 2]},
      'title_urls': {'url': []},
      'selftext_urls': {'url': []},
      'answers_urls': {'url': []}}
    

Additional Notes: Add here

Motivations & Intentions

Motivations, Intended Use

Motivations

Purpose(s)

  • Monitoring
  • Research
  • Production
  • Others (please specify)

Domain(s) of Application

For example: Machine Learning, Computer Vision, Object Detection.

keyword, keyword, keyword

Motivating Factor(s)

For example:

  • Bringing demographic diversity to imagery training data for object-detection models
  • Encouraging academics to take on second-order challenges of cultural representation in object detection

Summarize motivation here. Include links where relevant.

Intended Use

Dataset Use(s)

  • Safe for production use
  • Safe for research use
  • Conditional use - some unsafe applications
  • Only approved use
  • Others (please specify)

Suitable Use Case(s)

Suitable Use Case: Summarize here. Include links where necessary.

Suitable Use Case: Summarize here. Include links where necessary.

Suitable Use Case: Summarize here. Include links where necessary.

Additional Notes: Add here

Unsuitable Use Case(s)

Unsuitable Use Case: Summarize here. Include links where necessary.

Unsuitable Use Case: Summarize here. Include links where necessary.

Unsuitable Use Case: Summarize here. Include links where necessary.

Additional Notes: Add here

Research and Problem Space(s)

Summarize here. Include any specific research questions.

Citation Guidelines

Guidelines & Steps: Summarize here. Include links where necessary.

BiBTeX:

@article{kuznetsova2020open,
      title={The open images dataset v4},
      author={Kuznetsova, Alina and Rom, Hassan and Alldrin, and others},
      journal={International Journal of Computer Vision},
      volume={128},
      number={7},
      pages={1956--1981},
      year={2020},
      publisher={Springer}
    }
    

Additional Notes: Add here

Access, Rentention, & Wipeout

Access, Retention, Wipeout and Deletion

Access

Access Type

  • Internal - Unrestricted
  • Internal - Restricted
  • External - Open Acess
  • Others (please specify)

Documentation Link(s)

  • Dataset Website URL
  • GitHub URL

Prerequisite(s)

For example:

This dataset requires membership in [specific] database groups:

  • Complete the [Mandatory Training]
  • Read [Data Usage Policy]
  • Initiate a Data Requesting by filing

Policy Link(s)

  • Direct download URL
  • Other repository URL

Code to download data:

...
    

Access Control List(s)

Access Control List: Write summary and notes here.

Access Control List: Write summary and notes here.

Access Control List: Write summary and notes here.

Additional Notes: Add here

Retention

Duration

Specify duration in days, months, or years.

Policy Summary

Retention Plan ID: Write here

Summary: Write summary and notes here

Process Guide

For example:

This dataset compiles with [standard policy guidelines].

Additional Notes: Add here

Exception(s) and Exemption(s)

Exemption Code: ANONYMOUS_DATA / EMPLOYEE_DATA / PUBLIC_DATA / INTERNAL_BUSINESS_DATA / SIMULATED_TEST_DATA

Summary: Write summary and notes here.

Additional Notes: Add here

Wipeout and Deletion

Duration

Specify duration in days, months, or years.

Deletion Event Summary

Sequence of deletion and processing events:

  • Summarize first event here
  • Summarize second event here
  • Summarize third event here

Additional Notes: Add here

Acceptable Means of Deletion

  • Write acceptable means of deletion
  • Write acceptable means of deletion
  • Write acceptable means of deletion

Post-Deletion Obligations

Sequence of post-deletion obligations:

  • Summarize first obligation here
  • Summarize second obligation here
  • Summarize third obligation here

Additional Notes: Add here

Operational Requirement(s)

Wipeout Integration Operational Requirements:

  • Write first requirement here
  • Write second requirement here
  • Write third requirement here

Exceptions and Exemptions

Policy Exception bug: [bug]

Summary: Write summary and notes here

Additional Notes: Add here

Provenance

Collection, Collection Criteria, Relationship to Source, Version and Maintenance

Collection

Method(s) Used

  • API
  • Artificially Generated
  • Crowdsourced - Paid
  • Crowdsourced - Volunteer
  • Vendor Collection Efforts
  • Scraped or Crawled
  • Survey, forms, or polls
  • Taken from other existing datasets
  • Unknown
  • To be determined
  • Others (please specify)

Methodology Detail(s)

Collection Type

Source: Describe here. Include links where available.

Platform: [Platform Name], Describe platform here. Include links where relevant.

Is this source considered sensitive or high-risk? [Yes/No]

Dates of Collection: [MMM YYYY - MMM YYYY]

Primary modality of collection data:

Usage Note: Select one for this collection type.

  • Image Data
  • Text Data
  • Tabular Data
  • Audio Data
  • Video Data
  • Time Series
  • Graph Data
  • Geospatial Data
  • Unknown
  • Multimodal (please specify)
  • Others (please specify)

Update Frequency for collected data:

Usage Note: Select one for this collection type.

  • Yearly
  • Quarterly
  • Monthly
  • Biweekly
  • Weekly
  • Daily
  • Hourly
  • Static
  • Others (please specify)

Additional Links for this collection:

  • [Access Policy]
  • [Wipeout Policy]
  • [Retention Policy]

Additional Notes: Add here

Source Description(s)

  • Source: Describe here. Include links, data examples, metrics, visualizations where relevant.
  • Source: Describe here. Include links, data examples, metrics, visualizations where relevant.
  • Source: Describe here. Include links, data examples, metrics, visualizations where relevant.

Additional Notes: Add here

Collection Cadence

Static: Data was collected once from single or multiple sources.

Streamed: Data is continuously acquired from single or multiple sources.

Dynamic: Data is updated regularly from single or multiple sources.

Others: Please specify

Data Integration

Source

Included Fields

Data fields that were collected and are included in the dataset.

Field Name Description
Field Name Describe here. Include links, data examples, metrics, visualizations where relevant.
Field Name Describe here. Include links, data examples, metrics, visualizations where relevant.

Additional Notes: Add here

Excluded Fields

Data fields that were collected but are excluded from the dataset.

Field Name Description
Field Name Describe here. Include links, data examples, metrics, visualizations where relevant.
Field Name Describe here. Include links, data examples, metrics, visualizations where relevant.

Additional Notes: Add here

Data Processing

Collection Method or Source

Description: Describe here. Include links where relevant.

Methods employed: Describe here. Include links where relevant.

Tools or libraries: Describe here. Include links where relevant.

Additional Notes: Add here

Collection Criteria

Data Selection

  • Collection Method of Source: Summarize data selection criteria here. Include links where available.
  • Collection Method of Source: Summarize data selection criteria here. Include links where available.
  • Collection Method of Source: Summarize data selection criteria here. Include links where available.

Additional Notes: Add here

Data Inclusion

  • Collection Method of Source: Summarize data inclusion criteria here. Include links where available.
  • Collection Method of Source: Summarize data inclusion criteria here. Include links where available.
  • Collection Method of Source: Summarize data inclusion criteria here. Include links where available.

Additional Notes: Add here

Data Exclusion

  • Collection Method of Source: Summarize data exclusion criteria here. Include links where available.
  • Collection Method of Source: Summarize data exclusion criteria here. Include links where available.
  • Collection Method of Source: Summarize data exclusion criteria here. Include links where available.

Additional Notes: Add here

Relationship to Source

Use & Utility(ies)

  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.

Additional Notes: Add here

Benefit and Value(s)

  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.

Additional Notes: Add here

Limitation(s) and Trade-Off(s)

  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.

Version and Maintenance

First Version

  • Release date: MM/YYYY
  • Link to dataset: [Dataset Name + Version]
  • Status: [Select one: Actively Maintained/Limited Maintenance/Deprecated]
  • Size of Dataset: 123 MB
  • Number of Instances: 123456

Note(s) and Caveat(s)

Summarize here. Include links where available.

Additional Notes: Add here

Cadence

  • Yearly
  • Quarterly
  • Monthly
  • Biweekly
  • Weekly
  • Daily
  • Hourly
  • Static
  • Others (please specify)

Last and Next Update(s)

  • Date of last update: DD/MM/YYYY
  • Total data points affected: 12345
  • Data points updated: 12345
  • Data points added: 12345
  • Data points removed: 12345
  • Date of next update: DD/MM/YYYY

Changes on Update(s)

  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.
  • Source Type: Summarize here. Include links where available.

Additional Notes: Add here

Human and Other Sensitive Attributes

Sensitive Human Attribute(s)

  • Gender
  • Socio-economic status
  • Geography
  • Language
  • Age
  • Culture
  • Experience or Seniority
  • Others (please specify)

Intentionality

Intentionally Collected Attributes

Human attributes were labeled or collected as a part of the dataset creation process.

Field Name Description
Field Name Human Attributed Collected
Field Name Human Attributed Collected

Additional Notes: Add here

Unintentionally Collected Attributes

Human attributes were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.

Field Name Description
Field Name Human Attributed Collected
Field Name Human Attributed Collected

Additional Notes: Add here

Rationale

Summarize here. Include links, table, and media as relevant.

Source(s)

  • Human Attribute: Sources
  • Human Attribute: Sources
  • Human Attribute: Sources

Additional Notes: Add here

Methodology Detail(s)

Human Attribute Method: Describe the collection method here. Include links where necessary

Collection task: Describe the task here. Include links where necessary

Platforms, tools, or libraries:

  • [Platform, tools, or libraries]: Write description here
  • [Platform, tools, or libraries]: Write description here
  • [Platform, tools, or libraries]: Write description here

Additional Notes: Add here

Distribution(s)

Human Attribute Label or Class Label or Class Label or Class Label or Class
Count 123456 123456 123456 123456
[Statistic] 123456 123456 123456 123456
[Statistic] 123456 123456 123456 123456
[Statistic] 123456 123456 123456 123456

Above: Provide a caption for the above table or visualization. Additional Notes: Add here

Known Correlations

[field_name, field_name]

Description: Summarize here. Include visualizations, metrics, or links where necessary.

Impact on dataset use: Summarize here. Include visualizations, metrics, or links where necessary.

Additional Notes: add here

Risk(s) and Mitigation(s)

Human Attribute

Summarize here. Include links and metrics where applicable.

Risk type: [Description + Mitigations]

Risk type: [Description + Mitigations]

Risk type: [Description + Mitigations]

Trade-offs, caveats, & other considerations: Summarize here. Include visualizations, metrics, or links where necessary.

Additional Notes: Add here

Extended Use

Use with Other Data, Forking & Sampling, Use in ML or AI Systems

Use with Other Data

Safety Level

  • Safe to use with other data
  • Conditionally safe to use with other data
  • Should not be used with other data
  • Unknown
  • Others (please specify)

Known Safe Dataset(s) or Data Type(s)

Dataset or Data Type: Summarize here. Include visualizations, metrics, or links where necessary.

Dataset or Data Type: Summarize here. Include visualizations, metrics, or links where necessary.

Dataset or Data Type: Summarize here. Include visualizations, metrics, or links where necessary.

Best Practices

Summarize here. Include visualizations, metrics, demonstrative examples, or links where necessary.

Additional Notes: Add here

Known Unsafe Dataset(s) or Data Type(s)

Dataset or Data Type: Summarize here. Include visualizations, metrics, or links where necessary.

Dataset or Data Type: Summarize here. Include visualizations, metrics, or links where necessary.

Dataset or Data Type: Summarize here. Include visualizations, metrics, or links where necessary.

Limitation(s) and Recommendation(s)

Summarize here. Include links and metrics where applicable.

Limitation type: Dataset or data type, description and recommendation.

Limitation type: Dataset or data type, description and recommendation.

Limitation type: Dataset or data type, description and recommendation.

Additional Notes: Add here

Forking & Sampling

Safety Level

  • Safe to form and/or sample
  • Conditionally safe to fork and/or sample
  • Should not be forked and/or sampled
  • Unknown
  • Others (please specify)

Acceptable Sampling Method(s)

  • Cluster Sampling
  • Haphazard Sampling
  • Multi-stage sampling
  • Random Sampling
  • Retrospective Sampling
  • Stratified Sampling
  • Systematic Sampling
  • Weighted Sampling
  • Unknown
  • Unsampled
  • Others (please specify)

Best Practice(s)

Summarize here. Include links, figures, and demonstrative examples where available.

Additional Notes: Add here

Risk(s) and Mitigation(s)

Summarize here. Include links and metrics where applicable.

Risk Type: [Description + Mitigations]

Risk Type: [Description + Mitigations]

Risk Type: [Description + Mitigations]

Additional Notes: Add here

Limitation(s) and Recommendation(s)

Summarize here. Include links and metrics where applicable.

Limitation Type: [Description + Recommendation]

Limitation Type: [Description + Recommendation]

Limitation Type: [Description + Recommendation]

Additional Notes: Add here

Use in ML or AI Systems

Dataset Use(s)

  • Training
  • Testing
  • Validation
  • Development or Production Use
  • Fine Tuning
  • Others (please specify)

Notable Feature(s)

Exploration Demo: [Link to server or demo.]

Notable Field Name: Describe here. Include links, data examples, metrics, visualizations where relevant.

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Usage Guideline(s)

Usage Guidelines: Summarize here. Include links where necessary.

Approval Steps: Summarize here. Include links where necessary.

Reviewer: Provide the name of a reviewer for publications referencing this dataset.

Additional Notes: Add here

Distribution(s)

Set Number of data points
Train 62,563
Test 62,563
Validation 62,563
Dev 62,563

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Known Correlation(s)

field_name, field_name

Description: Summarize here. Include visualizations, metrics, or links where necessary.

Impact on dataset use: Summarize here. Include visualizations, metrics, or links where necessary.

Risks from correlation: Summarize here. Include recommended mitigative steps if available.

Additional Notes: Add here

Split Statistics

Statistic Train Test Valid Dev
Count 123456 123456 123456 123456
Descriptive Statistic 123456 123456 123456 123456
Descriptive Statistic 123456 123456 123456 123456
Descriptive Statistic 123456 123456 123456 123456

Above: Caption for table above.

Transformations

Synopsis, Breakdown of Transformations

Synopsis

Transformation(s) Applied

  • Anomaly Detection
  • Cleaning Mismatched Values
  • Cleaning Missing Values
  • Converting Data Types
  • Data Aggregation
  • Dimensionality Reduction
  • Joining Input Sources
  • Redaction or Anonymization
  • Others (Please specify)

Field(s) Transformed

Transformation Type

Field Name Source & Target
Field Name Source Field: Target Field
Field Name Source Field: Target Field
... ...

Additional Notes: Add here

Library(ies) and Method(s) Used

Transformation Type

Method: Describe the transformation method here. Include links where necessary.

Platforms, tools, or libraries:

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Transformation Results: Provide results, outcomes, and actions taken because of the transformations. Include visualizations where available.

Additional Notes: Add here

Breakdown of Transformations

Cleaning Missing Value(s)

Summarize here. Include links where available.

Field Name: Count or description

Field Name: Count or description

Field Name: Count or description

Method(s) Used

Summarize here. Include links where necessary.

Platforms, tools, or libraries

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Comparative Summary

Summarize here. Include links, tables, visualizations where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risk(s)

Summarize here. Include links and metrics where applicable.

  • Risk Type: Description + Mitigations
  • Risk Type: Description + Mitigations
  • Risk Type: Description + Mitigations

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Cleaning Mismatched Value(s)

Summarize here. Include links where available.

Field Name: Count or Description

Field Name: Count or Description

Field Name: Count or Description

Method(s) Used

Summarize here. Include links where available.

Comparative Summary

Summarize here. Include links where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risk(s)

Summarize here. Include links and metrics where applicable.

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Anomalies

Summarize here. Include links where available.

Field Name: Count or Description

Field Name: Count or Description

Field Name: Count or Description

Method(s) Used

Summarize here. Include links where necessary.

Platforms, tools, or libraries

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Comparative Summary

Summarize here. Include links, tables, visualizations where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risk(s)

Summarize here. Include links and metrics where applicable.

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Dimensionality Reduction

Summarize here. Include links where available.

Field Name: Count or Description

Field Name: Count or Description

Field Name: Count or Description

Method(s) Used

Summarize here. Include links where necessary.

Platforms, tools, or libraries

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Comparative Summary

Summarize here. Include links, tables, visualizations where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risks

Summarize here. Include links and metrics where applicable.

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Joining Input Sources

Summarize here. Include links where available.

Field Name: Count or Description

Field Name: Count or Description

Field Name: Count or Description

Method(s) Used

Summarize here. Include links where necessary.

Platforms, tools, or libraries

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Comparative Summary

Summarize here. Include links, tables, visualizations where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risk(s)

Summarize here. Include links and metrics where applicable.

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Redaction or Anonymization

Summarize here. Include links where available.

Field Name: Count or Description

Field Name: Count or Description

Field Name: Count or Description

Method(s) Used

Summarize here. Include links where necessary.

Platforms, tools, or libraries

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Comparative Summary

Summarize here. Include links, tables, visualizations where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risk(s)

Summarize here. Include links and metrics where applicable.

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Risk Type: Description + Mitigations

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Others (Please Specify)

Summarize here. Include links where available.

Field Name: Count or Description

Field Name: Count or Description

Field Name: Count or Description

Method(s) Used

Summarize here. Include links where necessary.

Platforms, tools, or libraries

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Comparative Summary

Summarize here. Include links, tables, visualizations where available.

Field Name Diff
Field Name Before: After
Field Name Before: After
... ...

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Residual & Other Risk(s)

Summarize here. Include links and metrics where applicable.

Risk type: [Description + Mitigations]

Risk type: [Description + Mitigations]

Risk type: [Description + Mitigations]

Human Oversight Measure(s)

Summarize here. Include links where available.

Additional Considerations

Summarize here. Include links where available.

Annotations & Labeling

Human Annotators

Annotation Workforce Type

  • Annotation Target in Data
  • Machine-Generated
  • Annotations
  • Human Annotations (Expert)
  • Human Annotations (Non-Expert)
  • Human Annotations (Employees)
  • Human Annotations (Contractors)
  • Human Annotations (Crowdsourcing)
  • Human Annotations (Outsourced / Managed)
  • Teams
  • Unlabeled
  • Others (Please specify)

Annotation Characteristic(s)

Annotation Type Number
Number of unique annotations 123456789
Total number of annotations 123456789
Average annotations per example 123456789
Number of annotators per example 123456789
[Quality metric per granuality] 123456789
[Quality metric per granuality] 123456789
[Quality metric per granuality] 123456789

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Annotation Description(s)

(Annotation Type)

Description: Description of annotations (labels, ratings) produced. Include how this was created or authored.

Link: Relevant URL link.

Platforms, tools, or libraries:

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Additional Notes: Add here

Annotation Distribution(s)

Annotation Type Number
Annotations (or Class) 12345 (20%)
Annotations (or Class) 12345 (20%)
Annotations (or Class) 12345 (20%)
Annotations (or Class) 12345 (20%)
Annotations (or Class) 12345 (20%)

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Annotation Task(s)

(Task Type)

Task description: Summarize here. Include links if available.

Task instructions: Summarize here. Include links if available.

Methods used: Summarize here. Include links if available.

Inter-rater adjudication policy: Summarize here. Include links if available.

Golden questions: Summarize here. Include links if available.

Additional notes: Add here

Human Annotators

Annotator Description(s)

(Annotation Type)

Task type: Summarize here. Include links if available.

Number of unique annotators: Summarize here. Include links if available.

Expertise of annotators: Summarize here. Include links if available.

Description of annotators: Summarize here. Include links if available.

Language distribution of annotators: Summarize here. Include links if available.

Geographic distribution of annotators: Summarize here. Include links if available.

Summary of annotation instructions: Summarize here. Include links if available.

Summary of gold questions: Summarize here. Include links if available.

Annotation platforms: Summarize here. Include links if available.

Additional Notes: Add here

Annotator Task(s)

(Task Type)

Task description: Summarize here. Include links if available.

Task instructions: Summarize here. Include links if available.

Methods used: Summarize here. Include links if available.

Inter-rater adjudication policy: Summarize here. Include links if available.

Golden questions: Summarize here. Include links if available.

Additional notes: Add here

Language(s)

(Annotation Type)

  • Language [Percentage %]
  • Language [Percentage %]
  • Language [Percentage %]

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Location(s)

(Annotation Type)

  • Location [Percentage %]
  • Location [Percentage %]
  • Location [Percentage %]

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Gender(s)

(Annotation Type)

  • Gender [Percentage %]
  • Gender [Percentage %]
  • Gender [Percentage %]

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Validation Types

Description of Human Validators

Method(s)

  • Data Type Validation
  • Range and Constraint Validation
  • Code/cross-reference Validation
  • Structured Validation
  • Consistency Validation
  • Not Validated
  • Others (Please Specify)

Breakdown(s)

(Validation Type)

Number of Data Points Validated: 12345

Fields Validated

Field Count (if available)
Field 123456
Field 123456
Field 123456

Above: Provide a caption for the above table or visualization.

Description(s)

(Validation Type)

Method: Describe the validation method here. Include links where necessary.

Platforms, tools, or libraries:

  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here
  • Platform, tool, or library: Write description here

Validation Results: Provide results, outcomes, and actions taken because of the validation. Include visualizations where available.

Additional Notes: Add here

Description of Human Validators

Characteristic(s)

(Validation Type)

  • Unique validators: 12345
  • Number of examples per validator: 123456
  • Average cost/task/validator: $$$
  • Training provided: Y/N
  • Expertise required: Y/N

Description(s)

(Validation Type)

Validator description: Summarize here. Include links if available.

Training provided: Summarize here. Include links if available.

Validator selection criteria: Summarize here. Include links if available.

Training provided: Summarize here. Include links if available.

Additional Notes: Add here

Language(s)

(Validation Type)

  • Language [Percentage %]
  • Language [Percentage %]
  • Language [Percentage %]

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Location(s)

(Validation Type)

  • Location [Percentage %]
  • Location [Percentage %]
  • Location [Percentage %]

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Gender(s)

(Validation Type)

  • Gender [Percentage %]
  • Gender [Percentage %]
  • Gender [Percentage %]

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Sampling Methods

Method(s) Used

  • Cluster Sampling
  • Haphazard Sampling
  • Multi-stage Sampling
  • Random Sampling
  • Retrospective Sampling
  • Stratified Sampling
  • Systematic Sampling
  • Weighted Sampling
  • Unknown
  • Unsampled
  • Others (Please specify)

Characteristic(s)

(Sampling Type) Number
Upstream Source Write here
Total data sampled 123m
Sample size 123
Threshold applied 123k units at property
Sampling rate 123
Sample mean 123
Sample std. dev 123
Sampling distribution 123
Sampling variation 123
Sample statistic 123

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Sampling Criteria

  • Sampling method: Summarize here. Include links where applicable.
  • Sampling method: Summarize here. Include links where applicable.
  • Sampling method: Summarize here. Include links where applicable.

Known Applications & Benchmarks

ML Application(s)

For example: Classification, Regression, Object Detection

Evaluation Result(s)

(Model Name)

Model Card: [Link to full model card]

Evaluation Results

  • Accuracy: 123 (params)
  • Precision: 123 (params)
  • Recall: 123 (params)
  • Performance metric: 123 (params)

Above: Provide a caption for the above table or visualization.

Additional Notes: Add here

Evaluation Process(es)

(Model Name)

[Method used]: Summarize here. Include links where available.

  • Process: Summarize here. Include links, diagrams, visualizations, tables as relevant.
  • Factors: Summarize here. Include links, diagrams, visualizations, tables as relevant.
  • Considerations: Summarize here. Include links, diagrams, visualizations, tables as relevant.
  • Results: Summarize here. Include links, diagrams, visualizations, tables as relevant.

Additional Notes: Add here

Description(s) and Statistic(s)

(Model Name)

Model Card: Link to full model card

Model Description: Summarize here. Include links where applicable.

  • Model Size: 123 (params)
  • Model Weights: 123 (params)
  • Model Layers 123 (params)
  • Latency: 123 (params)

Additional Notes: Add here

Expected Performance and Known Caveats

(Model Name)

Expected Performance: Summarize here. Include links where available.

Known Caveats: Summarize here. Include links, diagrams, visualizations, and tables as relevant.

Additioanl Notes: Add here

Terms of Art

Concepts and Definitions referenced in this Data Card

Concepts and Definitions referenced in this Data Card

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Term of Art

Definition: Write here

Source: Write here and share link

Interpretation: Write here

Reflections on Data

Title, Title, Title

Title

Write notes here.

Title

Write notes here.

Title

Write notes here.

The Playbook

Proactive data transparency

Our Playbook contains four modules designed with participatory activities to define long-term transparency for your datasets and in your contexts. Our transparency patterns capture practical ways to create Data Cards that are people-centric, purposeful and actionable.

Ask

Create a Data Card template

Co-create transparency with stakeholders from across the dataset lifecycle. Deliver metadata schema that is purposeful, relevant, and actionable towards responsible AI development.

In this module, define what transparency means to your organization and the stakeholders who will maintain, track, or use your datasets. Work with them to co-create a schema that captures the human decisions and invisible explanations that shape datasets.

Inspect

Evaluate a Data Card template

Whether you’re starting from a Data Card or your own documentation schema, identify gaps and opportunities in dataset transparency.

Refine and validate your metadata schema. Test it with real world datasets to incorporate feedback. Decide which questions can be automated and how, and how to keep responses people-centric.

Answer

Fill out a Data Card template

Navigate common challenges such as future-proofing your dataset. Follow responsible practices to ensure transparency in documentation.

Explore guidance for filling our Data Cards, created from deploying over 40 Data and Model Cards at Google. Produce Data Cards that readers of different backgrounds can easily understand and rely on for their decisions.

Audit

Assess a completed Data Card

Evaluate your completed documentation before publishing it. Measure and track a transparency effort for multiple datasets in your organization.

Audit your Data Card as a whole using structured assessments and user studies. Explore different ways to capture the impact of your Data Card based on organizational goals for transparency.

Quick Start

User Guide

Workflow best practices for creating Data Cards for your datasets.

Explore the user guide

Templates

Data and Model Card templates for your dataset and any associated models.

Showcase

Add your completed Data Cards to our growing repository.

Contribute at GitHub

Publication

Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI

Read the paper