User Guide


Adapt this workflow to your context and your team's practices.

8 steps to create Data Cards, complete with cautions.

Explore a step-by-step plan curated from observing teams across Google that have created Data & Model Cards.

1. Before you begin

Identify stakeholders and collaborators across your dataset’s lifecycle who can contribute to the Data Card. This ensures that your Data Card is accurate, complete, and faithful to the most recent knowledge in the domain.

Use the Typology of Stakeholders to identify key voices across your dataset's lifecycle.

Three silhouettes representing a diverse set of stakeholders

2. Build flexibility into your timeline

While it can take a few hours to fill out a Data or Model Cards template, factors such as needing to conduct additional analysis, review processes, stakeholder communication bottlenecks can substantially increase the time in between sessions. Plan accordingly.

Transparency is often investigative. Build in slack for at least 2 to 3 additional analyses.

Three document icons floating over a series of circles

3. Collect your documents

Review the Data Card template and gather existing documentation to get a head start.

Documentation lives in many forms—papers, code, pipelines, analyses, emails, slides, and sometimes with other people.

4. Build up from the Data Card design

Our Data Card templates are structured to support non-linear and collaborative completion, based on how we’ve seen teams and individuals from entirely different background complete them.

The Playbook contains transparency patterns for creating truly actionable, purposeful, and people-centric Data Cards.

Click on the numbers in order to see how teams completed our template.

1. Determine Relevance

Start with assessing the relevance of questions to your dataset.

2. Answer Multiple Choices

Answer the highest level telescopic questions. Frame your periscopic and microscopic questions based on these choices.

3. Link to documents

Add links to relevant documents in all questions. This way you’ll know where to pull your answers from, and collaborators can link to better sources if necessary.

4. Tackle the obvious

Get the easiest answers in. Fill out the periscopic questions with information from the linked and existing documentation.

5. Fill in the details

Answer microscopic questions with the necessary context. This may require wordsmithing or additional analyses and collaboration.

6. Make links useful

Summarize linked documents so readers know what they are clicking into, and you communicate a clear idea even if access to documents is restricted.

Data Card with sections highlighted for review

5. Collaboratively review and expand your content

Get your Data Card draft reviewed by your peers and stakeholders. Conduct necessary analyses to verify your content and fill in the blanks. Work with experts, as necessary.

Get a deeper understanding of your dataset using tools like TensorFlow Data Validation.

6. Final Checks

  • Are all acronyms spelled out the first time they are used?
  • Are there any relative words that can be avoided?
  • Are all links summarized?
  • Is necessary context available?
  • Are there any vague statements?
  • Are explanations sufficiently simple that they don’t compromise nuance?
  • Are uncertainty and unknowns clearly explained?
  • Are terms of art explained?

Use this moment as a group to assess your Data Card on our transparency dimensions and heuristics to ensure that your Data Card is ⭐️⭐️⭐️⭐️⭐️ for any reader!

Data Card and template with revisions

7. Wrap Up

Developmental Review

Run a detailed heuristics review or a dimensions assessment to ensure that your Data Card is purposeful, relevant, and useful for your readers.

Share Your Data Card

If public, share your Data Card with us! Add it to our growing repository of Data Cards on GitHub.

See our GitHub

Feedback

Found this useful? Hit a snag? We want to hear about your experience.

Give us Feedback

🎉 8. You’re all done!

Set up a reminder to revisit your Data Card every 6 months or so and update content.

  • New labels applied to data in your dataset
  • New subsets created from your dataset
  • New uses, applications, and corresponding performance
  • Any new risks or caveats that were discovered in practice

Epilogue

Transparency is not a one-time problem.

It's a systems problem that's knotty and wicked, and datasets are one part of it. The Playbook's approach to transparency is continuous and contextual. We see it as a marathon – less about the generation of new ideas, more about the deliberate consideration of what is right in front of us. Thank you for being on this journey with us!