Data Visualization: Planning

There are three main steps to completing a data visualization. Steps 1 and 2 are interchangeable and may sometimes occur simultaneously.

  1. Formulating the question you want your visualization to answer or the story you want your visualization to tell
  2. Gathering, understanding, and sorting the data
  3. Applying the visual representation

A good overview of the process can be found on the LinkedIn Learning course Data Visualization Fundamentals.

You’ll likely already have the data you need for your visualization if you’re planning on using data related to your research. However, if you’re having trouble finding the data you need or you just want to find some data to play around with, check out this Main Campus library guide for a great list of resources.

The following is an excerpt/summary of the various stages of the data familiarization and preparation process as outlined in Andy Kirk’s book Data Visualization: a successful design process.

Examining the Data

  • Do you have all the data you need? Does it include all the variables that you are interested in?
  • Are there any obvious errors in your data? Is there any data that is missing?

Understanding the Data Types

  • What type of data have you acquired?

chart of data types

  • What is the range of values for each type of data?

chart of data ranges

Transforming for Quality

  • Do you need to clean up your data? Do you need to fix any errors or fill in any gaps in your data?

Transforming for Analysis

  • Parsing (splitting up) and variables, such as extracting year from a date value
  • Merging variables to form new ones, such as creating a whole name out of title, forename, and surname
  • Converting qualitative data/free-text into coded values or keywords
  • Deriving new values out of others, such as gender from title or a sentiment out of some qualitative data
  • Creating calculations for use in analysis, such as percentage proportions
  • Removing redundant data for which you have no planned use (be careful though!)

Depending on how you collected your data or where you sourced your data from, large amounts of cleaning might not be needed. Basic steps like renaming column headers or splitting columns can be done in Excel or Google Sheets. However, if the dataset you’re working with requires some significant cleaning, the following tools can help automate that process:

  • OpenRefine
    A free and powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data
  • DataWrangler
    A tool developed by the Stanford Visualization Group for wrangling and cleaning up your data into a format that can be interpreted by common data visualization tools.