AI Basics for Data Visualization
In this optional tutorial, we will explore how to use AI assistants to help create data visualizations. Specifically, we can collaborate with AI or "large language models" (LLMs) to build visualizations more quickly. However, we will do so while still applying the design knowledge and multi-disciplinary lenses we've developed together throughout the course.Contents
Motivation
So far, we have been building data visualizations through a combination of pre-built charting libraries (like through matplotlib) and custom drawing with technologies like Sketchingpy. However, sometimes it can be a lot of code. Iteration times between one idea to the next can be extensive. What if we could get through those cycles faster? Maybe we could explore more design possibilities!
That in mind, we next introduce AI assistants which can help you implement faster. That said, these tools are best when used to accelerate that iteration process we've already seen in the class. In other words, as we will see in a moment, there is still value that comes from building a prototype and experiencing it (as a human) to further improve design choices.
Let's try practicing going through this cycle by revisiting the moose and wolves example from the very start of class. However, this time, let's see what it feels like to be accelerated by AI while still applying our critical design perspective.
Role of AI
AI assistants like Claude, ChatGPT, or other language models can help generate visualization code quickly. However, they don't necessarily know about design and can't experience visualizations the same way you (and your human audiences) will. A lot of the heavy lifting still comes from guiding solutions to a good conclusion, taking advantage of the techniques we discussed in class. Put another way, they work best when you bring your visualization expertise to direct the conversation which involves:
- Knowing what kinds of visualization will best serve your data and purpose or what you might want to explore.
- Recognizing when a visualization generated has issues with readability, accessibility, or other design considerations like poor gestalt.
- Providing specific feedback to guide the output in the process of iterating towards a better result.
All that in mind, think of an AI assistant as an avenue to accelerate implementation quickly but ask if that moving faster is always better. How could it be an issue? Well, this depends on the task at hand but consider that, when writing code yourself, you are confronted in the implementation process with lots of questions and decisions because the code forces you to be explicit about what you want. Instead, when AI helps you move quickly, some of those choices might be made for you possibly without you realizing. In other words, you need to be even more conscientious of all of the concepts from the class in order to ensure your ideas are layered in. Be careful that AI doesn't leave you with less control.
Prepare
We'll be using matplotlib to create a visualization of predator-prey dynamics between wolves and moose populations at Isle Royale National Park, returning to an example from Lesson 1. This gives us a nice dataset which has some dimensionality but still tame enough that we can take advantage of some pre-built charts. The goal here is to create a split bar plot showing both wolf and moose populations over time, with wolves extending to the left and moose extending to the right.
Before we can get into it, we do need access to an AI assistant that can generate Python for data visualization. I used Claude which typically offers strong performance in testing but other options include ChatGPT, Gemini, and Mistral AI. If in need of a free option, Mistral has also performed quite well.
Next, as in Tutorial 1, you'll also need a Python environment where you can run the generated code. However, this could be Jupyter Lite which does not require installation. That said, if you used something other than Jupyter Lite, make sure you have matplotlib installed in your environment. For example, this could be done with (optionally in a virtual environment):
pip install matplotlib
Once you're set up, open your AI assistant in one window and your Python environment in another so you can easily execute the code as you go. This set up can help fuel that essential iteration process.
Note: The exact output you get from an AI assistant may differ slightly from what's shown here. That's completely normal and expected! The key learning objective here is to experience the process of iteration: making a request, reviewing the output, identifying what to improve, and refining your prompts accordingly.
Initial Request
Let's start by making our first request to the assistant, just to get a basic initial plot. Try copying this prompt into your chat window with your chosen AI:
Hello! Can you please use matplotlib to generate a graph of wolves vs moose populations as a split bar plot (one going to the left and the other going to the right) where the axes have different colors to correspond to the bar colors for the two series? Please have the left facing axis for wolves go from 0 to 50 and the right facing axis for moose go from 0 to 2500. See https://www.nps.gov/isro/learn/nature/wolf-moose-populations.htm for data.
Check the work: Run the code provided by the AI assistant. You should see a visualization with:
- Blue bars extending to the left representing wolf populations
- Orange (or similar color) bars extending to the right representing moose populations
- A timeline showing years (likely 1980-2019 or similar range)
- Color-coordinated axes with scales from 0-50 for wolves and 0-2500 for moose
Take a moment to examine the output. What do you notice? Can you see the predator-prey relationship in the data? The classic predator-prey oscillation pattern might be visible. However, it's likely that the wolf data appear compressed because AI made a symmetric scale: the range of the actual data on one side is much smaller (up to 50) than the other (up to 2500). This brings us to our first iteration.
First Iteration
Let's address this issue where the wolf population changes may be hard to see because they're compressed into a narrow space compared to the moose. Specifically, let's improve this by giving wolves more horizontal space per individual. This will make the harmonic relationship between the two populations easier to see. How about another prompt:
This is great but the wolves are too compressed. Let's have the split at roughly half way across the plot. This means that there is more horizontal space per wolf than per moose. However, this will show the harmonic relationship better.
Check the work: Run the updated code. Now you should see:
- Wolves taking up approximately half the plot width (left side)
- Moose taking up approximately half the plot width (right side)
- Much more visible oscillations in the wolf population
- The predator-prey relationship should be much clearer
By allocating equal horizontal space to both scales, we've effectively "zoomed in" on the wolf data. Now you can see how when wolves are high, moose populations tend to be lower. Conversely, when wolves go down (especially around 2015-2018 in the data), moose populations go up.
This is an great example of how your human understanding of a visualization helps guide the AI. This is especially true as the story (or capabilities for users to tell their own stories) becomes more clear through experience.
Refining Labels
The visualization is getting clearer but you may also notice that the axis labels might need some work. For instance, there might be overlapping text where the colored "Wolves" and "Moose" labels compete with a black axis title. If that is the case, let's clean this up and make the supporting elements more clear. Try this prompt:
Perfect! We have the colored wolves and moose labels overlapping with the black axis title. Let's set the black axis title to empty and then let's make the colors for both series darker so that it is easy to read wolves and moose. Finally, let's say "Number of Wolves" and "Number of Moose" in that colored text.
Check the work: Run the code and verify:
- The black axis title is now gone, eliminating the overlap
- The blue and orange colors are darker and hopefully easier to read
- The labels now should say "Number of Wolves" and "Number of Moose" instead of just "Wolves" and "Moose"
Notice how this iteration focused on reducing visual clutter and improving readability. This connects to Tufte's concepts of reducing chartjunk and maximizing the data-ink ratio!
Final Polish
There's one more improvement we can make. Right now, it's possible that the numerical labels along the horizontal axis (the tick marks showing the actual population numbers) are all in black. Let's make them match their respective data series for maximum clarity. Try this final prompt:
Doing really well! One last thing. The labels for the count (along the bottom horizontal axis). Can you please make those the same color as the series they describe (so one color on left and a different color on right)?
Check the work: Run the final version. You should now see:
- Blue tick labels on the left side (for wolf counts)
- Orange tick labels on the right side (for moose counts)
- The tick marks themselves are also color-coded
- The entire visualization now uses color consistently to distinguish the two data series
Alright... after all of that valuable back and forth, you hopefully now have a polished visualization that clearly shows the predator-prey dynamics between wolves and moose. Just as if you built it up from scratch by hand, we can see how iteration with AI can eventually achieve a similar result where every visual element is intentionally designed to support the viewer's understanding of the data.
Reflection
Let's step back and think about what we just accomplished and what it might tell us about working with AI assistants for data visualization. Specifically, notice that we (likely) didn't get a perfect visualization in the first try. Instead, we went through several rounds as we determined the elements about which we needed to be more precise:
- Initial request with basic requirements
- Adjusting the spatial allocation to better show the data
- Cleaning up labels and improving readability
- Fine-tuning color coding for consistency
Think back to your regular iterative process when you write code "by hand" and how this might be similar to how it is often best to work with AI assistants. Importantly, at each step, we have an opportunity to incorporate knowledge from this course and from our experience of a prototype. Just as we created early sketches in code and refine for higher fidelity in prior tutorials, we can continue to use the lenses of the class in a cycle of critically evaluating ever improving outputs through ever more specific prompts.
Next
This is a first stab at a relatively simple script. However, what happens when things get longer or more involved? What if we have a chart type for which there isn't already a pre-built implementation? Let's continue onwards to Tutorial 14 to learn about advanced AI techniques including llms.txt and subagents / agentic workflows.Citations
- National Park Service, "Wolf and Moose Populations," Isle Royale National Park, 2025. [Online]. Available: https://www.nps.gov/isro/learn/nature/wolf-moose-populations.htm
- J. Hunter, et al., "Matplotlib: Visualization with Python," Matplotlib Development Team, 2025. [Online]. Available: https://matplotlib.org
- E. Tufte, "The Visual Display of Quantitative Information," Graphics Press, 2001.
- Anthropic, "Claude," Anthropic PBC, 2025. [Online]. Available: https://claude.ai
- OpenAI, "ChatGPT," OpenAI, 2025. [Online]. Available: https://chat.openai.com
- Google, "Gemini," Google, 2025. [Online]. Available: https://gemini.google.com/
- Mistral AI, "Mistral AI," Mistral AI, 2025. [Online]. Available: https://mistral.ai/
- Project Jupyter, "Jupyter," Project Jupyter, 2025. [Online]. Available: https://jupyter.org
- K. Reitz and T. Schlusser, "The Hitchhiker's Guide to Python," 2025. [Online]. Available: https://docs.python-guide.org