Effective Plotting

Brett Andrews
University of Pittsburgh

DESI Collaboration Meeting
12.01.2021

Don't make me think!

Take advantage of human perception.

Color better than shape.

Kieran Healy, "Data Vizualization: a Practical Introduction."

Distinguishability falls off a cliff unless data is highly structured.

Kieran Healy, "Data Vizualization: a Practical Introduction."

Don't need to show all data in one panel.

Kieran Healy, "Data Vizualization: a Practical Introduction."

Multiple panels add structure.

Kieran Healy, "Data Vizualization: a Practical Introduction."

Shapes and Colors: some choices are better

Unordered/Qualitative Data

Which shapes play nicely together?

Demiralp et al. (2014), "Learning Perceptual Kernels for Vizualization Design."

Which colors play nicely together?

Demiralp et al. (2014), "Learning Perceptual Kernels for Vizualization Design."

Qualitative Colormaps

Matplotlib docs

Ordered Data

Sequential vs. Diverging

Marvin

Sequential Colormaps

Matplotlib docs

Diverging Colormaps

Matplotlib docs

Colorcet: 100+ more colormaps
(with many perceptually uniform).

Visualization of big data can be misleading...

...and you might not even realize it.

Overplotting

Datashader: Plotting Pitfalls

Oversaturation: saturating pixel intensity

Datashader: Plotting Pitfalls

Oversaturation: interpretation depends on data set size and order

Datashader: Plotting Pitfalls

Oversaturation: pixel intensity depends on symbol size

Datashader: Plotting Pitfalls

Undersampling: can be difficult to understand full distribution with a subsample

Datashader: Plotting Pitfalls

Heatmap: solves overplotting, oversaturation, undersampling
...but need intelligent bin size choice for message

Datashader: Plotting Pitfalls

Undersaturation: missing diffuse distributions

Datashader: Plotting Pitfalls

Undersaturation: missing diffuse distributions

Datashader: Plotting Pitfalls

Undersaturation: offset to make low values visible

Datashader: Plotting Pitfalls

Fixing underutilized range: logarithmic transform

Datashader: Plotting Pitfalls

Fixing underutilized range: histogram equalization

Datashader: Plotting Pitfalls

Non-uniform colormap

Datashader: Plotting Pitfalls

Datashader: automatically avoid these pitfalls
and quickly plot up to 1 billion points on your laptop

Datashader: Plotting Pitfalls

Matplotlib-based option for fast scatter density plots.

mpl-scatter-density

Don't make me think!

Take advantage of human perception.