03-matplotlib.md 7.86 KB
 Greg Wilson committed Mar 03, 2014 1 ``````--- `````` Greg Wilson committed Jun 22, 2016 2 ``````title: Analyzing Patient Data `````` Maxim Belkin committed Dec 17, 2019 3 4 ``````teaching: 30 exercises: 20 `````` Greg Wilson committed Jun 22, 2016 5 ``````questions: `````` Maxim Belkin committed Dec 17, 2019 6 7 ``````- "How can I visualize tabular data files in Python?" - "How can I group several plots together?" `````` Greg Wilson committed Jun 22, 2016 8 ``````objectives: `````` Eilis Hannon committed Oct 20, 2016 9 ``````- "Plot simple graphs from data." `````` Maxim Belkin committed Dec 17, 2019 10 ``````- "Group several graphs in a single figure." `````` Greg Wilson committed Jun 22, 2016 11 ``````keypoints: `````` Greg Wilson committed Jun 25, 2016 12 ``````- "Use the `pyplot` library from `matplotlib` for creating simple visualizations." `````` Greg Wilson committed Mar 03, 2014 13 14 ``````--- `````` Maxim Belkin committed May 24, 2018 15 16 17 18 ``````## Visualizing data The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers," and the best way to develop insight is often to visualize data. Visualization deserves an entire lecture of its own, but we can explore a few features of Python's `matplotlib` library here. While `````` Tyler Reddy committed Aug 02, 2018 19 ``````there is no official plotting library, `matplotlib` is the _de facto_ standard. First, we will `````` Maxim Belkin committed May 24, 2018 20 21 ``````import the `pyplot` module from `matplotlib` and use two of its functions to create and display a heat map of our data: `````` Greg Wilson committed Mar 03, 2014 22 `````` `````` Greg Wilson committed Jun 22, 2016 23 ``````~~~ `````` Azalee Bostroem committed May 09, 2015 24 ``````import matplotlib.pyplot `````` Konrad Förstner committed Oct 18, 2016 25 ``````image = matplotlib.pyplot.imshow(data) `````` Elliott Sales de Andrade committed Jan 22, 2016 26 ``````matplotlib.pyplot.show() `````` Greg Wilson committed Dec 03, 2014 27 ``````~~~ `````` Anne Fouilloux committed Feb 14, 2018 28 ``````{: .language-python} `````` Greg Wilson committed Mar 03, 2014 29 `````` `````` Maxim Belkin committed Sep 19, 2019 30 ``````![Heatmap of the Data](../fig/inflammation-01-imshow.svg) `````` Greg Wilson committed Mar 03, 2014 31 `````` `````` Maxim Belkin committed May 24, 2018 32 ``````Blue pixels in this heat map represent low values, while yellow pixels represent high values. As we `````` Iain committed Nov 08, 2019 33 ``````can see, inflammation rises and falls over a 40-day period. Let's take a look at the average inflammation over time: `````` Greg Wilson committed Mar 03, 2014 34 `````` `````` Greg Wilson committed Jun 22, 2016 35 ``````~~~ `````` Trevor Bekolay committed Jun 22, 2016 36 ``````ave_inflammation = numpy.mean(data, axis=0) `````` Damien Irving committed May 26, 2015 37 ``````ave_plot = matplotlib.pyplot.plot(ave_inflammation) `````` Elliott Sales de Andrade committed Jan 22, 2016 38 ``````matplotlib.pyplot.show() `````` Greg Wilson committed Dec 03, 2014 39 ``````~~~ `````` Anne Fouilloux committed Feb 14, 2018 40 ``````{: .language-python} `````` Greg Wilson committed Mar 03, 2014 41 `````` `````` Maxim Belkin committed Sep 19, 2019 42 ``````![Average Inflammation Over Time](../fig/inflammation-01-average.svg) `````` Greg Wilson committed Mar 03, 2014 43 `````` `````` Maxim Belkin committed May 24, 2018 44 45 46 47 ``````Here, we have put the average per day across all patients in the variable `ave_inflammation`, then asked `matplotlib.pyplot` to create and display a line graph of those values. The result is a roughly linear rise and fall, which is suspicious: we might instead expect a sharper rise and slower fall. Let's have a look at two other statistics: `````` Greg Wilson committed Mar 03, 2014 48 `````` `````` Greg Wilson committed Jun 22, 2016 49 ``````~~~ `````` Trevor Bekolay committed Jun 22, 2016 50 ``````max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0)) `````` Elliott Sales de Andrade committed Jan 22, 2016 51 ``````matplotlib.pyplot.show() `````` Greg Wilson committed Dec 03, 2014 52 ``````~~~ `````` Anne Fouilloux committed Feb 14, 2018 53 ``````{: .language-python} `````` Greg Wilson committed Mar 03, 2014 54 `````` `````` Maxim Belkin committed Sep 19, 2019 55 ``````![Maximum Value Along The First Axis](../fig/inflammation-01-maximum.svg) `````` Greg Wilson committed Dec 03, 2014 56 `````` `````` Greg Wilson committed Jun 22, 2016 57 ``````~~~ `````` Trevor Bekolay committed Jun 22, 2016 58 ``````min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0)) `````` Elliott Sales de Andrade committed Jan 22, 2016 59 ``````matplotlib.pyplot.show() `````` Greg Wilson committed Dec 03, 2014 60 ``````~~~ `````` Anne Fouilloux committed Feb 14, 2018 61 ``````{: .language-python} `````` Greg Wilson committed Apr 09, 2014 62 `````` `````` Maxim Belkin committed Sep 19, 2019 63 ``````![Minimum Value Along The First Axis](../fig/inflammation-01-minimum.svg) `````` Greg Wilson committed Mar 03, 2014 64 `````` `````` Maxim Belkin committed May 24, 2018 65 66 67 68 ``````The maximum value rises and falls smoothly, while the minimum seems to be a step function. Neither trend seems particularly likely, so either there's a mistake in our calculations or something is wrong with our data. This insight would have been difficult to reach by examining the numbers themselves without visualization tools. `````` Greg Wilson committed Mar 03, 2014 69 `````` `````` Maxim Belkin committed May 24, 2018 70 ``````### Grouping plots `````` Azalee Bostroem committed May 09, 2015 71 ``````You can group similar plots in a single figure using subplots. `````` Azalee Bostroem committed May 09, 2015 72 ``````This script below uses a number of new commands. The function `matplotlib.pyplot.figure()` `````` Azalee Bostroem committed May 09, 2015 73 ``````creates a space into which we will place all of our plots. The parameter `figsize` `````` Azalee Bostroem committed May 09, 2015 74 ``````tells Python how big to make this space. Each subplot is placed into the figure using `````` Nicholas Cifuentes-Goodbody committed May 02, 2018 75 76 77 ``````its `add_subplot` [method]({{ page.root }}/reference/#method). The `add_subplot` method takes 3 parameters. The first denotes how many total rows of subplots there are, the second parameter refers to the total number of subplot columns, and the final parameter denotes which subplot `````` Elliott Sales de Andrade committed Jan 27, 2016 78 79 80 ``````your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a different variable (`axes1`, `axes2`, `axes3`). Once a subplot is created, the axes can be titled using the `set_xlabel()` command (or `set_ylabel()`). `````` W. Trevor King committed Apr 09, 2015 81 ``````Here are our three plots side by side: `````` Greg Wilson committed Mar 03, 2014 82 `````` `````` Greg Wilson committed Jun 22, 2016 83 ``````~~~ `````` W. Trevor King committed Apr 09, 2015 84 85 ``````import numpy import matplotlib.pyplot `````` Greg Wilson committed Mar 03, 2014 86 `````` `````` W. Trevor King committed Apr 09, 2015 87 ``````data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') `````` Greg Wilson committed Mar 03, 2014 88 `````` `````` W. Trevor King committed Apr 09, 2015 89 ``````fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) `````` Greg Wilson committed Mar 03, 2014 90 `````` `````` Andrew Lonsdale committed Feb 13, 2015 91 92 93 ``````axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) `````` Greg Wilson committed Mar 03, 2014 94 `````` `````` Andrew Lonsdale committed Feb 13, 2015 95 ``````axes1.set_ylabel('average') `````` Trevor Bekolay committed Jun 22, 2016 96 ``````axes1.plot(numpy.mean(data, axis=0)) `````` Greg Wilson committed Mar 03, 2014 97 `````` `````` Andrew Lonsdale committed Feb 13, 2015 98 ``````axes2.set_ylabel('max') `````` Trevor Bekolay committed Jun 22, 2016 99 ``````axes2.plot(numpy.max(data, axis=0)) `````` Greg Wilson committed Mar 03, 2014 100 `````` `````` Andrew Lonsdale committed Feb 13, 2015 101 ``````axes3.set_ylabel('min') `````` Trevor Bekolay committed Jun 22, 2016 102 ``````axes3.plot(numpy.min(data, axis=0)) `````` Andrew Lonsdale committed Feb 13, 2015 103 104 `````` fig.tight_layout() `````` Andrew Lonsdale committed Feb 13, 2015 105 `````` `````` Elliott Sales de Andrade committed Jan 22, 2016 106 ``````matplotlib.pyplot.show() `````` Greg Wilson committed Dec 03, 2014 107 ``````~~~ `````` Anne Fouilloux committed Feb 14, 2018 108 ``````{: .language-python} `````` Greg Wilson committed Mar 03, 2014 109 `````` `````` Maxim Belkin committed Sep 19, 2019 110 ``````![The Previous Plots as Subplots](../fig/inflammation-01-group-plot.svg) `````` Greg Wilson committed Mar 03, 2014 111 `````` `````` Greg Wilson committed Sep 05, 2016 112 ``````The [call]({{ page.root }}/reference/#function-call) to `loadtxt` reads our data, `````` Raniere Silva committed Sep 02, 2014 113 114 ``````and the rest of the program tells the plotting library how large we want the figure to be, `````` Elliott Sales de Andrade committed Jan 27, 2016 115 ``````that we're creating three subplots, `````` Raniere Silva committed Sep 02, 2014 116 117 ``````what to draw for each one, and that we want a tight layout. `````` Brian Jackson committed Feb 22, 2018 118 ``````(If we leave out that call to `fig.tight_layout()`, `````` Raniere Silva committed Sep 02, 2014 119 ``````the graphs will actually be squeezed together more closely.) `````` Greg Wilson committed Mar 03, 2014 120 `````` `````` mboisson committed May 15, 2015 121 `````` `````` Greg Wilson committed Jun 22, 2016 122 ``````> ## Plot Scaling `````` Greg Wilson committed Dec 04, 2014 123 ``````> `````` valiseverywhere committed Jun 20, 2016 124 ``````> Why do all of our plots stop just short of the upper end of our graph? `````` Greg Wilson committed Jul 08, 2016 125 126 127 128 129 130 131 132 ``````> > > ## Solution > > Because matplotlib normally sets x and y axes limits to the min and max of our data > > (depending on data range) > {: .solution} > > If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes', > for example: `````` Ben Jolly committed Jun 17, 2016 133 ``````> `````` Greg Wilson committed Jun 22, 2016 134 ``````> ~~~ `````` Ben Jolly committed Jun 17, 2016 135 136 ``````> axes3.set_ylim(0,6) > ~~~ `````` Anne Fouilloux committed Feb 14, 2018 137 ``````> {: .language-python} `````` valiseverywhere committed Jun 20, 2016 138 ``````> `````` Greg Wilson committed Jul 08, 2016 139 140 141 142 143 144 145 146 147 148 ``````> Update your plotting code to automatically set a more appropriate scale. > (Hint: you can make use of the `max` and `min` methods to help.) > > > ## Solution > > ~~~ > > # One method > > axes3.set_ylabel('min') > > axes3.plot(numpy.min(data, axis=0)) > > axes3.set_ylim(0,6) > > ~~~ `````` Anne Fouilloux committed Feb 14, 2018 149 ``````> > {: .language-python} `````` Greg Wilson committed Jul 08, 2016 150 151 152 153 154 155 156 157 158 159 ``````> {: .solution} > > > ## Solution > > ~~~ > > # A more automated approach > > min_data = numpy.min(data, axis=0) > > axes3.set_ylabel('min') > > axes3.plot(min_data) > > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1) > > ~~~ `````` Anne Fouilloux committed Feb 14, 2018 160 ``````> > {: .language-python} `````` Greg Wilson committed Jul 08, 2016 161 ``````> {: .solution} `````` Greg Wilson committed Jun 22, 2016 162 ``````{: .challenge} `````` Greg Wilson committed Dec 03, 2014 163 `````` `````` Greg Wilson committed Jun 22, 2016 164 ``````> ## Drawing Straight Lines `````` Greg Wilson committed Dec 04, 2014 165 ``````> `````` Brian Jackson committed Feb 22, 2018 166 ``````> In the center and right subplots above, we expect all lines to look like step functions because `````` Thomas Robitaille committed Oct 04, 2016 167 168 169 ``````> non-integer value are not realistic for the minimum and maximum values. However, you can see > that the lines are not always vertical or horizontal, and in particular the step function > in the subplot on the right looks slanted. Why is this? `````` Greg Wilson committed Jul 08, 2016 170 171 ``````> > > ## Solution `````` Thomas Robitaille committed Oct 04, 2016 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 ``````> > Because matplotlib interpolates (draws a straight line) between the points. > > One way to do avoid this is to use the Matplotlib `drawstyle` option: > > > > ~~~ > > import numpy > > import matplotlib.pyplot > > > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > > > axes1 = fig.add_subplot(1, 3, 1) > > axes2 = fig.add_subplot(1, 3, 2) > > axes3 = fig.add_subplot(1, 3, 3) > > > > axes1.set_ylabel('average') > > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid') > > > > axes2.set_ylabel('max') > > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid') > > > > axes3.set_ylabel('min') > > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid') > > > > fig.tight_layout() > > > > matplotlib.pyplot.show() > > ~~~ `````` Anne Fouilloux committed Feb 14, 2018 200 ``````> > {: .language-python} `````` Maxim Belkin committed Sep 19, 2019 201 ``````> ![Plot with step lines](../fig/inflammation-01-line-styles.svg) `````` Greg Wilson committed Jul 08, 2016 202 ``````> {: .solution} `````` Greg Wilson committed Jun 22, 2016 203 ``````{: .challenge} `````` Greg Wilson committed Dec 03, 2014 204 `````` `````` Greg Wilson committed Jun 22, 2016 205 ``````> ## Make Your Own Plot `````` Greg Wilson committed Dec 04, 2014 206 ``````> `````` Greg Wilson committed Jul 08, 2016 207 208 209 210 211 ``````> Create a plot showing the standard deviation (`numpy.std`) > of the inflammation data for each day across all patients. > > > ## Solution > > ~~~ `````` David Mawdsley committed Sep 25, 2017 212 ``````> > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0)) `````` Greg Wilson committed Jul 08, 2016 213 214 ``````> > matplotlib.pyplot.show() > > ~~~ `````` Anne Fouilloux committed Feb 14, 2018 215 ``````> > {: .language-python} `````` Greg Wilson committed Jul 08, 2016 216 ``````> {: .solution} `````` Greg Wilson committed Jun 22, 2016 217 ``````{: .challenge} `````` Thomas Coudrat committed Feb 22, 2015 218 `````` `````` Greg Wilson committed Jun 22, 2016 219 ``````> ## Moving Plots Around `````` Thomas Coudrat committed Feb 22, 2015 220 ``````> `````` Greg Wilson committed Jul 08, 2016 221 222 223 224 225 226 227 228 ``````> Modify the program to display the three plots on top of one another > instead of side by side. > > > ## Solution > > ~~~ > > import numpy > > import matplotlib.pyplot > > `````` dn80 committed Dec 14, 2017 229 ``````> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') `````` Greg Wilson committed Jul 08, 2016 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 ``````> > > > # change figsize (swap width and height) > > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0)) > > > > # change add_subplot (swap first two parameters) > > axes1 = fig.add_subplot(3, 1, 1) > > axes2 = fig.add_subplot(3, 1, 2) > > axes3 = fig.add_subplot(3, 1, 3) > > > > axes1.set_ylabel('average') > > axes1.plot(numpy.mean(data, axis=0)) > > > > axes2.set_ylabel('max') > > axes2.plot(numpy.max(data, axis=0)) > > > > axes3.set_ylabel('min') > > axes3.plot(numpy.min(data, axis=0)) > > > > fig.tight_layout() > > > > matplotlib.pyplot.show() > > ~~~ `````` Anne Fouilloux committed Feb 14, 2018 252 ``````> > {: .language-python} `````` Greg Wilson committed Jul 08, 2016 253 ``````> {: .solution} `````` Greg Wilson committed Jun 22, 2016 254 ``````{: .challenge} `````` Ryan Neufeld committed Jan 21, 2016 255 `````` `````` Maxim Belkin committed Apr 18, 2018 256 ``{% include links.md %}``