Commit b95b5283 authored by Rémi Emonet's avatar Rémi Emonet

[DOI: 10.5281/zenodo.838768] Rebuilt HTML files for release 2017.08

jekyll version: jekyll 3.4.3
parent 4a60077c
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="last-modified" content="2017-08-04 00:20:27 +0200">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- meta "search-domain" used for google site search function google_search() -->
<meta name="search-domain" value="/swc-releases/2017.08/python-novice-inflammation">
<link rel="stylesheet" type="text/css" href="../assets/css/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="../assets/css/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="../assets/css/lesson.css" />
<link rel="shortcut icon" type="image/x-icon" href="/favicon-swc.ico" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<title>Programming with Python: Analyzing Patient Data</title>
</head>
<body>
<div class="container">
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../">Home</a>
</div>
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav">
<li><a href="../conduct/">Code of Conduct</a></li>
<li><a href="../setup/">Setup</a></li>
<li class="dropdown">
<a href="../" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Episodes <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../01-numpy/">Analyzing Patient Data</a></li>
<li><a href="../02-loop/">Repeating Actions with Loops</a></li>
<li><a href="../03-lists/">Storing Multiple Values in Lists</a></li>
<li><a href="../04-files/">Analyzing Data from Multiple Files</a></li>
<li><a href="../05-cond/">Making Choices</a></li>
<li><a href="../06-func/">Creating Functions</a></li>
<li><a href="../07-errors/">Errors and Exceptions</a></li>
<li><a href="../08-defensive/">Defensive Programming</a></li>
<li><a href="../09-debugging/">Debugging</a></li>
<li><a href="../10-cmdline/">Command-Line Programs</a></li>
<li role="separator" class="divider"></li>
<li><a href="../aio/">All in one page (Beta)</a></li>
</ul>
</li>
<li class="dropdown">
<a href="../" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Extras <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../reference/">Reference</a></li>
<li><a href="../about/">About</a></li>
<li><a href="../discuss/">Discussion</a></li>
<li><a href="../figures/">Figures</a></li>
<li><a href="../guide/">Instructor Notes</a></li>
</ul>
</li>
<li><a href="../license/">License</a></li>
<li><a href="/edit/gh-pages/_episodes/01-numpy.md">Improve this page <span class="glyphicon glyphicon-pencil" aria-hidden="true"></span></a></li>
</ul>
<form class="navbar-form navbar-right" role="search" id="search" onsubmit="google_search(); return false;">
<div class="form-group">
<input type="text" id="google-search" placeholder="Search..." aria-label="Google site search">
</div>
</form>
</div>
</div>
</nav>
<div class="row">
<div class="col-md-1">
<h3>
<a href="../"><span class="glyphicon glyphicon-menu-up" aria-hidden="true"></span><span class="sr-only">lesson home</span></a>
</h3>
</div>
<div class="col-md-10">
<h3 class="maintitle"><a href="../">Programming with Python</a></h3>
</div>
<div class="col-md-1">
<h3>
<a href="../02-loop/"><span class="glyphicon glyphicon-menu-right" aria-hidden="true"></span><span class="sr-only">next episode</span></a>
</h3>
</div>
</div>
<article>
<div class="row">
<div class="col-md-1">
</div>
<div class="col-md-10">
<h1 class="maintitle">Analyzing Patient Data</h1>
</div>
<div class="col-md-1">
</div>
</div>
<blockquote class="objectives">
<h2>Overview</h2>
<div class="row">
<div class="col-md-3">
<strong>Teaching:</strong> 30 min
<br/>
<strong>Exercises:</strong> 0 min
</div>
<div class="col-md-9">
<strong>Questions</strong>
<ul>
<li><p>How can I process tabular data files in Python?</p>
</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-md-3">
</div>
<div class="col-md-9">
<strong>Objectives</strong>
<ul>
<li><p>Explain what a library is, and what libraries are used for.</p>
</li>
<li><p>Import a Python library and use the functions it contains.</p>
</li>
<li><p>Read tabular data from a file into a program.</p>
</li>
<li><p>Assign values to variables.</p>
</li>
<li><p>Select individual values and subsections from data.</p>
</li>
<li><p>Perform operations on arrays of data.</p>
</li>
<li><p>Plot simple graphs from data.</p>
</li>
</ul>
</div>
</div>
</blockquote>
<p>In this lesson we will learn how to manipulate the inflammation dataset with Python. But before we discuss how to deal with many data points, we will show how to store a single value on the computer.</p>
<p>The line below <a href="reference.html#assignment">assigns</a> the value <code class="highlighter-rouge">55</code> to a <a href="reference.html#variable">variable</a> <code class="highlighter-rouge">weight_kg</code>:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>weight_kg = 55
</code></pre>
</div>
<p>A variable is just a name for a value,
such as <code class="highlighter-rouge">x_val</code>, <code class="highlighter-rouge">current_temperature</code>, or <code class="highlighter-rouge">subject_id</code>.
Python’s variables must begin with a letter and are <a href="reference.html#case-sensitive">case sensitive</a>.
We can create a new variable by assigning a value to it using <code class="highlighter-rouge">=</code>.
When we are finished typing and press Shift+Enter,
the notebook runs our command.</p>
<p>Once a variable has a value, we can print it to the screen:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(weight_kg)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>55
</code></pre>
</div>
<p>and do arithmetic with it:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print('weight in pounds:', 2.2 * weight_kg)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>weight in pounds: 121.0
</code></pre>
</div>
<p>As the example above shows,
we can print several things at once by separating them with commas.</p>
<p>We can also change a variable’s value by assigning it a new one:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>weight_kg = 57.5
print('weight in kilograms is now:', weight_kg)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>weight in kilograms is now: 57.5
</code></pre>
</div>
<p>If we imagine the variable as a sticky note with a name written on it,
assignment is like putting the sticky note on a particular value:</p>
<p><img src="../fig/python-sticky-note-variables-01.svg" alt="Variables as Sticky Notes" /></p>
<p>This means that assigning a value to one variable does <em>not</em> change the values of other variables.
For example,
let’s store the subject’s weight in pounds in a variable:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>weight_lb = 2.2 * weight_kg
print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>weight in kilograms: 57.5 and in pounds: 126.5
</code></pre>
</div>
<p><img src="../fig/python-sticky-note-variables-02.svg" alt="Creating Another Variable" /></p>
<p>and then change <code class="highlighter-rouge">weight_kg</code>:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>weight_kg = 100.0
print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>weight in kilograms is now: 100.0 and weight in pounds is still: 126.5
</code></pre>
</div>
<p><img src="../fig/python-sticky-note-variables-03.svg" alt="Updating a Variable" /></p>
<p>Since <code class="highlighter-rouge">weight_lb</code> doesn’t “remember” where its value came from,
it isn’t automatically updated when <code class="highlighter-rouge">weight_kg</code> changes.
This is different from the way spreadsheets work.</p>
<blockquote class="callout">
<h2 id="whos-who-in-memory">Who’s Who in Memory</h2>
<p>You can use the <code class="highlighter-rouge">%whos</code> command at any time to see what
variables you have created and what modules you have loaded into the computer’s memory.
As this is an IPython command, it will only work if you are in an IPython terminal or the Jupyter Notebook.</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>%whos
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>Variable Type Data/Info
--------------------------------
numpy module &lt;module 'numpy' from '/Us&lt;...&gt;kages/numpy/__init__.py'&gt;
weight_kg float 100.0
weight_lb float 126.5
</code></pre>
</div>
</blockquote>
<p>Words are useful,
but what’s more useful are the sentences and stories we build with them.
Similarly,
while a lot of powerful, general tools are built into languages like Python,
specialized tools built up from these basic units live in <a href="reference.html#library">libraries</a>
that can be called upon when needed.</p>
<p>In order to load our inflammation data,
we need to access (<a href="reference.html#import">import</a> in Python terminology)
a library called <a href="http://docs.scipy.org/doc/numpy/" title="NumPy Documentation">NumPy</a>.
In general you should use this library if you want to do fancy things with numbers,
especially if you have matrices or arrays.
We can import NumPy using:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>import numpy
</code></pre>
</div>
<p>Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench.
Libraries provide additional functionality to the basic Python package,
much like a new piece of equipment adds functionality to a lab space. Just like in the lab, importing too many libraries
can sometimes complicate and slow down your programs - so we only import what we need for each program.
Once we’ve imported the library,
we can ask the library to read our data file for us:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>array([[ 0., 0., 1., ..., 3., 0., 0.],
[ 0., 1., 2., ..., 1., 0., 1.],
[ 0., 1., 1., ..., 2., 1., 1.],
...,
[ 0., 1., 1., ..., 1., 1., 1.],
[ 0., 0., 0., ..., 0., 2., 0.],
[ 0., 0., 1., ..., 1., 1., 0.]])
</code></pre>
</div>
<p>The expression <code class="highlighter-rouge">numpy.loadtxt(...)</code> is a <a href="reference.html#function-call">function call</a>
that asks Python to run the <a href="reference.html#function">function</a> <code class="highlighter-rouge">loadtxt</code> which belongs to the <code class="highlighter-rouge">numpy</code> library.
This <a href="reference.html#dotted-notation">dotted notation</a> is used everywhere in Python
to refer to the parts of things as <code class="highlighter-rouge">thing.component</code>.</p>
<p><code class="highlighter-rouge">numpy.loadtxt</code> has two <a href="reference.html#parameter">parameters</a>:
the name of the file we want to read,
and the <a href="reference.html#delimiter">delimiter</a> that separates values on a line.
These both need to be character strings (or <a href="reference.html#string">strings</a> for short),
so we put them in quotes.</p>
<p>Since we haven’t told it to do anything else with the function’s output,
the notebook displays it.
In this case,
that output is the data we just loaded.
By default,
only a few rows and columns are shown
(with <code class="highlighter-rouge">...</code> to omit elements when displaying big arrays).
To save space,
Python displays numbers as <code class="highlighter-rouge">1.</code> instead of <code class="highlighter-rouge">1.0</code>
when there’s nothing interesting after the decimal point.</p>
<p>Our call to <code class="highlighter-rouge">numpy.loadtxt</code> read our file,
but didn’t save the data in memory.
To do that,
we need to assign the array to a variable. Just as we can assign a single value to a variable, we can also assign an array of values
to a variable using the same syntax. Let’s re-run <code class="highlighter-rouge">numpy.loadtxt</code> and save its result:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
</code></pre>
</div>
<p>This statement doesn’t produce any output because assignment doesn’t display anything.
If we want to check that our data has been loaded,
we can print the variable’s value:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(data)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>[[ 0. 0. 1. ..., 3. 0. 0.]
[ 0. 1. 2. ..., 1. 0. 1.]
[ 0. 1. 1. ..., 2. 1. 1.]
...,
[ 0. 1. 1. ..., 1. 1. 1.]
[ 0. 0. 0. ..., 0. 2. 0.]
[ 0. 0. 1. ..., 1. 1. 0.]]
</code></pre>
</div>
<p>Now that our data is in memory,
we can start doing things with it.
First,
let’s ask what <a href="../reference/#type">type</a> of thing <code class="highlighter-rouge">data</code> refers to:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(type(data))
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>&lt;class 'numpy.ndarray'&gt;
</code></pre>
</div>
<p>The output tells us that <code class="highlighter-rouge">data</code> currently refers to
an N-dimensional array created by the NumPy library.
These data correspond to arthritis patients’ inflammation.
The rows are the individual patients and the columns
are their daily inflammation measurements.</p>
<blockquote class="callout">
<h2 id="data-type">Data Type</h2>
<p>A Numpy array contains one or more elements
of the same type. <code class="highlighter-rouge">type</code> will only tell you that
a variable is a NumPy array.
We can also find out the type
of the data contained in the NumPy array.</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(data.dtype)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>dtype('float64')
</code></pre>
</div>
<p>This tells us that the NumPy array’s elements are
<a href="../reference/#floating-point number">floating-point numbers</a>.</p>
</blockquote>
<p>With this command we can see the array’s <a href="../reference/#shape">shape</a>:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(data.shape)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>(60, 40)
</code></pre>
</div>
<p>This tells us that <code class="highlighter-rouge">data</code> has 60 rows and 40 columns. When we created the
variable <code class="highlighter-rouge">data</code> to store our arthritis data, we didn’t just create the array, we also
created information about the array, called <a href="../reference/#member">members</a> or
attributes. This extra information describes <code class="highlighter-rouge">data</code> in
the same way an adjective describes a noun.
<code class="highlighter-rouge">data.shape</code> is an attribute of <code class="highlighter-rouge">data</code> which describes the dimensions of <code class="highlighter-rouge">data</code>.
We use the same dotted notation for the attributes of variables
that we use for the functions in libraries
because they have the same part-and-whole relationship.</p>
<p>If we want to get a single number from the array,
we must provide an <a href="../reference/#index">index</a> in square brackets,
just as we do in math when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to a value:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print('first value in data:', data[0, 0])
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>first value in data: 0.0
</code></pre>
</div>
<div class="python highlighter-rouge"><pre class="highlight"><code>print('middle value in data:', data[30, 20])
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>middle value in data: 13.0
</code></pre>
</div>
<p>The expression <code class="highlighter-rouge">data[30, 20]</code> may not surprise you,
but <code class="highlighter-rouge">data[0, 0]</code> might.
Programming languages like Fortran, MATLAB and R start counting at 1,
because that’s what human beings have done for thousands of years.
Languages in the C family (including C++, Java, Perl, and Python) count from 0
because it represents an offset from the first value in the array (the second
value is offset by one index from the first value). This is closer to the way
that computers represent arrays (if you are interested in the historical
reasons behind counting indices from zero, you can read
<a href="http://exple.tive.org/blarg/2013/10/22/citation-needed/">Mike Hoye’s blog post</a>).
As a result,
if we have an M×N array in Python,
its indices go from 0 to M-1 on the first axis
and 0 to N-1 on the second.
It takes a bit of getting used to,
but one way to remember the rule is that
the index is how many steps we have to take from the start to get the item we want.</p>
<p><img src="../fig/python-zero-index.png" alt="Zero Index" /></p>
<blockquote class="callout">
<h2 id="in-the-corner">In the Corner</h2>
<p>What may also surprise you is that when Python displays an array,
it shows the element with index <code class="highlighter-rouge">[0, 0]</code> in the upper left corner
rather than the lower left.
This is consistent with the way mathematicians draw matrices,
but different from the Cartesian coordinates.
The indices are (row, column) instead of (column, row) for the same reason,
which can be confusing when plotting data.</p>
</blockquote>
<p>An index like <code class="highlighter-rouge">[30, 20]</code> selects a single element of an array,
but we can select whole sections as well.
For example,
we can select the first ten days (columns) of values
for the first four patients (rows) like this:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(data[0:4, 0:10])
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>[[ 0. 0. 1. 3. 1. 2. 4. 7. 8. 3.]
[ 0. 1. 2. 1. 2. 1. 3. 2. 2. 6.]
[ 0. 1. 1. 3. 3. 2. 6. 2. 5. 9.]
[ 0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]
</code></pre>
</div>
<p>The <a href="../reference/#slice">slice</a> <code class="highlighter-rouge">0:4</code> means,
“Start at index 0 and go up to, but not including, index 4.”
Again,
the up-to-but-not-including takes a bit of getting used to,
but the rule is that the difference between the upper and lower bounds is the number of values in the slice.</p>
<p>We don’t have to start slices at 0:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(data[5:10, 0:10])
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>[[ 0. 0. 1. 2. 2. 4. 2. 1. 6. 4.]
[ 0. 0. 2. 2. 4. 2. 2. 5. 5. 8.]
[ 0. 0. 1. 2. 3. 1. 2. 3. 5. 3.]
[ 0. 0. 0. 3. 1. 5. 6. 5. 5. 8.]
[ 0. 1. 1. 2. 1. 3. 5. 3. 5. 8.]]
</code></pre>
</div>
<p>We also don’t have to include the upper and lower bound on the slice.
If we don’t include the lower bound,
Python uses 0 by default;
if we don’t include the upper,
the slice runs to the end of the axis,
and if we don’t include either
(i.e., if we just use ‘:’ on its own),
the slice includes everything:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>small = data[:3, 36:]
print('small is:')
print(small)
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>small is:
[[ 2. 3. 0. 0.]
[ 1. 1. 0. 1.]
[ 2. 2. 1. 1.]]
</code></pre>
</div>
<p>Arrays also know how to perform common mathematical operations on their values.
The simplest operations with data are arithmetic:
add, subtract, multiply, and divide.
When you do such operations on arrays,
the operation is done on each individual element of the array.
Thus:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>doubledata = data * 2.0
</code></pre>
</div>
<p>will create a new array <code class="highlighter-rouge">doubledata</code>
whose elements have the value of two times the value of the corresponding elements in <code class="highlighter-rouge">data</code>:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print('original:')
print(data[:3, 36:])
print('doubledata:')
print(doubledata[:3, 36:])
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>original:
[[ 2. 3. 0. 0.]
[ 1. 1. 0. 1.]
[ 2. 2. 1. 1.]]
doubledata:
[[ 4. 6. 0. 0.]
[ 2. 2. 0. 2.]
[ 4. 4. 2. 2.]]
</code></pre>
</div>
<p>If,
instead of taking an array and doing arithmetic with a single value (as above)
you did the arithmetic operation with another array of the same shape,
the operation will be done on corresponding elements of the two arrays.
Thus:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>tripledata = doubledata + data
</code></pre>
</div>
<p>will give you an array where <code class="highlighter-rouge">tripledata[0,0]</code> will equal <code class="highlighter-rouge">doubledata[0,0]</code> plus <code class="highlighter-rouge">data[0,0]</code>,
and so on for all other elements of the arrays.</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print('tripledata:')
print(tripledata[:3, 36:])
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>tripledata:
[[ 6. 9. 0. 0.]
[ 3. 3. 0. 3.]
[ 6. 6. 3. 3.]]
</code></pre>
</div>
<p>Often, we want to do more than add, subtract, multiply, and divide values of data.
NumPy knows how to do more complex operations on arrays.
If we want to find the average inflammation for all patients on all days,
for example,
we can ask NumPy to compute <code class="highlighter-rouge">data</code>’s mean value:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(numpy.mean(data))
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>6.14875
</code></pre>
</div>
<p><code class="highlighter-rouge">mean</code> is a <a href="../reference/#function">function</a> that takes
an array as an <a href="../reference/#argument">argument</a>.
If variables are nouns, functions are verbs:
they do things with variables.</p>
<blockquote class="callout">
<h2 id="not-all-functions-have-input">Not All Functions Have Input</h2>