Commit b95b5283 authored by Rémi Emonet's avatar Rémi Emonet

[DOI: 10.5281/zenodo.838768] Rebuilt HTML files for release 2017.08

jekyll version: jekyll 3.4.3
parent 4a60077c
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="last-modified" content="2017-08-04 00:20:27 +0200">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- meta "search-domain" used for google site search function google_search() -->
<meta name="search-domain" value="/swc-releases/2017.08/python-novice-inflammation">
<link rel="stylesheet" type="text/css" href="../assets/css/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="../assets/css/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="../assets/css/lesson.css" />
<link rel="shortcut icon" type="image/x-icon" href="/favicon-swc.ico" />
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<title>Programming with Python: Analyzing Data from Multiple Files</title>
</head>
<body>
<div class="container">
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../">Home</a>
</div>
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav">
<li><a href="../conduct/">Code of Conduct</a></li>
<li><a href="../setup/">Setup</a></li>
<li class="dropdown">
<a href="../" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Episodes <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../01-numpy/">Analyzing Patient Data</a></li>
<li><a href="../02-loop/">Repeating Actions with Loops</a></li>
<li><a href="../03-lists/">Storing Multiple Values in Lists</a></li>
<li><a href="../04-files/">Analyzing Data from Multiple Files</a></li>
<li><a href="../05-cond/">Making Choices</a></li>
<li><a href="../06-func/">Creating Functions</a></li>
<li><a href="../07-errors/">Errors and Exceptions</a></li>
<li><a href="../08-defensive/">Defensive Programming</a></li>
<li><a href="../09-debugging/">Debugging</a></li>
<li><a href="../10-cmdline/">Command-Line Programs</a></li>
<li role="separator" class="divider"></li>
<li><a href="../aio/">All in one page (Beta)</a></li>
</ul>
</li>
<li class="dropdown">
<a href="../" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Extras <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="../reference/">Reference</a></li>
<li><a href="../about/">About</a></li>
<li><a href="../discuss/">Discussion</a></li>
<li><a href="../figures/">Figures</a></li>
<li><a href="../guide/">Instructor Notes</a></li>
</ul>
</li>
<li><a href="../license/">License</a></li>
<li><a href="/edit/gh-pages/_episodes/04-files.md">Improve this page <span class="glyphicon glyphicon-pencil" aria-hidden="true"></span></a></li>
</ul>
<form class="navbar-form navbar-right" role="search" id="search" onsubmit="google_search(); return false;">
<div class="form-group">
<input type="text" id="google-search" placeholder="Search..." aria-label="Google site search">
</div>
</form>
</div>
</div>
</nav>
<div class="row">
<div class="col-md-1">
<h3>
<a href="../03-lists/"><span class="glyphicon glyphicon-menu-left" aria-hidden="true"></span><span class="sr-only">previous episode</span></a>
</h3>
</div>
<div class="col-md-10">
<h3 class="maintitle"><a href="../">Programming with Python</a></h3>
</div>
<div class="col-md-1">
<h3>
<a href="../05-cond/"><span class="glyphicon glyphicon-menu-right" aria-hidden="true"></span><span class="sr-only">next episode</span></a>
</h3>
</div>
</div>
<article>
<div class="row">
<div class="col-md-1">
</div>
<div class="col-md-10">
<h1 class="maintitle">Analyzing Data from Multiple Files</h1>
</div>
<div class="col-md-1">
</div>
</div>
<blockquote class="objectives">
<h2>Overview</h2>
<div class="row">
<div class="col-md-3">
<strong>Teaching:</strong> 20 min
<br/>
<strong>Exercises:</strong> 0 min
</div>
<div class="col-md-9">
<strong>Questions</strong>
<ul>
<li><p>How can I do the same operations on many different files?</p>
</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-md-3">
</div>
<div class="col-md-9">
<strong>Objectives</strong>
<ul>
<li><p>Use a library function to get a list of filenames that match a wildcard pattern.</p>
</li>
<li><p>Write a <code class="highlighter-rouge">for</code> loop to process multiple files.</p>
</li>
</ul>
</div>
</div>
</blockquote>
<p>We now have almost everything we need to process all our data files.
The only thing that’s missing is a library with a rather unpleasant name:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>import glob
</code></pre>
</div>
<p>The <code class="highlighter-rouge">glob</code> library contains a function, also called <code class="highlighter-rouge">glob</code>,
that finds files and directories whose names match a pattern.
We provide those patterns as strings:
the character <code class="highlighter-rouge">*</code> matches zero or more characters,
while <code class="highlighter-rouge">?</code> matches any one character.
We can use this to get the names of all the CSV files in the current directory:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>print(glob.glob('inflammation*.csv'))
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>['inflammation-05.csv', 'inflammation-11.csv', 'inflammation-12.csv', 'inflammation-08.csv', 'inflammation-03.csv', 'inflammation-06.csv', 'inflammation-09.csv', 'inflammation-07.csv', 'inflammation-10.csv', 'inflammation-02.csv', 'inflammation-04.csv', 'inflammation-01.csv']
</code></pre>
</div>
<p>As these examples show,
<code class="highlighter-rouge">glob.glob</code>’s result is a list of file and directory paths in arbitrary order.
This means we can loop over it
to do something with each filename in turn.
In our case,
the “something” we want to do is generate a set of plots for each file in our inflammation dataset.
If we want to start by analyzing just the first three files in alphabetical order, we can use the <code class="highlighter-rouge">sorted</code> built-in function to generate a new sorted list from the <code class="highlighter-rouge">glob.glob</code> output:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>import numpy
import matplotlib.pyplot
filenames = sorted(glob.glob('inflammation*.csv'))
filenames = filenames[0:3]
for f in filenames:
print(f)
data = numpy.loadtxt(fname=f, delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
</code></pre>
</div>
<div class="output highlighter-rouge"><pre class="highlight"><code>inflammation-01.csv
</code></pre>
</div>
<p><img src="../fig/03-loop_49_1.png" alt="Analysis of inflammation-01.csv" /></p>
<div class="output highlighter-rouge"><pre class="highlight"><code>inflammation-02.csv
</code></pre>
</div>
<p><img src="../fig/03-loop_49_3.png" alt="Analysis of inflammation-02.csv" /></p>
<div class="output highlighter-rouge"><pre class="highlight"><code>inflammation-03.csv
</code></pre>
</div>
<p><img src="../fig/03-loop_49_5.png" alt="Analysis of inflammation-03.csv" /></p>
<p>Sure enough,
the maxima of the first two data sets show exactly the same ramp as the first,
and their minima show the same staircase structure;
a different situation has been revealed in the third dataset,
where the maxima are a bit less regular, but the minima are consistently zero.</p>
<blockquote class="challenge">
<h2 id="plotting-differences">Plotting Differences</h2>
<p>Plot the difference between the average of the first dataset
and the average of the second dataset,
i.e., the difference between the leftmost plot of the first two figures.</p>
<blockquote class="solution">
<h2 id="solution">Solution</h2>
<div class="python highlighter-rouge"><pre class="highlight"><code>import glob
import numpy
import matplotlib.pyplot
filenames = glob.glob('inflammation*.csv')
data0 = numpy.loadtxt(fname=filenames[0], delimiter=',')
data1 = numpy.loadtxt(fname=filenames[1], delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
matplotlib.pyplot.ylabel('Difference in average')
matplotlib.pyplot.plot(data0.mean(axis=0) - data1.mean(axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
</code></pre>
</div>
</blockquote>
</blockquote>
<blockquote class="challenge">
<h2 id="generate-composite-statistics">Generate Composite Statistics</h2>
<p>Use each of the files once to generate a dataset containing values averaged over all patients:</p>
<div class="python highlighter-rouge"><pre class="highlight"><code>filenames = glob.glob('inflammation*.csv')
composite_data = numpy.zeros((60,40))
for f in filenames:
# sum each new file's data into as it's read
#
# and then divide the composite_data by number of samples
composite_data /= len(filenames)
</code></pre>
</div>
<p>Then use pyplot to generate average, max, and min for all patients.</p>
<blockquote class="solution">
<h2 id="solution-1">Solution</h2>
<div class="python highlighter-rouge"><pre class="highlight"><code>import glob
import numpy
import matplotlib.pyplot
filenames = glob.glob('data/inflammation*.csv')
composite_data = numpy.zeros((60,40))
for f in filenames:
data = numpy.loadtxt(fname = f, delimiter=',')
composite_data += data
composite_data/=len(filenames)
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(composite_data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(composite_data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(composite_data, axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
</code></pre>
</div>
</blockquote>
</blockquote>
<blockquote class="keypoints">
<h2>Key Points</h2>
<ul>
<li><p>Use <code class="highlighter-rouge">glob.glob(pattern)</code> to create a list of files whose names match a pattern.</p>
</li>
<li><p>Use <code class="highlighter-rouge">*</code> in a pattern to match zero or more characters, and <code class="highlighter-rouge">?</code> to match any single character.</p>
</li>
</ul>
</blockquote>
</article>
<div class="row">
<div class="col-md-1">
<h3>
<a href="../03-lists/"><span class="glyphicon glyphicon-menu-left" aria-hidden="true"></span><span class="sr-only">previous episode</span></a>
</h3>
</div>
<div class="col-md-10">
</div>
<div class="col-md-1">
<h3>
<a href="../05-cond/"><span class="glyphicon glyphicon-menu-right" aria-hidden="true"></span><span class="sr-only">next episode</span></a>
</h3>
</div>
</div>
<footer>
<div class="row">
<div class="col-md-6" align="left">
<h4>
Copyright &copy; 2016–2017
<a href="https://software-carpentry.org">Software Carpentry Foundation</a>
</h4>
</div>
<div class="col-md-6" align="right">
<h4>
<a href="/edit/gh-pages/_episodes/04-files.md">Edit on GitHub</a>
/
<a href="/blob/gh-pages/CONTRIBUTING.md">Contributing</a>
/
<a href="/">Source</a>
/
<a href="/blob/gh-pages/CITATION">Cite</a>
/
<a href="">Contact</a>
</h4>
</div>
</div>
</footer>
</div>
<script src="../assets/js/jquery.min.js"></script>
<script src="../assets/js/bootstrap.min.js"></script>
<script src="../assets/js/lesson.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-37305346-2', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -67,3 +67,7 @@ exclude:
# Turn off built-in syntax highlighting.
highlighter: false
github:
url: '/swc-releases/2017.08/python-novice-inflammation'
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment