<li><p>How can I process tabular data files in Python?</p>
</li>
</ul>
</div>
</div>
<divclass="row">
<divclass="col-md-3">
</div>
<divclass="col-md-9">
<strong>Objectives</strong>
<ul>
<li><p>Explain what a library is, and what libraries are used for.</p>
</li>
<li><p>Import a Python library and use the functions it contains.</p>
</li>
<li><p>Read tabular data from a file into a program.</p>
</li>
<li><p>Assign values to variables.</p>
</li>
<li><p>Select individual values and subsections from data.</p>
</li>
<li><p>Perform operations on arrays of data.</p>
</li>
<li><p>Plot simple graphs from data.</p>
</li>
</ul>
</div>
</div>
</blockquote>
<p>In this lesson we will learn how to manipulate the inflammation dataset with Python. But before we discuss how to deal with many data points, we will show how to store a single value on the computer.</p>
<p>The line below <ahref="reference.html#assignment">assigns</a> the value <codeclass="highlighter-rouge">55</code> to a <ahref="reference.html#variable">variable</a><codeclass="highlighter-rouge">weight_kg</code>:</p>
such as <codeclass="highlighter-rouge">x_val</code>, <codeclass="highlighter-rouge">current_temperature</code>, or <codeclass="highlighter-rouge">subject_id</code>.
Python’s variables must begin with a letter and are <ahref="reference.html#case-sensitive">case sensitive</a>.
We can create a new variable by assigning a value to it using <codeclass="highlighter-rouge">=</code>.
When we are finished typing and press Shift+Enter,
the notebook runs our command.</p>
<p>Once a variable has a value, we can print it to the screen:</p>
<p>The expression <codeclass="highlighter-rouge">numpy.loadtxt(...)</code> is a <ahref="reference.html#function-call">function call</a>
that asks Python to run the <ahref="reference.html#function">function</a><codeclass="highlighter-rouge">loadtxt</code> which belongs to the <codeclass="highlighter-rouge">numpy</code> library.
This <ahref="reference.html#dotted-notation">dotted notation</a> is used everywhere in Python
to refer to the parts of things as <codeclass="highlighter-rouge">thing.component</code>.</p>
<p><codeclass="highlighter-rouge">numpy.loadtxt</code> has two <ahref="reference.html#parameter">parameters</a>:
the name of the file we want to read,
and the <ahref="reference.html#delimiter">delimiter</a> that separates values on a line.
These both need to be character strings (or <ahref="reference.html#string">strings</a> for short),
so we put them in quotes.</p>
<p>Since we haven’t told it to do anything else with the function’s output,
the notebook displays it.
In this case,
that output is the data we just loaded.
By default,
only a few rows and columns are shown
(with <codeclass="highlighter-rouge">...</code> to omit elements when displaying big arrays).
To save space,
Python displays numbers as <codeclass="highlighter-rouge">1.</code> instead of <codeclass="highlighter-rouge">1.0</code>
when there’s nothing interesting after the decimal point.</p>
<p>Our call to <codeclass="highlighter-rouge">numpy.loadtxt</code> read our file,
but didn’t save the data in memory.
To do that,
we need to assign the array to a variable. Just as we can assign a single value to a variable, we can also assign an array of values
to a variable using the same syntax. Let’s re-run <codeclass="highlighter-rouge">numpy.loadtxt</code> and save its result:</p>
<p>This tells us that <codeclass="highlighter-rouge">data</code> has 60 rows and 40 columns. When we created the
variable <codeclass="highlighter-rouge">data</code> to store our arthritis data, we didn’t just create the array, we also
created information about the array, called <ahref="../reference/#member">members</a> or
attributes. This extra information describes <codeclass="highlighter-rouge">data</code> in
the same way an adjective describes a noun.
<codeclass="highlighter-rouge">data.shape</code> is an attribute of <codeclass="highlighter-rouge">data</code> which describes the dimensions of <codeclass="highlighter-rouge">data</code>.
We use the same dotted notation for the attributes of variables
that we use for the functions in libraries
because they have the same part-and-whole relationship.</p>
<p>If we want to get a single number from the array,
we must provide an <ahref="../reference/#index">index</a> in square brackets,
just as we do in math when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to a value:</p>
<divclass="python highlighter-rouge"><preclass="highlight"><code>print('first value in data:', data[0, 0])
</code></pre>
</div>
<divclass="output highlighter-rouge"><preclass="highlight"><code>first value in data: 0.0
</code></pre>
</div>
<divclass="python highlighter-rouge"><preclass="highlight"><code>print('middle value in data:', data[30, 20])
</code></pre>
</div>
<divclass="output highlighter-rouge"><preclass="highlight"><code>middle value in data: 13.0
</code></pre>
</div>
<p>The expression <codeclass="highlighter-rouge">data[30, 20]</code> may not surprise you,
but <codeclass="highlighter-rouge">data[0, 0]</code> might.
Programming languages like Fortran, MATLAB and R start counting at 1,
because that’s what human beings have done for thousands of years.
Languages in the C family (including C++, Java, Perl, and Python) count from 0
because it represents an offset from the first value in the array (the second
value is offset by one index from the first value). This is closer to the way
that computers represent arrays (if you are interested in the historical
reasons behind counting indices from zero, you can read
<ahref="http://exple.tive.org/blarg/2013/10/22/citation-needed/">Mike Hoye’s blog post</a>).
As a result,
if we have an M×N array in Python,
its indices go from 0 to M-1 on the first axis
and 0 to N-1 on the second.
It takes a bit of getting used to,
but one way to remember the rule is that
the index is how many steps we have to take from the start to get the item we want.</p>
instead of taking an array and doing arithmetic with a single value (as above)
you did the arithmetic operation with another array of the same shape,
the operation will be done on corresponding elements of the two arrays.
Thus:</p>
<divclass="python highlighter-rouge"><preclass="highlight"><code>tripledata = doubledata + data
</code></pre>
</div>
<p>will give you an array where <codeclass="highlighter-rouge">tripledata[0,0]</code> will equal <codeclass="highlighter-rouge">doubledata[0,0]</code> plus <codeclass="highlighter-rouge">data[0,0]</code>,
and so on for all other elements of the arrays.</p>