Unverified Commit 08d82a55 authored by Brian Jackson's avatar Brian Jackson Committed by GitHub
Browse files

Updated file

Again, a number of small wording changes.
parent eaa796ea
...@@ -57,10 +57,9 @@ print(weight_kg) ...@@ -57,10 +57,9 @@ print(weight_kg)
~~~ ~~~
{: .output} {: .output}
and do arithmetic with it: and do arithmetic with it (remember, there are 2.2 pounds per kilogram):
~~~ ~~~
#There are 2.2 pounds per kilogram.
print('weight in pounds:', 2.2 * weight_kg) print('weight in pounds:', 2.2 * weight_kg)
~~~ ~~~
{: .language-python} {: .language-python}
...@@ -194,12 +193,13 @@ The expression `numpy.loadtxt(...)` is a [function call]({{ page.root }}/referen ...@@ -194,12 +193,13 @@ The expression `numpy.loadtxt(...)` is a [function call]({{ page.root }}/referen
that asks Python to run the [function]({{ page.root }}/reference/#function) `loadtxt` which belongs to the `numpy` library. that asks Python to run the [function]({{ page.root }}/reference/#function) `loadtxt` which belongs to the `numpy` library.
This [dotted notation]({{ page.root }}/reference/#dotted-notation) is used everywhere in Python: This [dotted notation]({{ page.root }}/reference/#dotted-notation) is used everywhere in Python:
the thing that appears before the dot contains the thing that appears after. the thing that appears before the dot contains the thing that appears after.
As an example, John Smith is the John that belongs to the Smith family, As an example, John Smith is the John that belongs to the Smith family,
We could use the dot notation to write his name `smith.john`, We could use the dot notation to write his name `smith.john`,
just as `loadtxt` is a function that belongs to the `numpy` library. just as `loadtxt` is a function that belongs to the `numpy` library.
`numpy.loadtxt` has two [parameters]({{ page.root }}/reference/#parameter): `numpy.loadtxt` has two [parameters]({{ page.root }}/reference/#parameter):
the name of the file we want to read, the name of the file we want to read
and the [delimiter]({{ page.root }}/reference/#delimiter) that separates values on a line. and the [delimiter]({{ page.root }}/reference/#delimiter) that separates values on a line.
These both need to be character strings (or [strings]({{ page.root }}/reference/#string) for short), These both need to be character strings (or [strings]({{ page.root }}/reference/#string) for short),
so we put them in quotes. so we put them in quotes.
...@@ -215,19 +215,19 @@ To save space, ...@@ -215,19 +215,19 @@ To save space,
Python displays numbers as `1.` instead of `1.0` Python displays numbers as `1.` instead of `1.0`
when there's nothing interesting after the decimal point. when there's nothing interesting after the decimal point.
Our call to `numpy.loadtxt` read our file, Our call to `numpy.loadtxt` read our file
but didn't save the data in memory. but didn't save the data in memory.
To do that, To do that,
we need to assign the array to a variable. Just as we can assign a single value to a variable, we can also assign an array of values we need to assign the array to a variable. Just as we can assign a single value to a variable, we can also assign an array of values
to a variable using the same syntax. Let's re-run `numpy.loadtxt` and save its result: to a variable using the same syntax. Let's re-run `numpy.loadtxt` and save the returned data:
~~~ ~~~
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
~~~ ~~~
{: .language-python} {: .language-python}
This statement doesn't produce any output because assignment doesn't display anything. This statement doesn't produce any output because we've assigned the output to the variable `data`.
If we want to check that our data has been loaded, If we want to check that the data have been loaded,
we can print the variable's value: we can print the variable's value:
~~~ ~~~
...@@ -246,8 +246,8 @@ print(data) ...@@ -246,8 +246,8 @@ print(data)
~~~ ~~~
{: .output} {: .output}
Now that our data is in memory, Now that the data are in memory,
we can start doing things with it. we can manipulate them.
First, First,
let's ask what [type]({{ page.root }}/reference/#type) of thing `data` refers to: let's ask what [type]({{ page.root }}/reference/#type) of thing `data` refers to:
...@@ -262,17 +262,18 @@ print(type(data)) ...@@ -262,17 +262,18 @@ print(type(data))
{: .output} {: .output}
The output tells us that `data` currently refers to The output tells us that `data` currently refers to
an N-dimensional array created by the NumPy library. an N-dimensional array, the functionality for which is provided by the NumPy library.
These data correspond to arthritis patients' inflammation. These data correspond to arthritis patients' inflammation.
The rows are the individual patients and the columns The rows are the individual patients, and the columns
are their daily inflammation measurements. are their daily inflammation measurements.
> ## Data Type > ## Data Type
> >
> A Numpy array contains one or more elements > A Numpy array contains one or more elements
> of the same type. `type` will only tell you that > of the same type. The `type` function will only tell you that
> a variable is a NumPy array. > a variable is a NumPy array but won't tell you the type of
> We can also find out the type > thing inside the array.
> We can find out the type
> of the data contained in the NumPy array. > of the data contained in the NumPy array.
> >
> ~~~ > ~~~
...@@ -289,7 +290,7 @@ are their daily inflammation measurements. ...@@ -289,7 +290,7 @@ are their daily inflammation measurements.
> [floating-point numbers]({{ page.root }}/reference/#floating-point number). > [floating-point numbers]({{ page.root }}/reference/#floating-point number).
{: .callout} {: .callout}
With this command we can see the array's [shape]({{ page.root }}/reference/#shape): With the following command, we can see the array's [shape]({{ page.root }}/reference/#shape):
~~~ ~~~
print(data.shape) print(data.shape)
...@@ -301,19 +302,19 @@ print(data.shape) ...@@ -301,19 +302,19 @@ print(data.shape)
~~~ ~~~
{: .output} {: .output}
This tells us that `data` has 60 rows and 40 columns. When we created the The output tells us that the `data` array variable contains 60 rows and 40 columns. When we created the
variable `data` to store our arthritis data, we didn't just create the array, we also variable `data` to store our arthritis data, we didn't just create the array; we also
created information about the array, called [members]({{ page.root }}/reference/#member) or created information about the array, called [members]({{ page.root }}/reference/#member) or
attributes. This extra information describes `data` in attributes. This extra information describes `data` in
the same way an adjective describes a noun. the same way an adjective describes a noun.
`data.shape` is an attribute of `data` which describes the dimensions of `data`. `data.shape` is an attribute of `data` which describes the dimensions of `data`.
We use the same dotted notation for the attributes of variables We use the same dotted notation for the attributes of variables
that we use for the functions in libraries that we use for the functions in libraries
because they have the same part-and-whole relationship. because they have the same part-and-whole relationship.
If we want to get a single number from the array, If we want to get a single number from the array,
we must provide an [index]({{ page.root }}/reference/#index) in square brackets, we must provide an [index]({{ page.root }}/reference/#index) in square brackets after the variable name,
just as we do in math when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to a value: just as we do in math when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to one specific value:
~~~ ~~~
print('first value in data:', data[0, 0]) print('first value in data:', data[0, 0])
...@@ -337,7 +338,7 @@ middle value in data: 13.0 ...@@ -337,7 +338,7 @@ middle value in data: 13.0
The expression `data[30, 20]` accesses the element at row 30, column 20. While this expression may not surprise you, The expression `data[30, 20]` accesses the element at row 30, column 20. While this expression may not surprise you,
`data[0, 0]` might. `data[0, 0]` might.
Programming languages like Fortran, MATLAB and R start counting at 1, Programming languages like Fortran, MATLAB and R start counting at 1
because that's what human beings have done for thousands of years. because that's what human beings have done for thousands of years.
Languages in the C family (including C++, Java, Perl, and Python) count from 0 Languages in the C family (including C++, Java, Perl, and Python) count from 0
because it represents an offset from the first value in the array (the second because it represents an offset from the first value in the array (the second
...@@ -360,7 +361,7 @@ the index is how many steps we have to take from the start to get the item we wa ...@@ -360,7 +361,7 @@ the index is how many steps we have to take from the start to get the item we wa
> What may also surprise you is that when Python displays an array, > What may also surprise you is that when Python displays an array,
> it shows the element with index `[0, 0]` in the upper left corner > it shows the element with index `[0, 0]` in the upper left corner
> rather than the lower left. > rather than the lower left.
> This is consistent with the way mathematicians draw matrices, > This is consistent with the way mathematicians draw matrices
> but different from the Cartesian coordinates. > but different from the Cartesian coordinates.
> The indices are (row, column) instead of (column, row) for the same reason, > The indices are (row, column) instead of (column, row) for the same reason,
> which can be confusing when plotting data. > which can be confusing when plotting data.
...@@ -422,6 +423,7 @@ print('small is:') ...@@ -422,6 +423,7 @@ print('small is:')
print(small) print(small)
~~~ ~~~
{: .language-python} {: .language-python}
The above example selects rows 0 through 2 and columns 36 through to the end of the array.
~~~ ~~~
small is: small is:
...@@ -433,9 +435,9 @@ small is: ...@@ -433,9 +435,9 @@ small is:
Arrays also know how to perform common mathematical operations on their values. Arrays also know how to perform common mathematical operations on their values.
The simplest operations with data are arithmetic: The simplest operations with data are arithmetic:
add, subtract, multiply, and divide. addition, subtraction, multiplication, and division.
When you do such operations on arrays, When you do such operations on arrays,
the operation is done on each individual element of the array. the operation is done element-by-element.
Thus: Thus:
~~~ ~~~
...@@ -444,7 +446,7 @@ doubledata = data * 2.0 ...@@ -444,7 +446,7 @@ doubledata = data * 2.0
{: .language-python} {: .language-python}
will create a new array `doubledata` will create a new array `doubledata`
whose elements have the value of two times the value of the corresponding elements in `data`: each elements of which is twice the value of the corresponding element in `data`:
~~~ ~~~
print('original:') print('original:')
...@@ -467,7 +469,7 @@ doubledata: ...@@ -467,7 +469,7 @@ doubledata:
{: .output} {: .output}
If, If,
instead of taking an array and doing arithmetic with a single value (as above) instead of taking an array and doing arithmetic with a single value (as above),
you did the arithmetic operation with another array of the same shape, you did the arithmetic operation with another array of the same shape,
the operation will be done on corresponding elements of the two arrays. the operation will be done on corresponding elements of the two arrays.
Thus: Thus:
...@@ -494,8 +496,8 @@ tripledata: ...@@ -494,8 +496,8 @@ tripledata:
~~~ ~~~
{: .output} {: .output}
Often, we want to do more than add, subtract, multiply, and divide values of data. Often, we want to do more than add, subtract, multiply, and divide array elements.
NumPy knows how to do more complex operations on arrays. NumPy knows how to do more complex operations, too.
If we want to find the average inflammation for all patients on all days, If we want to find the average inflammation for all patients on all days,
for example, for example,
we can ask NumPy to compute `data`'s mean value: we can ask NumPy to compute `data`'s mean value:
...@@ -512,8 +514,6 @@ print(numpy.mean(data)) ...@@ -512,8 +514,6 @@ print(numpy.mean(data))
`mean` is a [function]({{ page.root }}/reference/#function) that takes `mean` is a [function]({{ page.root }}/reference/#function) that takes
an array as an [argument]({{ page.root }}/reference/#argument). an array as an [argument]({{ page.root }}/reference/#argument).
If variables are nouns, functions are verbs:
they do things with variables.
> ## Not All Functions Have Input > ## Not All Functions Have Input
> >
...@@ -551,7 +551,7 @@ print('minimum inflammation:', minval) ...@@ -551,7 +551,7 @@ print('minimum inflammation:', minval)
print('standard deviation:', stdval) print('standard deviation:', stdval)
~~~ ~~~
{: .language-python} {: .language-python}
Here we've assigned the return value from `numpy.max(data)` to the variable `maxval`, the value from `numpy.min(data)` to `minval`, and so on.
~~~ ~~~
maximum inflammation: 20.0 maximum inflammation: 20.0
minimum inflammation: 0.0 minimum inflammation: 0.0
...@@ -562,18 +562,18 @@ standard deviation: 4.61383319712 ...@@ -562,18 +562,18 @@ standard deviation: 4.61383319712
> ## Mystery Functions in IPython > ## Mystery Functions in IPython
> >
> How did we know what functions NumPy has and how to use them? > How did we know what functions NumPy has and how to use them?
> If you are working in the IPython/Jupyter Notebook there is an easy way to find out. > If you are working in the IPython/Jupyter Notebook, there is an easy way to find out.
> If you type the name of something followed by a dot, then you can use tab completion > If you type the name of something followed by a dot, then you can use tab completion
> (e.g. type `numpy.` and then press tab) > (e.g. type `numpy.` and then press tab)
> to see a list of all functions and attributes that you can use. After selecting one you > to see a list of all functions and attributes that you can use. After selecting one, you
> can also add a question mark (e.g. `numpy.cumprod?`) and IPython will return an > can also add a question mark (e.g. `numpy.cumprod?`), and IPython will return an
> explanation of the method! This is the same as doing `help(numpy.cumprod)`. > explanation of the method! This is the same as doing `help(numpy.cumprod)`.
{: .callout} {: .callout}
When analyzing data, though, When analyzing data, though,
we often want to look at partial statistics, we often want to look at variations in statistical values,
such as the maximum value per patient such as the maximum inflammation per patient
or the average value per day. or the average inflammation per day.
One way to do this is to create a new temporary array of the data we want, One way to do this is to create a new temporary array of the data we want,
then ask it to do the calculation: then ask it to do the calculation:
...@@ -607,13 +607,13 @@ maximum inflammation for patient 2: 19.0 ...@@ -607,13 +607,13 @@ maximum inflammation for patient 2: 19.0
{: .output} {: .output}
What if we need the maximum inflammation for each patient over all days (as in the What if we need the maximum inflammation for each patient over all days (as in the
next diagram on the left), or the average for each day (as in the next diagram on the left) or the average for each day (as in the
diagram on the right)? As the diagram below shows, we want to perform the diagram on the right)? As the diagram below shows, we want to perform the
operation across an axis: operation across an axis:
![Operations Across Axes](../fig/python-operations-across-axes.png) ![Operations Across Axes](../fig/python-operations-across-axes.png)
To support this, To support this functionality,
most array functions allow us to specify the axis we want to work on. most array functions allow us to specify the axis we want to work on.
If we ask for the average across axis 0 (rows in our 2D example), If we ask for the average across axis 0 (rows in our 2D example),
we get: we get:
...@@ -672,10 +672,10 @@ which is the average inflammation per patient across all days. ...@@ -672,10 +672,10 @@ which is the average inflammation per patient across all days.
The mathematician Richard Hamming once said, The mathematician Richard Hamming once said,
"The purpose of computing is insight, not numbers," "The purpose of computing is insight, not numbers,"
and the best way to develop insight is often to visualize data. and the best way to develop insight is often to visualize data.
Visualization deserves an entire lecture (of course) of its own, Visualization deserves an entire lecture of its own,
but we can explore a few features of Python's `matplotlib` library here. but we can explore a few features of Python's `matplotlib` library here.
While there is no "official" plotting library, While there is no official plotting library,
this package is the de facto standard. `matplotlib` is the de facto standard.
First, First,
we will import the `pyplot` module from `matplotlib` we will import the `pyplot` module from `matplotlib`
and use two of its functions to create and display a heat map of our data: and use two of its functions to create and display a heat map of our data:
...@@ -689,7 +689,7 @@ matplotlib.pyplot.show() ...@@ -689,7 +689,7 @@ matplotlib.pyplot.show()
![Heatmap of the Data](../fig/01-numpy_71_0.png) ![Heatmap of the Data](../fig/01-numpy_71_0.png)
Blue regions in this heat map are low values, while red shows high values. Blue pixels in this heat map represent low values, while red pixels represent high values.
As we can see, As we can see,
inflammation rises and falls over a 40-day period. inflammation rises and falls over a 40-day period.
...@@ -724,10 +724,9 @@ matplotlib.pyplot.show() ...@@ -724,10 +724,9 @@ matplotlib.pyplot.show()
Here, Here,
we have put the average per day across all patients in the variable `ave_inflammation`, we have put the average per day across all patients in the variable `ave_inflammation`,
then asked `matplotlib.pyplot` to create and display a line graph of those values. then asked `matplotlib.pyplot` to create and display a line graph of those values.
The result is roughly a linear rise and fall, The result is a roughly linear rise and fall,
which is suspicious: which is suspicious:
based on other studies, we might instead expect a sharper rise and slower fall.
we expect a sharper rise and slower fall.
Let's have a look at two other statistics: Let's have a look at two other statistics:
~~~ ~~~
...@@ -746,13 +745,13 @@ matplotlib.pyplot.show() ...@@ -746,13 +745,13 @@ matplotlib.pyplot.show()
![Minimum Value Along The First Axis](../fig/01-numpy_75_3.png) ![Minimum Value Along The First Axis](../fig/01-numpy_75_3.png)
The maximum value rises and falls perfectly smoothly, The maximum value rises and falls smoothly,
while the minimum seems to be a step function. while the minimum seems to be a step function.
Neither result seems particularly likely, Neither trend seems particularly likely,
so either there's a mistake in our calculations so either there's a mistake in our calculations
or something is wrong with our data. or something is wrong with our data.
This insight would have been difficult to reach by This insight would have been difficult to reach by
examining the data without visualization tools. examining the numbers themselves without visualization tools.
You can group similar plots in a single figure using subplots. You can group similar plots in a single figure using subplots.
This script below uses a number of new commands. The function `matplotlib.pyplot.figure()` This script below uses a number of new commands. The function `matplotlib.pyplot.figure()`
...@@ -801,8 +800,7 @@ how large we want the figure to be, ...@@ -801,8 +800,7 @@ how large we want the figure to be,
that we're creating three subplots, that we're creating three subplots,
what to draw for each one, what to draw for each one,
and that we want a tight layout. and that we want a tight layout.
(Perversely, (If we leave out that call to `fig.tight_layout()`,
if we leave out that call to `fig.tight_layout()`,
the graphs will actually be squeezed together more closely.) the graphs will actually be squeezed together more closely.)
> ## Scientists Dislike Typing > ## Scientists Dislike Typing
...@@ -959,7 +957,7 @@ the graphs will actually be squeezed together more closely.) ...@@ -959,7 +957,7 @@ the graphs will actually be squeezed together more closely.)
> ## Drawing Straight Lines > ## Drawing Straight Lines
> >
> In the center and right subplots above, we expect all lines to look like step functions, because > In the center and right subplots above, we expect all lines to look like step functions because
> non-integer value are not realistic for the minimum and maximum values. However, you can see > non-integer value are not realistic for the minimum and maximum values. However, you can see
> that the lines are not always vertical or horizontal, and in particular the step function > that the lines are not always vertical or horizontal, and in particular the step function
> in the subplot on the right looks slanted. Why is this? > in the subplot on the right looks slanted. Why is this?
...@@ -1144,19 +1142,18 @@ the graphs will actually be squeezed together more closely.) ...@@ -1144,19 +1142,18 @@ the graphs will actually be squeezed together more closely.)
>## Change In Inflammation >## Change In Inflammation
> >
>This patient data is _longitudinal_ in the sense that each row represents a >This patient data is _longitudinal_ in the sense that each row represents a
>series of observations relating to one individual. This means that change >series of observations relating to one individual, and so a change in
>inflammation is a meaningful concept. >one patient's inflammation level may be important.
> >
>The `numpy.diff()` function takes a NumPy array and returns the >The `numpy.diff()` function takes a NumPy array and returns the
>difference along a specified axis. >difference from element to the next along a specified axis.
> >
>Which axis would it make sense to use this function along? >Which axis would it make sense to use this function along?
> >
> > ## Solution > > ## Solution
> > Since the row axis (0) is patients, it does not make sense to get the > > Since the row axis (0) is patients, it might not make sense to get the
> > difference between two arbitrary patients. The column axis (1) is in > > difference between two patients. The column axis (1) is in
> > days, so the differnce is the change in inflammation -- a meaningful > > days, so the differnce is the change in time for inflammation values.
> > concept.
> > > >
> > ~~~ > > ~~~
> > numpy.diff(data, axis=1) > > numpy.diff(data, axis=1)
...@@ -1195,8 +1192,9 @@ the graphs will actually be squeezed together more closely.) ...@@ -1195,8 +1192,9 @@ the graphs will actually be squeezed together more closely.)
> > ~~~ > > ~~~
> > {: .language-python} > > {: .language-python}
> > > >
> > If a difference is a *decrease*, then the difference will be negative. If > > If inflammation values *decrease* along an axis, then the difference from
> > you are interested in the **magnitude** of the change and not just the > > one element to the next will be negative. If
> > you are interested in the **magnitude** of the change and not the
> > direction, the `numpy.absolute()` function will provide that. > > direction, the `numpy.absolute()` function will provide that.
> > > >
> > Notice the difference if you get the largest _absolute_ difference > > Notice the difference if you get the largest _absolute_ difference
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment