diff --git a/01-numpy/index.html b/01-numpy/index.html new file mode 100644 index 0000000000000000000000000000000000000000..e640e08616afaf6c733a7b5cac6479aef06b9956 --- /dev/null +++ b/01-numpy/index.html @@ -0,0 +1,1457 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Analyzing Patient Data + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Analyzing Patient Data

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I process tabular data files in Python?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Explain what a library is, and what libraries are used for.

+
• + +
• Import a Python library and use the functions it contains.

+
• + +
• Read tabular data from a file into a program.

+
• + +
• Assign values to variables.

+
• + +
• Select individual values and subsections from data.

+
• + +
• Perform operations on arrays of data.

+
• + +
• Plot simple graphs from data.

+
• + +
+
+
+ +
+ +

In this lesson we will learn how to manipulate the inflammation dataset with Python. But before we discuss how to deal with many data points, we will show how to store a single value on the computer.

+ +

The line below assigns the value `55` to a variable `weight_kg`:

+ +
``````weight_kg = 55
+``````
+
+ +

A variable is just a name for a value, +such as `x_val`, `current_temperature`, or `subject_id`. +Python’s variables must begin with a letter and are case sensitive. +We can create a new variable by assigning a value to it using `=`. +When we are finished typing and press Shift+Enter, +the notebook runs our command.

+ +

Once a variable has a value, we can print it to the screen:

+ +
``````print(weight_kg)
+``````
+
+ +
``````55
+``````
+
+ +

and do arithmetic with it:

+ +
``````print('weight in pounds:', 2.2 * weight_kg)
+``````
+
+ +
``````weight in pounds: 121.0
+``````
+
+ +

As the example above shows, +we can print several things at once by separating them with commas.

+ +

We can also change a variable’s value by assigning it a new one:

+ +
``````weight_kg = 57.5
+print('weight in kilograms is now:', weight_kg)
+``````
+
+ +
``````weight in kilograms is now: 57.5
+``````
+
+ +

If we imagine the variable as a sticky note with a name written on it, +assignment is like putting the sticky note on a particular value:

+ + + +

This means that assigning a value to one variable does not change the values of other variables. +For example, +let’s store the subject’s weight in pounds in a variable:

+ +
``````weight_lb = 2.2 * weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+``````
+
+ +
``````weight in kilograms: 57.5 and in pounds: 126.5
+``````
+
+ + + +

and then change `weight_kg`:

+ +
``````weight_kg = 100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+``````
+
+ +
``````weight in kilograms is now: 100.0 and weight in pounds is still: 126.5
+``````
+
+ + + +

Since `weight_lb` doesn’t “remember” where its value came from, +it isn’t automatically updated when `weight_kg` changes. +This is different from the way spreadsheets work.

+ +
+

## Who’s Who in Memory

+ +

You can use the `%whos` command at any time to see what +variables you have created and what modules you have loaded into the computer’s memory. +As this is an IPython command, it will only work if you are in an IPython terminal or the Jupyter Notebook.

+ +
``````%whos
+``````
+
+ +
``````Variable    Type       Data/Info
+--------------------------------
+numpy       module     <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
+weight_kg   float      100.0
+weight_lb   float      126.5
+``````
+
+
+ +

Words are useful, +but what’s more useful are the sentences and stories we build with them. +Similarly, +while a lot of powerful, general tools are built into languages like Python, +specialized tools built up from these basic units live in libraries +that can be called upon when needed.

+ +

In order to load our inflammation data, +we need to access (import in Python terminology) +a library called NumPy. +In general you should use this library if you want to do fancy things with numbers, +especially if you have matrices or arrays. +We can import NumPy using:

+ +
``````import numpy
+``````
+
+ +

Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench. +Libraries provide additional functionality to the basic Python package, +much like a new piece of equipment adds functionality to a lab space. Just like in the lab, importing too many libraries +can sometimes complicate and slow down your programs - so we only import what we need for each program. +Once we’ve imported the library, +we can ask the library to read our data file for us:

+ +
``````numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+``````
+
+ +
``````array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+``````
+
+ +

The expression `numpy.loadtxt(...)` is a function call +that asks Python to run the function `loadtxt` which belongs to the `numpy` library. +This dotted notation is used everywhere in Python +to refer to the parts of things as `thing.component`.

+ +

`numpy.loadtxt` has two parameters: +the name of the file we want to read, +and the delimiter that separates values on a line. +These both need to be character strings (or strings for short), +so we put them in quotes.

+ +

Since we haven’t told it to do anything else with the function’s output, +the notebook displays it. +In this case, +that output is the data we just loaded. +By default, +only a few rows and columns are shown +(with `...` to omit elements when displaying big arrays). +To save space, +Python displays numbers as `1.` instead of `1.0` +when there’s nothing interesting after the decimal point.

+ +

Our call to `numpy.loadtxt` read our file, +but didn’t save the data in memory. +To do that, +we need to assign the array to a variable. Just as we can assign a single value to a variable, we can also assign an array of values +to a variable using the same syntax. Let’s re-run `numpy.loadtxt` and save its result:

+ +
``````data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+``````
+
+ +

This statement doesn’t produce any output because assignment doesn’t display anything. +If we want to check that our data has been loaded, +we can print the variable’s value:

+ +
``````print(data)
+``````
+
+ +
``````[[ 0.  0.  1. ...,  3.  0.  0.]
+ [ 0.  1.  2. ...,  1.  0.  1.]
+ [ 0.  1.  1. ...,  2.  1.  1.]
+ ...,
+ [ 0.  1.  1. ...,  1.  1.  1.]
+ [ 0.  0.  0. ...,  0.  2.  0.]
+ [ 0.  0.  1. ...,  1.  1.  0.]]
+``````
+
+ +

Now that our data is in memory, +we can start doing things with it. +First, +let’s ask what type of thing `data` refers to:

+ +
``````print(type(data))
+``````
+
+ +
``````<class 'numpy.ndarray'>
+``````
+
+ +

The output tells us that `data` currently refers to +an N-dimensional array created by the NumPy library. +These data correspond to arthritis patients’ inflammation. +The rows are the individual patients and the columns +are their daily inflammation measurements.

+ +
+

## Data Type

+ +

A Numpy array contains one or more elements +of the same type. `type` will only tell you that +a variable is a NumPy array. +We can also find out the type +of the data contained in the NumPy array.

+ +
``````print(data.dtype)
+``````
+
+ +
``````dtype('float64')
+``````
+
+ +

This tells us that the NumPy array’s elements are +floating-point numbers.

+
+ +

With this command we can see the array’s shape:

+ +
``````print(data.shape)
+``````
+
+ +
``````(60, 40)
+``````
+
+ +

This tells us that `data` has 60 rows and 40 columns. When we created the +variable `data` to store our arthritis data, we didn’t just create the array, we also +created information about the array, called members or +attributes. This extra information describes `data` in +the same way an adjective describes a noun. +`data.shape` is an attribute of `data` which describes the dimensions of `data`. +We use the same dotted notation for the attributes of variables +that we use for the functions in libraries +because they have the same part-and-whole relationship.

+ +

If we want to get a single number from the array, +we must provide an index in square brackets, +just as we do in math when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to a value:

+ +
``````print('first value in data:', data[0, 0])
+``````
+
+ +
``````first value in data: 0.0
+``````
+
+ +
``````print('middle value in data:', data[30, 20])
+``````
+
+ +
``````middle value in data: 13.0
+``````
+
+ +

The expression `data[30, 20]` may not surprise you, +but `data[0, 0]` might. +Programming languages like Fortran, MATLAB and R start counting at 1, +because that’s what human beings have done for thousands of years. +Languages in the C family (including C++, Java, Perl, and Python) count from 0 +because it represents an offset from the first value in the array (the second +value is offset by one index from the first value). This is closer to the way +that computers represent arrays (if you are interested in the historical +reasons behind counting indices from zero, you can read +Mike Hoye’s blog post). +As a result, +if we have an M×N array in Python, +its indices go from 0 to M-1 on the first axis +and 0 to N-1 on the second. +It takes a bit of getting used to, +but one way to remember the rule is that +the index is how many steps we have to take from the start to get the item we want.

+ + + +
+

## In the Corner

+ +

What may also surprise you is that when Python displays an array, +it shows the element with index `[0, 0]` in the upper left corner +rather than the lower left. +This is consistent with the way mathematicians draw matrices, +but different from the Cartesian coordinates. +The indices are (row, column) instead of (column, row) for the same reason, +which can be confusing when plotting data.

+
+ +

An index like `[30, 20]` selects a single element of an array, +but we can select whole sections as well. +For example, +we can select the first ten days (columns) of values +for the first four patients (rows) like this:

+ +
``````print(data[0:4, 0:10])
+``````
+
+ +
``````[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
+ [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
+ [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
+ [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]
+``````
+
+ +

The slice `0:4` means, +“Start at index 0 and go up to, but not including, index 4.” +Again, +the up-to-but-not-including takes a bit of getting used to, +but the rule is that the difference between the upper and lower bounds is the number of values in the slice.

+ +

We don’t have to start slices at 0:

+ +
``````print(data[5:10, 0:10])
+``````
+
+ +
``````[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
+ [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
+ [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
+ [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
+ [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]
+``````
+
+ +

We also don’t have to include the upper and lower bound on the slice. +If we don’t include the lower bound, +Python uses 0 by default; +if we don’t include the upper, +the slice runs to the end of the axis, +and if we don’t include either +(i.e., if we just use ‘:’ on its own), +the slice includes everything:

+ +
``````small = data[:3, 36:]
+print('small is:')
+print(small)
+``````
+
+ +
``````small is:
+[[ 2.  3.  0.  0.]
+ [ 1.  1.  0.  1.]
+ [ 2.  2.  1.  1.]]
+``````
+
+ +

Arrays also know how to perform common mathematical operations on their values. +The simplest operations with data are arithmetic: +add, subtract, multiply, and divide. + When you do such operations on arrays, +the operation is done on each individual element of the array. +Thus:

+ +
``````doubledata = data * 2.0
+``````
+
+ +

will create a new array `doubledata` +whose elements have the value of two times the value of the corresponding elements in `data`:

+ +
``````print('original:')
+print(data[:3, 36:])
+print('doubledata:')
+print(doubledata[:3, 36:])
+``````
+
+ +
``````original:
+[[ 2.  3.  0.  0.]
+ [ 1.  1.  0.  1.]
+ [ 2.  2.  1.  1.]]
+doubledata:
+[[ 4.  6.  0.  0.]
+ [ 2.  2.  0.  2.]
+ [ 4.  4.  2.  2.]]
+``````
+
+ +

If, +instead of taking an array and doing arithmetic with a single value (as above) +you did the arithmetic operation with another array of the same shape, +the operation will be done on corresponding elements of the two arrays. +Thus:

+ +
``````tripledata = doubledata + data
+``````
+
+ +

will give you an array where `tripledata[0,0]` will equal `doubledata[0,0]` plus `data[0,0]`, +and so on for all other elements of the arrays.

+ +
``````print('tripledata:')
+print(tripledata[:3, 36:])
+``````
+
+ +
``````tripledata:
+[[ 6.  9.  0.  0.]
+ [ 3.  3.  0.  3.]
+ [ 6.  6.  3.  3.]]
+``````
+
+ +

Often, we want to do more than add, subtract, multiply, and divide values of data. +NumPy knows how to do more complex operations on arrays. +If we want to find the average inflammation for all patients on all days, +for example, +we can ask NumPy to compute `data`’s mean value:

+ +
``````print(numpy.mean(data))
+``````
+
+ +
``````6.14875
+``````
+
+ +

`mean` is a function that takes +an array as an argument. +If variables are nouns, functions are verbs: +they do things with variables.

+ +
+

## Not All Functions Have Input

+ +

Generally, a function uses inputs to produce outputs. +However, some functions produce outputs without +needing any input. For example, checking the current time +doesn’t require any input.

+ +
``````import time
+print(time.ctime())
+``````
+
+ +
``````'Sat Mar 26 13:07:33 2016'
+``````
+
+ +

For functions that don’t take in any arguments, +we still need parentheses (`()`) +to tell Python to go and do something for us.

+
+ +

NumPy has lots of useful functions that take an array as input. +Let’s use three of those functions to get some descriptive values about the dataset. +We’ll also use multiple assignment, +a convenient Python feature that will enable us to do this all in one line.

+ +
``````maxval, minval, stdval = numpy.max(data), numpy.min(data), numpy.std(data)
+
+print('maximum inflammation:', maxval)
+print('minimum inflammation:', minval)
+print('standard deviation:', stdval)
+``````
+
+ +
``````maximum inflammation: 20.0
+minimum inflammation: 0.0
+standard deviation: 4.61383319712
+``````
+
+ +
+

## Mystery Functions in IPython

+ +

How did we know what functions NumPy has and how to use them? +If you are working in the IPython/Jupyter Notebook there is an easy way to find out. +If you type the name of something followed by a dot, then you can use tab completion +(e.g. type `numpy.` and then press tab) +to see a list of all functions and attributes that you can use. After selecting one you +can also add a question mark (e.g. `numpy.cumprod?`) and IPython will return an +explanation of the method! This is the same as doing `help(numpy.cumprod)`.

+
+ +

When analyzing data, though, +we often want to look at partial statistics, +such as the maximum value per patient +or the average value per day. +One way to do this is to create a new temporary array of the data we want, +then ask it to do the calculation:

+ +
``````patient_0 = data[0, :] # 0 on the first axis (rows), everything on the second (columns)
+print('maximum inflammation for patient 0:', patient_0.max())
+``````
+
+ +
``````maximum inflammation for patient 0: 18.0
+``````
+
+ +

Everything in a line of code following the ‘#’ symbol is a +comment that is ignored by Python. +Comments allow programmers to leave explanatory notes for other +programmers or their future selves.

+ +

We don’t actually need to store the row in a variable of its own. +Instead, we can combine the selection and the function call:

+ +
``````print('maximum inflammation for patient 2:', numpy.max(data[2, :]))
+``````
+
+ +
``````maximum inflammation for patient 2: 19.0
+``````
+
+ +

What if we need the maximum inflammation for each patient over all days (as in the +next diagram on the left), or the average for each day (as in the +diagram on the right)? As the diagram below shows, we want to perform the +operation across an axis:

+ + + +

To support this, +most array functions allow us to specify the axis we want to work on. +If we ask for the average across axis 0 (rows in our 2D example), +we get:

+ +
``````print(numpy.mean(data, axis=0))
+``````
+
+ +
``````[  0.           0.45         1.11666667   1.75         2.43333333   3.15
+   3.8          3.88333333   5.23333333   5.51666667   5.95         5.9
+   8.35         7.73333333   8.36666667   9.5          9.58333333
+  10.63333333  11.56666667  12.35        13.25        11.96666667
+  11.03333333  10.16666667  10.           8.66666667   9.15         7.25
+   7.33333333   6.58333333   6.06666667   5.95         5.11666667   3.6
+   3.3          3.56666667   2.48333333   1.5          1.13333333
+   0.56666667]
+``````
+
+ +

As a quick check, +we can ask this array what its shape is:

+ +
``````print(numpy.mean(data, axis=0).shape)
+``````
+
+ +
``````(40,)
+``````
+
+ +

The expression `(40,)` tells us we have an N×1 vector, +so this is the average inflammation per day for all patients. +If we average across axis 1 (columns in our 2D example), we get:

+ +
``````print(numpy.mean(data, axis=1))
+``````
+
+ +
``````[ 5.45   5.425  6.1    5.9    5.55   6.225  5.975  6.65   6.625  6.525
+  6.775  5.8    6.225  5.75   5.225  6.3    6.55   5.7    5.85   6.55
+  5.775  5.825  6.175  6.1    5.8    6.425  6.05   6.025  6.175  6.55
+  6.175  6.35   6.725  6.125  7.075  5.725  5.925  6.15   6.075  5.75
+  5.975  5.725  6.3    5.9    6.75   5.925  7.225  6.15   5.95   6.275  5.7
+  6.1    6.825  5.975  6.725  5.7    6.25   6.4    7.05   5.9  ]
+``````
+
+ +

which is the average inflammation per patient across all days.

+ +

The mathematician Richard Hamming once said, +“The purpose of computing is insight, not numbers,” +and the best way to develop insight is often to visualize data. +Visualization deserves an entire lecture (of course) of its own, +but we can explore a few features of Python’s `matplotlib` library here. +While there is no “official” plotting library, +this package is the de facto standard. +First, +we will import the `pyplot` module from `matplotlib` +and use two of its functions to create and display a heat map of our data:

+ +
``````import matplotlib.pyplot
+image = matplotlib.pyplot.imshow(data)
+matplotlib.pyplot.show()
+``````
+
+ + + +

Blue regions in this heat map are low values, while red shows high values. +As we can see, +inflammation rises and falls over a 40-day period.

+ +
+

## Some IPython Magic

+ +

If you’re using an IPython / Jupyter notebook, +you’ll need to execute the following command +in order for your matplotlib images to appear +in the notebook when `show()` is called:

+ +
``````%matplotlib inline
+``````
+
+ +

The `%` indicates an IPython magic function - +a function that is only valid within the notebook environment. +Note that you only have to execute this function once per notebook.

+
+ +

Let’s take a look at the average inflammation over time:

+ +
``````ave_inflammation = numpy.mean(data, axis=0)
+ave_plot = matplotlib.pyplot.plot(ave_inflammation)
+matplotlib.pyplot.show()
+``````
+
+ + + +

Here, +we have put the average per day across all patients in the variable `ave_inflammation`, +then asked `matplotlib.pyplot` to create and display a line graph of those values. +The result is roughly a linear rise and fall, +which is suspicious: +based on other studies, +we expect a sharper rise and slower fall. +Let’s have a look at two other statistics:

+ +
``````max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
+matplotlib.pyplot.show()
+``````
+
+ + + +
``````min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
+matplotlib.pyplot.show()
+``````
+
+ + + +

The maximum value rises and falls perfectly smoothly, +while the minimum seems to be a step function. +Neither result seems particularly likely, +so either there’s a mistake in our calculations +or something is wrong with our data. +This insight would have been difficult to reach by +examining the data without visualization tools.

+ +

You can group similar plots in a single figure using subplots. +This script below uses a number of new commands. The function `matplotlib.pyplot.figure()` +creates a space into which we will place all of our plots. The parameter `figsize` +tells Python how big to make this space. Each subplot is placed into the figure using +its `add_subplot` method. The `add_subplot` method takes 3 parameters. The first denotes +how many total rows of subplots there are, the second parameter refers to the +total number of subplot columns, and the final parameter denotes which subplot +your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a +different variable (`axes1`, `axes2`, `axes3`). Once a subplot is created, the axes can +be titled using the `set_xlabel()` command (or `set_ylabel()`). +Here are our three plots side by side:

+ +
``````import numpy
+import matplotlib.pyplot
+
+
+fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
+
+
+axes1.set_ylabel('average')
+axes1.plot(numpy.mean(data, axis=0))
+
+axes2.set_ylabel('max')
+axes2.plot(numpy.max(data, axis=0))
+
+axes3.set_ylabel('min')
+axes3.plot(numpy.min(data, axis=0))
+
+fig.tight_layout()
+
+matplotlib.pyplot.show()
+``````
+
+ + + +

The call to `loadtxt` reads our data, +and the rest of the program tells the plotting library +how large we want the figure to be, +that we’re creating three subplots, +what to draw for each one, +and that we want a tight layout. +(Perversely, +if we leave out that call to `fig.tight_layout()`, +the graphs will actually be squeezed together more closely.)

+ +
+

## Scientists Dislike Typing

+ +

We will always use the syntax `import numpy` to import NumPy. +However, in order to save typing, it is +often suggested +to make a shortcut like so: `import numpy as np`. +If you ever see Python code online using a NumPy function with `np` +(for example, `np.loadtxt(...)`), it’s because they’ve used this shortcut. +When working with other people, it is important to agree on a convention of how common libraries are imported.

+
+
+

+ +

Draw diagrams showing what variables refer to what values after each statement in the following program:

+ +
``````mass = 47.5
+age = 122
+mass = mass * 2.0
+age = age - 20
+``````
+
+
+ +
+

## Sorting Out References

+ +

What does the following program print out?

+ +
``````first, second = 'Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+``````
+
+ +
+

## Solution

+
``````Hopper Grace
+``````
+
+
+
+ +
+

## Slicing Strings

+ +

A section of an array is called a slice. +We can take slices of character strings as well:

+ +
``````element = 'oxygen'
+print('first three characters:', element[0:3])
+print('last three characters:', element[3:6])
+``````
+
+ +
``````first three characters: oxy
+last three characters: gen
+``````
+
+ +

What is the value of `element[:4]`? +What about `element[4:]`? +Or `element[:]`?

+ +
+

## Solution

+
``````oxyg
+en
+oxygen
+``````
+
+
+ +

What is `element[-1]`? +What is `element[-2]`?

+ +
+

## Solution

+
``````n
+e
+``````
+
+
+ +

Given those answers, +explain what `element[1:-1]` does.

+ +
+

## Solution

+

Creates a substring from index 1 up to (not including) the final index, +effectively removing the first and last letters from ‘oxygen’

+
+
+ +
+

## Thin Slices

+ +

The expression `element[3:3]` produces an empty string, +i.e., a string that contains no characters. +If `data` holds our array of patient data, +what does `data[3:3, 4:4]` produce? +What about `data[3:3, :]`?

+ +
+

## Solution

+
``````[]
+[]
+``````
+
+
+
+ +
+

## Plot Scaling

+ +

Why do all of our plots stop just short of the upper end of our graph?

+ +
+

## Solution

+

Because matplotlib normally sets x and y axes limits to the min and max of our data +(depending on data range)

+
+ +

If we want to change this, we can use the `set_ylim(min, max)` method of each ‘axes’, +for example:

+ +
``````axes3.set_ylim(0,6)
+``````
+
+ +

Update your plotting code to automatically set a more appropriate scale. +(Hint: you can make use of the `max` and `min` methods to help.)

+ +
+

## Solution

+
``````# One method
+axes3.set_ylabel('min')
+axes3.plot(numpy.min(data, axis=0))
+axes3.set_ylim(0,6)
+``````
+
+
+ +
+

## Solution

+
``````# A more automated approach
+min_data = numpy.min(data, axis=0)
+axes3.set_ylabel('min')
+axes3.plot(min_data)
+axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)
+``````
+
+
+
+ +
+

## Drawing Straight Lines

+ +

In the center and right subplots above, we expect all lines to look like step functions, because +non-integer value are not realistic for the minimum and maximum values. However, you can see +that the lines are not always vertical or horizontal, and in particular the step function +in the subplot on the right looks slanted. Why is this?

+ +
+

## Solution

+

Because matplotlib interpolates (draws a straight line) between the points. +One way to do avoid this is to use the Matplotlib `drawstyle` option:

+ +
``````import numpy
+import matplotlib.pyplot
+
+
+fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
+
+
+axes1.set_ylabel('average')
+axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid')
+
+axes2.set_ylabel('max')
+axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid')
+
+axes3.set_ylabel('min')
+axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid')
+
+fig.tight_layout()
+
+matplotlib.pyplot.show()
+``````
+
+ +
+
+ +
+

+ +

Create a plot showing the standard deviation (`numpy.std`) +of the inflammation data for each day across all patients.

+ +
+

## Solution

+
``````max_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0))
+matplotlib.pyplot.show()
+``````
+
+
+
+ +
+

## Moving Plots Around

+ +

Modify the program to display the three plots on top of one another +instead of side by side.

+ +
+

## Solution

+
``````import numpy
+import matplotlib.pyplot
+
+
+# change figsize (swap width and height)
+fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0))
+
+# change add_subplot (swap first two parameters)
+
+axes1.set_ylabel('average')
+axes1.plot(numpy.mean(data, axis=0))
+
+axes2.set_ylabel('max')
+axes2.plot(numpy.max(data, axis=0))
+
+axes3.set_ylabel('min')
+axes3.plot(numpy.min(data, axis=0))
+
+fig.tight_layout()
+
+matplotlib.pyplot.show()
+``````
+
+
+
+ +
+

## Stacking Arrays

+ +

Arrays can be concatenated and stacked on top of one another, +using NumPy’s `vstack` and `hstack` functions for vertical and horizontal stacking, respectively.

+ +
``````import numpy
+
+A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
+print('A = ')
+print(A)
+
+B = numpy.hstack([A, A])
+print('B = ')
+print(B)
+
+C = numpy.vstack([A, A])
+print('C = ')
+print(C)
+``````
+
+ +
``````A =
+[[1 2 3]
+ [4 5 6]
+ [7 8 9]]
+B =
+[[1 2 3 1 2 3]
+ [4 5 6 4 5 6]
+ [7 8 9 7 8 9]]
+C =
+[[1 2 3]
+ [4 5 6]
+ [7 8 9]
+ [1 2 3]
+ [4 5 6]
+ [7 8 9]]
+``````
+
+ +

Write some additional code that slices the first and last columns of `A`, +and stacks them into a 3x2 array. +Make sure to `print` the results to verify your solution.

+ +
+

## Solution

+ +

A ‘gotcha’ with array indexing is that singleton dimensions +are dropped by default. That means `A[:, 0]` is a one dimensional +array, which won’t stack as desired. To preserve singleton dimensions, +the index itself can be a slice or array. For example, `A[:, :1]` returns +a two dimensional array with one singleton dimension (i.e. a column +vector).

+ +
``````D = numpy.hstack((A[:, :1], A[:, -1:]))
+print('D = ')
+print(D)
+``````
+
+ +
``````D =
+[[1 3]
+ [4 6]
+ [7 9]]
+``````
+
+
+ +
+

## Solution

+ +

An alternative way to achieve the same result is to use Numpy’s +delete function to remove the second column of A.

+ +
``````D = numpy.delete(A, 1, 1)
+print('D = ')
+print(D)
+``````
+
+ +
``````D =
+[[1 3]
+ [4 6]
+ [7 9]]
+``````
+
+
+
+ +
+

## Change In Inflammation

+ +

This patient data is longitudinal in the sense that each row represents a +series of observations relating to one individual. This means that change +inflammation is a meaningful concept.

+ +

The `numpy.diff()` function takes a NumPy array and returns the +difference along a specified axis.

+ +

Which axis would it make sense to use this function along?

+ +
+

## Solution

+

Since the row axis (0) is patients, it does not make sense to get the +difference between two arbitrary patients. The column axis (1) is in +days, so the differnce is the change in inflammation – a meaningful +concept.

+ +
``````numpy.diff(data, axis=1)
+``````
+
+
+ +

If the shape of an individual data file is `(60, 40)` (60 rows and 40 +columns), what would the shape of the array be after you run the `diff()` +function and why?

+ +
+

## Solution

+

The shape will be `(60, 39)` because there is one fewer difference between +columns than there are columns in the data.

+
+ +

How would you find the largest change in inflammation for each patient? Does +it matter if the change in inflammation is an increase or a decrease?

+ +
+

## Solution

+

By using the `numpy.max()` function after you apply the `numpy.diff()` +function, you will get the largest difference between days.

+ +
``````numpy.max(numpy.diff(data, axis=1), axis=1)
+``````
+
+ +
``````array([  7.,  12.,  11.,  10.,  11.,  13.,  10.,   8.,  10.,  10.,   7.,
+         7.,  13.,   7.,  10.,  10.,   8.,  10.,   9.,  10.,  13.,   7.,
+        12.,   9.,  12.,  11.,  10.,  10.,   7.,  10.,  11.,  10.,   8.,
+        11.,  12.,  10.,   9.,  10.,  13.,  10.,   7.,   7.,  10.,  13.,
+        12.,   8.,   8.,  10.,  10.,   9.,   8.,  13.,  10.,   7.,  10.,
+         8.,  12.,  10.,   7.,  12.])
+``````
+
+ +

If a difference is a decrease, then the difference will be negative. If +you are interested in the magnitude of the change and not just the +direction, the `numpy.absolute()` function will provide that.

+ +

Notice the difference if you get the largest absolute difference +between readings.

+ +
``````numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1)
+``````
+
+ +
``````array([ 12.,  14.,  11.,  13.,  11.,  13.,  10.,  12.,  10.,  10.,  10.,
+        12.,  13.,  10.,  11.,  10.,  12.,  13.,   9.,  10.,  13.,   9.,
+        12.,   9.,  12.,  11.,  10.,  13.,   9.,  13.,  11.,  11.,   8.,
+        11.,  12.,  13.,   9.,  10.,  13.,  11.,  11.,  13.,  11.,  13.,
+        13.,  10.,   9.,  10.,  10.,   9.,   9.,  13.,  10.,   9.,  10.,
+        11.,  13.,  10.,  10.,  12.])
+``````
+
+ +
+
+ + +
+

## Key Points

+
+ +
• Import a library into a program using `import libraryname`.

+
• + +
• Use the `numpy` library to work with arrays in Python.

+
• + +
• Use `variable = value` to assign a value to a variable in order to record it in memory.

+
• + +
• Variables are created on demand whenever a value is assigned to them.

+
• + +
• Use `print(something)` to display the value of `something`.

+
• + +
• The expression `array.shape` gives the shape of an array.

+
• + +
• Use `array[x, y]` to select a single element from a 2D array.

+
• + +
• Array indices start at 0, not 1.

+
• + +
• Use `low:high` to specify a `slice` that includes the indices from `low` to `high-1`.

+
• + +
• All the indexing and slicing that works on arrays also works on strings.

+
• + +
• Use `# some kind of explanation` to add comments to programs.

+
• + +
• Use `numpy.mean(array)`, `numpy.max(array)`, and `numpy.min(array)` to calculate simple statistics.

+
• + +
• Use `numpy.mean(array, axis=0)` or `numpy.mean(array, axis=1)` to calculate statistics across the specified axis.

+
• + +
• Use the `pyplot` library from `matplotlib` for creating simple visualizations.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/02-loop/index.html b/02-loop/index.html new file mode 100644 index 0000000000000000000000000000000000000000..0156a29dbdafe7e0e6f0de170244e69f84399ecf --- /dev/null +++ b/02-loop/index.html @@ -0,0 +1,643 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Repeating Actions with Loops + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Repeating Actions with Loops

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I do the same operations on many different values?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Explain what a `for` loop does.

+
• + +
• Correctly write `for` loops to repeat simple calculations.

+
• + +
• Trace changes to a loop variable as the loop runs.

+
• + +
• Trace changes to other variables as they are updated by a `for` loop.

+
• + +
+
+
+ +
+ +

In the last lesson, +we wrote some code that plots some values of interest from our first inflammation dataset, +and reveals some suspicious features in it, such as from `inflammation-01.csv`

+ + + +

We have a dozen data sets right now, though, and more on the way. +We want to create plots for all of our data sets with a single statement. +To do that, we’ll have to teach the computer how to repeat things.

+ +

An example task that we might want to repeat is printing each character in a +word on a line of its own.

+ +
``````word = 'lead'
+``````
+
+ +

We can access a character in a string using its index. For example, we can get the first +character of the word `'lead'`, by using `word`. One way to print each character is to use +four `print` statements:

+ +
``````print(word)
+print(word)
+print(word)
+print(word)
+``````
+
+ +
``````l
+e
+a
+d
+``````
+
+ +

This is a bad approach for two reasons:

+ +
+
1. +

It doesn’t scale: +if we want to print the characters in a string that’s hundreds of letters long, +we’d be better off just typing them in.

+
2. +
3. +

It’s fragile: +if we give it a longer string, +it only prints part of the data, +and if we give it a shorter one, +it produces an error because we’re asking for characters that don’t exist.

+
4. +
+ +
``````word = 'tin'
+print(word)
+print(word)
+print(word)
+print(word)
+
+``````
+
+ +
``````t
+i
+n
+``````
+
+ +
``````---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+      3 print(word)
+      4 print(word)
+----> 5 print(word)
+
+IndexError: string index out of range
+``````
+
+ +

Here’s a better approach:

+ +
``````word = 'lead'
+for char in word:
+    print(char)
+
+``````
+
+ +
``````l
+e
+a
+d
+``````
+
+ +

This is shorter—certainly shorter than something that prints every character in a hundred-letter string—and +more robust as well:

+ +
``````word = 'oxygen'
+for char in word:
+    print(char)
+``````
+
+ +
``````o
+x
+y
+g
+e
+n
+``````
+
+ +

The improved version uses a for loop +to repeat an operation—in this case, printing—once for each thing in a sequence. +The general form of a loop is:

+ +
``````for element in variable:
+    do things with element
+``````
+
+ +

Using the oxygen example above, the loop might look like this:

+ + + +

where each character (`char`) in the variable `word` is looped through and printed one character after another. +The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop).

+ +

We can call the loop variable anything we like, +but there must be a colon at the end of the line starting the loop, +and we must indent anything we want to run inside the loop. Unlike many other languages, there is no +command to signify the end of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.

+ +
+

## What’s in a name?

+ +

In the example above, the loop variable was given the name `char` as a mnemonic; it is short for ‘character’. ‘Char’ is not a keyword in Python that pulls the characters from words or strings. In fact when a similar loop is run over a list rather than a word, the output would be each member of that list printed in order, rather than the characters.

+ +
``````elements = ['oxygen', 'nitrogen', 'argon']
+for char in elements:
+   print(char)
+``````
+
+ +
``````oxygen
+nitrogen
+argon
+``````
+
+ +

We can choose any name we want for variables. We might just as easily have chosen the name `banana` for the loop variable, as long as we use the same name when we invoke the variable inside the loop:

+ +
``````word = 'oxygen'
+for banana in word:
+    print(banana)
+``````
+
+ +
``````o
+x
+y
+g
+e
+n
+``````
+
+ +

It is a good idea to choose variable names that are meaningful, otherwise it would be more difficult to understand what the loop is doing.

+
+ +

Here’s another loop that repeatedly updates a variable:

+ +
``````length = 0
+for vowel in 'aeiou':
+    length = length + 1
+print('There are', length, 'vowels')
+``````
+
+ +
``````There are 5 vowels
+``````
+
+ +

It’s worth tracing the execution of this little program step by step. +Since there are five characters in `'aeiou'`, +the statement on line 3 will be executed five times. +The first time around, +`length` is zero (the value assigned to it on line 1) +and `vowel` is `'a'`. +The statement adds 1 to the old value of `length`, +producing 1, +and updates `length` to refer to that new value. +The next time around, +`vowel` is `'e'` and `length` is 1, +so `length` is updated to be 2. +After three more updates, +`length` is 5; +since there is nothing left in `'aeiou'` for Python to process, +the loop finishes +and the `print` statement on line 4 tells us our final answer.

+ +

Note that a loop variable is just a variable that’s being used to record progress in a loop. +It still exists after the loop is over, +and we can re-use variables previously defined as loop variables as well:

+ +
``````letter = 'z'
+for letter in 'abc':
+    print(letter)
+print('after the loop, letter is', letter)
+``````
+
+ +
``````a
+b
+c
+after the loop, letter is c
+``````
+
+ +

Note also that finding the length of a string is such a common operation +that Python actually has a built-in function to do it called `len`:

+ +
``````print(len('aeiou'))
+``````
+
+ +
``````5
+``````
+
+ +

`len` is much faster than any function we could write ourselves, +and much easier to read than a two-line loop; +it will also give us the length of many other things that we haven’t met yet, +so we should always use it when we can.

+ +
+

## From 1 to N

+ +

Python has a built-in function called `range` that creates a sequence of numbers. `range` can +accept 1, 2, or 3 parameters.

+ +
+
• If one parameter is given, `range` creates an array of that length, +starting at zero and incrementing by 1. +For example, `range(3)` produces the numbers `0, 1, 2`.
• +
• If two parameters are given, `range` starts at +the first and ends just before the second, incrementing by one. +For example, `range(2, 5)` produces `2, 3, 4`.
• +
• If `range` is given 3 parameters, +it starts at the first one, ends just before the second one, and increments by the third one. +For exmaple `range(3, 10, 2)` produces `3, 5, 7, 9`.
• +
+ +

Using `range`, +write a loop that uses `range` to print the first 3 natural numbers:

+ +
``````1
+2
+3
+``````
+
+ +
+

## Solution

+
``````for i in range(1, 4):
+   print(i)
+``````
+
+
+
+ +
+

## Computing Powers With Loops

+ +

Exponentiation is built into Python:

+ +
``````print(5 ** 3)
+``````
+
+ +
``````125
+``````
+
+ +

Write a loop that calculates the same result as `5 ** 3` using +multiplication (and without exponentiation).

+ +
+

## Solution

+
``````result = 1
+for i in range(0, 3):
+   result = result * 5
+print(result)
+``````
+
+
+
+ +
+

## Reverse a String

+ +

Knowing that two strings can be concatenated using the `+` operator, +write a loop that takes a string +and produces a new string with the characters in reverse order, +so `'Newton'` becomes `'notweN'`.

+ +
+

## Solution

+
``````newstring = ''
+oldstring = 'Newton'
+for char in oldstring:
+   newstring = char + newstring
+print(newstring)
+``````
+
+
+
+ +
+

## Computing the Value of a Polynomial

+ +

The built-in function `enumerate` takes a sequence (e.g. a list) and generates a +new sequence of the same length. Each element of the new sequence is a pair composed of the index +(0, 1, 2,…) and the value from the original sequence:

+ +
``````for i, x in enumerate(xs):
+    # Do something with i and x
+``````
+
+ +

The loop above assigns the index to `i` and the value to `x`.

+ +

Suppose you have encoded a polynomial as a list of coefficients in +the following way: the first element is the constant term, the +second element is the coefficient of the linear term, the third is the +coefficient of the quadratic term, etc.

+ +
``````x = 5
+cc = [2, 4, 3]
+``````
+
+ +
``````y = cc * x**0 + cc * x**1 + cc * x**2
+y = 97
+``````
+
+ +

Write a loop using `enumerate(cc)` which computes the value `y` of any +polynomial, given `x` and `cc`.

+ +
+

## Solution

+
``````y = 0
+for i, c in enumerate(cc):
+    y = y + x**i * c
+``````
+
+
+
+ + +
+

## Key Points

+
+ +
• Use `for variable in sequence` to process the elements of a sequence one at a time.

+
• + +
• The body of a `for` loop must be indented.

+
• + +
• Use `len(thing)` to determine the length of something that contains other values.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/03-lists/index.html b/03-lists/index.html new file mode 100644 index 0000000000000000000000000000000000000000..3798832ec8bbee792e4c17da8b707e38047eabd0 --- /dev/null +++ b/03-lists/index.html @@ -0,0 +1,782 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Storing Multiple Values in Lists + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Storing Multiple Values in Lists

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I store many values together?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Explain what a list is.

+
• + +
• Create and index lists of simple values.

+
• + +
+
+
+ +
+ +

Just as a `for` loop is a way to do operations many times, +a list is a way to store many values. +Unlike NumPy arrays, +lists are built into the language (so we don’t have to load a library +to use them). +We create a list by putting values inside square brackets and separating the values with commas:

+ +
``````odds = [1, 3, 5, 7]
+print('odds are:', odds)
+``````
+
+ +
``````odds are: [1, 3, 5, 7]
+``````
+
+ +

We select individual elements from lists by indexing them:

+ +
``````print('first and last:', odds, odds[-1])
+``````
+
+ +
``````first and last: 1 7
+``````
+
+ +

and if we loop over a list, +the loop variable is assigned elements one at a time:

+ +
``````for number in odds:
+    print(number)
+``````
+
+ +
``````1
+3
+5
+7
+``````
+
+ +

There is one important difference between lists and strings: +we can change the values in a list, +but we cannot change individual characters in a string. +For example:

+ +
``````names = ['Newton', 'Darwing', 'Turing'] # typo in Darwin's name
+print('names is originally:', names)
+names = 'Darwin' # correct the name
+print('final value of names:', names)
+``````
+
+ +
``````names is originally: ['Newton', 'Darwing', 'Turing']
+final value of names: ['Newton', 'Darwin', 'Turing']
+``````
+
+ +

works, but:

+ +
``````name = 'Darwin'
+name = 'd'
+``````
+
+ +
``````---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-8-220df48aeb2e> in <module>()
+      1 name = 'Darwin'
+----> 2 name = 'd'
+
+TypeError: 'str' object does not support item assignment
+``````
+
+ +

does not.

+ +
+

## Ch-Ch-Ch-Changes

+ +

Data which can be modified in place is called mutable, +while data which cannot be modified is called immutable. +Strings and numbers are immutable. This does not mean that variables with string or number values are constants, +but when we want to change the value of a string or number variable, we can only replace the old value +with a completely new value.

+ +

Lists and arrays, on the other hand, are mutable: we can modify them after they have been created. We can +change individual elements, append new elements, or reorder the whole list. For some operations, like +sorting, we can choose whether to use a function that modifies the data in place or a function that returns a +modified copy and leaves the original unchanged.

+ +

Be careful when modifying data in place. If two variables refer to the same list, and you modify the list +value, it will change for both variables!

+ +
``````salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
+mySalsa = salsa        # <-- mySalsa and salsa point to the *same* list data in memory
+salsa = 'hot peppers'
+print('Ingredients in my salsa:', mySalsa)
+``````
+
+ +
``````Ingredients in my salsa: ['hot peppers', 'onions', 'cilantro', 'tomatoes']
+``````
+
+ +

If you want variables with mutable values to be independent, you +must make a copy of the value when you assign it.

+ +
``````salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
+mySalsa = list(salsa)        # <-- makes a *copy* of the list
+salsa = 'hot peppers'
+print('Ingredients in my salsa:', mySalsa)
+``````
+
+ +
``````Ingredients in my salsa: ['peppers', 'onions', 'cilantro', 'tomatoes']
+``````
+
+ +

Because of pitfalls like this, code which modifies data in place can be more difficult to understand. However, +it is often far more efficient to modify a large data structure in place than to create a modified copy for +every small change. You should consider both of these aspects when writing your code.

+
+ +
+

## Nested Lists

+

Since lists can contain any Python variable, it can even contain other lists.

+ +

For example, we could represent the products in the shelves of a small grocery shop:

+ +
``````x = [['pepper', 'zucchini', 'onion'],
+     ['cabbage', 'lettuce', 'garlic'],
+     ['apple', 'pear', 'banana']]
+``````
+
+ +

Here is a visual example of how indexing a list of lists `x` works:

+ +

+ + +

Using the previously declared list `x`, these would be the results of the +index operations shown in the image:

+ +
``````print([x])
+``````
+
+ +
``````[['pepper', 'zucchini', 'onion']]
+``````
+
+ +
``````print(x)
+``````
+
+ +
``````['pepper', 'zucchini', 'onion']
+``````
+
+ +
``````print(x)
+``````
+
+ +
``````'pepper'
+``````
+
+ +

Thanks to Hadley Wickham +for the image above.

+
+ +

There are many ways to change the contents of lists besides assigning new values to +individual elements:

+ +
``````odds.append(11)
+print('odds after adding a value:', odds)
+``````
+
+ +
``````odds after adding a value: [1, 3, 5, 7, 11]
+``````
+
+ +
``````del odds
+print('odds after removing the first element:', odds)
+``````
+
+ +
``````odds after removing the first element: [3, 5, 7, 11]
+``````
+
+ +
``````odds.reverse()
+print('odds after reversing:', odds)
+``````
+
+ +
``````odds after reversing: [11, 7, 5, 3]
+``````
+
+ +

While modifying in place, it is useful to remember that Python treats lists in a slightly counter-intuitive way.

+ +

If we make a list and (attempt to) copy it then modify in place, we can cause all sorts of trouble:

+ +
``````odds = [1, 3, 5, 7]
+primes = odds
+primes.append(2)
+print('primes:', primes)
+print('odds:', odds)
+``````
+
+ +
``````primes: [1, 3, 5, 7, 2]
+odds: [1, 3, 5, 7, 2]
+``````
+
+ +

This is because Python stores a list in memory, and then can use multiple names to refer to the same list. +If all we want to do is copy a (simple) list, we can use the `list` function, so we do not modify a list we did not mean to:

+ +
``````odds = [1, 3, 5, 7]
+primes = list(odds)
+primes.append(2)
+print('primes:', primes)
+print('odds:', odds)
+``````
+
+ +
``````primes: [1, 3, 5, 7, 2]
+odds: [1, 3, 5, 7]
+``````
+
+ +

This is different from how variables worked in lesson 1, and more similar to how a spreadsheet works.

+ +
+

## Turn a String Into a List

+ +

Use a for-loop to convert the string “hello” into a list of letters:

+ +
``````["h", "e", "l", "l", "o"]
+``````
+
+ +

Hint: You can create an empty list like this:

+ +
``````my_list = []
+``````
+
+ +
+

## Solution

+
``````my_list = []
+for char in "hello":
+    my_list.append(char)
+print(my_list)
+``````
+
+
+
+ +

Subsets of lists and strings can be accessed by specifying ranges of values in brackets, +similar to how we accessed ranges of positions in a Numpy array. +This is commonly referred to as “slicing” the list/string.

+ +
``````binomial_name = "Drosophila melanogaster"
+group = binomial_name[0:10]
+print("group:", group)
+
+species = binomial_name[11:24]
+print("species:", species)
+
+chromosomes = ["X", "Y", "2", "3", "4"]
+autosomes = chromosomes[2:5]
+print("autosomes:", autosomes)
+
+last = chromosomes[-1]
+print("last:", last)
+``````
+
+ +
``````group: Drosophila
+species: melanogaster
+autosomes: ["2", "3", "4"]
+last: 4
+``````
+
+ +
+

## Slicing From the End

+ +

Use slicing to access only the last four characters of a string or entries of a list.

+ +
``````string_for_slicing = "Observation date: 02-Feb-2013"
+list_for_slicing = [["fluorine", "F"], ["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
+``````
+
+ +
``````"2013"
+[["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
+``````
+
+ +

Would your solution work regardless of whether you knew beforehand +the length of the string or list +(e.g. if you wanted to apply the solution to a set of lists of different lengths)? +If not, try to change your approach to make it more robust.

+ +
+

## Solution

+

Use negative indices to count elements from the end of a container (such as list or string):

+ +
``````string_for_slicing[-4:]
+list_for_slicing[-4:]
+``````
+
+
+
+ +
+

## Non-Continuous Slices

+ +

So far we’ve seen how to use slicing to take single blocks +of successive entries from a sequence. +But what if we want to take a subset of entries +that aren’t next to each other in the sequence?

+ +

You can achieve this by providing a third argument +to the range within the brackets, called the step size. +The example below shows how you can take every third entry in a list:

+ +
``````primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
+subset = primes[0:12:3]
+print("subset", subset)
+``````
+
+ +
``````subset [2, 7, 17, 29]
+``````
+
+ +

Notice that the slice taken begins with the first entry in the range, +followed by entries taken at equally-spaced intervals (the steps) thereafter. +If you wanted to begin the subset with the third entry, +you would need to specify that as the starting point of the sliced range:

+ +
``````primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
+subset = primes[2:12:3]
+print("subset", subset)
+``````
+
+ +
``````subset [5, 13, 23, 37]
+``````
+
+ +

Use the step size argument to create a new string +that contains only every other character in the string +“In an octopus’s garden in the shade”

+ +
``````beatles = "In an octopus's garden in the shade"
+``````
+
+ +
``````I notpssgre ntesae
+``````
+
+ +
+

## Solution

+

To obtain every other character you need to provide a slice with the step +size of 2:

+ +
``````beatles[0:35:2]
+``````
+
+ +

You can also leave out the beginning and end of the slice to take the whole string +and provide only the step argument to go every second +element:

+ +
``````beatles[::2]
+``````
+
+
+
+ +

If you want to take a slice from the beginning of a sequence, you can omit the first index in the range:

+ +
``````date = "Monday 4 January 2016"
+day = date[0:6]
+print("Using 0 to begin range:", day)
+day = date[:6]
+print("Omitting beginning index:", day)
+``````
+
+ +
``````Using 0 to begin range: Monday
+Omitting beginning index: Monday
+``````
+
+ +

And similarly, you can omit the ending index in the range to take a slice to the very end of the sequence:

+ +
``````months = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]
+sond = months[8:12]
+print("With known last position:", sond)
+sond = months[8:len(months)]
+print("Using len() to get last entry:", sond)
+sond = months[8:]
+print("Omitting ending index:", sond)
+``````
+
+ +
``````With known last position: ["sep", "oct", "nov", "dec"]
+Using len() to get last entry: ["sep", "oct", "nov", "dec"]
+Omitting ending index: ["sep", "oct", "nov", "dec"]
+``````
+
+ +
+

## Swapping the contents of variables

+ +

Explain what the overall effect of this code is:

+ +
``````left = 'L'
+right = 'R'
+
+temp = left
+left = right
+right = temp
+``````
+
+ +

Compare it to:

+ +
``````left, right = [right, left]
+``````
+
+ +

Do they always do the same thing? +Which do you find easier to read?

+ +
+

## Solution

+

Both examples exchange the values of `left` and `right`:

+ +
``````print(left, right)
+``````
+
+ +
``````R L
+``````
+
+ +

In the first case we used a temporary variable `temp` to keep the value of `left` before we overwrite it with the value of `right`. In the second case, `right` and `left` are packed into a list and then unpacked into `left` and `right`.

+
+
+ +
+

+ +

`+` usually means addition, but when used on strings or lists, it means “concatenate”. +Given that, what do you think the multiplication operator `*` does on lists? +In particular, what will be the output of the following code?

+ +
``````counts = [2, 4, 6, 8, 10]
+repeats = counts * 2
+print(repeats)
+``````
+
+ +
+
1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]`
2. +
3. `[4, 8, 12, 16, 20]`
4. +
5. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]`
6. +
7. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]`
8. +
+ +

The technical term for this is operator overloading: +a single operator, like `+` or `*`, +can do different things depending on what it’s applied to.

+ +
+

## Solution

+ +

The multiplication operator `*` used on a list replicates elements of the list and concatenates them together:

+ +
``````[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
+``````
+
+ +

It’s equivalent to:

+ +
``````counts + counts
+``````
+
+
+
+ + +
+

## Key Points

+
+ +
• `[value1, value2, value3, ...]` creates a list.

+
• + +
• Lists are indexed and sliced in the same way as strings and arrays.

+
• + +
• Lists are mutable (i.e., their values can be changed in place).

+
• + +
• Strings are immutable (i.e., the characters in them cannot be changed).

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/04-files/index.html b/04-files/index.html new file mode 100644 index 0000000000000000000000000000000000000000..cdc2f8a4d60f6a211239cf5fd439444625467d6e --- /dev/null +++ b/04-files/index.html @@ -0,0 +1,443 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Analyzing Data from Multiple Files + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Analyzing Data from Multiple Files

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 20 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I do the same operations on many different files?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Use a library function to get a list of filenames that match a wildcard pattern.

+
• + +
• Write a `for` loop to process multiple files.

+
• + +
+
+
+ +
+ +

We now have almost everything we need to process all our data files. +The only thing that’s missing is a library with a rather unpleasant name:

+ +
``````import glob
+``````
+
+ +

The `glob` library contains a function, also called `glob`, +that finds files and directories whose names match a pattern. +We provide those patterns as strings: +the character `*` matches zero or more characters, +while `?` matches any one character. +We can use this to get the names of all the CSV files in the current directory:

+ +
``````print(glob.glob('inflammation*.csv'))
+``````
+
+ +
``````['inflammation-05.csv', 'inflammation-11.csv', 'inflammation-12.csv', 'inflammation-08.csv', 'inflammation-03.csv', 'inflammation-06.csv', 'inflammation-09.csv', 'inflammation-07.csv', 'inflammation-10.csv', 'inflammation-02.csv', 'inflammation-04.csv', 'inflammation-01.csv']
+``````
+
+ +

As these examples show, +`glob.glob`’s result is a list of file and directory paths in arbitrary order. +This means we can loop over it +to do something with each filename in turn. +In our case, +the “something” we want to do is generate a set of plots for each file in our inflammation dataset. +If we want to start by analyzing just the first three files in alphabetical order, we can use the `sorted` built-in function to generate a new sorted list from the `glob.glob` output:

+ +
``````import numpy
+import matplotlib.pyplot
+
+filenames = sorted(glob.glob('inflammation*.csv'))
+filenames = filenames[0:3]
+for f in filenames:
+    print(f)
+
+
+    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
+
+    axes1 = fig.add_subplot(1, 3, 1)
+    axes2 = fig.add_subplot(1, 3, 2)
+    axes3 = fig.add_subplot(1, 3, 3)
+
+    axes1.set_ylabel('average')
+    axes1.plot(numpy.mean(data, axis=0))
+
+    axes2.set_ylabel('max')
+    axes2.plot(numpy.max(data, axis=0))
+
+    axes3.set_ylabel('min')
+    axes3.plot(numpy.min(data, axis=0))
+
+    fig.tight_layout()
+    matplotlib.pyplot.show()
+``````
+
+ +
``````inflammation-01.csv
+``````
+
+ + + +
``````inflammation-02.csv
+``````
+
+ + + +
``````inflammation-03.csv
+``````
+
+ + + +

Sure enough, +the maxima of the first two data sets show exactly the same ramp as the first, +and their minima show the same staircase structure; +a different situation has been revealed in the third dataset, +where the maxima are a bit less regular, but the minima are consistently zero.

+ +
+

## Plotting Differences

+ +

Plot the difference between the average of the first dataset +and the average of the second dataset, +i.e., the difference between the leftmost plot of the first two figures.

+ +
+

## Solution

+
``````import glob
+import numpy
+import matplotlib.pyplot
+
+filenames = glob.glob('inflammation*.csv')
+
+
+fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
+
+matplotlib.pyplot.ylabel('Difference in average')
+matplotlib.pyplot.plot(data0.mean(axis=0) - data1.mean(axis=0))
+
+fig.tight_layout()
+matplotlib.pyplot.show()
+``````
+
+
+
+ +
+

## Generate Composite Statistics

+ +

Use each of the files once to generate a dataset containing values averaged over all patients:

+ +
``````filenames = glob.glob('inflammation*.csv')
+composite_data = numpy.zeros((60,40))
+for f in filenames:
+    # sum each new file's data into as it's read
+    #
+# and then divide the composite_data by number of samples
+composite_data /= len(filenames)
+``````
+
+ +

Then use pyplot to generate average, max, and min for all patients.

+ +
+

## Solution

+
``````import glob
+import numpy
+import matplotlib.pyplot
+
+filenames = glob.glob('data/inflammation*.csv')
+composite_data = numpy.zeros((60,40))
+
+for f in filenames:
+    data = numpy.loadtxt(fname = f, delimiter=',')
+    composite_data += data
+
+composite_data/=len(filenames)
+
+fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
+
+
+axes1.set_ylabel('average')
+axes1.plot(numpy.mean(composite_data, axis=0))
+
+axes2.set_ylabel('max')
+axes2.plot(numpy.max(composite_data, axis=0))
+
+axes3.set_ylabel('min')
+axes3.plot(numpy.min(composite_data, axis=0))
+
+fig.tight_layout()
+
+matplotlib.pyplot.show()
+``````
+
+
+
+ + +
+

## Key Points

+
+ +
• Use `glob.glob(pattern)` to create a list of files whose names match a pattern.

+
• + +
• Use `*` in a pattern to match zero or more characters, and `?` to match any single character.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/05-cond/index.html b/05-cond/index.html new file mode 100644 index 0000000000000000000000000000000000000000..9816f73778b3dba51b2616823d17086207993065 --- /dev/null +++ b/05-cond/index.html @@ -0,0 +1,687 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Making Choices + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Making Choices

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can my programs do different things based on data values?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Write conditional statements including `if`, `elif`, and `else` branches.

+
• + +
• Correctly evaluate expressions containing `and` and `or`.

+
• + +
+
+
+ +
+ +

In our last lesson, we discovered something suspicious was going on +in our inflammation data by drawing some plots. +How can we use Python to automatically recognize the different features we saw, +and take a different action for each? In this lesson, we’ll learn how to write code that +runs only when certain conditions are true.

+ +

## Conditionals

+ +

We can ask Python to take different actions, depending on a condition, with an `if` statement:

+ +
``````num = 37
+if num > 100:
+    print('greater')
+else:
+    print('not greater')
+print('done')
+``````
+
+ +
``````not greater
+done
+``````
+
+ +

The second line of this code uses the keyword `if` to tell Python that we want to make a choice. +If the test that follows the `if` statement is true, +the body of the `if` +(i.e., the lines indented underneath it) are executed. +If the test is false, +the body of the `else` is executed instead. +Only one or the other is ever executed:

+ + + +

Conditional statements don’t have to include an `else`. +If there isn’t one, +Python simply does nothing if the test is false:

+ +
``````num = 53
+print('before conditional...')
+if num > 100:
+    print('53 is greater than 100')
+print('...after conditional')
+``````
+
+ +
``````before conditional...
+...after conditional
+``````
+
+ +

We can also chain several tests together using `elif`, +which is short for “else if”. +The following Python code uses `elif` to print the sign of a number.

+ +
``````num = -3
+
+if num > 0:
+    print(num, "is positive")
+elif num == 0:
+    print(num, "is zero")
+else:
+    print(num, "is negative")
+``````
+
+ +
``````"-3 is negative"
+``````
+
+ +

One important thing to notice in the code above is that we use a double equals sign `==` to test for equality +rather than a single equals sign +because the latter is used to mean assignment.

+ +

We can also combine tests using `and` and `or`. +`and` is only true if both parts are true:

+ +
``````if (1 > 0) and (-1 > 0):
+    print('both parts are true')
+else:
+    print('at least one part is false')
+``````
+
+ +
``````at least one part is false
+``````
+
+ +

while `or` is true if at least one part is true:

+ +
``````if (1 < 0) or (-1 < 0):
+    print('at least one test is true')
+``````
+
+ +
``````at least one test is true
+``````
+
+ +

## Checking our Data

+ +

Now that we’ve seen how conditionals work, +we can use them to check for the suspicious features we saw in our inflammation data. +In the first couple of plots, the maximum inflammation per day +seemed to rise like a straight line, one unit per day. +We can check for this inside the `for` loop we wrote with the following conditional:

+ +
``````if numpy.max(data, axis=0) == 0 and numpy.max(data, axis=0) == 20:
+    print('Suspicious looking maxima!')
+``````
+
+ +

We also saw a different problem in the third dataset; +the minima per day were all zero (looks like a healthy person snuck into our study). +We can also check for this with an `elif` condition:

+ +
``````elif numpy.sum(numpy.min(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+``````
+
+ +

And if neither of these conditions are true, we can use `else` to give the all-clear:

+ +
``````else:
+    print('Seems OK!')
+``````
+
+ +

Let’s test that out:

+ +
``````data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+if numpy.max(data, axis=0) == 0 and numpy.max(data, axis=0) == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.min(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+``````
+
+ +
``````Suspicious looking maxima!
+``````
+
+ +
``````data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
+if numpy.max(data, axis=0) == 0 and numpy.max(data, axis=0) == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.min(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+``````
+
+ +
``````Minima add up to zero!
+``````
+
+ +

In this way, +we have asked Python to do something different depending on the condition of our data. +Here we printed messages in all cases, +but we could also imagine not using the `else` catch-all +so that messages are only printed when something is wrong, +freeing us from having to manually examine every plot for features we’ve seen before.

+ +
+

## How Many Paths?

+ +

Consider this code:

+ +
``````if 4 > 5:
+    print('A')
+elif 4 == 5:
+    print('B')
+elif 4 < 5:
+    print('C')
+``````
+
+ +

Which of the following would be printed if you were to run this code? +Why did you pick this answer?

+ +
+
1. A
2. +
3. B
4. +
5. C
6. +
7. B and C
8. +
+ +
+

## Solution

+

C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true, +but `4 < 5` is true.

+
+
+ +
+

## What Is Truth?

+ +

`True` and `False` are special words in Python called `booleans` +which represent true and false statements. +However, they aren’t the only values in Python that are true and false. +In fact, any value can be used in an `if` or `elif`. +After reading and running the code below, +explain what the rule is for which values are considered true and which are considered false.

+ +
``````if '':
+    print('empty string is true')
+if 'word':
+    print('word is true')
+if []:
+    print('empty list is true')
+if [1, 2, 3]:
+    print('non-empty list is true')
+if 0:
+    print('zero is true')
+if 1:
+    print('one is true')
+``````
+
+
+ +
+

## That’s Not Not What I Meant

+ +

Sometimes it is useful to check whether some condition is not true. +The Boolean operator `not` can do this explicitly. +After reading and running the code below, +write some `if` statements that use `not` to test the rule +that you formulated in the previous challenge.

+ +
``````if not '':
+    print('empty string is not true')
+if not 'word':
+    print('word is not true')
+if not not True:
+    print('not not True is true')
+``````
+
+
+ +
+

## Close Enough

+ +

Write some conditions that print `True` if the variable `a` is within 10% of the variable `b` +and `False` otherwise. +Compare your implementation with your partner’s: +do you get the same answer for all possible pairs of numbers?

+ +
+

## Solution 1

+
``````a = 5
+b = 5.1
+
+if abs(a - b) < 0.1 * abs(b):
+    print('True')
+else:
+    print('False')
+``````
+
+
+ +
+

## Solution 2

+
``````print(abs(a - b) < 0.1 * abs(b))
+``````
+
+ +

This works because the Booleans `True` and `False` +have string representations which can be printed.

+
+
+ +
+

## In-Place Operators

+ +

Python (and most other languages in the C family) provides in-place operators +that work like this:

+ +
``````x = 1  # original value
+x += 1 # add one to x, assigning result back to x
+x *= 3 # multiply x by 3
+print(x)
+``````
+
+ +
``````6
+``````
+
+ +

Write some code that sums the positive and negative numbers in a list separately, +using in-place operators. +Do you think the result is more or less readable than writing the same without in-place operators?

+ +
+

## Solution

+
``````positive_sum = 0
+negative_sum = 0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+    if num > 0:
+        positive_sum += num
+    elif num == 0:
+        pass
+    else:
+        negative_sum += num
+print(positive_sum, negative_sum)
+``````
+
+ +

Here `pass` means “don’t do anything”. +In this particular case, it’s not actually needed, since if `num == 0` neither +sum needs to change, but it illustrates the use of `elif` and `pass`.

+
+
+ +
+

## Sorting a List Into Buckets

+ +

The folder containing our data files has large data sets whose names start with +“inflammation-“, small ones whose names with “small-“, and possibly other files +whose sizes we don’t know. Our goal is to sort those files into three lists +called `large_files`, `small_files`, and `other_files` respectively. Add code +to the template below to do this. Note that the string method +`startswith` +returns `True` if and only if the string it is called on starts with the string +passed as an argument.

+ +
``````files = ['inflammation-01.csv', 'myscript.py', 'inflammation-02.csv', 'small-01.csv', 'small-02.csv']
+large_files = []
+small_files = []
+other_files = []
+``````
+
+ +

+ +
+
1. loop over the names of the files
2. +
3. figure out which group each filename belongs
4. +
5. append the filename to that list
6. +
+ +

In the end the three lists should be:

+ +
``````large_files = ['inflammation-01.csv', 'inflammation-02.csv']
+small_files = ['small-01.csv', 'small-02.csv']
+other_files = ['myscript.py']
+``````
+
+ +
+

## Solution

+
``````for file in files:
+    if 'inflammation-' in file:
+        large_files.append(file)
+    elif 'small-' in file:
+        small_files.append(file)
+    else:
+        other_files.append(file)
+
+print(large_files)
+print(small_files)
+print(other_files)
+``````
+
+
+
+ +
+

## Counting Vowels

+ +
+
1. Write a loop that counts the number of vowels in a character string.
2. +
3. Test it on a few individual words and full sentences.
4. +
5. Once you are done, compare your solution to your neighbor’s. +Did you make the same decisions about how to handle the letter ‘y’ +(which some people think is a vowel, and some do not)?
6. +
+ +
+

## Solution

+
``````vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+    if char in vowels:
+        count += 1
+
+print("The number of vowels in this string is " + str(count))
+``````
+
+
+
+ + +
+

## Key Points

+
+ +
• Use `if condition` to start a conditional statement, `elif condition` to provide additional tests, and `else` to provide a default.

+
• + +
• The bodies of the branches of conditional statements must be indented.

+
• + +
• Use `==` to test for equality.

+
• + +
• `X and Y` is only true if both `X` and `Y` are true.

+
• + +
• `X or Y` is true if either `X` or `Y`, or both, are true.

+
• + +
• Zero, the empty string, and the empty list are considered false; all other numbers, strings, and lists are considered true.

+
• + +
• Nest loops to operate on multi-dimensional data.

+
• + +
• Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/06-func/index.html b/06-func/index.html new file mode 100644 index 0000000000000000000000000000000000000000..16b1dec4201a5bff42314ff32bf514d7f04846b3 --- /dev/null +++ b/06-func/index.html @@ -0,0 +1,1256 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Creating Functions + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Creating Functions

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I define new functions?

+
• + +
• What’s the difference between defining and calling a function?

+
• + +
• What happens when I call a function?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Define a function that takes parameters.

+
• + +
• Return a value from a function.

+
• + +
• Test and debug a function.

+
• + +
• Set default values for function parameters.

+
• + +
• Explain why we should divide programs into small, single-purpose functions.

+
• + +
+
+
+ +
+ +

At this point, +we’ve written code to draw some interesting features in our inflammation data, +loop over all our data files to quickly draw these plots for each of them, +and have Python make decisions based on what it sees in our data. +But, our code is getting pretty long and complicated; +what if we had thousands of datasets, +and didn’t want to generate a figure for every single one? +Commenting out the figure-drawing code is a nuisance. +Also, what if we want to use that code again, +on a different dataset or at a different point in our program? +Cutting and pasting it is going to make our code get very long and very repetitive, +very quickly. +We’d like a way to package our code so that it is easier to reuse, +and Python provides for this by letting us define things called ‘functions’ — +a shorthand way of re-executing longer pieces of code.

+ +

Let’s start by defining a function `fahr_to_kelvin` that converts temperatures from Fahrenheit to Kelvin:

+ +
``````def fahr_to_kelvin(temp):
+    return ((temp - 32) * (5/9)) + 273.15
+``````
+
+ + + + + +

The function definition opens with the keyword `def` followed by the +name of the function and a parenthesized list of parameter names. The +body of the function — the +statements that are executed when it runs — is indented below the +definition line.

+ +

When we call the function, +the values we pass to it are assigned to those variables +so that we can use them inside the function. +Inside the function, +we use a return statement to send a result back to whoever asked for it.

+ +

Let’s try running our function.

+ +
``````fahr_to_kelvin(32)
+``````
+
+ +

This command should call our function, using “32” as the input and return the function value.

+ +

In fact, calling our own function is no different from calling any other function:

+
``````print('freezing point of water:', fahr_to_kelvin(32))
+print('boiling point of water:', fahr_to_kelvin(212))
+``````
+
+ +
``````freezing point of water: 273.15
+boiling point of water: 373.15
+``````
+
+ +

We’ve successfully called the function that we defined, +and we have access to the value that we returned.

+ +
+

## Integer Division

+ +

We are using Python 3, where division always returns a floating point number:

+ +
``````\$ python3 -c "print(5/9)"
+``````
+
+ +
``````0.5555555555555556
+``````
+
+ +

Unfortunately, this wasn’t the case in Python 2:

+ +
``````5/9
+``````
+
+ +
``````0
+``````
+
+ +

If you are using Python 2 and want to keep the fractional part of division +you need to convert one or the other number to floating point:

+ +
``````float(5)/9
+``````
+
+ +
``````0.555555555556
+``````
+
+ +
``````5/float(9)
+``````
+
+ +
``````0.555555555556
+``````
+
+ +
``````5.0/9
+``````
+
+ +
``````0.555555555556
+``````
+
+
``````5/9.0
+``````
+
+ +
``````0.555555555556
+``````
+
+ +

And if you want an integer result from division in Python 3, +use a double-slash:

+ +
``````4//2
+``````
+
+ +
``````2
+``````
+
+ +
``````3//2
+``````
+
+ +
``````1
+``````
+
+
+ +

## Composing Functions

+ +

Now that we’ve seen how to turn Fahrenheit into Kelvin, +it’s easy to turn Kelvin into Celsius:

+ +
``````def kelvin_to_celsius(temp_k):
+    return temp_k - 273.15
+
+print('absolute zero in Celsius:', kelvin_to_celsius(0.0))
+``````
+
+ +
``````absolute zero in Celsius: -273.15
+``````
+
+ +

What about converting Fahrenheit to Celsius? +We could write out the formula, +but we don’t need to. +Instead, +we can compose the two functions we have already created:

+ +
``````def fahr_to_celsius(temp_f):
+    temp_k = fahr_to_kelvin(temp_f)
+    result = kelvin_to_celsius(temp_k)
+    return result
+
+print('freezing point of water in Celsius:', fahr_to_celsius(32.0))
+``````
+
+ +
``````freezing point of water in Celsius: 0.0
+``````
+
+ +

This is our first taste of how larger programs are built: +we define basic operations, +then combine them in ever-large chunks to get the effect we want. +Real-life functions will usually be larger than the ones shown here — typically half a dozen to a few dozen lines — but +they shouldn’t ever be much longer than that, +or the next person who reads it won’t be able to understand what’s going on.

+ +

## Tidying up

+ +

Now that we know how to wrap bits of code up in functions, +we can make our inflammation analysis easier to read and easier to reuse. +First, let’s make an `analyze` function that generates our plots:

+ +
``````def analyze(filename):
+
+
+    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
+
+    axes1 = fig.add_subplot(1, 3, 1)
+    axes2 = fig.add_subplot(1, 3, 2)
+    axes3 = fig.add_subplot(1, 3, 3)
+
+    axes1.set_ylabel('average')
+    axes1.plot(numpy.mean(data, axis=0))
+
+    axes2.set_ylabel('max')
+    axes2.plot(numpy.max(data, axis=0))
+
+    axes3.set_ylabel('min')
+    axes3.plot(numpy.min(data, axis=0))
+
+    fig.tight_layout()
+    matplotlib.pyplot.show()
+``````
+
+ +

and another function called `detect_problems` that checks for those systematics +we noticed:

+ +
``````def detect_problems(filename):
+
+
+    if numpy.max(data, axis=0) == 0 and numpy.max(data, axis=0) == 20:
+        print('Suspicious looking maxima!')
+    elif numpy.sum(numpy.min(data, axis=0)) == 0:
+        print('Minima add up to zero!')
+    else:
+        print('Seems OK!')
+``````
+
+ +

Notice that rather than jumbling this code together in one giant `for` loop, +we can now read and reuse both ideas separately. +We can reproduce the previous analysis with a much simpler `for` loop:

+ +
``````for f in filenames[:3]:
+    print(f)
+    analyze(f)
+    detect_problems(f)
+``````
+
+ +

By giving our functions human-readable names, +we can more easily read and understand what is happening in the `for` loop. +Even better, if at some later date we want to use either of those pieces of code again, +we can do so in a single line.

+ +

## Testing and Documenting

+ +

Once we start putting things in functions so that we can re-use them, +we need to start testing that those functions are working correctly. +To see how to do this, +let’s write a function to center a dataset around a particular value:

+ +
``````def center(data, desired):
+    return (data - numpy.mean(data)) + desired
+``````
+
+ +

We could test this on our actual data, +but since we don’t know what the values ought to be, +it will be hard to tell if the result was correct. +Instead, +let’s use NumPy to create a matrix of 0’s +and then center that around 3:

+ +
``````z = numpy.zeros((2,2))
+print(center(z, 3))
+``````
+
+ +
``````[[ 3.  3.]
+ [ 3.  3.]]
+``````
+
+ +

That looks right, +so let’s try `center` on our real data:

+ +
``````data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(center(data, 0))
+``````
+
+ +
``````[[-6.14875 -6.14875 -5.14875 ..., -3.14875 -6.14875 -6.14875]
+ [-6.14875 -5.14875 -4.14875 ..., -5.14875 -6.14875 -5.14875]
+ [-6.14875 -5.14875 -5.14875 ..., -4.14875 -5.14875 -5.14875]
+ ...,
+ [-6.14875 -5.14875 -5.14875 ..., -5.14875 -5.14875 -5.14875]
+ [-6.14875 -6.14875 -6.14875 ..., -6.14875 -4.14875 -6.14875]
+ [-6.14875 -6.14875 -5.14875 ..., -5.14875 -5.14875 -6.14875]]
+``````
+
+ +

It’s hard to tell from the default output whether the result is correct, +but there are a few simple tests that will reassure us:

+ +
``````print('original min, mean, and max are:', numpy.min(data), numpy.mean(data), numpy.max(data))
+centered = center(data, 0)
+print('min, mean, and max of centered data are:', numpy.min(centered), numpy.mean(centered), numpy.max(centered))
+``````
+
+ +
``````original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of centered data are: -6.14875 2.84217094304e-16 13.85125
+``````
+
+ +

That seems almost right: +the original mean was about 6.1, +so the lower bound from zero is now about -6.1. +The mean of the centered data isn’t quite zero — we’ll explore why not in the challenges — but it’s pretty close. +We can even go further and check that the standard deviation hasn’t changed:

+ +
``````print('std dev before and after:', numpy.std(data), numpy.std(centered))
+``````
+
+ +
``````std dev before and after: 4.61383319712 4.61383319712
+``````
+
+ +

Those values look the same, +but we probably wouldn’t notice if they were different in the sixth decimal place. +Let’s do this instead:

+ +
``````print('difference in standard deviations before and after:', numpy.std(data) - numpy.std(centered))
+``````
+
+ +
``````difference in standard deviations before and after: -3.5527136788e-15
+``````
+
+ +

Again, +the difference is very small. +It’s still possible that our function is wrong, +but it seems unlikely enough that we should probably get back to doing our analysis. +We have one more task first, though: +we should write some documentation for our function +to remind ourselves later what it’s for and how to use it.

+ +

+ +
``````# center(data, desired): return a new array containing the original data centered around the desired value.
+def center(data, desired):
+    return (data - numpy.mean(data)) + desired
+``````
+
+ +

There’s a better way, though. +If the first thing in a function is a string that isn’t assigned to a variable, +that string is attached to the function as its documentation:

+ +
``````def center(data, desired):
+    '''Return a new array containing the original data centered around the desired value.'''
+    return (data - numpy.mean(data)) + desired
+``````
+
+ +

This is better because we can now ask Python’s built-in help system to show us the documentation for the function:

+ +
``````help(center)
+``````
+
+ +
``````Help on function center in module __main__:
+
+center(data, desired)
+    Return a new array containing the original data centered around the desired value.
+``````
+
+ +

A string like this is called a docstring. +We don’t need to use triple quotes when we write one, +but if we do, +we can break the string across multiple lines:

+ +
``````def center(data, desired):
+    '''Return a new array containing the original data centered around the desired value.
+    Example: center([1, 2, 3], 0) => [-1, 0, 1]'''
+    return (data - numpy.mean(data)) + desired
+
+help(center)
+``````
+
+ +
``````Help on function center in module __main__:
+
+center(data, desired)
+    Return a new array containing the original data centered around the desired value.
+    Example: center([1, 2, 3], 0) => [-1, 0, 1]
+``````
+
+ +

## Defining Defaults

+ +

We have passed parameters to functions in two ways: +directly, as in `type(data)`, +and by name, as in `numpy.loadtxt(fname='something.csv', delimiter=',')`. +In fact, +we can pass the filename to `loadtxt` without the `fname=`:

+ +
``````numpy.loadtxt('inflammation-01.csv', delimiter=',')
+``````
+
+ +
``````array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+``````
+
+ +

but we still need to say `delimiter=`:

+ +
``````numpy.loadtxt('inflammation-01.csv', ',')
+``````
+
+ +
``````---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-26-e3bc6cf4fd6a> in <module>()
+
+    775     try:
+    776         # Make sure we're dealing with a proper dtype
+--> 777         dtype = np.dtype(dtype)
+    778         defconv = _getconv(dtype)
+    779
+
+TypeError: data type "," not understood
+``````
+
+ +

To understand what’s going on, +and make our own functions easier to use, +let’s re-define our `center` function like this:

+ +
``````def center(data, desired=0.0):
+    '''Return a new array containing the original data centered around the desired value (0 by default).
+    Example: center([1, 2, 3], 0) => [-1, 0, 1]'''
+    return (data - numpy.mean(data)) + desired
+``````
+
+ +

The key change is that the second parameter is now written `desired=0.0` instead of just `desired`. +If we call the function with two arguments, +it works as it did before:

+ +
``````test_data = numpy.zeros((2, 2))
+print(center(test_data, 3))
+``````
+
+ +
``````[[ 3.  3.]
+ [ 3.  3.]]
+``````
+
+ +

But we can also now call it with just one parameter, +in which case `desired` is automatically assigned the default value of 0.0:

+ +
``````more_data = 5 + numpy.zeros((2, 2))
+print('data before centering:')
+print(more_data)
+print('centered data:')
+print(center(more_data))
+``````
+
+ +
``````data before centering:
+[[ 5.  5.]
+ [ 5.  5.]]
+centered data:
+[[ 0.  0.]
+ [ 0.  0.]]
+``````
+
+ +

This is handy: +if we usually want a function to work one way, +but occasionally need it to do something else, +we can allow people to pass a parameter when they need to +but provide a default to make the normal case easier. +The example below shows how Python matches values to parameters:

+ +
``````def display(a=1, b=2, c=3):
+    print('a:', a, 'b:', b, 'c:', c)
+
+print('no parameters:')
+display()
+print('one parameter:')
+display(55)
+print('two parameters:')
+display(55, 66)
+``````
+
+ +
``````no parameters:
+a: 1 b: 2 c: 3
+one parameter:
+a: 55 b: 2 c: 3
+two parameters:
+a: 55 b: 66 c: 3
+``````
+
+ +

As this example shows, +parameters are matched up from left to right, +and any that haven’t been given a value explicitly get their default value. +We can override this behavior by naming the value as we pass it in:

+ +
``````print('only setting the value of c')
+display(c=77)
+``````
+
+ +
``````only setting the value of c
+a: 1 b: 2 c: 77
+``````
+
+ +

With that in hand, +let’s look at the help for `numpy.loadtxt`:

+ +
``````help(numpy.loadtxt)
+``````
+
+ +
``````Help on function loadtxt in module numpy.lib.npyio:
+
+    Load data from a text file.
+
+    Each row in the text file must have the same number of values.
+
+    Parameters
+    ----------
+    fname : file or str
+        File, filename, or generator to read.  If the filename extension is
+        ``.gz`` or ``.bz2``, the file is first decompressed. Note that
+        generators should return byte strings for Python 3k.
+    dtype : data-type, optional
+        Data-type of the resulting array; default: float.  If this is a
+        record data-type, the resulting array will be 1-dimensional, and
+        each row will be interpreted as an element of the array.  In this
+        case, the number of columns used must match the number of fields in
+        the data-type.
+        The character used to indicate the start of a comment;
+        default: '#'.
+    delimiter : str, optional
+        The string used to separate values.  By default, this is any
+        whitespace.
+    converters : dict, optional
+        A dictionary mapping column number to a function that will convert
+        that column to a float.  E.g., if column 0 is a date string:
+        ``converters = {0: datestr2num}``.  Converters can also be used to
+        provide a default value for missing data (but see also `genfromtxt`):
+        ``converters = {3: lambda s: float(s.strip() or 0)}``.  Default: None.
+    skiprows : int, optional
+        Skip the first `skiprows` lines; default: 0.
+    usecols : sequence, optional
+        Which columns to read, with 0 being the first.  For example,
+        ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
+        The default, None, results in all columns being read.
+    unpack : bool, optional
+        If True, the returned array is transposed, so that arguments may be
+        unpacked using ``x, y, z = loadtxt(...)``.  When used with a record
+        data-type, arrays are returned for each field.  Default is False.
+    ndmin : int, optional
+        The returned array will have at least `ndmin` dimensions.
+        Otherwise mono-dimensional axes will be squeezed.
+        Legal values: 0 (default), 1 or 2.
+
+    Returns
+    -------
+    out : ndarray
+        Data read from the text file.
+
+    --------
+    genfromtxt : Load data with missing values handled as specified.
+
+    Notes
+    -----
+    This function aims to be a fast reader for simply formatted files.  The
+    `genfromtxt` function provides more sophisticated handling of, e.g.,
+    lines with missing values.
+
+    Examples
+    --------
+    >>> from StringIO import StringIO   # StringIO behaves like a file object
+    >>> c = StringIO("0 1\n2 3")
+    array([[ 0.,  1.],
+           [ 2.,  3.]])
+
+    >>> d = StringIO("M 21 72\nF 35 58")
+    >>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
+    ...                      'formats': ('S1', 'i4', 'f4')})
+    array([('M', 21, 72.0), ('F', 35, 58.0)],
+          dtype=[('gender', '|S1'), ('age', '<i4'), ('weight', '<f4')])
+
+    >>> c = StringIO("1,0,2\n3,0,4")
+    >>> x, y = np.loadtxt(c, delimiter=',', usecols=(0, 2), unpack=True)
+    >>> x
+    array([ 1.,  3.])
+    >>> y
+    array([ 2.,  4.])
+``````
+
+ +

There’s a lot of information here, +but the most important part is the first couple of lines:

+ +
``````loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None,
+        unpack=False, ndmin=0)
+``````
+
+ +

This tells us that `loadtxt` has one parameter called `fname` that doesn’t have a default value, +and eight others that do. +If we call the function like this:

+ +
``````numpy.loadtxt('inflammation-01.csv', ',')
+``````
+
+ +

then the filename is assigned to `fname` (which is what we want), +but the delimiter string `','` is assigned to `dtype` rather than `delimiter`, +because `dtype` is the second parameter in the list. However `','` isn’t a known `dtype` so +our code produced an error message when we tried to run it. +When we call `loadtxt` we don’t have to provide `fname=` for the filename because it’s the +first item in the list, but if we want the `','` to be assigned to the variable `delimiter`, +we do have to provide `delimiter=` for the second parameter since `delimiter` is not +the second parameter in the list.

+ +

+ +

Consider these two functions:

+ +
``````def s(p):
+    a = 0
+    for v in p:
+        a += v
+    m = a / len(p)
+    d = 0
+    for v in p:
+        d += (v - m) * (v - m)
+    return numpy.sqrt(d / (len(p) - 1))
+
+def std_dev(sample):
+    sample_sum = 0
+    for value in sample:
+        sample_sum += value
+
+    sample_mean = sample_sum / len(sample)
+
+    sum_squared_devs = 0
+    for value in sample:
+        sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+    return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
+``````
+
+ +

The functions `s` and `std_dev` are computationally equivalent (they +both calculate the sample standard deviation), but to a human reader, +they look very different. You probably found `std_dev` much easier to +read and understand than `s`.

+ +

As this example illustrates, both documentation and a programmer’s +coding style combine to determine how easy it is for others to read +and understand the programmer’s code. Choosing meaningful variable +names and using blank spaces to break the code into logical “chunks” +are helpful techniques for producing readable code. This is useful +not only for sharing code with others, but also for the original +programmer. If you need to revisit code that you wrote months ago and +haven’t thought about since then, you will appreciate the value of +readable code!

+ +
+

## Combining Strings

+ +

“Adding” two strings produces their concatenation: +`'a' + 'b'` is `'ab'`. +Write a function called `fence` that takes two parameters called `original` and `wrapper` +and returns a new string that has the wrapper character at the beginning and end of the original. +A call to your function should look like this:

+ +
``````print(fence('name', '*'))
+``````
+
+ +
``````*name*
+``````
+
+ +
+

## Solution

+
``````def fence(original, wrapper):
+    return wrapper + original + wrapper
+``````
+
+
+
+ +
+

## Selecting Characters From Strings

+ +

If the variable `s` refers to a string, +then `s` is the string’s first character +and `s[-1]` is its last. +Write a function called `outer` +that returns a string made up of just the first and last characters of its input. +A call to your function should look like this:

+ +
``````print(outer('helium'))
+``````
+
+ +
``````hm
+``````
+
+ +
+

## Solution

+
``````def outer(input_string):
+    return input_string + input_string[-1]
+``````
+
+
+
+ +
+

## Rescaling an Array

+ +

Write a function `rescale` that takes an array as input +and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0. +(Hint: If `L` and `H` are the lowest and highest values in the original array, +then the replacement for a value `v` should be `(v-L) / (H-L)`.)

+ +
+

## Solution

+
``````def rescale(input_array):
+    L = numpy.min(input_array)
+    H = numpy.max(input_array)
+    output_array = (input_array - L) / (H - L)
+    return output_array
+``````
+
+
+
+ +
+

## Testing and Documenting Your Function

+ +

Run the commands `help(numpy.arange)` and `help(numpy.linspace)` +to see how to use these functions to generate regularly-spaced values, +then use those values to test your `rescale` function. +Once you’ve successfully tested your function, +add a docstring that explains what it does.

+ +
+

## Solution

+
``````'''Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
+       0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
+'''
+``````
+
+
+
+ +
+

## Defining Defaults

+ +

Rewrite the `rescale` function so that it scales data to lie between `0.0` and `1.0` by default, +but will allow the caller to specify lower and upper bounds if they want. +Compare your implementation to your neighbor’s: +do the two functions always behave the same way?

+ +
+

## Solution

+
``````def rescale(input_array, low_val=0.0, high_val=1.0):
+    '''rescales input array values to lie between low_val and high_val'''
+    L = numpy.min(input_array)
+    H = numpy.max(input_array)
+    intermed_array = (input_array - L) / (H - L)
+    output_array = intermed_array * (high_val - low_val) + low_val
+    return output_array
+``````
+
+
+
+ +
+

## Variables Inside and Outside Functions

+ +

What does the following piece of code display when run — and why?

+ +
``````f = 0
+k = 0
+
+def f2k(f):
+  k = ((f-32)*(5.0/9.0)) + 273.15
+  return k
+
+f2k(8)
+f2k(41)
+f2k(32)
+
+print(k)
+``````
+
+ +
+

## Solution

+ +
``````259.81666666666666
+287.15
+273.15
+0
+``````
+
+

`k` is 0 because the `k` inside the function `f2k` doesn’t know about the `k` defined outside the function.

+
+
+ +
+

## Mixing Default and Non-Default Parameters

+ +

Given the following code:

+ +
``````def numbers(one, two=2, three, four=4):
+    n = str(one) + str(two) + str(three) + str(four)
+    return n
+
+print(numbers(1, three=3))
+``````
+
+ +

what do you expect will be printed? What is actually printed? +What rule do you think Python is following?

+ +
+
1. `1234`
2. +
3. `one2three4`
4. +
5. `1239`
6. +
7. `SyntaxError`
8. +
+ +

Given that, what does the following piece of code display when run?

+ +
``````def func(a, b=3, c=6):
+  print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+``````
+
+ +
+
1. `a: b: 3 c: 6`
2. +
3. `a: -1 b: 3 c: 6`
4. +
5. `a: -1 b: 2 c: 6`
6. +
7. `a: b: -1 c: 2`
8. +
+ +
+

## Solution

+

Attempting to define the `numbers` function results in `4. SyntaxError`. +The defined parameters `two` and `four` are given default values. Because +`one` and `three` are not given default values, they are required to be +included as arguments when the function is called and must be placed +before any parameters that have default values in the function definition.

+ +

The given call to `func` displays `a: -1 b: 2 c: 6`. -1 is assigned to +the first parameter `a`, 2 is assigned to the next parameter `b`, and `c` is +not passed a value, so it uses its default value 6.

+
+
+ +
+

## The Old Switcheroo

+ +

Consider this code:

+ +
``````a = 3
+b = 7
+
+def swap(a, b):
+    temp = a
+    a = b
+    b = temp
+
+swap(a, b)
+
+print(a, b)
+``````
+
+ +

Which of the following would be printed if you were to run this code? Why did you pick this answer?

+ +
+
1. `7 3`
2. +
3. `3 7`
4. +
5. `3 3`
6. +
7. `7 7`
8. +
+ +
+

## Solution

+

`3, 7` is correct. Initially `a` has a value of 3 and `b` has a value of 7. +When the swap function is called, it creates local variables (also called +`a` and `b` in this case) and trades their values. The function does not +return any values and does not alter `a` or `b` outside of its local copy. +Therefore the original values of `a` and `b` remain unchanged.

+
+
+ +
+

+ +

Revise a function you wrote for one of the previous exercises to try to make +the code more readable. Then, collaborate with one of your neighbors +to critique each other’s functions and discuss how your function implementations +could be further improved to make them more readable.

+
+ + +
+

## Key Points

+
+ +
• Define a function using `def name(...params...)`.

+
• + +
• The body of a function must be indented.

+
• + +
• Call a function using `name(...values...)`.

+
• + +
• Numbers are stored as integers or floating-point numbers.

+
• + +
• Integer division produces the whole part of the answer (not the fractional part).

+
• + +
• Each time a function is called, a new stack frame is created on the call stack to hold its parameters and local variables.

+
• + +
• Python looks for variables in the current stack frame before looking for them at the top level.

+
• + +
• Use `help(thing)` to view help for something.

+
• + +
• Put docstrings in functions to provide help for that function.

+
• + +
• Specify default values for parameters when defining a function using `name=value` in the parameter list.

+
• + +
• Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used).

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/07-errors/index.html b/07-errors/index.html new file mode 100644 index 0000000000000000000000000000000000000000..da392b36f858e66d81a3638053d886bd64e7fb4a --- /dev/null +++ b/07-errors/index.html @@ -0,0 +1,846 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Errors and Exceptions + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Errors and Exceptions

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +

+
• + +
• How can I handle errors in Python programs?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• To be able to read a traceback, and determine where the error took place and what type it is.

+
• + +
• To be able to describe the types of situations in which syntax errors, indentation errors, name errors, index errors, and missing file errors occur.

+
• + +
+
+
+ +
+ +

Every programmer encounters errors, +both those who are just beginning, +and those who have been programming for years. +Encountering errors and exceptions can be very frustrating at times, +and can make coding feel like a hopeless endeavour. +However, +understanding what the different types of errors are +and when you are likely to encounter them can help a lot. +Once you know why you get certain types of errors, +they become much easier to fix.

+ +

Errors in Python have a very specific form, +called a traceback. +Let’s examine one:

+ +
``````# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+    ice_creams = [
+        "chocolate",
+        "vanilla",
+        "strawberry"
+    ]
+    print(ice_creams)
+
+favorite_ice_cream()
+``````
+
+ +
``````---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+      6     print(ice_creams)
+      7
+----> 8 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+      4         "vanilla",                                                                    "strawberry"
+      5     ]
+----> 6     print(ice_creams)
+      7
+      8 favorite_ice_cream()
+
+IndexError: list index out of range
+``````
+
+ +

This particular traceback has two levels. +You can determine the number of levels by looking for the number of arrows on the left hand side. +In this case:

+ +
+
1. +

The first shows code from the cell above, +with an arrow pointing to Line 8 (which is `favorite_ice_cream()`).

+
2. +
3. +

The second shows some code in the function `favorite_ice_cream`, +with an arrow pointing to Line 6 (which is `print(ice_creams)`).

+
4. +
+ +

The last level is the actual place where the error occurred. +The other level(s) show what function the program executed to get to the next level down. +So, in this case, the program first performed a function call to the function `favorite_ice_cream`. +Inside this function, +the program encountered an error on Line 6, when it tried to run the code `print(ice_creams)`.

+ +
+

## Long Tracebacks

+ +

Sometimes, you might see a traceback that is very long – sometimes they might even be 20 levels deep! +This can make it seem like something horrible happened, +but really it just means that your program called many functions before it ran into the error. +Most of the time, +you can just pay attention to the bottom-most level, +which is the actual place where the error occurred.

+
+ +

So what error did the program actually encounter? +In the last line of the traceback, +Python helpfully tells us the category or type of error (in this case, it is an `IndexError`) +and a more detailed error message (in this case, it says “list index out of range”).

+ +

If you encounter an error and don’t know what it means, +it is still important to read the traceback closely. +That way, +if you fix the error, +but encounter a new one, +you can tell that the error changed. +Additionally, +sometimes just knowing where the error occurred is enough to fix it, +even if you don’t entirely understand the message.

+ +

If you do encounter an error you don’t recognize, +try looking at the official documentation on errors. +However, +note that you may not always be able to find the error there, +as it is possible to create custom errors. +In that case, +hopefully the custom error message is informative enough to help you figure out what went wrong.

+ +

## Syntax Errors

+ +

When you forget a colon at the end of a line, +accidentally add one space too many when indenting under an `if` statement, +or forget a parenthesis, +you will encounter a syntax error. +This means that Python couldn’t figure out how to read your program. +This is similar to forgetting punctuation in English: +for example, +this text is difficult to read there is no punctuation there is also no capitalization +why is this hard because you have to figure out where each sentence ends +you also have to figure out where each sentence begins +to some extent it might be ambiguous if there should be a sentence break or not

+ +

People can typically figure out what is meant by text with no punctuation, +but people are much smarter than computers. +If Python doesn’t know how to read the program, +it will just give up and inform you with an error. +For example:

+ +
``````def some_function()
+    msg = "hello, world!"
+    print(msg)
+     return msg
+``````
+
+ +
``````  File "<ipython-input-3-6bb841ea1423>", line 1
+    def some_function()
+                       ^
+SyntaxError: invalid syntax
+``````
+
+ +

Here, Python tells us that there is a `SyntaxError` on line 1, +and even puts a little arrow in the place where there is an issue. +In this case the problem is that the function definition is missing a colon at the end.

+ +

Actually, the function above has two issues with syntax. +If we fix the problem with the colon, +we see that there is also an `IndentationError`, +which means that the lines in the function definition do not all have the same indentation:

+ +
``````def some_function():
+    msg = "hello, world!"
+    print(msg)
+     return msg
+``````
+
+ +
``````  File "<ipython-input-4-ae290e7659cb>", line 4
+    return msg
+    ^
+IndentationError: unexpected indent
+``````
+
+ +

Both `SyntaxError` and `IndentationError` indicate a problem with the syntax of your program, +but an `IndentationError` is more specific: +it always means that there is a problem with how your code is indented.

+ +
+

## Tabs and Spaces

+ +

A quick note on indentation errors: +they can sometimes be insidious, +especially if you are mixing spaces and tabs. +Because they are both whitespace, +it is difficult to visually tell the difference. +The Jupyter notebook actually gives us a bit of a hint, +but not all Python editors will do that. +In the following example, +the first two lines are using a tab for indentation, +while the third line uses four spaces:

+ +
``````def some_function():
+    msg = "hello, world!"
+    print(msg)
+    return msg
+``````
+
+ +
``````  File "<ipython-input-5-653b36fbcd41>", line 4
+    return msg
+              ^
+IndentationError: unindent does not match any outer indentation level
+``````
+
+ +

By default, one tab is equivalent to eight spaces, +so the only way to mix tabs and spaces is to make it look like this. +In general, it is better to just never use tabs and always use spaces, +because it can make things very confusing.

+
+ +

## Variable Name Errors

+ +

Another very common type of error is called a `NameError`, +and occurs when you try to use a variable that does not exist. +For example:

+ +
``````print(a)
+``````
+
+ +
``````---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+``````
+
+ +

Variable name errors come with some of the most informative error messages, +which are usually of the form “name ‘the_variable_name’ is not defined”.

+ +

Why does this error message occur? +That’s a harder question to answer, +because it depends on what your code is supposed to do. +However, +there are a few very common reasons why you might have an undefined variable. +The first is that you meant to use a string, but forgot to put quotes around it:

+ +
``````print(hello)
+``````
+
+ +
``````---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+``````
+
+ +

The second is that you just forgot to create the variable before using it. +In the following example, +`count` should have been defined (e.g., with `count = 0`) before the for loop:

+ +
``````for number in range(10):
+    count = count + number
+print("The count is:", count)
+``````
+
+ +
``````---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+      1 for number in range(10):
+----> 2     count = count + number
+      3 print("The count is:", count)
+
+NameError: name 'count' is not defined
+``````
+
+ +

Finally, the third possibility is that you made a typo when you were writing your code. +Let’s say we fixed the error above by adding the line `Count = 0` before the for loop. +Frustratingly, this actually does not fix the error. +Remember that variables are case-sensitive, +so the variable `count` is different from `Count`. We still get the same error, because we still have not defined `count`:

+ +
``````Count = 0
+for number in range(10):
+    count = count + number
+print("The count is:", count)
+``````
+
+ +
``````---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+      1 Count = 0
+      2 for number in range(10):
+----> 3     count = count + number
+      4 print("The count is:", count)
+
+NameError: name 'count' is not defined
+``````
+
+ +

## Index Errors

+ +

Next up are errors having to do with containers (like lists and strings) and the items within them. +If you try to access an item in a list or a string that does not exist, +then you will get an error. +This makes sense: +if you asked someone what day they would like to get coffee, +and they answered “caturday”, +you might be a bit annoyed. +Python gets similarly annoyed if you try to ask it for an item that doesn’t exist:

+ +
``````letters = ['a', 'b', 'c']
+print("Letter #1 is", letters)
+print("Letter #2 is", letters)
+print("Letter #3 is", letters)
+print("Letter #4 is", letters)
+``````
+
+ +
``````Letter #1 is a
+Letter #2 is b
+Letter #3 is c
+``````
+
+ +
``````---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+      3 print("Letter #2 is", letters)
+      4 print("Letter #3 is", letters)
+----> 5 print("Letter #4 is", letters)
+
+IndexError: list index out of range
+``````
+
+ +

Here, +Python is telling us that there is an `IndexError` in our code, +meaning we tried to access a list index that did not exist.

+ +

## File Errors

+ +

The last type of error we’ll cover today +are those associated with reading and writing files: `FileNotFoundError`. +If you try to read a file that does not exist, +you will receive a `FileNotFoundError` telling you so. +If you attempt to write to a file that was opened read-only, Python 3 +returns an `UnsupportedOperationError`. +More generally, problems with input and output manifest as +`IOError`s or `OSError`s, depending on the version of Python you use.

+ +
``````file_handle = open('myfile.txt', 'r')
+``````
+
+ +
``````---------------------------------------------------------------------------
+FileNotFoundError                         Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+``````
+
+ +

One reason for receiving this error is that you specified an incorrect path to the file. +For example, +if I am currently in a folder called `myproject`, +and I have a file in `myproject/writing/myfile.txt`, +but I try to just open `myfile.txt`, +this will fail. +The correct path would be `writing/myfile.txt`. +It is also possible (like with `NameError`) that you just made a typo.

+ +

A related issue can occur if you use the “read” flag instead of the “write” flag. +Python will not give you an error if you try to open a file for writing when the file does not exist. +However, +if you meant to open a file for reading, +but accidentally opened it for writing, +and then try to read from it, +you will get an `UnsupportedOperation` error +telling you that the file was not opened for reading:

+ +
``````file_handle = open('myfile.txt', 'w')
+``````
+
+ +
``````---------------------------------------------------------------------------
+UnsupportedOperation                      Traceback (most recent call last)
+<ipython-input-15-b846479bc61f> in <module>()
+      1 file_handle = open('myfile.txt', 'w')
+
+``````
+
+ +

These are the most common errors with files, +though many others exist. +If you get an error that you’ve never seen before, +searching the Internet for that error type +often reveals common reasons why you might get that error.

+ +
+

+ +

Read the python code and the resulting traceback below, and answer the following questions:

+ +
+
1. How many levels does the traceback have?
2. +
3. What is the function name where the error occurred?
4. +
5. On which line number in this function did the error occurr?
6. +
7. What is the type of error?
8. +
9. What is the error message?
10. +
+ +
``````# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+    messages = {
+        "monday": "Hello, world!",
+        "tuesday": "Today is tuesday!",
+        "wednesday": "It is the middle of the week.",
+        "thursday": "Today is Donnerstag in German!",
+        "friday": "Last day of the week!",
+        "saturday": "Hooray for the weekend!",
+        "sunday": "Aw, the weekend is almost over."
+    }
+    print(messages[day])
+
+def print_friday_message():
+    print_message("Friday")
+
+print_friday_message()
+``````
+
+ +
``````---------------------------------------------------------------------------
+KeyError                                  Traceback (most recent call last)
+     14     print_message("Friday")
+     15
+---> 16 print_friday_message()
+
+     12
+     13 def print_friday_message():
+---> 14     print_message("Friday")
+     15
+     16 print_friday_message()
+
+      9         "sunday": "Aw, the weekend is almost over."
+     10     }
+---> 11     print(messages[day])
+     12
+     13 def print_friday_message():
+
+KeyError: 'Friday'
+``````
+
+ +
+

## Solution

+
+
1. 3 levels
2. +
3. `print_message`
4. +
5. 11
6. +
7. `KeyError`
8. +
9. There isn’t really a message; you’re supposed to infer that `Friday` is not a key in `messages`.
10. +
+
+
+ +
+

## Identifying Syntax Errors

+ +
+
1. Read the code below, and (without running it) try to identify what the errors are.
2. +
3. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`?
4. +
5. Fix the error.
6. +
7. Repeat steps 2 and 3, until you have fixed all the errors.
8. +
+ +
``````def another_function
+  print("Syntax errors are annoying.")
+   print("But at least python tells us about them!")
+  print("So they are usually not too hard to fix.")
+``````
+
+ +
+

## Solution

+

`SyntaxError` for missing `():` at end of first line, +`IndentationError` for mismatch between second and third lines. +A fixed version is:

+ +
``````def another_function():
+    print("Syntax errors are annoying.")
+    print("But at least python tells us about them!")
+    print("So they are usually not too hard to fix.")
+``````
+
+
+
+ +
+

## Identifying Variable Name Errors

+ +
+
1. Read the code below, and (without running it) try to identify what the errors are.
2. +
3. Run the code, and read the error message. +What type of `NameError` do you think this is? +In other words, is it a string with no quotes, +a misspelled variable, +or a variable that should have been defined but was not?
4. +
5. Fix the error.
6. +
7. Repeat steps 2 and 3, until you have fixed all the errors.
8. +
+ +
``````for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + "b"
+print(message)
+``````
+
+ +
+

## Solution

+

3 `NameError`s for `number` being misspelled, for `message` not defined, and for `a` not being in quotes.

+ +

Fixed version:

+ +
``````message = ""
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + "a"
+    else:
+        message = message + "b"
+print(message)
+``````
+
+
+
+ +
+

## Identifying Index Errors

+ +
+
1. Read the code below, and (without running it) try to identify what the errors are.
2. +
3. Run the code, and read the error message. What type of error is it?
4. +
5. Fix the error.
6. +
+ +
``````seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons)
+``````
+
+ +
+

## Solution

+

`IndexError`; the last entry is `seasons`, so `seasons` doesn’t make sense. +A fixed version is:

+ +
``````seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+``````
+
+
+
+ + +
+

## Key Points

+
+ +
• Tracebacks can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was.

+
• + +
• An error having to do with the ‘grammar’ or syntax of the program is called a `SyntaxError`. If the issue has to do with how the code is indented, then it will be called an `IndentationError`.

+
• + +
• A `NameError` will occur if you use a variable that has not been defined, either because you meant to use quotes around a string, you forgot to define the variable, or you just made a typo.

+
• + +
• Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an `IndexError`.

+
• + +
• Trying to read a file that does not exist will give you an `FileNotFoundError`. Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an `IOError`.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/08-defensive/index.html b/08-defensive/index.html new file mode 100644 index 0000000000000000000000000000000000000000..0107162cacf470145147a09835635da800cf80f0 --- /dev/null +++ b/08-defensive/index.html @@ -0,0 +1,817 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Defensive Programming + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Defensive Programming

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I make my programs more reliable?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Explain what an assertion is.

+
• + +
• Add assertions that check the program’s state is correct.

+
• + +
• Correctly add precondition and postcondition assertions to functions.

+
• + +
• Explain what test-driven development is, and use it when creating new functions.

+
• + +
• Explain why variables should be initialized using actual data values rather than arbitrary constants.

+
• + +
+
+
+ +
+ +

Our previous lessons have introduced the basic tools of programming: +variables and lists, +file I/O, +loops, +conditionals, +and functions. +What they haven’t done is show us how to tell +whether a program is getting the right answer, +and how to tell if it’s still getting the right answer +as we make changes to it.

+ +

To achieve that, +we need to:

+ +
+
• Write programs that check their own operation.
• +
• Write and run tests for widely-used functions.
• +
• Make sure we know what “correct” actually means.
• +
+ +

The good news is, +doing these things will speed up our programming, +not slow it down. +As in real carpentry — the kind done with lumber — the time saved +by measuring carefully before cutting a piece of wood +is much greater than the time that measuring takes.

+ +

## Assertions

+ +

The first step toward getting the right answers from our programs +is to assume that mistakes will happen +and to guard against them. +This is called defensive programming, +and the most common way to do it is to add assertions to our code +so that it checks itself as it runs. +An assertion is simply a statement that something must be true at a certain point in a program. +When Python sees one, +it evaluates the assertion’s condition. +If it’s true, +Python does nothing, +but if it’s false, +Python halts the program immediately +and prints the error message if one is provided. +For example, +this piece of code halts as soon as the loop encounters a value that isn’t positive:

+ +
``````numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
+total = 0.0
+for n in numbers:
+    assert n > 0.0, 'Data should only contain positive values'
+    total += n
+print('total is:', total)
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-19-33d87ea29ae4> in <module>()
+      2 total = 0.0
+      3 for n in numbers:
+----> 4     assert n > 0.0, 'Data should only contain positive values'
+      5     total += n
+      6 print('total is:', total)
+
+AssertionError: Data should only contain positive values
+``````
+
+ +

Programs like the Firefox browser are full of assertions: +10-20% of the code they contain +are there to check that the other 80-90% are working correctly. +Broadly speaking, +assertions fall into three categories:

+ +
+
• +

A precondition is something that must be true at the start of a function in order for it to work correctly.

+
• +
• +

A postcondition is something that the function guarantees is true when it finishes.

+
• +
• +

An invariant is something that is always true at a particular point inside a piece of code.

+
• +
+ +

For example, +suppose we are representing rectangles using a tuple of four coordinates `(x0, y0, x1, y1)`, +representing the lower left and upper right corners of the rectangle. +In order to do some calculations, +we need to normalize the rectangle so that the lower left corner is at the origin +and the longest side is 1.0 units long. +This function does that, +but checks that its input is correctly formatted and that its result makes sense:

+ +
``````def normalize_rectangle(rect):
+    '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.'''
+    assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
+    x0, y0, x1, y1 = rect
+    assert x0 < x1, 'Invalid X coordinates'
+    assert y0 < y1, 'Invalid Y coordinates'
+
+    dx = x1 - x0
+    dy = y1 - y0
+    if dx > dy:
+        scaled = float(dx) / dy
+        upper_x, upper_y = 1.0, scaled
+    else:
+        scaled = float(dx) / dy
+        upper_x, upper_y = scaled, 1.0
+
+    assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
+    assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'
+
+    return (0, 0, upper_x, upper_y)
+``````
+
+ +

The preconditions on lines 3, 5, and 6 catch invalid inputs:

+ +
``````print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-21-3a97b1dcab70> in <module>()
+----> 1 print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate
+
+<ipython-input-20-408dc39f3915> in normalize_rectangle(rect)
+      1 def normalize_rectangle(rect):
+      2     '''Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.'''
+----> 3     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
+      4     x0, y0, x1, y1 = rect
+      5     assert x0 < x1, 'Invalid X coordinates'
+
+AssertionError: Rectangles must contain 4 coordinates
+``````
+
+ +
``````print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-22-f05ae7878a45> in <module>()
+----> 1 print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted
+
+<ipython-input-20-408dc39f3915> in normalize_rectangle(rect)
+      3     assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
+      4     x0, y0, x1, y1 = rect
+----> 5     assert x0 < x1, 'Invalid X coordinates'
+      6     assert y0 < y1, 'Invalid Y coordinates'
+      7
+
+AssertionError: Invalid X coordinates
+``````
+
+ +

The post-conditions on lines 17 and 18 help us catch bugs by telling us when our calculations cannot have been correct. +For example, +if we normalize a rectangle that is taller than it is wide everything seems OK:

+ +
``````print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) ))
+``````
+
+ +
``````(0, 0, 0.2, 1.0)
+``````
+
+ +

but if we normalize one that’s wider than it is tall, +the assertion is triggered:

+ +
``````print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-24-5f0ef7954aeb> in <module>()
+----> 1 print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) ))
+
+<ipython-input-20-408dc39f3915> in normalize_rectangle(rect)
+     16
+     17     assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
+---> 18     assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'
+     19
+     20     return (0, 0, upper_x, upper_y)
+
+AssertionError: Calculated upper Y coordinate invalid
+``````
+
+ +

Re-reading our function, +we realize that line 11 should divide `dy` by `dx` rather than `dx` by `dy`. +(You can display line numbers by typing Ctrl-M, then L.) +If we had left out the assertion at the end of the function, +we would have created and returned something that had the right shape as a valid answer, +but wasn’t. +Detecting and debugging that would almost certainly have taken more time in the long run +than writing the assertion.

+ +

But assertions aren’t just about catching errors: +they also help people understand programs. +Each assertion gives the person reading the program +a chance to check (consciously or otherwise) +that their understanding matches what the code is doing.

+ +

Most good programmers follow two rules when adding assertions to their code. +The first is, fail early, fail often. +The greater the distance between when and where an error occurs and when it’s noticed, +the harder the error will be to debug, +so good code catches mistakes as early as possible.

+ +

The second rule is, turn bugs into assertions or tests. +Whenever you fix a bug, write an assertion that catches the mistake +should you make it again. +If you made a mistake in a piece of code, +the odds are good that you have made other mistakes nearby, +or will make the same mistake (or a related one) +the next time you change it. +Writing assertions to check that you haven’t regressed +(i.e., haven’t re-introduced an old problem) +can save a lot of time in the long run, +and helps to warn people who are reading the code +(including your future self) +that this bit is tricky.

+ +

## Test-Driven Development

+ +

An assertion checks that something is true at a particular point in the program. +The next step is to check the overall behavior of a piece of code, +i.e., +to make sure that it produces the right output when it’s given a particular input. +For example, +suppose we need to find where two or more time series overlap. +The range of each time series is represented as a pair of numbers, +which are the time the interval started and ended. +The output is the largest range that they all include:

+ + + +

Most novice programmers would solve this problem like this:

+ +
+
1. Write a function `range_overlap`.
2. +
3. Call it interactively on two or three different inputs.
4. +
5. If it produces the wrong answer, fix the function and re-run that test.
6. +
+ +

This clearly works — after all, thousands of scientists are doing it right now — but +there’s a better way:

+ +
+
1. Write a short function for each test.
2. +
3. Write a `range_overlap` function that should pass those tests.
4. +
5. If `range_overlap` produces any wrong answers, fix it and re-run the test functions.
6. +
+ +

Writing the tests before writing the function they exercise +is called test-driven development (TDD). +Its advocates believe it produces better code faster because:

+ +
+
1. If people write tests after writing the thing to be tested, +they are subject to confirmation bias, +i.e., +they subconsciously write tests to show that their code is correct, +rather than to find errors.
2. +
3. Writing tests helps programmers figure out what the function is actually supposed to do.
4. +
+ +

Here are three test functions for `range_overlap`:

+ +
``````assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
+assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
+assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-25-d8be150fbef6> in <module>()
+----> 1 assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
+      2 assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
+      3 assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
+
+AssertionError:
+``````
+
+ +

The error is actually reassuring: +we haven’t written `range_overlap` yet, +so if the tests passed, +it would be a sign that someone else had +and that we were accidentally using their function.

+ +

And as a bonus of writing these tests, +we’ve implicitly defined what our input and output look like: +we expect a list of pairs as input, +and produce a single pair as output.

+ +

Something important is missing, though. +We don’t have any tests for the case where the ranges don’t overlap at all:

+ +
``````assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == ???
+``````
+
+ +

What should `range_overlap` do in this case: +fail with an error message, +produce a special value like `(0.0, 0.0)` to signal that there’s no overlap, +or something else? +Any actual implementation of the function will do one of these things; +writing the tests first helps us figure out which is best +before we’re emotionally invested in whatever we happened to write +before we realized there was an issue.

+ +

+ +
``````assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == ???
+``````
+
+ +

Do two segments that touch at their endpoints overlap or not? +Mathematicians usually say “yes”, +but engineers usually say “no”. +The best answer is “whatever is most useful in the rest of our program”, +but again, +any actual implementation of `range_overlap` is going to do something, +and whatever it is ought to be consistent with what it does when there’s no overlap at all.

+ +

Since we’re planning to use the range this function returns +as the X axis in a time series chart, +we decide that:

+ +
+
1. every overlap has to have non-zero width, and
2. +
3. we will return the special value `None` when there’s no overlap.
4. +
+ +

`None` is built into Python, +and means “nothing here”. +(Other languages often call the equivalent value `null` or `nil`). +With that decision made, +we can finish writing our last two tests:

+ +
``````assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
+assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-26-d877ef460ba2> in <module>()
+----> 1 assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
+      2 assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
+
+AssertionError:
+``````
+
+ +

Again, +we get an error because we haven’t written our function, +but we’re now ready to do so:

+ +
``````def range_overlap(ranges):
+    '''Return common overlap among a set of [low, high] ranges.'''
+    lowest = 0.0
+    highest = 1.0
+    for (low, high) in ranges:
+        lowest = max(lowest, low)
+        highest = min(highest, high)
+    return (lowest, highest)
+``````
+
+ +

(Take a moment to think about why we use `max` to raise `lowest` +and `min` to lower `highest`). +We’d now like to re-run our tests, +but they’re scattered across three different cells. +To make running them easier, +let’s put them all in a function:

+ +
``````def test_range_overlap():
+    assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
+    assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
+    assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
+    assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
+    assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
+``````
+
+ +

We can now test `range_overlap` with a single function call:

+ +
``````test_range_overlap()
+``````
+
+ +
``````---------------------------------------------------------------------------
+AssertionError                            Traceback (most recent call last)
+<ipython-input-29-cf9215c96457> in <module>()
+----> 1 test_range_overlap()
+
+<ipython-input-28-5d4cd6fd41d9> in test_range_overlap()
+      1 def test_range_overlap():
+----> 2     assert range_overlap([ (0.0, 1.0), (5.0, 6.0) ]) == None
+      3     assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None
+      4     assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
+      5     assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
+
+AssertionError:
+``````
+
+ +

The first test that was supposed to produce `None` fails, +so we know something is wrong with our function. +We don’t know whether the other tests passed or failed +because Python halted the program as soon as it spotted the first error. +Still, +some information is better than none, +and if we trace the behavior of the function with that input, +we realize that we’re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, +regardless of the input values. +This violates another important rule of programming: +always initialize from data.

+ +
+

## Pre- and Post-Conditions

+ +

Suppose you are writing a function called `average` that calculates the average of the numbers in a list. +What pre-conditions and post-conditions would you write for it? +Compare your answer to your neighbor’s: +can you think of a function that will pass your tests but not his/hers or vice versa?

+ +
+

## Solution

+
``````# a possible pre-condition:
+assert len(input_list) > 0, 'List length must be non-zero'
+# a possible post-condition:
+assert numpy.min(input_list) <= average <= numpy.max(input_list), 'Average should be between min and max of input values (inclusive)'
+``````
+
+
+
+ +
+

## Testing Assertions

+ +

Given a sequence of a number of cars, the function `get_total_cars` returns +the total number of cars.

+ +
``````get_total_cars([1, 2, 3, 4])
+``````
+
+ +
``````10
+``````
+
+ +
``````get_total_cars(['a', 'b', 'c'])
+``````
+
+ +
``````ValueError: invalid literal for int() with base 10: 'a'
+``````
+
+ +

Explain in words what the assertions in this function check, +and for each one, +give an example of input that will make that assertion fail.

+ +
``````def get_total(values):
+    assert len(values) > 0
+    for element in values:
+    	assert int(element)
+    values = [int(element) for element in values]
+    total = sum(values)
+    assert total > 0
+``````
+
+ +
+

## Solution

+
+
• The first assertion checks that the input sequence `values` is not empty. +An empty sequence such as `[]` will make it fail.
• +
• The second assertion checks that each value in the list can be turned into an integer. +Input such as `[1, 2,'c', 3]` will make it fail.
• +
• The third assertion checks that the total of the list is greater than 0. +Input such as `[-10, 2, 3]` will make it fail.
• +
+
+
+ +
+

## Fixing and Testing

+ +

Fix `range_overlap`. Re-run `test_range_overlap` after each change you make.

+ +
+

## Solution

+
``````import numpy
+
+def range_overlap(ranges):
+    '''Return common overlap among a set of [low, high] ranges.'''
+    if not ranges:
+        # ranges is None or an empty list
+        return None
+    lowest, highest = ranges
+    for (low, high) in ranges[1:]:
+        lowest = max(lowest, low)
+        highest = min(highest, high)
+    if lowest >= highest:  # no overlap
+        return None
+    else:
+        return (lowest, highest)
+``````
+
+
+
+ + +
+

## Key Points

+
+ +
• Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.

+
• + +
• Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.

+
• + +
• Use preconditions to check that the inputs to a function are safe to use.

+
• + +
• Use postconditions to check that the output from a function is safe to use.

+
• + +
• Write tests before writing code in order to help determine exactly what that code is supposed to do.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/09-debugging/index.html b/09-debugging/index.html new file mode 100644 index 0000000000000000000000000000000000000000..676166f62472f450ca9db4a6763f66d56a2721a1 --- /dev/null +++ b/09-debugging/index.html @@ -0,0 +1,573 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Debugging + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Debugging

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I debug my program?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Debug code containing an error systematically.

+
• + +
• Identify ways of making code less error-prone and more easily tested.

+
• + +
+
+
+ +
+ +

Once testing has uncovered problems, +the next step is to fix them. +Many novices do this by making more-or-less random changes to their code +until it seems to produce the right answer, +but that’s very inefficient +(and the result is usually only correct for the one case they’re testing). +The more experienced a programmer is, +the more systematically they debug, +and most follow some variation on the rules explained below.

+ +

## Know What It’s Supposed to Do

+ +

The first step in debugging something is to +know what it’s supposed to do. +“My program doesn’t work” isn’t good enough: +in order to diagnose and fix problems, +we need to be able to tell correct output from incorrect. +If we can write a test case for the failing case — i.e., +if we can assert that with these inputs, +the function should produce that result — +then we’re ready to start debugging. +If we can’t, +then we need to figure out how we’re going to know when we’ve fixed things.

+ +

But writing test cases for scientific software is frequently harder than +writing test cases for commercial applications, +because if we knew what the output of the scientific code was supposed to be, +we wouldn’t be running the software: +we’d be writing up our results and moving on to the next program. +In practice, +scientists tend to do the following:

+ +
+
1. +

Test with simplified data. +Before doing statistics on a real data set, +we should try calculating statistics for a single record, +for two identical records, +for two records whose values are one step apart, +or for some other case where we can calculate the right answer by hand.

+
2. +
3. +

Test a simplified case. +If our program is supposed to simulate +magnetic eddies in rapidly-rotating blobs of supercooled helium, +our first test should be a blob of helium that isn’t rotating, +and isn’t being subjected to any external electromagnetic fields. +Similarly, +if we’re looking at the effects of climate change on speciation, +our first test should hold temperature, precipitation, and other factors constant.

+
4. +
5. +

Compare to an oracle. +A test oracle is something whose results are trusted, +such as experimental data, an older program, or a human expert. +We use test oracles to determine if our new program produces the correct results. +If we have a test oracle, +we should store its output for particular cases +so that we can compare it with our new results as often as we like +without re-running that program.

+
6. +
7. +

Check conservation laws. +Mass, energy, and other quantities are conserved in physical systems, +so they should be in programs as well. +Similarly, +if we are analyzing patient data, +the number of records should either stay the same or decrease +as we move from one analysis to the next +(since we might throw away outliers or records with missing values). +If “new” patients start appearing out of nowhere as we move through our pipeline, +it’s probably a sign that something is wrong.

+
8. +
9. +

Visualize. +Data analysts frequently use simple visualizations to check both +the science they’re doing +and the correctness of their code +(just as we did in the opening lesson of this tutorial). +This should only be used for debugging as a last resort, +though, +since it’s very hard to compare two visualizations automatically.

+
10. +
+ +

## Make It Fail Every Time

+ +

We can only debug something when it fails, +so the second step is always to find a test case that +makes it fail every time. +The “every time” part is important because +few things are more frustrating than debugging an intermittent problem: +if we have to call a function a dozen times to get a single failure, +the odds are good that we’ll scroll past the failure when it actually occurs.

+ +

As part of this, +it’s always important to check that our code is “plugged in”, +i.e., +that we’re actually exercising the problem that we think we are. +Every programmer has spent hours chasing a bug, +only to realize that they were actually calling their code on the wrong data set +or with the wrong configuration parameters, +or are using the wrong version of the software entirely. +Mistakes like these are particularly likely to happen when we’re tired, +frustrated, +and up against a deadline, +which is one of the reasons late-night (or overnight) coding sessions +are almost never worthwhile.

+ +

## Make It Fail Fast

+ +

If it takes 20 minutes for the bug to surface, +we can only do three experiments an hour. +That doesn’t just mean we’ll get less data in more time: +we’re also more likely to be distracted by other things as we wait for our program to fail, +which means the time we are spending on the problem is less focused. +It’s therefore critical to make it fail fast.

+ +

As well as making the program fail fast in time, +we want to make it fail fast in space, +i.e., +we want to localize the failure to the smallest possible region of code:

+ +
+
1. +

The smaller the gap between cause and effect, +the easier the connection is to find. +Many programmers therefore use a divide and conquer strategy to find bugs, +i.e., +if the output of a function is wrong, +they check whether things are OK in the middle, +then concentrate on either the first or second half, +and so on.

+
2. +
3. +

N things can interact in N2 different ways, +so every line of code that isn’t run as part of a test +means more than one thing we don’t need to worry about.

+
4. +
+ +

## Change One Thing at a Time, For a Reason

+ +

Replacing random chunks of code is unlikely to do much good. +(After all, +if you got it wrong the first time, +you’ll probably get it wrong the second and third as well.) +Good programmers therefore +change one thing at a time, for a reason. +They are either trying to gather more information +(“is the bug still there if we change the order of the loops?”) +or test a fix +(“can we make the bug go away by sorting our data before processing it?”).

+ +

Every time we make a change, +however small, +we should re-run our tests immediately, +because the more things we change at once, +the harder it is to know what’s responsible for what +(those N2 interactions again). +And we should re-run all of our tests: +more than half of fixes made to code introduce (or re-introduce) bugs, +so re-running all of our tests tells us whether we have regressed.

+ +

## Keep Track of What You’ve Done

+ +

Good scientists keep track of what they’ve done +so that they can reproduce their work, +and so that they don’t waste time repeating the same experiments +or running ones whose results won’t be interesting. +Similarly, +debugging works best when we +keep track of what we’ve done +and how well it worked. +If we find ourselves asking, +“Did left followed by right with an odd number of lines cause the crash? +Or was it right followed by left? +Or was I using an even number of lines?” +then it’s time to step away from the computer, +take a deep breath, +and start working more systematically.

+ +

Records are particularly useful when the time comes to ask for help. +People are more likely to listen to us +when we can explain clearly what we did, +and we’re better able to give them the information they need to be useful.

+ +
+

## Version Control Revisited

+ +

Version control is often used to reset software to a known state during debugging, +and to explore recent changes to code that might be responsible for bugs. +In particular, +most version control systems have a `blame` command +that will show who last changed particular lines of code…

+
+ +

## Be Humble

+ +

And speaking of help: +if we can’t find a bug in 10 minutes, +we should be humble and ask for help. +Just explaining the problem aloud is often useful, +since hearing what we’re thinking helps us spot inconsistencies and hidden assumptions.

+ +

Asking for help also helps alleviate confirmation bias. +If we have just spent an hour writing a complicated program, +we want it to work, +so we’re likely to keep telling ourselves why it should, +rather than searching for the reason it doesn’t. +People who aren’t emotionally invested in the code can be more objective, +which is why they’re often able to spot the simple mistakes we have overlooked.

+ +

Part of being humble is learning from our mistakes. +Programmers tend to get the same things wrong over and over: +either they don’t understand the language and libraries they’re working with, +or their model of how things work is wrong. +In either case, +taking note of why the error occurred +and checking for it next time +quickly turns into not making the mistake at all.

+ +

And that is what makes us most productive in the long run. +As the saying goes, +A week of hard work can sometimes save you an hour of thought. +If we train ourselves to avoid making some kinds of mistakes, +to break our code into modular, testable chunks, +and to turn every assumption (or mistake) into an assertion, +it will actually take us less time to produce working programs, +not more.

+ +
+

## Debug With a Neighbor

+ +

Take a function that you have written today, and introduce a tricky bug. +Your function should still run, but will give the wrong output. +Switch seats with your neighbor and attempt to debug +the bug that they introduced into their function. +Which of the principles discussed above did you find helpful?

+
+ +
+

## Not Supposed to be the Same

+ +

You are assisting a researcher with Python code that computes the +Body Mass Index (BMI) of patients. The researcher is concerned because +all patients seemingly have unusual and identical BMIs, despite having different +physiques. BMI is calculated as weight in kilograms +divided by the the square of height in metres.

+ +

Use the debugging principles in this exercise and locate problems +with the code. What suggestions would you give the researcher for +ensuring any later changes they make work correctly?

+ +
``````patients = [[70, 1.8], [80, 1.9], [150, 1.7]]
+
+def calculate_bmi(weight, height):
+    return weight / (height ** 2)
+
+for patient in patients:
+    weight, height = patients
+    bmi = calculate_bmi(height, weight)
+    print("Patient's BMI is: %f" % bmi)
+``````
+
+ +
``````Patient's BMI is: 0.000367
+Patient's BMI is: 0.000367
+Patient's BMI is: 0.000367
+``````
+
+ +
+

## Solution

+
+
• +

The loop is not being utilised correctly. `height` and `weight` are always +set as the first patient’s data during each iteration of the loop.

+
• +
• +

The height/weight variables are reversed in the function call to +`calculate_bmi(...)`, the correct BMIs are 21.604938, 22.160665 and 51.903114.

+
• +
+
+
+ + +
+

## Key Points

+
+ +
• Know what code is supposed to do before trying to debug it.

+
• + +
• Make it fail every time.

+
• + +
• Make it fail fast.

+
• + +
• Change one thing at a time, and for a reason.

+
• + +
• Keep track of what you’ve done.

+
• + +
• Be humble.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + next episode + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/10-cmdline/index.html b/10-cmdline/index.html new file mode 100644 index 0000000000000000000000000000000000000000..013b27d0fff1f389ffda5f50e982efcd62999fe4 --- /dev/null +++ b/10-cmdline/index.html @@ -0,0 +1,1246 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Command-Line Programs + + +
+ + + + +
+
+

+
+
+ +

+ +
+
+

+
+
+ +
+
+
+
+
+

# Command-Line Programs

+
+
+
+
+ + +
+

## Overview

+ +
+
+ Teaching: 30 min +
+ Exercises: 0 min +
+
+ Questions +
+ +
• How can I write Python programs that will work like Unix command-line tools?

+
• + +
+
+
+ +
+
+
+
+ Objectives +
+ +
• Use the values of command-line arguments in a program.

+
• + +
• Handle flags and files separately in a command-line program.

+
• + +
• Read data from standard input in a program so that it can be used in a pipeline.

+
• + +
+
+
+ +
+ +

The Jupyter Notebook and other interactive tools are great for prototyping code and exploring data, +but sooner or later we will want to use our program in a pipeline +or run it in a shell script to process thousands of data files. +In order to do that, +we need to make our programs work like other Unix command-line tools. +For example, +we may want a program that reads a dataset +and prints the average inflammation per patient.

+ +
+

## Switching to Shell Commands

+ +

In this lesson we are switching from typing commands in a Python interpreter to typing +commands in a shell terminal window (such as bash). When you see a `\$` in front of a +command that tells you to run that command in the shell rather than the Python interpreter.

+
+ +

This program does exactly what we want - it prints the average inflammation per patient +for a given file.

+ +
``````\$ python ../code/readings_04.py --mean inflammation-01.csv
+5.45
+5.425
+6.1
+...
+6.4
+7.05
+5.9
+``````
+
+ +

We might also want to look at the minimum of the first four lines

+ +
``````\$ head -4 inflammation-01.csv | python ../code/readings_04.py --min
+``````
+
+ +

or the maximum inflammations in several files one after another:

+ +
``````\$ python ../code/readings_04.py --max inflammation-*.csv
+``````
+
+ +

Our scripts should do the following:

+ +
+
1. If no filename is given on the command line, read data from standard input.
2. +
3. If one or more filenames are given, read data from them and report statistics for each file separately.
4. +
5. Use the `--min`, `--mean`, or `--max` flag to determine what statistic to print.
6. +
+ +

To make this work, +we need to know how to handle command-line arguments in a program, +and how to get at standard input. +We’ll tackle these questions in turn below.

+ +

## Command-Line Arguments

+ +

Using the text editor of your choice, +save the following in a text file called `sys_version.py`:

+ +
``````import sys
+print('version is', sys.version)
+``````
+
+ +

The first line imports a library called `sys`, +which is short for “system”. +It defines values such as `sys.version`, +which describes which version of Python we are running. +We can run this script from the command line like this:

+ +
``````\$ python sys_version.py
+``````
+
+ +
``````version is 3.4.3+ (default, Jul 28 2015, 13:17:50)
+[GCC 4.9.3]
+``````
+
+ +

Create another file called `argv_list.py` and save the following text to it.

+ +
``````import sys
+print('sys.argv is', sys.argv)
+``````
+
+ +

The strange name `argv` stands for “argument values”. +Whenever Python runs a program, +it takes all of the values given on the command line +and puts them in the list `sys.argv` +so that the program can determine what they were. +If we run this program with no arguments:

+ +
``````\$ python argv_list.py
+``````
+
+ +
``````sys.argv is ['argv_list.py']
+``````
+
+ +

the only thing in the list is the full path to our script, +which is always `sys.argv`. +If we run it with a few arguments, however:

+ +
``````\$ python argv_list.py first second third
+``````
+
+ +
``````sys.argv is ['argv_list.py', 'first', 'second', 'third']
+``````
+
+ +

then Python adds each of those arguments to that magic list.

+ +

With this in hand, +let’s build a version of `readings.py` that always prints the per-patient mean of a single data file. +The first step is to write a function that outlines our implementation, +and a placeholder for the function that does the actual work. +By convention this function is usually called `main`, +though we can call it whatever we want:

+ +
``````\$ cat ../code/readings_01.py
+``````
+
+ +
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    filename = sys.argv
+    for m in numpy.mean(data, axis=1):
+        print(m)
+``````
+
+ +

This function gets the name of the script from `sys.argv`, +because that’s where it’s always put, +and the name of the file to process from `sys.argv`. +Here’s a simple test:

+ +
``````\$ python ../code/readings_01.py inflammation-01.csv
+``````
+
+ +

There is no output because we have defined a function, +but haven’t actually called it. +Let’s add a call to `main`:

+ +
``````\$ cat ../code/readings_02.py
+``````
+
+ +
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    filename = sys.argv
+    for m in numpy.mean(data, axis=1):
+        print(m)
+
+if __name__ == '__main__':
+   main()
+``````
+
+ +

and run that:

+ +
``````\$ python ../code/readings_02.py inflammation-01.csv
+``````
+
+ +
``````5.45
+5.425
+6.1
+5.9
+5.55
+6.225
+5.975
+6.65
+6.625
+6.525
+6.775
+5.8
+6.225
+5.75
+5.225
+6.3
+6.55
+5.7
+5.85
+6.55
+5.775
+5.825
+6.175
+6.1
+5.8
+6.425
+6.05
+6.025
+6.175
+6.55
+6.175
+6.35
+6.725
+6.125
+7.075
+5.725
+5.925
+6.15
+6.075
+5.75
+5.975
+5.725
+6.3
+5.9
+6.75
+5.925
+7.225
+6.15
+5.95
+6.275
+5.7
+6.1
+6.825
+5.975
+6.725
+5.7
+6.25
+6.4
+7.05
+5.9
+``````
+
+ +
+

## Running Versus Importing

+ +

Running a Python script in bash is very similar to +importing that file in Python. +The biggest difference is that we don’t expect anything +to happen when we import a file, +whereas when running a script, we expect to see some +output printed to the console.

+ +

In order for a Python script to work as expected +when imported or when run as a script, +we typically put the part of the script +that produces output in the following if statement:

+ +
``````if __name__ == '__main__':
+    main()  # Or whatever function produces output
+``````
+
+ +

When you import a Python file, `__name__` is the name +of that file (e.g., when importing `readings.py`, +`__name__` is `'readings'`). However, when running a +script in bash, `__name__` is always set to `'__main__'` +in that script so that you can determine if the file +is being imported or run as a script.

+
+ +
+

## The Right Way to Do It

+ +

If our programs can take complex parameters or multiple filenames, +we shouldn’t handle `sys.argv` directly. +Instead, +we should use Python’s `argparse` library, +which handles common cases in a systematic way, +and also makes it easy for us to provide sensible error messages for our users. +We will not cover this module in this lesson +but you can go to Tshepang Lekhonkhobe’s Argparse tutorial +that is part of Python’s Official Documentation.

+
+ +

## Handling Multiple Files

+ +

The next step is to teach our program how to handle multiple files. +Since 60 lines of output per file is a lot to page through, +we’ll start by using three smaller files, +each of which has three days of data for two patients:

+ +
``````\$ ls small-*.csv
+``````
+
+ +
``````small-01.csv small-02.csv small-03.csv
+``````
+
+ +
``````\$ cat small-01.csv
+``````
+
+ +
``````0,0,1
+0,1,2
+``````
+
+ +
``````\$ python ../code/readings_02.py small-01.csv
+``````
+
+ +
``````0.333333333333
+1.0
+``````
+
+ +

Using small data files as input also allows us to check our results more easily: +here, +for example, +we can see that our program is calculating the mean correctly for each line, +whereas we were really taking it on faith before. +This is yet another rule of programming: +test the simple things first.

+ +

We want our program to process each file separately, +so we need a loop that executes once for each filename. +If we specify the files on the command line, +the filenames will be in `sys.argv`, +but we need to be careful: +`sys.argv` will always be the name of our script, +rather than the name of a file. +We also need to handle an unknown number of filenames, +since our program could be run for any number of files.

+ +

The solution to both problems is to loop over the contents of `sys.argv[1:]`. +The ‘1’ tells Python to start the slice at location 1, +so the program’s name isn’t included; +since we’ve left off the upper bound, +the slice runs to the end of the list, +and includes all the filenames. +Here’s our changed program +`readings_03.py`:

+ +
``````\$ cat ../code/readings_03.py
+``````
+
+ +
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    for filename in sys.argv[1:]:
+        for m in numpy.mean(data, axis=1):
+            print(m)
+
+if __name__ == '__main__':
+   main()
+``````
+
+ +

and here it is in action:

+ +
``````\$ python ../code/readings_03.py small-01.csv small-02.csv
+``````
+
+ +
``````0.333333333333
+1.0
+13.6666666667
+11.0
+``````
+
+ +
+

## The Right Way to Do It

+ +

At this point, +we have created three versions of our script called `readings_01.py`, +`readings_02.py`, and `readings_03.py`. +We wouldn’t do this in real life: +instead, +we would have one file called `readings.py` that we committed to version control +every time we got an enhancement working. +For teaching, +though, +we need all the successive versions side by side.

+
+ +

## Handling Command-Line Flags

+ +

The next step is to teach our program to pay attention to the `--min`, `--mean`, and `--max` flags. +These always appear before the names of the files, +so we could just do this:

+ +
``````\$ cat ../code/readings_04.py
+``````
+
+ +
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    action = sys.argv
+    filenames = sys.argv[2:]
+
+    for f in filenames:
+
+        if action == '--min':
+            values = numpy.min(data, axis=1)
+        elif action == '--mean':
+            values = numpy.mean(data, axis=1)
+        elif action == '--max':
+            values = numpy.max(data, axis=1)
+
+        for m in values:
+            print(m)
+
+if __name__ == '__main__':
+   main()
+``````
+
+ +

This works:

+ +
``````\$ python ../code/readings_04.py --max small-01.csv
+``````
+
+ +
``````1.0
+2.0
+``````
+
+ +

but there are several things wrong with it:

+ +
+
1. +

`main` is too large to read comfortably.

+
2. +
3. +

If we do not specify at least two additional arguments on the +command-line, one for the flag and one for the filename, but only +one, the program will not throw an exception but will run. It assumes that the file +list is empty, as `sys.argv` will be considered the `action`, even if it +is a filename. Silent failures like this +are always hard to debug.

+
4. +
5. +

The program should check if the submitted `action` is one of the three recognized flags.

+
6. +
+ +

This version pulls the processing of each file out of the loop into a function of its own. +It also checks that `action` is one of the allowed flags +before doing any processing, +so that the program fails fast:

+ +
``````\$ cat ../code/readings_05.py
+``````
+
+ +
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    action = sys.argv
+    filenames = sys.argv[2:]
+    assert action in ['--min', '--mean', '--max'], \
+           'Action is not one of --min, --mean, or --max: ' + action
+    for f in filenames:
+        process(f, action)
+
+def process(filename, action):
+
+    if action == '--min':
+        values = numpy.min(data, axis=1)
+    elif action == '--mean':
+        values = numpy.mean(data, axis=1)
+    elif action == '--max':
+        values = numpy.max(data, axis=1)
+
+    for m in values:
+        print(m)
+
+if __name__ == '__main__':
+   main()
+``````
+
+ +

This is four lines longer than its predecessor, +but broken into more digestible chunks of 8 and 12 lines.

+ +

## Handling Standard Input

+ +

The next thing our program has to do is read data from standard input if no filenames are given +so that we can put it in a pipeline, +redirect input to it, +and so on. +Let’s experiment in another script called `count_stdin.py`:

+ +
``````\$ cat ../code/count_stdin.py
+``````
+
+ +
``````import sys
+
+count = 0
+for line in sys.stdin:
+    count += 1
+
+print(count, 'lines in standard input')
+``````
+
+ +

This little program reads lines from a special “file” called `sys.stdin`, +which is automatically connected to the program’s standard input. +We don’t have to open it — Python and the operating system +take care of that when the program starts up — +but we can do almost anything with it that we could do to a regular file. +Let’s try running it as if it were a regular command-line program:

+ +
``````\$ python ../code/count_stdin.py < small-01.csv
+``````
+
+ +
``````2 lines in standard input
+``````
+
+ +

A common mistake is to try to run something that reads from standard input like this:

+ +
``````\$ python ../code/count_stdin.py small-01.csv
+``````
+
+ +

i.e., to forget the `<` character that redirects the file to standard input. +In this case, +there’s nothing in standard input, +so the program waits at the start of the loop for someone to type something on the keyboard. +Since there’s no way for us to do this, +our program is stuck, +and we have to halt it using the `Interrupt` option from the `Kernel` menu in the Notebook.

+ +

We now need to rewrite the program so that it loads data from `sys.stdin` if no filenames are provided. +Luckily, +`numpy.loadtxt` can handle either a filename or an open file as its first parameter, +so we don’t actually need to change `process`. +Only `main` changes:

+ +
``````\$ cat ../code/readings_06.py
+``````
+
+ +
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    action = sys.argv
+    filenames = sys.argv[2:]
+    assert action in ['--min', '--mean', '--max'], \
+           'Action is not one of --min, --mean, or --max: ' + action
+    if len(filenames) == 0:
+        process(sys.stdin, action)
+    else:
+        for f in filenames:
+            process(f, action)
+
+def process(filename, action):
+
+    if action == '--min':
+        values = numpy.min(data, axis=1)
+    elif action == '--mean':
+        values = numpy.mean(data, axis=1)
+    elif action == '--max':
+        values = numpy.max(data, axis=1)
+
+    for m in values:
+        print(m)
+
+if __name__ == '__main__':
+   main()
+``````
+
+ +

Let’s try it out:

+ +
``````\$ python ../code/readings_06.py --mean < small-01.csv
+``````
+
+ +
``````0.333333333333
+1.0
+``````
+
+ +

That’s better. +In fact, +that’s done: +the program now does everything we set out to do.

+ +
+

## Arithmetic on the Command Line

+ +

Write a command-line program that does addition and subtraction:

+ +
``````\$ python arith.py add 1 2
+``````
+
+ +
``````3
+``````
+
+ +
``````\$ python arith.py subtract 3 4
+``````
+
+ +
``````-1
+``````
+
+ +
+

## Solution

+
``````import sys
+
+def main():
+    assert len(sys.argv) == 4, 'Need exactly 3 arguments'
+
+    operator = sys.argv
+    assert operator in ['add', 'subtract', 'multiply', 'divide'], \
+        'Operator is not one of add, subtract, multiply, or divide: bailing out'
+    try:
+        operand1, operand2 = float(sys.argv), float(sys.argv)
+    except ValueError:
+        print('cannot convert input to a number: bailing out')
+        return
+
+    do_arithmetic(operand1, operator, operand2)
+
+def do_arithmetic(operand1, operator, operand2):
+
+        value = operand1 + operand2
+    elif operator == 'subtract':
+        value = operand1 - operand2
+    elif operator == 'multiply':
+        value = operand1 * operand2
+    elif operator == 'divide':
+        value = operand1 / operand2
+    print(value)
+
+main()
+``````
+
+
+
+ +
+

## Finding Particular Files

+ +

Using the `glob` module introduced earlier, +write a simple version of `ls` that shows files in the current directory with a particular suffix. +A call to this script should look like this:

+ +
``````\$ python my_ls.py py
+``````
+
+ +
``````left.py
+right.py
+zero.py
+``````
+
+ +
+

## Solution

+
``````import sys
+import glob
+
+def main():
+    '''prints names of all files with sys.argv as suffix'''
+    assert len(sys.argv) >= 2, 'Argument list cannot be empty'
+    suffix = sys.argv # NB: behaviour is not as you'd expect if sys.argv is *
+    glob_input = '*.' + suffix # construct the input
+    glob_output = sorted(glob.glob(glob_input)) # call the glob function
+    for item in glob_output: # print the output
+        print(item)
+    return
+
+main()
+``````
+
+
+
+ +
+

## Changing Flags

+ +

Rewrite `readings.py` so that it uses `-n`, `-m`, and `-x` instead of `--min`, `--mean`, and `--max` respectively. +Is the code easier to read? +Is the program easier to understand?

+ +
+

## Solution

+
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    action = sys.argv
+    filenames = sys.argv[2:]
+    assert action in ['-n', '-m', '-x'], \
+           'Action is not one of -n, -m, or -x: ' + action
+    if len(filenames) == 0:
+        process(sys.stdin, action)
+    else:
+        for f in filenames:
+            process(f, action)
+
+def process(filename, action):
+
+    if action == '-n':
+        values = numpy.min(data, axis=1)
+    elif action == '-m':
+        values = numpy.mean(data, axis=1)
+    elif action == '-x':
+        values = numpy.max(data, axis=1)
+
+    for m in values:
+        print(m)
+
+main()
+``````
+
+
+
+ +
+

+ +

Separately, +modify `readings.py` so that if no parameters are given +(i.e., no action is specified and no filenames are given), +it prints a message explaining how it should be used.

+ +
+

## Solution

+
``````# this is code/readings_08.py
+import sys
+import numpy
+
+def main():
+    script = sys.argv
+    if len(sys.argv) == 1: # no arguments, so print help message
+        print("""Usage: python readings_08.py action filenames
+              action must be one of --min --mean --max
+              if filenames is blank, input is taken from stdin;
+              otherwise, each filename in the list of arguments is processed in turn""")
+        return
+
+    action = sys.argv
+    filenames = sys.argv[2:]
+    assert action in ['--min', '--mean', '--max'], \
+           'Action is not one of --min, --mean, or --max: ' + action
+    if len(filenames) == 0:
+        process(sys.stdin, action)
+    else:
+        for f in filenames:
+            process(f, action)
+
+def process(filename, action):
+
+    if action == '--min':
+        values = numpy.min(data, axis=1)
+    elif action == '--mean':
+        values = numpy.mean(data, axis=1)
+    elif action == '--max':
+        values = numpy.max(data, axis=1)
+
+    for m in values:
+        print(m)
+
+main()
+``````
+
+
+
+ +
+

+ +

Separately, +modify `readings.py` so that if no action is given +it displays the means of the data.

+ +
+

## Solution

+
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    action = sys.argv
+    if action not in ['--min', '--mean', '--max']: # if no action given
+        action = '--mean'    # set a default action, that being mean
+        filenames = sys.argv[1:] # start the filenames one place earlier in the argv list
+    else:
+        filenames = sys.argv[2:]
+
+    if len(filenames) == 0:
+        process(sys.stdin, action)
+    else:
+        for f in filenames:
+            process(f, action)
+
+def process(filename, action):
+
+    if action == '--min':
+        values = numpy.min(data, axis=1)
+    elif action == '--mean':
+        values = numpy.mean(data, axis=1)
+    elif action == '--max':
+        values = numpy.max(data, axis=1)
+
+    for m in values:
+        print(m)
+
+main()
+``````
+
+
+
+ +
+

## A File-Checker

+ +

Write a program called `check.py` that takes the names of one or more inflammation data files as arguments +and checks that all the files have the same number of rows and columns. +What is the best way to test your program?

+ +
+

## Solution

+
``````import sys
+import numpy
+
+def main():
+    script = sys.argv
+    filenames = sys.argv[1:]
+    if len(filenames) <=1: #nothing to check
+        print('Only 1 file specified on input')
+    else:
+        nrow0, ncol0 = row_col_count(filenames)
+        print('First file %s: %d rows and %d columns' % (filenames, nrow0, ncol0))
+        for f in filenames[1:]:
+            nrow, ncol = row_col_count(f)
+            if nrow != nrow0 or ncol != ncol0:
+                print('File %s does not check: %d rows and %d columns' % (f, nrow, ncol))
+            else:
+                print('File %s checks' % f)
+        return
+
+def row_col_count(filename):
+    try:
+        nrow, ncol = numpy.loadtxt(filename, delimiter=',').shape
+    except ValueError: #get this if file doesn't have same number of rows and columns, or if it has non-numeric content
+        nrow, ncol = (0, 0)
+    return nrow, ncol
+
+main()
+``````
+
+
+
+ +
+

## Counting Lines

+ +

Write a program called `line_count.py` that works like the Unix `wc` command:

+ +
+
• If no filenames are given, it reports the number of lines in standard input.
• +
• If one or more filenames are given, it reports the number of lines in each, followed by the total number of lines.
• +
+ +
+

## Solution

+
``````import sys
+
+def main():
+    '''print each input filename and the number of lines in it,
+       and print the sum of the number of lines'''
+    filenames = sys.argv[1:]
+    sum_nlines = 0 #initialize counting variable
+
+    if len(filenames) == 0: # no filenames, just stdin
+        sum_nlines = count_file_like(sys.stdin)
+        print('stdin: %d' % sum_nlines)
+    else:
+        for f in filenames:
+            n = count_file(f)
+            print('%s %d' % (f, n))
+            sum_nlines += n
+        print('total: %d' % sum_nlines)
+
+def count_file(filename):
+    '''count the number of lines in a file'''
+    f = open(filename,'r')
+    f.close()
+    return(nlines)
+
+def count_file_like(file_like):
+    '''count the number of lines in a file-like object (eg stdin)'''
+    n = 0
+    for line in file_like:
+        n = n+1
+    return n
+
+main()
+
+``````
+
+
+
+ +
+

## Generate an Error Message

+ +

Write a program called `check_arguments.py` that prints usage +then exits the program if no arguments are provided. +(Hint: You can use `sys.exit()` to exit the program.)

+ +
``````\$ python check_arguments.py
+``````
+
+ +
``````usage: python check_argument.py filename.txt
+``````
+
+ +
``````\$ python check_arguments.py filename.txt
+``````
+
+ +
``````Thanks for specifying arguments!
+``````
+
+
+ + +
+

## Key Points

+
+ +
• The `sys` library connects a Python program to the system it is running on.

+
• + +
• The list `sys.argv` contains the command-line arguments that a program was run with.

+
• + +
• Avoid silent failures.

+
• + +
• The pseudo-file `sys.stdin` connects to a program’s standard input.

+
• + +
• The pseudo-file `sys.stdout` connects to a program’s standard output.

+
• + +
+
+ +
+ +
+
+

+
+
+ +
+
+

### + + lesson home + +

+
+
+ + + + + + + +
+ + + + + + + + diff --git a/_config.yml b/_config.yml index 44fdf0842e05cf3b6bc9b0379692bb882436fd13..8eba02f4e77a146f3a39111494356840a00e3d22 100644 --- a/_config.yml +++ b/_config.yml @@ -67,3 +67,7 @@ exclude: # Turn off built-in syntax highlighting. highlighter: false + +github: + url: '/swc-releases/2017.08/python-novice-inflammation' + diff --git a/about/index.html b/about/index.html new file mode 100644 index 0000000000000000000000000000000000000000..e15d9701234075e26e5c1029c630472bd4e3cc18 --- /dev/null +++ b/about/index.html @@ -0,0 +1,211 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: About + + +
+ + + + +

+ + +
+ +
+ Since 1998, + Software Carpentry + has been teaching researchers in science, engineering, medicine, and related disciplines + the computing skills they need to get more done in less time and with less pain. + Its volunteer instructors have run hundreds of events + for thousands of learners in the past two and a half years. +
+
+

+
+ +
+ Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. + Its target audience is researchers who have little to no prior computational experience, + and its lessons are domain specific, + building on learners' existing knowledge to enable them to quickly apply skills learned to their own research. +
+
+

+
+ +
+ Library Carpentry is made by librarians to help librarians + automate repetitive, boring, error-prone tasks; + create, maintain and analyse sustainable and reusable data; + work effectively with IT and systems colleagues; + better understand the use of software in research; + and much more. + Library Carpentry was the winner of the 2016 + British Library Labs Teaching and Learning Award. +
+
+ + + + + + + + +
+ + + + + + + + diff --git a/aio/index.html b/aio/index.html new file mode 100644 index 0000000000000000000000000000000000000000..7d9a00680ed4dbfc9a6cc733a413f801553143e6 --- /dev/null +++ b/aio/index.html @@ -0,0 +1,233 @@ + + + + + + + + + + + + + + + + + + + Programming with Python + + +
+ + + + +

# Programming with Python

+ + + +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ + + + + + + + +
+ + + + +

# Programming with Python: Contributor Code of Conduct

+ +

As contributors and maintainers of this project, +we pledge to respect all people who contribute through reporting issues, +posting feature requests, +updating documentation, +submitting pull requests or patches, +and other activities.

+ +

We are committed to making participation in this project a harassment-free experience for everyone, +regardless of level of experience, +gender, +gender identity and expression, +sexual orientation, +disability, +personal appearance, +body size, +race, +ethnicity, +age, +or religion.

+ +

Examples of unacceptable behavior by participants include the use of sexual language or imagery, +derogatory comments or personal attacks, +trolling, +public or private harassment, +insults, +or other unprofessional conduct.

+ +

Project maintainers have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to our Code of Conduct. +Project maintainers who do not follow the Code of Conduct may be removed from the project team.

+ +

Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our reporting guidelines.

+ + + + + + + + + + +
+ + + + + + + + diff --git a/discuss/index.html b/discuss/index.html new file mode 100644 index 0000000000000000000000000000000000000000..336ded617c6a0732c667c0a2556284de2304381b --- /dev/null +++ b/discuss/index.html @@ -0,0 +1,528 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Discussion + + +
+ + + + +

# Programming with Python: Discussion

+ +

## Rules of Debugging

+ +
+
1. Fail early, fail often.
2. +
3. Always initialize from data.
4. +
5. Know what it’s supposed to do.
6. +
7. Make it fail every time.
8. +
9. Make it fail fast.
10. +
11. Change one thing at a time, for a reason.
12. +
13. Keep track of what we’ve done.
14. +
15. Be humble.
16. +
17. Test the simple things first.
18. +
+ +

And remember, +a week of hard work can sometimes save you an hour of thought.

+ +

## The Call Stack

+ +

Let’s take a closer look at what happens when we call `fahr_to_celsius(32.0)`. +To make things clearer, +we’ll start by putting the initial value 32.0 in a variable +and store the final result in one as well:

+ +
``````original = 32.0
+final = fahr_to_celsius(original)
+``````
+
+ +

The diagram below shows what memory looks like after the first line has been executed:

+ + + +

When we call `fahr_to_celsius`, +Python doesn’t create the variable `temp` right away. +Instead, +it creates something called a stack frame +to keep track of the variables defined by `fahr_to_kelvin`. +Initially, +this stack frame only holds the value of `temp`:

+ + + +

When we call `fahr_to_kelvin` inside `fahr_to_celsius`, +Python creates another stack frame to hold `fahr_to_kelvin`’s variables:

+ + + +

It does this because there are now two variables in play called `temp`: +the parameter to `fahr_to_celsius`, +and the parameter to `fahr_to_kelvin`. +Having two variables with the same name in the same part of the program would be ambiguous, +so Python (and every other modern programming language) creates a new stack frame for each function call +to keep that function’s variables separate from those defined by other functions.

+ +

When the call to `fahr_to_kelvin` returns a value, +Python throws away `fahr_to_kelvin`’s stack frame +and creates a new variable in the stack frame for `fahr_to_celsius` to hold the temperature in Kelvin:

+ + + +

It then calls `kelvin_to_celsius`, +which means it creates a stack frame to hold that function’s variables:

+ + + +

Once again, +Python throws away that stack frame when `kelvin_to_celsius` is done +and creates the variable `result` in the stack frame for `fahr_to_celsius`:

+ + + +

Finally, +when `fahr_to_celsius` is done, +Python throws away its stack frame +and puts its result in a new variable called `final` +that lives in the stack frame we started with:

+ + + +

This final stack frame is always there; +it holds the variables we defined outside the functions in our code. +What it doesn’t hold is the variables that were in the various stack frames. +If we try to get the value of `temp` after our functions have finished running, +Python tells us that there’s no such thing:

+ +
``````print('final value of temp after all function calls:', temp)
+``````
+
+ +
``````---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-12-ffd9b4dbd5f1> in <module>()
+----> 1 print('final value of temp after all function calls:', temp)
+
+NameError: name 'temp' is not defined
+``````
+
+ +
``````final value of temp after all function calls:
+``````
+
+ +

Why go to all this trouble? +Well, +here’s a function called `span` that calculates the difference between +the minimum and maximum values in an array:

+ +
``````import numpy
+
+def span(a):
+    diff = numpy.max(a) - numpy.min(a)
+    return diff
+
+print('span of data:', span(data))
+``````
+
+ +
``````span of data: 20.0
+``````
+
+ +

Notice that `span` assigns a value to a variable called `diff`. +We might very well use a variable with the same name to hold data:

+ +
``````diff = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print('span of data:', span(diff))
+``````
+
+ +
``````span of data: 20.0
+``````
+
+ +

We don’t expect `diff` to have the value 20.0 after this function call, +so the name `diff` cannot refer to the same thing inside `span` as it does in the main body of our program. +And yes, +we could probably choose a different name than `diff` in our main program in this case, +but we don’t want to have to read every line of NumPy to see what variable names its functions use +before calling any of those functions, +just in case they change the values of our variables.

+ +

The big idea here is encapsulation, +and it’s the key to writing correct, comprehensible programs. +A function’s job is to turn several operations into one +so that we can think about a single function call +instead of a dozen or a hundred statements +each time we want to do something. +That only works if functions don’t interfere with each other; +if they do, +we have to pay attention to the details once again, +which quickly overloads our short-term memory.

+ +
+

## Following the Call Stack

+ +

We previously wrote functions called `fence` and `outer`. +Draw a diagram showing how the call stack changes when we run the following:

+ +
``````print(outer(fence('carbon', '+')))
+``````
+
+
+ +

## Image Grids

+ +

Let’s start by creating some simple heat maps of our own +using a library called `ipythonblocks`. +The first step is to create our own “image”:

+ +
``````from ipythonblocks import ImageGrid
+``````
+
+ +

Unlike the `import` statements we have seen earlier, +this one doesn’t load the entire `ipythonblocks` library. +Instead, +it just loads `ImageGrid` from that library, +since that’s the only thing we need (for now).

+ +

Once we have `ImageGrid` loaded, +we can use it to create a very simple grid of colored cells:

+ +
``````grid = ImageGrid(5, 3)
+grid.show()
+``````
+
+ + + +

Just like a NumPy array, +an `ImageGrid` has some properties that hold information about it:

+ +
``````print('grid width:', grid.width)
+print('grid height:', grid.height)
+print('grid lines on:', grid.lines_on)
+``````
+
+ +
``````grid width: 5
+grid height: 3
+grid lines on: True
+``````
+
+ +

The obvious thing to do with a grid like this is color in its cells, +but in order to do that, +we need to know how computers represent color. +The most common schemes are RGB, +which is short for “red, green, blue”. +RGB is an additive color model: +every shade is some combination of red, green, and blue intensities. +We can think of these three values as being the axes in a cube:

+ + + +

An RGB color is an example of a multi-part value: +like a Cartesian coordinate, +it is one thing with several parts. +We can represent such a value in Python using a tuple, +which we write using parentheses instead of the square brackets used for a list:

+ +
``````position = (12.3, 45.6)
+print('position is:', position)
+color = (10, 20, 30)
+print('color is:', color)
+``````
+
+ +
``````position is: (12.3, 45.6)
+color is: (10, 20, 30)
+``````
+
+ +

We can select elements from tuples using indexing, +just as we do with lists and arrays:

+ +
``````print('first element of color is:', color)
+``````
+
+ +
``````first element of color is: 10
+``````
+
+ +

Unlike lists and arrays, +though, +tuples cannot be changed after they are created — in technical terms, +they are immutable:

+ +
``````color = 40
+print('first element of color after change:', color)
+``````
+
+ +
``````---------------------------------------------------------------------------
+TypeError                                 Traceback (most recent call last)
+<ipython-input-11-9c3dd30a4e52> in <module>()
+----> 1 color = 40
+2 print('first element of color after change:', color)
+
+TypeError: 'tuple' object does not support item assignment
+``````
+
+ +

If a tuple represents an RGB color, +its red, green, and blue components can take on values between 0 and 255. +The upper bound may seem odd, +but it’s the largest number that can be represented in an 8-bit byte +(i.e., 28-1). +This makes it easy for computers to manipulate colors, +while providing fine enough gradations to fool most human eyes, +most of the time.

+ +

Let’s see what a few RGB colors actually look like:

+ +
``````row = ImageGrid(8, 1)
+row[0, 0] = (0, 0, 0)   # no color => black
+row[1, 0] = (255, 255, 255) # all colors => white
+row[2, 0] = (255, 0, 0) # all red
+row[3, 0] = (0, 255, 0) # all green
+row[4, 0] = (0, 0, 255) # all blue
+row[5, 0] = (255, 255, 0) # red and green
+row[6, 0] = (255, 0, 255) # red and blue
+row[7, 0] = (0, 255, 255) # green and blue
+row.show()
+``````
+
+ + + +

Simple color values like `(0,255,0)` are easy enough to decipher with a bit of practice, +but what color is `(214,90,127)`? +To help us, +`ipythonblocks` provides a function called `show_color`:

+ +
``````from ipythonblocks import show_color
+show_color(214, 90, 127)
+``````
+
+ + + +

It also provides a table of standard colors:

+ +
``````from ipythonblocks import colors
+c = ImageGrid(3, 2)
+c[0, 0] = colors['Fuchsia']
+c[0, 1] = colors['Salmon']
+c[1, 0] = colors['Orchid']
+c[1, 1] = colors['Lavender']
+c[2, 0] = colors['LimeGreen']
+c[2, 1] = colors['HotPink']
+c.show()
+``````
+
+ + + +
+

## Making a Colorbar

+ +

Fill in the `____` in the code below to create a bar that changes color from dark blue to black.

+ +
``````bar = ImageGrid(10, 1)
+for x in range(10):
+    bar[x, 0] = (0, 0, ____)
+bar.show()
+``````
+
+
+ +
+

## Why RGB?

+ +

Why do computers use red, green, and blue as their primary colors?

+
+ +
+

## Nested Loops

+ +

Will changing the nesting of the loops in the code above — i.e., +wrapping the Y-axis loop around the X-axis loop — change the final image? +Why or why not?

+
+ +
+

## Where to Change Data

+ +

Why did we transpose our data outside our heat map function? +Why not have the function perform the transpose?

+
+ +
+

## Return Versus Display

+ +

Why does the heat map function return the grid rather than displaying it immediately? +Do you think this is a good or bad design choice?

+
+ + + + + + + +
+ + + + + + + + diff --git a/figures/index.html b/figures/index.html new file mode 100644 index 0000000000000000000000000000000000000000..8a6f376fe2dbb16553865ed8a5a920d7af265782 --- /dev/null +++ b/figures/index.html @@ -0,0 +1,257 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Figures + + +
+ + + + +

# Programming with Python: Figures

+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + + + + + + + + + +
+ + + + + + + + diff --git a/guide/index.html b/guide/index.html new file mode 100644 index 0000000000000000000000000000000000000000..3b0e1b2f4da004a32c726bef3378de5d3d6ad494 --- /dev/null +++ b/guide/index.html @@ -0,0 +1,282 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Instructor Notes + + +
+ + + + +

# Programming with Python: Instructor Notes

+ +

## Legend

+ +

We are using a dataset with records on inflammation from patients following an +arthritis treatment.

+ +

We make reference in the lesson that this data is somehow strange. It is strange +because it is fabricated! The script used to generate the inflammation data +is included as `tools/gen_inflammation.py`.

+ +

## Overall

+ +

This lesson is written as an introduction to Python, +but its real purpose is to introduce the single most important idea in programming: +how to solve problems by building functions, +each of which can fit in a programmer’s working memory. +In order to teach that, +we must teach people a little about +the mechanics of manipulating data with lists and file I/O +so that their functions can do things they actually care about. +Our teaching order tries to show practical uses of every idea as soon as it is introduced; +instructors should resist the temptation to explain +the “other 90%” of the language +as well.

+ +

The final example asks them to build a command-line tool +that works with the Unix pipe-and-filter model. +We do this because it is a useful skill +and because it helps learners see that the software they use isn’t magical. +Tools like `grep` might be more sophisticated than +the programs our learners can write at this point in their careers, +but it’s crucial they realize this is a difference of scale rather than kind.

+ +

Explain that we use Python because:

+ +
+
• It’s free.
• +
• It has a lot of scientific libraries, and more are constantly being added.
• +
• It has a large scientific user community.
• +
• It’s easier for novices to learn than most of the mature alternatives. +(Software Carpentry originally used Perl; +when we switched, +we found that we could cover as much material in two days in Python +as we’d covered in three days in Perl, +and that retention was higher.)
• +
+ +

We do not include instructions on running the Jupyter Notebook in the tutorial +because we want to focus on the language rather than the tools. +Instructors should, however, walk learners through some basic operations:

+ +
+
• Launch from the command line with `jupyter notebook`.
• +
• Create a new notebook.
• +
• Enter code or data in a cell and execute it.
• +
• Explain the difference between `In[#]` and `Out[#]`.
• +
+ +

Watching the instructor grow programs step by step +is as helpful to learners as anything to do with Python. +Resist the urge to update a single cell repeatedly +(which is what you’d probably do in real life). +Instead, +clone the previous cell and write the update in the new copy +so that learners have a complete record of how the program grew. +Once you’ve done this, +you can say, +“Now why don’t we just break things into small functions right from the start?”

+ +

The discussion of command-line scripts +assumes that students understand standard I/O and building filters, +which are covered in the lesson on the shell.

+ +

## Frequently Argued Issues (FAI)

+ +
+
• +

`import ... as ...` syntax.

+ +

This syntax is commonly used in the scientific Python community; +it is explicitly recommended in documentation to `import numpy as np` +and `import matplotlib.pyplot as plt`. Despite that, we have decided +not to introduce aliasing imports in this novice lesson due to the +additional cognitive load it puts on students, despite the typing that +it saves. A good summary of arguments for and against can be found in +PR #61.

+ +

It is up to you as an individual instructor whether you want to introduce +these aliases when you teach this lesson, but we encourage you to please +read those arguments thoroughly before deciding one way or the other.

+
• +
• +

NumPy methods.

+ +

We used to use NumPy array methods in the first NumPy topic. +We switched these methods to the equivalent functions because a majority +of instructors supported the change; see +PR #244 +for detailed arguments for and against the change.

+
• +
• +

Underscores vs. hyphens in filenames

+ +

We used to use hyphens in filenames in order to signify that these Python +files should only be run as scripts and never imported. However, after some +discussion, +including an informal Twitter poll, we switched over to underscores because +many files that start off as Python scripts end up being imported eventually. +For that reason, we also added `if __name__ == '__main__'` guards around +`main()` calls, which is how real-world Python scripts ensure that imports +do not result in side-effects.

+
• +
+ +

After discussing the challenges is a good time to introduce the `b *= 2` syntax.

+ + + + + + + +
+ + + + + + + + diff --git a/index.html b/index.html new file mode 100644 index 0000000000000000000000000000000000000000..a38f30cfe75473c93fb4fb12035694450af53040 --- /dev/null +++ b/index.html @@ -0,0 +1,537 @@ + + + + + + + + + + + + + + + + + + + Programming with Python + + +
+ + + + +

# Programming with Python

+ +

The best way to learn how to program is to do something useful, +so this introduction to Python is built around a common scientific task: +data analysis.

+ +

Our real goal isn’t to teach you Python, +but to teach you the basic concepts that all programming depends on. +We use Python in our lessons because:

+ +
+
1. we have to use something for examples;
2. +
3. it’s free, well-documented, and runs almost everywhere;
4. +
5. it has a large (and growing) user base among scientists; and
6. +
7. experience shows that it’s easier for novices to pick up than most other languages.
8. +
+ +

But the two most important things are +to use whatever language your colleagues are using, +so that you can share your work with them easily, +and to use that language well.

+ +

We are studying inflammation in patients who have been given a new treatment for arthritis, +and need to analyze the first dozen data sets of their daily inflammation. +The data sets are stored in comma-separated values (CSV) format: +each row holds information for a single patient, +and the columns represent successive days. +The first few rows of our first file look like this:

+ +
``````0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
+0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
+0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
+0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1
+0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1
+``````
+
+ +

We want to:

+ +
+
• load that data into memory,
• +
• calculate the average inflammation per day across all patients, and
• +
• plot the result.
• +
+ +

To do all that, we’ll have to learn a little bit about programming.

+ +
+

## Prerequisites

+ +

Learners need to understand the concepts of files and directories +(including the working directory) and how to start a Python +interpreter before tackling this lesson. This lesson references the Jupyter (IPython) +Notebook although it can be taught through any Python interpreter. +The commands in this lesson pertain to Python 3.

+
+ +

+

+ + +
+

## Schedule

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 Setup Download files required for the lesson 00:00 + + 1. Analyzing Patient Data + + + + + How can I process tabular data files in Python? + + + + + + 00:30 + + 2. Repeating Actions with Loops + + + + + How can I do the same operations on many different values? + + + + + + 01:00 + + 3. Storing Multiple Values in Lists + + + + + How can I store many values together? + + + + + + 01:30 + + 4. Analyzing Data from Multiple Files + + + + + How can I do the same operations on many different files? + + + + + + 01:50 + + 5. Making Choices + + + + + How can my programs do different things based on data values? + + + + + + 02:20 + + 6. Creating Functions + + + + + How can I define new functions? + + + + + + What’s the difference between defining and calling a function? + + + + + + What happens when I call a function? + + + + + + 02:50 + + 7. Errors and Exceptions + + + + + How does Python report errors? + + + + + + How can I handle errors in Python programs? + + + + + + 03:20 + + 8. Defensive Programming + + + + + How can I make my programs more reliable? + + + + + + 03:50 + + 9. Debugging + + + + + How can I debug my program? + + + + + + 04:20 + + 10. Command-Line Programs + + + + + How can I write Python programs that will work like Unix command-line tools? + + + + + + 04:50 Finish
+ +

+ The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor. +

+ +
+ + + + + + + +
+ + + + + + + + diff --git a/license/index.html b/license/index.html new file mode 100644 index 0000000000000000000000000000000000000000..4635602b46aa5f3ebb8d94b14739570f717a7d46 --- /dev/null +++ b/license/index.html @@ -0,0 +1,247 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Licenses + + +
+ + + + +

+ +

## Instructional Material

+ +

All Software Carpentry and Data Carpentry instructional material is +made available under the Creative Commons Attribution +license. The following is a human-readable summary of +(and not a substitute for) the full legal text of the CC BY 4.0 +license.

+ +

You are free:

+ +
+
• to Share—copy and redistribute the material in any medium or format
• +
• to Adapt—remix, transform, and build upon the material
• +
+ +

for any purpose, even commercially.

+ +

The licensor cannot revoke these freedoms as long as you follow the +license terms.

+ +

Under the following terms:

+ +
+
• Attribution—You must give appropriate credit (mentioning that +your work is derived from work that is Copyright © Software +Carpentry and, where practical, linking to +http://software-carpentry.org/), provide a link to the +license, and indicate if changes were made. You may do +so in any reasonable manner, but not in any way that suggests the +licensor endorses you or your use.
• +
+ +

No additional restrictions—You may not apply legal terms or +technological measures that legally restrict others from doing +anything the license permits. With the understanding that:

+ +

Notices:

+ +
+
• You do not have to comply with the license for elements of the +material in the public domain or where your use is permitted by an +applicable exception or limitation.
• +
• No warranties are given. The license may not give you all of the +permissions necessary for your intended use. For example, other +rights such as publicity, privacy, or moral rights may limit how you +use the material.
• +
+ +

## Software

+ +

Except where otherwise noted, the example programs and other software +provided by Software Carpentry and Data Carpentry are made available under the +OSI-approved +MIT license.

+ +

Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +“Software”), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions:

+ +

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+ +

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+ +

+ +

“Software Carpentry” and “Data Carpentry” and their respective logos +are registered trademarks of NumFOCUS.

+ + + + + + + + +
+ + + + + + + + diff --git a/reference/index.html b/reference/index.html new file mode 100644 index 0000000000000000000000000000000000000000..75b158d0a96026683b9e497be40cf8251e3f0b7d --- /dev/null +++ b/reference/index.html @@ -0,0 +1,686 @@ + + + + + + + + + + + + + + + + + + + Programming with Python + + +
+ + + + +

# Programming with Python

+ + +

## Key Points

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
 + Analyzing Patient Data + + + + Import a library into a program using `import libraryname`. + + + Use the `numpy` library to work with arrays in Python. + + + Use `variable = value` to assign a value to a variable in order to record it in memory. + + + Variables are created on demand whenever a value is assigned to them. + + + Use `print(something)` to display the value of `something`. + + + The expression `array.shape` gives the shape of an array. + + + Use `array[x, y]` to select a single element from a 2D array. + + + Array indices start at 0, not 1. + + + Use `low:high` to specify a `slice` that includes the indices from `low` to `high-1`. + + + All the indexing and slicing that works on arrays also works on strings. + + + Use `# some kind of explanation` to add comments to programs. + + + Use `numpy.mean(array)`, `numpy.max(array)`, and `numpy.min(array)` to calculate simple statistics. + + + Use `numpy.mean(array, axis=0)` or `numpy.mean(array, axis=1)` to calculate statistics across the specified axis. + + + Use the `pyplot` library from `matplotlib` for creating simple visualizations. + + + + + Repeating Actions with Loops + + + + Use `for variable in sequence` to process the elements of a sequence one at a time. + + + The body of a `for` loop must be indented. + + + Use `len(thing)` to determine the length of something that contains other values. + + + + + Storing Multiple Values in Lists + + + + `[value1, value2, value3, ...]` creates a list. + + + Lists are indexed and sliced in the same way as strings and arrays. + + + Lists are mutable (i.e., their values can be changed in place). + + + Strings are immutable (i.e., the characters in them cannot be changed). + + + + + Analyzing Data from Multiple Files + + + + Use `glob.glob(pattern)` to create a list of files whose names match a pattern. + + + Use `*` in a pattern to match zero or more characters, and `?` to match any single character. + + + + + Making Choices + + + + Use `if condition` to start a conditional statement, `elif condition` to provide additional tests, and `else` to provide a default. + + + The bodies of the branches of conditional statements must be indented. + + + Use `==` to test for equality. + + + `X and Y` is only true if both `X` and `Y` are true. + + + `X or Y` is true if either `X` or `Y`, or both, are true. + + + Zero, the empty string, and the empty list are considered false; all other numbers, strings, and lists are considered true. + + + Nest loops to operate on multi-dimensional data. + + + Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior. + + + + + Creating Functions + + + + Define a function using `def name(...params...)`. + + + The body of a function must be indented. + + + Call a function using `name(...values...)`. + + + Numbers are stored as integers or floating-point numbers. + + + Integer division produces the whole part of the answer (not the fractional part). + + + Each time a function is called, a new stack frame is created on the call stack to hold its parameters and local variables. + + + Python looks for variables in the current stack frame before looking for them at the top level. + + + Use `help(thing)` to view help for something. + + + Put docstrings in functions to provide help for that function. + + + Specify default values for parameters when defining a function using `name=value` in the parameter list. + + + Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used). + + + + + Errors and Exceptions + + + + Tracebacks can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was. + + + An error having to do with the ‘grammar’ or syntax of the program is called a `SyntaxError`. If the issue has to do with how the code is indented, then it will be called an `IndentationError`. + + + A `NameError` will occur if you use a variable that has not been defined, either because you meant to use quotes around a string, you forgot to define the variable, or you just made a typo. + + + Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an `IndexError`. + + + Trying to read a file that does not exist will give you an `FileNotFoundError`. Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an `IOError`. + + + + + Defensive Programming + + + + Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do. + + + Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work. + + + Use preconditions to check that the inputs to a function are safe to use. + + + Use postconditions to check that the output from a function is safe to use. + + + Write tests before writing code in order to help determine exactly what that code is supposed to do. + + + + + Debugging + + + + Know what code is supposed to do before trying to debug it. + + + Make it fail every time. + + + Make it fail fast. + + + Change one thing at a time, and for a reason. + + + Keep track of what you’ve done. + + + Be humble. + + + + + Command-Line Programs + + + + The `sys` library connects a Python program to the system it is running on. + + + The list `sys.argv` contains the command-line arguments that a program was run with. + + + Avoid silent failures. + + + The pseudo-file `sys.stdin` connects to a program’s standard input. + + + The pseudo-file `sys.stdout` connects to a program’s standard output. + + + +
+ +

## Glossary

+ +
+
+
A way to represent colors as the sum of contributions from primary colors +such as red, green, and blue.
+
argument
+
A value given to a function or program when it runs. +The term is often used interchangeably (and inconsistently) with parameter.
+
assertion
+
An expression which is supposed to be true at a particular point in a program. +Programmers typically put assertions in their code to check for errors; +if the assertion fails (i.e., if the expression evaluates as false), +the program halts and produces an error message. +See also: invariant, precondition, postcondition.
+
assign
+
To give a value a name by associating a variable with it.
+
body
+
(of a function): the statements that are executed when a function runs.
+
call stack
+
A data structure inside a running program that keeps track of active function calls.
+
case-insensitive
+
Treating text as if upper and lower case characters of the same letter were the same. +See also: case-sensitive.
+
case-sensitive
+
Treating text as if upper and lower case characters of the same letter are different. +See also: case-insensitive.
+
comment
+
A remark in a program that is intended to help human readers understand what is going on, +but is ignored by the computer. +Comments in Python, R, and the Unix shell start with a `#` character and run to the end of the line; +comments in SQL start with `--`, +and other languages have other conventions.
+
compose
+
To apply one function to the result of another, such as `f(g(x))`.
+
conditional statement
+
A statement in a program that might or might not be executed +depending on whether a test is true or false.
+
comma-separated values
+
(CSV) A common textual representation for tables +in which the values in each row are separated by commas.
+
default value
+
A value to use for a parameter if nothing is specified explicitly.
+
defensive programming
+
The practice of writing programs that check their own operation to catch errors as early as possible.
+
delimiter
+
A character or characters used to separate individual values, +such as the commas between columns in a CSV file.
+
docstring
+
Short for “documentation string”, +this refers to textual documentation embedded in Python programs. +Unlike comments, docstrings are preserved in the running program +and can be examined in interactive sessions.
+
documentation
+
Human-language text written to explain what software does, +how it works, or how to use it.
+
dotted notation
+
A two-part notation used in many programming languages +in which `thing.component` refers to the `component` belonging to `thing`.
+
empty string
+
A character string containing no characters, +often thought of as the “zero” of text.
+
encapsulation
+
The practice of hiding something’s implementation details +so that the rest of a program can worry about what it does +rather than how it does it.
+
floating-point number
+
A number containing a fractional part and an exponent. +See also: integer.
+
for loop
+
A loop that is executed once for each value in some kind of set, list, or range. +See also: while loop.
+
function
+
A group of instructions (i.e., lines of code) that transform +some input arguments to some output.
+
function call
+
A use of a function in another piece of software.
+
immutable
+
Unchangeable. +The value of immutable data cannot be altered after it has been created. +See also: mutable.
+
import
+
To load a library into a program.
+
in-place operators
+
An operator such as `+=` that provides a shorthand notation for +the common case in which the variable being assigned to +is also an operand on the right hand side of the assignment. +For example, the statement `x += 3` means the same thing as `x = x + 3`.
+
index
+
A subscript that specifies the location of a single value in a collection, +such as a single pixel in an image.
+
inner loop
+
+
integer
+
+
invariant
+
An expression whose value doesn’t change during the execution of a program, +typically used in an assertion. +See also: precondition, postcondition.
+
library
+
A family of code units (functions, classes, variables) that implement a set of +related tasks.
+
loop variable
+
The variable that keeps track of the progress of the loop.
+
member
+
A variable contained within an object.
+
method
+
A function which is tied to a particular object. +Each of an object’s methods typically implements one of the things it can do, +or one of the questions it can answer.
+
object
+
A collection of conceptually related variables (members) and +functions using those variables (methods).
+
outer loop
+
+
parameter
+
A variable named in the function’s declaration that is used to hold a value passed into the call. +The term is often used interchangeably (and inconsistently) with argument.
+
pipe
+
A connection from the output of one program to the input of another. +When two or more programs are connected in this way, they are called a “pipeline”.
+
postcondition
+
A condition that a function (or other block of code) guarantees is true +once it has finished running. +Postconditions are often represented using assertions.
+
precondition
+
A condition that must be true in order for a function (or other block of code) to run correctly.
+
regression
+
To re-introduce a bug that was once fixed.
+
return statement
+
A statement that causes a function to stop executing and return a value to its caller immediately.
+
RGB
+
An additive model +that represents colors as combinations of red, green, and blue. +Each color’s value is typically in the range 0..255 +(i.e., a one-byte integer).
+
sequence
+
A collection of information that is presented in a specific order. +For example, in Python, a string is a sequence of characters, +while a list is a sequence of any variable.
+
shape
+
An array’s dimensions, represented as a vector. +For example, a 5×3 array’s shape is `(5,3)`.
+
silent failure
+
Failing without producing any warning messages. +Silent failures are hard to detect and debug.
+
slice
+
A regular subsequence of a larger sequence, +such as the first five elements or every second element.
+
stack frame
+
A data structure that provides storage for a function’s local variables. +Each time a function is called, a new stack frame is created +and put on the top of the call stack. When the function returns, +the stack frame is discarded.
+
standard input
+
A process’s default input stream. +In interactive command-line applications, +it is typically connected to the keyboard; in a pipe, +it receives data from the standard output of the preceding process.
+
standard output
+
A process’s default output stream. +In interactive command-line applications, +data sent to standard output is displayed on the screen; +in a pipe, +it is passed to the standard input of the next process.
+
string
+
Short for “character string”, +a sequence of zero or more characters.
+
syntax error
+
A programming error that occurs when statements are in an order or contain characters +not expected by the programming language.
+
test oracle
+
A program, device, data set, or human being +against which the results of a test can be compared.
+
test-driven development
+
The practice of writing unit tests before writing the code they test.
+
traceback
+
The sequence of function calls that led to an error.
+
tuple
+
An immutable sequence of values.
+
type
+
The classification of something in a program (for example, the contents of a variable) +as a kind of number (e.g. floating-point, integer), string, or something else.
+
type of error
+
Indicates the nature of an error in a program. For example, in Python, +an `IOError` to problems with file input/output. +See also: syntax error.
+
while loop
+
A loop that keeps executing as long as some condition is true. +See also: for loop.
+
+ + + + + + + + +
+ + + + + + + + diff --git a/setup/index.html b/setup/index.html new file mode 100644 index 0000000000000000000000000000000000000000..ae5d8795c11b20ca50435e9cbea5739571e548d5 --- /dev/null +++ b/setup/index.html @@ -0,0 +1,201 @@ + + + + + + + + + + + + + + + + + + + Programming with Python: Setup + + +
+ + + + +

# Programming with Python: Setup

+ +

In preparation for this lesson, you will need to download two zipped files and place them in the specified folder:

+ +
+
1. Make a new folder in your Desktop called `python-novice-inflammation`.
2. +
4. +
6. +
7. If the files aren’t unzipped yet, double-click to unzip them. You should end up with +two new folders called `data` and `code`.
8. +
9. To get started, go into the `data` folder from the Unix shell with:
10. +
+ +
``````\$ cd
+\$ cd Desktop/python-novice-inflammation/data
+``````
+
+ +

If you will be using the Jupyter (IPython) notebook for the lesson, +you should have already +installed Anaconda +which includes the notebook.

+ +

To start the notebook, open a terminal or git bash and type the command:

+ +
``````\$ jupyter notebook
+``````
+
+ +

To start the Python interpreter without the notebook, open a terminal or git bash and type the command:

+ +
``````\$ python
+``````
+
+ + + + + + + + +
+ + + + + + + +