Commit 0dd391df authored by Greg Wilson's avatar Greg Wilson
Browse files

Converting existing material to new format

parent befe9d73
python-novice-inflammation
==========================
Introduction to Python for non-programmers using inflammation data.
An introduction to Python for non-programmers using inflammation data.
> Please see [https://github.com/swcarpentry/lesson-example](https://github.com/swcarpentry/lesson-example)
> for instructions on formatting, building, and submitting lessons,
> or run `make` in this directory for a list of helpful commands.
See [the lesson template documentation][lesson-example]
for instructions on formatting, building, and submitting material,
or run `make` in this directory for a list of helpful commands.
Maintainers:
* [Trevor Bekolay](http://software-carpentry.org/team/#jackson_m)
* [Valentina Staneva](http://software-carpentry.org/team/#staneva_valentina)
* [Trevor Bekolay][bekolay_trevor]
* [Valentina Staneva][staneva_valentina]
[bekolay_trevor]: http://software-carpentry.org/team/#bekolay_trevor
[staneva_valentina]: http://software-carpentry.org/team/#staneva_valentina
This diff is collapsed.
---
layout: page
title: Programming with Python
subtitle: Repeating Actions with Loops
minutes: 30
title: Repeating Actions with Loops
teaching: 30
exercises: 0
questions:
- "FIXME"
objectives:
- "Explain what a for loop does."
- "Correctly write for loops to repeat simple calculations."
- "Trace changes to a loop variable as the loop runs."
- "Trace changes to other variables as they are updated by a for loop."
keypoints:
- "FIXME"
---
> ## Learning Objectives {.objectives}
>
> * Explain what a for loop does.
> * Correctly write for loops to repeat simple calculations.
> * Trace changes to a loop variable as the loop runs.
> * Trace changes to other variables as they are updated by a for loop.
In the last lesson,
we wrote some code that plots some values of interest from our first inflammation dataset,
and reveals some suspicious features in it, such as from `inflammation-01.csv`
![Analysis of inflammation-01.csv](fig/03-loop_2_0.png)\
![Analysis of inflammation-01.csv]({{ site.github.url }}/fig/03-loop_2_0.png)
We have a dozen data sets right now, though, and more on the way.
We want to create plots for all of our data sets with a single statement.
......@@ -24,26 +26,30 @@ To do that, we'll have to teach the computer how to repeat things.
An example task that we might want to repeat is printing each character in a
word on a line of its own.
~~~ {.python}
~~~
word = 'lead'
~~~
{: .python}
We can access a character in a string using its index. For example, we can get the first
character of the word 'lead', by using word[0]. One way to print each character is to use
four `print` statements:
~~~ {.python}
~~~
print(word[0])
print(word[1])
print(word[2])
print(word[3])
~~~
~~~ {.output}
{: .python}
~~~
l
e
a
d
~~~
{: .output}
This is a bad approach for two reasons:
......@@ -57,7 +63,7 @@ This is a bad approach for two reasons:
and if we give it a shorter one,
it produces an error because we're asking for characters that don't exist.
~~~ {.python}
~~~
word = 'tin'
print(word[0])
print(word[1])
......@@ -65,12 +71,16 @@ print(word[2])
print(word[3])
~~~
~~~ {.output}
{: .python}
~~~
t
i
n
~~~
~~~ {.error}
{: .output}
~~~
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-7974b6cdaf14> in <module>()
......@@ -80,34 +90,37 @@ IndexError Traceback (most recent call last)
IndexError: string index out of range
~~~
{: .error}
Here's a better approach:
~~~ {.python}
~~~
word = 'lead'
for char in word:
print(char)
~~~
{: .python}
~~~ {.output}
~~~
l
e
a
d
~~~
{: .output}
This is shorter---certainly shorter than something that prints every character in a hundred-letter string---and
more robust as well:
~~~ {.python}
~~~
word = 'oxygen'
for char in word:
print(char)
~~~
{: .python}
~~~ {.output}
~~~
o
x
y
......@@ -115,40 +128,44 @@ g
e
n
~~~
{: .output}
The improved version uses a [for loop](reference.html#for-loop)
to repeat an operation---in this case, printing---once for each thing in a collection.
The general form of a loop is:
~~~ {.python}
~~~
for variable in collection:
do things with variable
~~~
{: .python}
Using the oxygen example above, the loop might look like this:
![loop_image](./fig/loops_image.png)
Where each character (`char`) in the variable `word` is looped through and printed one character after another. The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop).
![loop_image]({{ site.github.url }}/fig/loops_image.png)
where each character (`char`) in the variable `word` is looped through and printed one character after another.
The numbers in the diagram denote which loop cycle the character was printed in (1 being the first loop, and 6 being the final loop).
We can call the [loop variable](reference.html#loop-variable) anything we like,
but there must be a colon at the end of the line starting the loop,
and we must indent anything we want to run inside the loop. Unlike many other languages, there is no
command to signify the end of the loop body (e.g. end for); what is indented after the for statement belongs to the loop.
Here's another loop that repeatedly updates a variable:
~~~ {.python}
~~~
length = 0
for vowel in 'aeiou':
length = length + 1
print('There are', length, 'vowels')
~~~
{: .python}
~~~ {.output}
~~~
There are 5 vowels
~~~
{: .output}
It's worth tracing the execution of this little program step by step.
Since there are five characters in `'aeiou'`,
......@@ -172,38 +189,41 @@ Note that a loop variable is just a variable that's being used to record progres
It still exists after the loop is over,
and we can re-use variables previously defined as loop variables as well:
~~~ {.python}
~~~
letter = 'z'
for letter in 'abc':
print(letter)
print('after the loop, letter is', letter)
~~~
{: .python}
~~~ {.output}
~~~
a
b
c
after the loop, letter is c
~~~
{: .output}
Note also that finding the length of a string is such a common operation
that Python actually has a built-in function to do it called `len`:
~~~ {.python}
~~~
print(len('aeiou'))
~~~
{: .python}
~~~ {.output}
~~~
5
~~~
{: .output}
`len` is much faster than any function we could write ourselves,
and much easier to read than a two-line loop;
it will also give us the length of many other things that we haven't met yet,
so we should always use it when we can.
> ## From 1 to N {.challenge}
> ## From 1 to N
>
> Python has a built-in function called `range` that creates a sequence of numbers. Range can
> accept 1-3 parameters. If one parameter is input, range creates an array of that length,
......@@ -216,28 +236,35 @@ so we should always use it when we can.
> Using `range`,
> write a loop that uses `range` to print the first 3 natural numbers:
>
> ~~~ {.python}
> ~~~
> 1
> 2
> 3
> ~~~
> {: .python}
{: .challenge}
> ## Computing powers with loops {.challenge}
> ## Computing Powers With Loops
>
> Exponentiation is built into Python:
>
> ~~~ {.python}
> ~~~
> print(5 ** 3)
> ~~~
> ~~~ {.output}
> {: .python}
>
> ~~~
> 125
> ~~~
> {: .output}
>
> Write a loop that calculates the same result as `5 ** 3` using
> multiplication (and without exponentiation).
{: .challenge}
> ## Reverse a string {.challenge}
> ## Reverse a String
>
> Write a loop that takes a string,
> and produces a new string with the characters in reverse order,
> so `'Newton'` becomes `'notweN'`.
{: .challenge}
---
layout: page
title: Programming with Python
subtitle: Storing Multiple Values in Lists
minutes: 30
title: Storing Multiple Values in Lists
teaching: 30
exercises: 0
questions:
- "FIXME"
objectives:
- "Explain what a list is."
- "Create and index lists of simple values."
keypoints:
- "FIXME"
---
> ## Learning Objectives {.objectives}
>
> * Explain what a list is.
> * Create and index lists of simple values.
Just as a `for` loop is a way to do operations many times,
a list is a way to store many values.
......@@ -16,65 +18,74 @@ lists are built into the language (so we don't have to load a library
to use them).
We create a list by putting values inside square brackets:
~~~ {.python}
~~~
odds = [1, 3, 5, 7]
print('odds are:', odds)
~~~
{: .python}
~~~ {.output}
~~~
odds are: [1, 3, 5, 7]
~~~
{: .output}
We select individual elements from lists by indexing them:
~~~ {.python}
~~~
print('first and last:', odds[0], odds[-1])
~~~
{: .python}
~~~ {.output}
~~~
first and last: 1 7
~~~
{: .output}
and if we loop over a list,
the loop variable is assigned elements one at a time:
~~~ {.python}
~~~
for number in odds:
print(number)
~~~
{: .python}
~~~ {.output}
~~~
1
3
5
7
~~~
{: .output}
There is one important difference between lists and strings:
we can change the values in a list,
but we cannot change the characters in a string.
For example:
~~~ {.python}
~~~
names = ['Newton', 'Darwing', 'Turing'] # typo in Darwin's name
print('names is originally:', names)
names[1] = 'Darwin' # correct the name
print('final value of names:', names)
~~~
{: .python}
~~~ {.output}
~~~
names is originally: ['Newton', 'Darwing', 'Turing']
final value of names: ['Newton', 'Darwin', 'Turing']
~~~
{: .output}
works, but:
~~~ {.python}
~~~
name = 'Bell'
name[0] = 'b'
~~~
{: .python}
~~~ {.error}
~~~
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-220df48aeb2e> in <module>()
......@@ -83,10 +94,11 @@ TypeError Traceback (most recent call last)
TypeError: 'str' object does not support item assignment
~~~
{: .error}
does not.
> ## Ch-Ch-Ch-Changes {.callout}
> ## Ch-Ch-Ch-Changes
>
> Data which can be modified in place is called [mutable](reference.html#mutable),
> while data which cannot be modified is called [immutable](reference.html#immutable).
......@@ -106,134 +118,159 @@ does not.
> Because of pitfalls like this, code which modifies data in place can be more difficult to understand. However,
> it is often far more efficient to modify a large data structure in place than to create a modified copy for
> every small change. You should consider both of these aspects when writing your code.
{: .callout}
> ## Nested Lists {.callout}
> ## Nested Lists
> Since lists can contain any Python variable, it can even contain other lists.
>
> For example, we could represent the products in the shelves of a small grocery shop:
>
> ~~~ {.python}
> ~~~
> x = [['pepper', 'zucchini', 'onion'],
> ['cabbage', 'lettuce', 'garlic'],
> ['apple', 'pear', 'banana']]
> ~~~
>
> {: .python}
>
> Here is a visual example of how indexing a list of lists `x` works:
>
> <a href='https://twitter.com/hadleywickham/status/643381054758363136'>
> ![The first element of a list. Adapted from @hadleywickham's tweet about R > lists.](img/indexing_lists_python.png)</a>
> ![The first element of a list. Adapted from @hadleywickham's tweet about R lists.]({{ site.github.url }}/fig/indexing_lists_python.png)</a>
>
> Using the previously declared list `x`, these would be the results of the
> index operations shown in the image:
>
> ~~~ {.python}
> ~~~
> print([x[0]])
> ~~~
> {: .python}
>
> ~~~ {.output}
> ~~~
> [['pepper', 'zucchini', 'onion']]
> ~~~
> {: .output}
>
> ~~~ {.python}
> ~~~
> print(x[0])
> ~~~
> {: .python}
>
> ~~~ {.output}
> ~~~
> ['pepper', 'zucchini', 'onion']
> ~~~
> {: .output}
>
> ~~~ {.python}
> ~~~
> print(x[0][0])
> ~~~
> {: .python}
>
> ~~~ {.output}
> ~~~
> 'pepper'
> ~~~
> {: .output}
>
> Thanks to [Hadley Wickham](https://twitter.com/hadleywickham/status/643381054758363136)
> for the image above.
{: .callout}
There are many ways to change the contents of lists besides assigning new values to
individual elements:
~~~ {.python}
~~~
odds.append(11)
print('odds after adding a value:', odds)
~~~
~~~ {.output}
{: .python}
~~~
odds after adding a value: [1, 3, 5, 7, 11]
~~~
{: .output}
~~~ {.python}
~~~
del odds[0]
print('odds after removing the first element:', odds)
~~~
~~~ {.output}
{: .python}
~~~
odds after removing the first element: [3, 5, 7, 11]
~~~
{: .output}
~~~ {.python}
~~~
odds.reverse()
print('odds after reversing:', odds)
~~~
~~~ {.output}
{: .python}
~~~
odds after reversing: [11, 7, 5, 3]
~~~
{: .output}
While modifying in place, it is useful to remember that Python treats lists in a slightly counterintuitive way.
If we make a list and (attempt to) copy it then modify in place, we can cause all sorts of trouble:
~~~ {.python}
~~~
odds = [1, 3, 5, 7]
primes = odds
primes += [2]
print('primes:', primes)
print('odds:', odds)
~~~
~~~ {.output}
{: .python}
~~~
primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7, 2]
~~~
{: .output}
This is because Python stores a list in memory, and then can use multiple names to refer to the same list.
If all we want to do is copy a (simple) list, we can use the `list` function, so we do not modify a list we did not mean to:
~~~ {.python}
~~~
odds = [1, 3, 5, 7]
primes = list(odds)
primes += [2]
print('primes:', primes)
print('odds:', odds)
~~~
~~~ {.output}
{: .python}
~~~
primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7]
~~~
{: .output}
This is different from how variables worked in lesson 1, and more similar to how a spreadsheet works.
> ## Turn a string into a list {.challenge}
> ## Turn a String Into a List
>
> Use a for-loop to convert the string "hello" into a list of letters:
>
> ~~~ {.python}
> ~~~
> ["h", "e", "l", "l", "o"]
> ~~~
> {: .python}
>
> Hint: You can create an empty list like this:
>
> ~~~ {.python}
> ~~~
> my_list = []
> ~~~
> {: .python}
{: .challenge}
Subsets of lists and strings can be accessed by specifying ranges of values in brackets,
similar to how we accessed ranges of positions in a Numpy array.
This is commonly referred to as "slicing" the list/string.
~~~ {.python}
~~~
binomial_name = "Drosophila melanogaster"
group = binomial_name[0:10]
print("group:", group)
......@@ -248,30 +285,38 @@ print("autosomes:", autosomes)
last = chromosomes[-1]
print("last:", last)
~~~
~~~ {.output}
{: .python}
~~~
group: Drosophila
species: melanogaster
autosomes: ["2", "3", "4"]
last: 4
~~~
{: .output}
> ## Slicing from the end {.challenge}
> ## Slicing From the End
> Use slicing to access only the last four characters of a string or entries of a list.
>
> ~~~ {.python}
> ~~~
> string_for_slicing = "Observation date: 02-Feb-2013"
> list_for_slicing = [["fluorine", "F"], ["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
> ~~~
> ~~~ {.output}
> {: .python}
>
> ~~~
> "2013"
> [["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
> ~~~
> {: .output}
>
> Would your solution work regardless of whether you knew beforehand
> the length of the string or list
> (e.g. if you wanted to apply the solution to a set of lists of different lengths)?
> If not, try to change your approach to make it more robust.
{: .challenge}
> ## Non-continuous slices {.challenge}
> ## Non-Continuous Slices
> So far we've seen how to use slicing to take single blocks
> of successive entries from a sequence.
> But what if we want to take a subset of entries
......@@ -281,57 +326,70 @@ last: 4
> to the range within the brackets, called the _step size_.
> The example below shows how you can take every third entry in a list:
>
> ~~~ {.python}
> ~~~
> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
> subset = primes[0:12:3]
> print("subset", subset)
> ~~~