Commit c2bc3766 authored by thc's avatar thc
Browse files

refactoring swc gits

parents
MIT License
Copyright (c) 2018 swc-bb
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# SWC/DC workshops at Potsdam, Germany
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/swc-bb/4learners_python/master)
This repository contains materials for [Software Carpentry](https://software-carpentry.org/) and [Data Carpentry](http://www.datacarpentry.org/) workshops hosted at Potsdam, Germany.
The present branch of the repository relates to the **Python Novice workshop -- An Introduction to Scientific Computing and Reproducible Research** held at [Potsdam Institute for Climate Impact Research (PIK)](https://www.pik-potsdam.de/), in Potsdam, Germany, on **22-23rd of February 2018**.
The homepage with a detailed schedule for this particular workshop is found [here](https://swc-bb.github.io/2018-02-22-Potsdam-Berlin/).
The etherpad we used during the workshop if found [here](http://pad.software-carpentry.org/2018-02-22-Potsdam-Berlin).
In order to re-run the workshop materials we encourage you to use the [conda](https://conda.io/docs/) package manager. Once installed, create an environment and install all required dependencies on your machine by typing
`conda env create -f environment.yml`
into your console. You activate your new environment by typing
`source activate python-workshop` (on LINUX and Mac) or
`activate python-workshop` (on WINDOWS).
Then you are ready to go (if you are stuck check out the [conda documentation site](https://conda.io/docs/user-guide/tasks/manage-environments.html#)). Alternatively, you may launch [binder](https://binderhub.readthedocs.io/en/latest/) to get a reproducible executable environment immediately in your browser. Simply click the _launch binder_ icon in the upper left corner.
If you want to get in touch with us, please email to swc-workshop-org@gfz-potsdam.de.
_If you are here because you are looking for a particular Python workshop you attended in the past, make sure you visit the appropriate branch of this repository (note that the branches are ordered by date)._
***
The workshop focuses on three foundational tools for scientific computing and reproducible research:
* the shell
* git for version control
* scientific computing with Python
## the shell
In this workshop we closely follow the Software Carpentry lesson [The Unix Shell](https://swcarpentry.github.io/shell-novice/).
## git for version control
In this workshop we closely follow the Software Carpentry lesson [Version Control with Git](https://swcarpentry.github.io/git-novice/).
## scientific computing with Python
In this workshop we teach four main aspects of scientific computing with Python.
* 01 - Introduction to Python
* 02 - Functions and Code Structures
* 03 - Defensive Programming
* 04 - Exploratory Data Analysis
All data sets, all code snippets, all [Jupyter](http://jupyter.org/) notebooks and the `environment.yml` file for reproducibility are available through this self contained repository.
The structure of this repository is outlined below:
4learners
│.git # git internals
│.gitignore # specify files/folders to be ignored by git
└───data
│ │... # find all the raw data files
└───figures
│ │... # saved figures go here
└───notebooks
│ └───_img
│ │ │... # rendered images are placed here
│ │... # find all Jupyter notebooks here
│README.md
│environment.yml # conda environment specifications for reproducibility
└───src
│... # here go the code snippets and scripts
└───_solutions
│... # solutions for coding challenges (don't cheat yourself ;-))
_Note that we are currently working on lessons to present our curriculum in a more generic form, hence, come back once in a while and check the [master branch](https://github.com/swc-bb/4learners_python) for updates._
***
This source diff could not be displayed because it is too large. You can view the blob instead.
name: python-workshop
channels:
- conda-forge
- defaults
dependencies:
- python=3.6
- numpy
- pandas
- matplotlib
- ipython
- xarray
- cartopy
- geopandas
- folium
- geopy
This diff is collapsed.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Interpreting Errors and Exceptions\n",
"**Every programmer encounters errors**. Errors and exceptions can be very frustrating at times, and can make coding feel like a hopeless endeavour. However, interpreting the different types of errors are and when you are likely to encounter them can help a lot.\n",
"\n",
"Errors in Python are thrown in a very specific form, called a _**traceback**_. Let’s examine one:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This code has an intentional error. You can type it directly or\n",
"# use it for reference to understand the error message below.\n",
"def favorite_ice_cream():\n",
" ice_creams = [\n",
" \"chocolate\",\n",
" \"vanilla\",\n",
" \"strawberry\"\n",
" ]\n",
" print(ice_creams[3])\n",
"\n",
"favorite_ice_cream()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SyntaxError\n",
"When you forget a colon at the end of a line, accidentally add one space too many when indenting under an if statement, or forget a parenthesis, you will encounter a syntax error. This means that Python couldn’t figure out how to read your program."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def some_function()\n",
" msg = \"hello, world!\"\n",
" print(msg)\n",
" return msg"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"favorite_fruits = [\n",
" \"apples\",\n",
" \"coconut\",\n",
" \"banana\"\n",
" 42\n",
"]\n",
"print(favorite_fruits)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NameError\n",
"Another very common type of error is called a `NameError`, and occurs when one tries to use a variable that does \n",
"not exist. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The second is that you just forgot to create the variable before using it. In the following example, `count` should have been defined (e.g., with `count = 0`) before the for loop. `count` cannot leave the `for`-loops scope."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for number in range(10):\n",
" count = count + number\n",
"print(\"The count is:\", count)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Count = 0\n",
"for number in range(10):\n",
" count = count + number\n",
"print(\"The count is:\", count)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## IOError\n",
"These exceptions are thrown when Input Output is corrupted. A common exception is the `FileNotFoundError`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file_handle = open('sponge.bob')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading Exceptions\n",
"Sometimes exceptions throw a **long traceback. This happens when the call-stack has been very deep**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This code has an intentional error. Do not type it directly;\n",
"# use it for reference to understand the error message below.\n",
"def print_message(day):\n",
" messages = {\n",
" \"monday\": \"Hello, world!\",\n",
" \"tuesday\": \"Today is tuesday!\",\n",
" \"wednesday\": \"It is the middle of the week.\",\n",
" \"thursday\": \"Today is Donnerstag in German!\",\n",
" \"friday\": \"Last day of the week!\",\n",
" \"saturday\": \"Hooray for the weekend!\",\n",
" \"sunday\": \"Aw, the weekend is almost over.\"\n",
" }\n",
" print(messages[day])\n",
"\n",
"def print_friday_message():\n",
" print_message(\"Friday\")\n",
"\n",
"print_friday_message()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Catch Exceptions!\n",
"Sometimes you **expect exceptions** to be thrown. Then it is useful to catch and handle those."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating `Exceptions`\n",
"Having meaningful exceptions can also be expressful."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise: Create and catch your own Exception\n",
"Create and catch your own exception."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"# Key Points\n",
"* Tracebacks can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was.\n",
"* An error having to do with the ‘grammar’ or syntax of the program is called a `SyntaxError`. If the issue has to do with how the code is indented, then it will be called an `IndentationError`.\n",
"* A `NameError` will occur if you use a variable that has not been defined, either because you meant to use quotes around a string, you forgot to define the variable, or you just made a typo.\n",
"* Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an `IndexError`.\n",
"* Trying to read a file that does not exist will give you an `FileNotFoundError`. Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an `IOError`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Assertions and Defensive Programming\n",
"The first step toward getting the right answers from our programs is to assume that mistakes will happen and to guard against them. This is called defensive programming, and the most common way to do it is to add assertions to our code so that it checks itself as it runs. **An assertion is simply a statement that something must be true at a certain point in a program.**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"ename": "AssertionError",
"evalue": "Data should only contain positive values",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-92fe4b03414b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mtotal\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mnumbers\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mn\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Data should only contain positive values'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mtotal\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'total is:'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtotal\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAssertionError\u001b[0m: Data should only contain positive values"
]
}
],
"source": [
"numbers = [1.5, 2.3, 0.7, -0.001, 4.4]\n",
"total = 0.0\n",
"\n",
"# Sum all numbers\n",
"for n in numbers:\n",
" assert n > 0.0, 'Data should only contain positive values'\n",
" total += n\n",
"\n",
"print('Sum is:', total)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80-90% are working correctly. Broadly speaking, assertions fall into three categories:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assertions in Applications\n",
"Now we create a function `scale_rectangle` and we harden it with assertions to make it fool proof."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# missing the fourth coordinate"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# X axis inverted"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test-Driven Development\n",
"The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input.\n",
"\n",
"### Example\n",
"Suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:\n",
"\n",
"![range](http://swcarpentry.github.io/python-novice-inflammation/fig/python-overlapping-ranges.svg)\n",
"\n",
"We will write the function `range_overlap`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"One can solve this problem like this:\n",
"\n",
"Write a function `range_overlap`:\n",
"1. Call it interactively on two or three different inputs.\n",
"2. If it produces the wrong answer, fix the function and re-run that test.\n",
"3. This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way:\n",
"\n",
"#### But: First we write the test!\n",
"\n",
"1. Write some tests\n",
"2. Realize a `range_overlap` function that should pass those tests.\n",
"3. If `range_overlap` produces any wrong answers, fix it and re-run the test functions.\n",
"\n",
"Writing the tests before writing the function they exercise is called test-driven development (TDD). Its advocates believe it produces better code faster because:\n",
"\n",
"If one writes tests after writing the thing to be tested, we are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.\n",
"Writing tests first helps to figure out what the function is actually supposed to do."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# write test_overlap()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"# write range_overlap()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# test_range_overlap()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first test that was supposed to produce `None` fails, so we know something is wrong with our function. We don’t know whether the other tests passed or failed because Python halted the program as soon as it spotted the first error.\n",
"\n",
"We realize that we’re initializing `lowest` and `highest` to 0.0 and 1.0 respectively, regardless of the input values. This violates another important rule of programming: **always initialize from data**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Excercise:\n",
"Fix `range_overlap` and re-run `test_range_overlap`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"# Key Points\n",
"* **Program defensively**, i.e., assume that errors are going to arise, and write code to detect them when they do.\n",
"* **Put assertions in programs to check their state** as they run, and to help readers understand how those programs are supposed to work.\n",
"* Writing tests before writing code can be useful."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Command-Line Programms\n",
"The **Jupyter Notebook and other interactive tools are great for prototyping code and exploring data**. But sooner or later we will want to use our program in a pipeline to process thousands of data files. In order to do that, we need to make our programs work like other Unix command-line tools.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Switching to Shell Commands\n",
"\n",
"In this lesson we are switching from typing commands in a Python interpreter to typing commands in a shell terminal window (such as bash). When you see a `$` in front of a command that tells you to run that command in the shell rather than the Python interpreter."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"My command-line arguments: ['/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py', '-f', '/run/user/17802/jupyter/kernel-c828dcaa-1827-45c0-879c-975513e66be6.json']\n"
]
}
],
"source": [
"# %load ../src/pycli-1.py\n",
"#!/usr/bin/env python3\n",
"# valid for UNIX system\n",
"import sys\n",
"print('My command-line arguments:', sys.argv)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The strange name `argv` stands for “argument values”. Whenever Python runs a program, it takes all of the values given on the command line and puts them in the list `sys.argv` so that the program can determine what they were.\n",
"\n",
"### Implementation of a call guardian\n",
"\n",
"This is useful if your code **also acts as an importable module**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []