cells:
- markdown: |
# More plotting and elementary stats
## A quick exercise
metadata:
slideshow:
slide_type: slide
- code: |
# Start with this.
%pylab inline
id: 0
metadata:
slideshow:
slide_type: slide
- markdown: |
## Simple plotting
- Plot $x, -x, \sin(x), x \sin(x)$ in range $-5\pi$ to $5\pi$
- Add a legend
- Annotate the origin
- Set axes limits to the range of x
metadata:
slideshow:
slide_type: slide
- code: |
# Solution
id: 1
metadata:
slideshow:
slide_type: slide
- markdown: |
## Mean, standard deviation, percentiles, ...
metadata:
slideshow:
slide_type: slide
- code: |
x = np.array([3, 2, 1, 4, 4, 5, 15, 24, 22, 25, 18, 32, 33])
- code: |
np.mean(x)
- code: |
np.median(x)
- markdown: |
## Variance, standard deviation and degrees of freedom
$$S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$
metadata:
slideshow:
slide_type: slide
- code: |
np.var(x, ddof=1)
- markdown: |
`ddof=1` corresponds to $n-1$ in the denominator.
- code: |
np.std(x, ddof=1)
- markdown: |
## Percentiles
metadata:
slideshow:
slide_type: slide
- code: |
np.percentile(x, 34)
id: 3
- code: |
np.percentile(x, [25, 50, 75, 34])
id: 4
- code: |
sorted(x)
id: 5
metadata:
slideshow:
slide_type: slide
- markdown: |
## More statistics from `scipy.stats`
Use the `scipy.stats` module for more stats related functions
metadata:
slideshow:
slide_type: slide
- code: |
import scipy.stats
- code: |
scipy.stats?
- code: |
scipy.stats.mode(x)
- code: |
scipy.stats.gmean(x)
- code: |
scipy.stats.hmean(x)
- markdown: |
## Load the given `data.txt` file
- Load up the data file into numpy arrays.
- Plot the two columns one vs the other with points.
- Find the mean and standard deviation of the columns
metadata:
slideshow:
slide_type: slide
- code: |
x, y = loadtxt('data/data.txt', unpack=True)
- code: |
# Find mean and standard deviation
- code: |
# Your plotting code here...
- markdown: |
## More plotting functions
We explore more plotting functions below
metadata:
slideshow:
slide_type: slide
- code: |
scatter(x, y, s=x, c=y);
id: 7
metadata:
slideshow:
slide_type: slide
- code: |
# Histogram
hist(y);
id: 8
metadata:
slideshow:
slide_type: slide
- code: |
hist(y, cumulative=True);
id: 9
- code: |
# Boxplot
boxplot(y, showmeans=True);
grid()
axis('tight');
id: 10
metadata:
slideshow:
slide_type: slide
- markdown: |
## Simple image processing
- Load the image.
- Show the image.
- Drop every alternate pixel to reduce the size of the image
- Crop the picture to only show the baby penguin.
metadata:
slideshow:
slide_type: slide
- code: |
imshow(img[:,:,3], cmap='gray');
colorbar();
id: 11
- code: |
img = imread('images/penguins.png')
figure(figsize=(10, 5))
subplot(2, 2, 1)
imshow(img)
subplot(2, 2, 2)
imshow(img[::2,::2])
subplot(2, 2, 3)
imshow(img[225:,100:250])
subplot(2, 2, 4)
imshow(img[:,:,1])
id: 12
metadata:
slideshow:
slide_type: slide
- markdown: |
Use a for loop to plot the 3 channels of the image in a sub-plot,
i.e. r, g, b color channels.
metadata:
slideshow:
slide_type: slide
- code: |
# Solution.
- markdown: |
## Histogram of image pixel data
- Changing the dimensions of a numpy array.
- Making a 2D array into a 1D array.
metadata:
slideshow:
slide_type: slide
- code: |
a = np.arange(9)
a.shape = (3, 3)
a.ravel()
id: 13
- code: |
np.ravel(img[:, :, 0]).shape
id: 14
- code: |
# Doing more
#print(numpy.ravel(img[:,:,0]).shape)
hist(numpy.ravel(img[:,:,3]));
id: 15
metadata:
slideshow:
slide_type: slide
- code: |
# Putting it together
print(numpy.ravel(img[:,:,0]).shape)
for color in [0, 1, 2, 3]:
subplot(2, 2, color + 1)
hist(numpy.ravel(img[:,:,color]), bins=40, density=True, edgecolor='b');
id: 17
metadata:
slideshow:
slide_type: slide
- markdown: |
## Pie charts
| **Cancer** | Lung | Breast | Colon | Prostate | Melanoma | Bladder |
|-------------|------|--------|-------|----------|----------|---------|
| **Numbers** | 42 | 50 | 32 | 55 | 9 | 12 |
metadata:
slideshow:
slide_type: slide
- code: |
# Solution
cancer = ['Lung', 'Breast', 'Colon', 'Prostate', 'Melanoma', 'Bladder']
numbers = [42, 50, 32, 55, 9, 12]
pie(numbers, labels=cancer);
id: 18
metadata:
slideshow:
slide_type: slide
- markdown: |
## More statistics
- Covariance, correlation
- Linear regression
metadata:
slideshow:
slide_type: slide
- markdown: |
## Covariance/correlation
$$Cov(X, Y) = E[(X-\mu_x)(Y - \mu_y)] = E[XY] - E[X]E[Y]$$
$$Corr(X, Y) = Cov(X, Y)/\sqrt{Var(X) Var(Y)}$$
metadata:
slideshow:
slide_type: slide
- code: |
# Loading some data
x, y = loadtxt('data/data.txt', unpack=True)
metadata:
slideshow:
slide_type: slide
- code: |
np.cov(x, y)
- code: |
np.corrcoef(x, y)
- markdown: |
## Trying with other data
- Find the correlation coefficient and covariance
- Plot the data.
metadata:
slideshow:
slide_type: slide
- code: |
x1 = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y1 = np.array([30, 33, 53, 73, 78, 85, 91, 92, 100, 120])
- code: |
# Your solution.
- markdown: |
## Simple linear regression
- Already seen how to do least-square fits with numpy
- Simpler way with `scipy.stats.linregress`
- Can also use `np.polyfit`
metadata:
slideshow:
slide_type: slide
- code: |
scipy.stats.linregress(x1, y1)
- code: |
# Result is a named tuple (collections.namedtuple)!
- code: |
res = scipy.stats.linregress(x1, y1)
metadata:
slideshow:
slide_type: slide
- code: |
res
- code: |
res[0]
- code: |
res.slope
- code: |
# Using np.polyfit
np.polyfit(x1, y1, deg=1)
metadata:
slideshow:
slide_type: slide
- code: |
np.polyfit?
- markdown: |
* Use `deg=1` for linear regression.
* Use `deg=2` for quadratic functions.
* ...
# The lines below here may be deleted if you do not need them.
# ---------------------------------------------------------------------------
metadata:
celltoolbar: Slideshow
kernelspec:
display_name: Python 3
language: python
name: python3
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.6.0
rise:
scroll: true
transition: none
nbformat: 4
nbformat_minor: 2