cells: - markdown: | # More plotting and elementary stats ## A quick exercise metadata: slideshow: slide_type: slide - code: | # Start with this. %pylab inline id: 0 metadata: slideshow: slide_type: slide - markdown: | ## Simple plotting - Plot $x, -x, \sin(x), x \sin(x)$ in range $-5\pi$ to $5\pi$ - Add a legend - Annotate the origin - Set axes limits to the range of x Sample plot to reproduce metadata: slideshow: slide_type: slide - code: | # Solution id: 1 metadata: slideshow: slide_type: slide - markdown: | ## Mean, standard deviation, percentiles, ... metadata: slideshow: slide_type: slide - code: | x = np.array([3, 2, 1, 4, 4, 5, 15, 24, 22, 25, 18, 32, 33]) - code: | np.mean(x) - code: | np.median(x) - markdown: | ## Variance, standard deviation and degrees of freedom $$S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$$ metadata: slideshow: slide_type: slide - code: | np.var(x, ddof=1) - markdown: | `ddof=1` corresponds to $n-1$ in the denominator. - code: | np.std(x, ddof=1) - markdown: | ## Percentiles metadata: slideshow: slide_type: slide - code: | np.percentile(x, 34) id: 3 - code: | np.percentile(x, [25, 50, 75, 34]) id: 4 - code: | sorted(x) id: 5 metadata: slideshow: slide_type: slide - markdown: | ## More statistics from `scipy.stats` Use the `scipy.stats` module for more stats related functions metadata: slideshow: slide_type: slide - code: | import scipy.stats - code: | scipy.stats? - code: | scipy.stats.mode(x) - code: | scipy.stats.gmean(x) - code: | scipy.stats.hmean(x) - markdown: | ## Load the given `data.txt` file - Load up the data file into numpy arrays. - Plot the two columns one vs the other with points. - Find the mean and standard deviation of the columns metadata: slideshow: slide_type: slide - code: | x, y = loadtxt('data/data.txt', unpack=True) - code: | # Find mean and standard deviation - code: | # Your plotting code here... - markdown: | ## More plotting functions We explore more plotting functions below metadata: slideshow: slide_type: slide - code: | scatter(x, y, s=x, c=y); id: 7 metadata: slideshow: slide_type: slide - code: | # Histogram hist(y); id: 8 metadata: slideshow: slide_type: slide - code: | hist(y, cumulative=True); id: 9 - code: | # Boxplot boxplot(y, showmeans=True); grid() axis('tight'); id: 10 metadata: slideshow: slide_type: slide - markdown: | ## Simple image processing - Load the image. - Show the image. - Drop every alternate pixel to reduce the size of the image - Crop the picture to only show the baby penguin. Sample plot metadata: slideshow: slide_type: slide - code: | imshow(img[:,:,3], cmap='gray'); colorbar(); id: 11 - code: | img = imread('images/penguins.png') figure(figsize=(10, 5)) subplot(2, 2, 1) imshow(img) subplot(2, 2, 2) imshow(img[::2,::2]) subplot(2, 2, 3) imshow(img[225:,100:250]) subplot(2, 2, 4) imshow(img[:,:,1]) id: 12 metadata: slideshow: slide_type: slide - markdown: | Use a for loop to plot the 3 channels of the image in a sub-plot, i.e. r, g, b color channels. metadata: slideshow: slide_type: slide - code: | # Solution. - markdown: | ## Histogram of image pixel data - Changing the dimensions of a numpy array. - Making a 2D array into a 1D array. metadata: slideshow: slide_type: slide - code: | a = np.arange(9) a.shape = (3, 3) a.ravel() id: 13 - code: | np.ravel(img[:, :, 0]).shape id: 14 - code: | # Doing more #print(numpy.ravel(img[:,:,0]).shape) hist(numpy.ravel(img[:,:,3])); id: 15 metadata: slideshow: slide_type: slide - code: | # Putting it together print(numpy.ravel(img[:,:,0]).shape) for color in [0, 1, 2, 3]: subplot(2, 2, color + 1) hist(numpy.ravel(img[:,:,color]), bins=40, density=True, edgecolor='b'); id: 17 metadata: slideshow: slide_type: slide - markdown: | ## Pie charts | **Cancer** | Lung | Breast | Colon | Prostate | Melanoma | Bladder | |-------------|------|--------|-------|----------|----------|---------| | **Numbers** | 42 | 50 | 32 | 55 | 9 | 12 | metadata: slideshow: slide_type: slide - code: | # Solution cancer = ['Lung', 'Breast', 'Colon', 'Prostate', 'Melanoma', 'Bladder'] numbers = [42, 50, 32, 55, 9, 12] pie(numbers, labels=cancer); id: 18 metadata: slideshow: slide_type: slide - markdown: | ## More statistics - Covariance, correlation - Linear regression metadata: slideshow: slide_type: slide - markdown: | ## Covariance/correlation $$Cov(X, Y) = E[(X-\mu_x)(Y - \mu_y)] = E[XY] - E[X]E[Y]$$ $$Corr(X, Y) = Cov(X, Y)/\sqrt{Var(X) Var(Y)}$$ metadata: slideshow: slide_type: slide - code: | # Loading some data x, y = loadtxt('data/data.txt', unpack=True) metadata: slideshow: slide_type: slide - code: | np.cov(x, y) - code: | np.corrcoef(x, y) - markdown: | ## Trying with other data - Find the correlation coefficient and covariance - Plot the data. metadata: slideshow: slide_type: slide - code: | x1 = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) y1 = np.array([30, 33, 53, 73, 78, 85, 91, 92, 100, 120]) - code: | # Your solution. - markdown: | ## Simple linear regression - Already seen how to do least-square fits with numpy - Simpler way with `scipy.stats.linregress` - Can also use `np.polyfit` metadata: slideshow: slide_type: slide - code: | scipy.stats.linregress(x1, y1) - code: | # Result is a named tuple (collections.namedtuple)! - code: | res = scipy.stats.linregress(x1, y1) metadata: slideshow: slide_type: slide - code: | res - code: | res[0] - code: | res.slope - code: | # Using np.polyfit np.polyfit(x1, y1, deg=1) metadata: slideshow: slide_type: slide - code: | np.polyfit? - markdown: | * Use `deg=1` for linear regression. * Use `deg=2` for quadratic functions. * ... # The lines below here may be deleted if you do not need them. # --------------------------------------------------------------------------- metadata: celltoolbar: Slideshow kernelspec: display_name: Python 3 language: python name: python3 language_info: codemirror_mode: name: ipython version: 3 file_extension: .py mimetype: text/x-python name: python nbconvert_exporter: python pygments_lexer: ipython3 version: 3.6.0 rise: scroll: true transition: none nbformat: 4 nbformat_minor: 2