diff options
author | Nishanth Amuluru | 2010-10-18 21:11:34 +0530 |
---|---|---|
committer | Nishanth Amuluru | 2010-10-18 21:11:34 +0530 |
commit | 5ff51d25821b168672247194a8ba9ce5e62d88f2 (patch) | |
tree | 1f41ee7de294b91e2673b96a9a2cffa433e8e139 /statistics | |
parent | e029f60f1a5254f12b32a62e9a00947dcfc16aa5 (diff) | |
parent | 53be576ae08a5ebe40d268f34b33229739396155 (diff) | |
download | st-scripts-5ff51d25821b168672247194a8ba9ce5e62d88f2.tar.gz st-scripts-5ff51d25821b168672247194a8ba9ce5e62d88f2.tar.bz2 st-scripts-5ff51d25821b168672247194a8ba9ce5e62d88f2.zip |
Merged
Diffstat (limited to 'statistics')
-rw-r--r-- | statistics/quickref.tex | 8 | ||||
-rw-r--r-- | statistics/script.rst | 174 | ||||
-rw-r--r-- | statistics/slides.org | 33 | ||||
-rw-r--r-- | statistics/slides.tex | 106 |
4 files changed, 321 insertions, 0 deletions
diff --git a/statistics/quickref.tex b/statistics/quickref.tex new file mode 100644 index 0000000..b26d168 --- /dev/null +++ b/statistics/quickref.tex @@ -0,0 +1,8 @@ +Creating a linear array:\\ +{\ex \lstinline| x = linspace(0, 2*pi, 50)|} + +Plotting two variables:\\ +{\ex \lstinline| plot(x, sin(x))|} + +Plotting two lists of equal length x, y:\\ +{\ex \lstinline| plot(x, y)|} diff --git a/statistics/script.rst b/statistics/script.rst new file mode 100644 index 0000000..5398e21 --- /dev/null +++ b/statistics/script.rst @@ -0,0 +1,174 @@ +Hello friends and welcome to the tutorial on statistics using Python + +{{{ Show the slide containing title }}} + +{{{ Show the slide containing the outline slide }}} + +In this tutorial, we shall learn + * Doing simple statistical operations in Python + * Applying these to real world problems + +You will need Ipython with pylab running on your computer +to use this tutorial. + +Also you will need to know about loading data using loadtxt to be +able to follow the real world application. + +We will first start with the most necessary statistical +operation i.e finding mean. + +We have a list of ages of a random group of people :: + + age_list=[4,45,23,34,34,38,65,42,32,7] + +One way of getting the mean could be getting sum of +all the elements and dividing by length of the list.:: + + sum_age_list =sum(age_list) + +sum function gives us the sum of the elements.:: + + mean_using_sum=float(sum_age_list)/len(age_list) + +This obviously gives the mean age but python has another +method for getting the mean. This is the mean function:: + + mean(age_list) + +Mean can be used in more ways in case of 2 dimensional lists. +Take a two dimensional list :: + + two_dimension=[[1,5,6,8],[1,3,4,5]] + +the mean function used in default manner will give the mean of the +flattened sequence. Flattened sequence means the two lists taken +as if it was a single list of elements :: + + mean(two_dimension) + flattened_seq=[1,5,6,8,1,3,4,5] + mean(flattened_seq) + +As you can see both the results are same. The other way is mean +of each column.:: + + mean(two_dimension,0) + array([ 1. , 4. , 5. , 6.5]) + +we pass an extra argument 0 in that case. + +In case of getting mean along the rows the argument is 1:: + + mean(two_dimension,1) + array([ 5. , 3.25]) + +We can see more option of mean using :: + + mean? + +Similarly we can calculate median and stanard deviation of a list +using the functions median and std:: + + median(age_list) + std(age_list) + +Median and std can also be calculated for two dimensional arrays along columns and rows just like mean. + + For example :: + + median(two_dimension,0) + std(two_dimension,1) + +This gives us the median along the colums and standard devition along the rows. + +Now lets apply this to a real world example + +We will a data file that is at the a path +``/home/fossee/sslc2.txt``.It contains record of students and their +performance in one of the State Secondary Board Examination. It has +180, 000 lines of record. We are going to read it and process this +data. We can see the content of file by double clicking on it. It +might take some time to open since it is quite a large file. Please +don't edit the data. This file has a particular structure. + +We can do :: + + cat /home/fossee/sslc2.txt + +to check the contents of the file. + +Each line in the file is a set of 11 fields separated +by semi-colons Consider a sample line from this file. +A;015163;JOSEPH RAJ S;083;042;47;00;72;244;;; + +The following are the fields in any given line. +* Region Code which is 'A' +* Roll Number 015163 +* Name JOSEPH RAJ S +* Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 ** +Science AA (Absent) ** Social 72 +* Total marks 244 +* + +Now lets try and find the mean of English marks of all students. + +For this we do. :: + + L=loadtxt('/home/fossee/sslc2.txt',usecols=(3,),delimiter=';') + L + mean(L) + +loadtxt function loads data from an external file.Delimiter specifies +the kind of character are the fields of data seperated by. +usecols specifies the columns to be used so (3,). The 'comma' is added +because usecols is a sequence. + +To get the median marks. :: + + median(L) + +Standard deviation. :: + + std(L) + + +Now lets try and and get the mean for all the subjects :: + + L=loadtxt('/home/fossee/sslc2.txt',usecols=(3,4,5,6,7),delimiter=';') + mean(L,0) + array([ 73.55452504, 53.79828941, 62.83342759, 50.69806158, 63.17056881]) + +As we can see from the result mean(L,0). The resultant sequence +is the mean marks of all students that gave the exam for the five subjects. + +and :: + + mean(L,1) + + +is the average accumalative marks of individual students. Clearly, mean(L,0) +was a row wise calcultaion while mean(L,1) was a column wise calculation. + + +{{{ Show summary slide }}} + +This brings us to the end of the tutorial. +we have learnt + + * How to do the standard statistical operations sum , mean + median and standard deviation in Python. + * Combine text loading and the statistical operation to solve + real world problems. + +{{{ Show the "sponsored by FOSSEE" slide }}} + + +This tutorial was created as a part of FOSSEE project, NME ICT, MHRD India + +Hope you have enjoyed and found it useful. +Thankyou + +.. Author : Amit Sethi + Internal Reviewer 1 : + Internal Reviewer 2 : + External Reviewer : + diff --git a/statistics/slides.org b/statistics/slides.org new file mode 100644 index 0000000..d4a5548 --- /dev/null +++ b/statistics/slides.org @@ -0,0 +1,33 @@ +#+LaTeX_CLASS: beamer +#+LaTeX_CLASS_OPTIONS: [presentation] +#+BEAMER_FRAME_LEVEL: 1 + +#+BEAMER_HEADER_EXTRA: \usetheme{Warsaw}\useoutertheme{infolines}\usecolortheme{default}\setbeamercovered{transparent} +#+COLUMNS: %45ITEM %10BEAMER_env(Env) %10BEAMER_envargs(Env Args) %4BEAMER_col(Col) %8BEAMER_extra(Extra) +#+PROPERTY: BEAMER_col_ALL 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 :ETC +#+OPTIONS: H:5 num:t toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t <:t + +#+TITLE: Statistics +#+AUTHOR: FOSSEE +#+DATE: 2010-09-14 Tue +#+EMAIL: info@fossee.in + +# \author[FOSSEE] {FOSSEE} + +# \institute[IIT Bombay] {Department of Aerospace Engineering\\IIT Bombay} +# \date{} + +* Tutorial Plan +** Doing simple statistical operations in Python +** Using loadtxt to solve statistics problem + +* Summary +** seq=[1,5,6,8,1,3,4,5] +** sum(seq) +** mean(seq) +** median(seq) +** std(seq) + +* Summary + +** loadtxt diff --git a/statistics/slides.tex b/statistics/slides.tex new file mode 100644 index 0000000..df1462c --- /dev/null +++ b/statistics/slides.tex @@ -0,0 +1,106 @@ +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%Tutorial slides on Python. +% +% Author: FOSSEE +% Copyright (c) 2009, FOSSEE, IIT Bombay +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\documentclass[14pt,compress]{beamer} +%\documentclass[draft]{beamer} +%\documentclass[compress,handout]{beamer} +%\usepackage{pgfpages} +%\pgfpagesuselayout{2 on 1}[a4paper,border shrink=5mm] + +% Modified from: generic-ornate-15min-45min.de.tex +\mode<presentation> +{ + \usetheme{Warsaw} + \useoutertheme{infolines} + \setbeamercovered{transparent} +} + +\usepackage[english]{babel} +\usepackage[latin1]{inputenc} +%\usepackage{times} +\usepackage[T1]{fontenc} + +\usepackage{ae,aecompl} +\usepackage{mathpazo,courier,euler} +\usepackage[scaled=.95]{helvet} + +\definecolor{darkgreen}{rgb}{0,0.5,0} + +\usepackage{listings} +\lstset{language=Python, + basicstyle=\ttfamily\bfseries, + commentstyle=\color{red}\itshape, + stringstyle=\color{darkgreen}, + showstringspaces=false, + keywordstyle=\color{blue}\bfseries} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% Macros +\setbeamercolor{emphbar}{bg=blue!20, fg=black} +\newcommand{\emphbar}[1] +{\begin{beamercolorbox}[rounded=true]{emphbar} + {#1} + \end{beamercolorbox} +} +\newcounter{time} +\setcounter{time}{0} +\newcommand{\inctime}[1]{\addtocounter{time}{#1}{\tiny \thetime\ m}} + +\newcommand{\typ}[1]{\lstinline{#1}} + +\newcommand{\kwrd}[1]{ \texttt{\textbf{\color{blue}{#1}}} } + +% Title page +\title{Your Title Here} + +\author[FOSSEE] {FOSSEE} + +\institute[IIT Bombay] {Department of Aerospace Engineering\\IIT Bombay} +\date{} + +% DOCUMENT STARTS +\begin{document} + +\begin{frame} + \maketitle +\end{frame} + +\begin{frame}[fragile] + \frametitle{Outline} + \begin{itemize} + \item + \end{itemize} +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% All other slides here. %% +%% The same slides will be used in a classroom setting. %% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{frame}[fragile] + \frametitle{Summary} + \begin{itemize} + \item + \end{itemize} +\end{frame} + +\begin{frame} + \frametitle{Thank you!} + \begin{block}{} + \begin{center} + This spoken tutorial has been produced by the + \textcolor{blue}{FOSSEE} team, which is funded by the + \end{center} + \begin{center} + \textcolor{blue}{National Mission on Education through \\ + Information \& Communication Technology \\ + MHRD, Govt. of India}. + \end{center} + \end{block} +\end{frame} + +\end{document} |