summaryrefslogtreecommitdiff
path: root/data_analysis/01_intro.ipyml
blob: ec5981ef7ca87ad24c877bf7ef7b39bce3847b4c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
cells:

- markdown: |
    # Introduction to Data Analysis with Python

    ### Prabhu Ramachandran
    ### The FOSSEE Python group &
    ### Department of Aerospace Engineering
    ### IIT Bombay

  metadata:
    slideshow:
      slide_type: slide

- markdown: |
    ## Introduction

    - A world of data!

    - Can we use data to drive decisions and form opinions?

  metadata:
    slideshow:
      slide_type: slide


- markdown: |
    ## Real data is not perfect

    - Partial information
    - Uncertainty
    - Errors
    <br/>

  metadata:
    slideshow:
      slide_type: subslide

- markdown: |

    - Important to check and clean data

  metadata:
    slideshow:
      slide_type: fragment

- markdown: |
    ## Statistical approach

    - Data collection

    <br/>

  metadata:
    slideshow:
      slide_type: subslide

- markdown: |

    - Visualization
    - Inference
    - Modeling
    - Prediction

  metadata:
    slideshow:
      slide_type: fragment

- markdown: |
    ## Importance of computers

    - Datasets are large
    - Easy to process on the computer
    - Simulation!

  metadata:
    slideshow:
      slide_type: subslide

- markdown: |
    ## This course

    - Use Python for data analysis
    - Exposes you to the basic tools available
    - Does not teach you statistics!
    - Will point out resources for this

  metadata:
    slideshow:
      slide_type: slide


- markdown: |
    ## Pre-requisites

    - Basic Python programming
    - `numpy`
    - Python 3.x, `Jupyter, scipy, matplotlib, pandas, statsmodels`

    - Mathematics (12th grade)
    - Introduction to statistics


  metadata:
    slideshow:
      slide_type: slide

- markdown: |
    ## Tools and Topics

    - Simple statistics with `numpy`
    - Statistical plots with `matplotlib`
    - Random variables with `scipy.stats`
    - Using `pandas` for data ingestion and analysis
    - Introduction to `statsmodel` for regression


  metadata:
    slideshow:
      slide_type: slide

- markdown: |
    ## Resources for learning

    - [Khan Academy Statistics and Probability](https://www.khanacademy.org/math/statistics-probability)
    - [Concrete introduction to Probability](http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability.ipynb) by Peter Norvig

    - [Penn State Stat 414 course](https://onlinecourses.science.psu.edu/stat414)

    - [Computational and Inferential Thinking](https://www.inferentialthinking.com/) by Ani Adhikari and John De Nero

    - [Think Stats2](http://greenteapress.com/wp/think-stats-2e/) by Allen B. Downey

  metadata:
    slideshow:
      slide_type: slide



- markdown: |
    ## Summary

    - Introduction to data analysis
    - Pre-requisites for this course
    - Tools covered
    - Resources for statistics and probability

  metadata:
    slideshow:
      slide_type: slide