diff options
Diffstat (limited to 'data_analysis/README.md')
-rw-r--r-- | data_analysis/README.md | 67 |
1 files changed, 67 insertions, 0 deletions
diff --git a/data_analysis/README.md b/data_analysis/README.md new file mode 100644 index 0000000..a00b95d --- /dev/null +++ b/data_analysis/README.md @@ -0,0 +1,67 @@ +# Introduction to data analysis with Python + +This material covers a short course on using Python for data analysis. + +The material assumes that the student is aware of basic mathematics and +statistics. While doing statistical analysis it always helps to know +statistics fairly well. We will attempt to provide some links to freely +available material that covers some of these basics. + +An excellent book on doing statistical analysis with Python is Allen Downey's +Think Stats book which is freely available. The material is not a traditional +approach to statistics but will get you thinking for sure. + +The emphasis of this course is to expose the student to the various libraries +and tools available in Python so they can embark on their own data analysis. +There is a lot of material already available. We will attempt to provide the +attendees links to some useful material. + +## Pre-requisites + +- Students should have completed the basic Python programming material. +- One should have a Python 3.x installation with the following packages: + - IPython, scipy, matplotlib + - pandas, statsmodels +- Use a reasonable editor, Canopy will work. +- If one desires a more advanced editor, I suggest VS Code + (https://code.visualstudio.com/) which is free, open source, and very + powerful. +- Knowledge of basic statistics. + +## Contents + +* Introduction + +* Simple statistics with `numpy` + * Basic stats functions, mean, std etc. + * Percentiles + * Random numbers: normal, random, choice, shuffle + +* Statistical plots + * hist + * boxplot + * scatter + * pie chart + +* Using `scipy.stats` + * pdf + * cdf + * rvs + +* Using `pandas` + * Quick introduction + * Categorical vs numerical data + * Data frames + * Basic operations + * String operations + * simple plots + * Groupby + * Pivot + * Maps + * pdvega + +* Using `statsmodel` + * regression + * anova + +* Seaborn? |