summaryrefslogtreecommitdiff
path: root/data_analysis/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'data_analysis/README.md')
-rw-r--r--data_analysis/README.md67
1 files changed, 67 insertions, 0 deletions
diff --git a/data_analysis/README.md b/data_analysis/README.md
new file mode 100644
index 0000000..a00b95d
--- /dev/null
+++ b/data_analysis/README.md
@@ -0,0 +1,67 @@
+# Introduction to data analysis with Python
+
+This material covers a short course on using Python for data analysis.
+
+The material assumes that the student is aware of basic mathematics and
+statistics. While doing statistical analysis it always helps to know
+statistics fairly well. We will attempt to provide some links to freely
+available material that covers some of these basics.
+
+An excellent book on doing statistical analysis with Python is Allen Downey's
+Think Stats book which is freely available. The material is not a traditional
+approach to statistics but will get you thinking for sure.
+
+The emphasis of this course is to expose the student to the various libraries
+and tools available in Python so they can embark on their own data analysis.
+There is a lot of material already available. We will attempt to provide the
+attendees links to some useful material.
+
+## Pre-requisites
+
+- Students should have completed the basic Python programming material.
+- One should have a Python 3.x installation with the following packages:
+ - IPython, scipy, matplotlib
+ - pandas, statsmodels
+- Use a reasonable editor, Canopy will work.
+- If one desires a more advanced editor, I suggest VS Code
+ (https://code.visualstudio.com/) which is free, open source, and very
+ powerful.
+- Knowledge of basic statistics.
+
+## Contents
+
+* Introduction
+
+* Simple statistics with `numpy`
+ * Basic stats functions, mean, std etc.
+ * Percentiles
+ * Random numbers: normal, random, choice, shuffle
+
+* Statistical plots
+ * hist
+ * boxplot
+ * scatter
+ * pie chart
+
+* Using `scipy.stats`
+ * pdf
+ * cdf
+ * rvs
+
+* Using `pandas`
+ * Quick introduction
+ * Categorical vs numerical data
+ * Data frames
+ * Basic operations
+ * String operations
+ * simple plots
+ * Groupby
+ * Pivot
+ * Maps
+ * pdvega
+
+* Using `statsmodel`
+ * regression
+ * anova
+
+* Seaborn?