# Introduction to data analysis with Python This material covers a short course on using Python for data analysis. The material assumes that the student is aware of basic mathematics and statistics. While doing statistical analysis it always helps to know statistics fairly well. We will attempt to provide some links to freely available material that covers some of these basics. An excellent book on doing statistical analysis with Python is Allen Downey's Think Stats book which is freely available. The material is not a traditional approach to statistics but will get you thinking for sure. The emphasis of this course is to expose the student to the various libraries and tools available in Python so they can embark on their own data analysis. There is a lot of material already available. We will attempt to provide the attendees links to some useful material. ## Pre-requisites - Students should have completed the basic Python programming material. - One should have a Python 3.x installation with the following packages: - IPython, scipy, matplotlib - pandas, statsmodels - If one desires a more advanced editor, I suggest VS Code (https://code.visualstudio.com/) which is free, open source, and very powerful. - Knowledge of basic statistics. ## Contents * Introduction * Simple statistics with `numpy` * Basic stats functions, mean, std etc. * Percentiles * Random numbers: normal, random, choice, shuffle * Statistical plots * hist * boxplot * scatter * pie chart * Using `scipy.stats` * pdf * cdf * rvs * Using `pandas` * Quick introduction * Categorical vs numerical data * Data frames * Basic operations * String operations * simple plots * Groupby * Pivot * Maps * pdvega * Using `statsmodel` * regression * anova * Seaborn?