Intro to Data Analysis in Python
PyLadies Vancouver Workshop
Skill Level: Beginner
A little bit of previous experience with Python or another coding language would be helpful, but not required!
Description
In this workshop, you will develop skills with powerful data analysis tools from Python’s rich ecosystem of libraries. If you’re wrestling with spreadsheets on a regular basis and want to find better ways to analyze and visualize your data, handle messy and missing data, and automate repetitive tasks, this workshop is for you. If you’re coming from another data analysis environment (R, Matlab, etc.) and/or you’re a Python enthusiast who is curious about Python’s data analysis capabilities, this workshop is also for you!
Working with real-world data and the Pandas library, you’ll learn how to load data from a comma-separated values (csv) file, quickly summarize it from many different angles, and visualize it in graphs—all with just a few lines of code. You’ll also learn how to dive into the data for a deeper analysis with techniques such as subsets, filters, text processing, and aggregation.
Setup
You’ll want to bring your laptop for lots of hands-on practice as we work through the lessons and exercises. We’ll be using Python 3.6, Jupyter Lab, and pandas
. I highly recommend using Anaconda to set up your environment, especially if you’re new to Python and/or data analysis is your main reason for using Python.
Please make sure to download and install the required software on your laptop prior to the workshop (click here for setup instructions).
Agenda
- Navigating the Python world as a data geek
- Jupyter Lab orientation + quick recap of Python basics
- Working with spreadsheet data
- Reading and summarizing CSV files
- Basic calculations and graphs
- Text data and messy / missing data
- Sorting, aggregation, and subsets
- Data visualization: a brief tour of the Python landscape
- Next steps, ideas, and inspiration
Credits: Some portions of this workshop are adapted from or inspired by the fantastic instructional materials in Data Insights with Python for Beginners (Copyright © Ladies Learning Code) and Python for Ecologists (Copyright © Software Carpentry), both made available under the Creative Commons Attribution license.