Course Overview

Intermediate Python for Data Science covers the essentials of using Python as a tool for data scientists to perform exploratory data analysis, complex visualizations, and large-scale distributed processing on “Big Data”. In this course we cover essential mathematical and statistics libraries such as NumPy, Pandas, SciPy, SciKit-Learn, TensorFlow, as well as visualization tools like matplotlib, PIL, and Seaborn.  This course is ‘intermediate level’ as it assumes that attendees have solid data analytics and data science background and have basic Python knowledge.  Topics are introductory in nature, but are covered in-depth, geared for experienced students.

Key Learning Areas

This course is approximately 50% hands-on, combining expert lecture, real-world demonstrations and group discussions with machine-based practical labs and exercises.  Our engaging instructors and mentors are highly experienced practitioners who bring years of current "on-the-job" experience into every classroom.

Working in a hands-on learning environment, guided by our expert team, attendees will learn how to:

  • How to work with Python in a Data Science Context
  • How to use NumPy, Pandas, and MatPlotLib
  • How to create and process images with PIL
  • How to visualize with Seaborn
  • Key features of SciPy and Scikit Learn

Course Outline

Python Quick Refresher

  • Python Language
  • Essential Syntax
  • Lists, Sets, Dictionaries, and Comprehensions
  • Functions
  • Classes, Modules, and imports
  • Exceptions

iPython

  • iPython basics
  • Terminal and GUI shells
  • Creating and using notebooks
  • Saving and loading notebooks
  • Ad hoc data visualization
  • Web Notebooks (Jupyter)

numpy

  • numpy basics
  • Creating arrays
  • Indexing and slicing
  • Large number sets
  • Transforming data
  • Advanced tricks

scipy

  • What can scipy do?
  • Most useful functions
  • Curve fitting
  • Modeling
  • Data visualization
  • Statistics

A Tour of scipy subpackages

  • Clustering
  • Physical and mathematical Constants
  • FFTs
  • Integral and differential solvers
  • Interpolation and smoothing
  • Input and Output
  • Linear Algebra
  • Image Processing
  • Distance Regression
  • Root-finding
  • Signal Processing
  • Sparse Matrices
  • Spatial data and algorithms
  • Statistical distributions and functions
  • C/C++ Integration

pandas

  • pandas overview
  • Dataframes
  • Reading and writing data
  • Data alignment and reshaping
  • Fancy indexing and slicing
  • Merging and joining data sets

matplotlib

  • Creating a basic plot
  • Commonly used plots
  • Ad hoc data visualization
  • Advanced usage
  • Exporting images

The Python Imaging Library (PIL)

  • PIL overview
  • Core image library
  • Image processing
  • Displaying images

seaborn

  • Seaborn overview
  • Bivariate and univariate plots
  • Visualizing Linear Regressions
  • Visualizing Data Matrices
  • Working with Time Series data

SciKit-Learn Machine Learning Essentials

  • SciKit overview
  • SciKit-Learn overview
  • Algorithms Overview
  • Classification, Regression, Clustering, and Dimensionality Reduction
  • SciKit Demo

Optional: Working with TensorFlow

  • TensorFlow overview
  • Keras
  • Getting Started with TensorFlow

Who Benefits

This course is geared for experienced data analysts, developers, engineers, or anyone tasked with utilizing Python for data analytics tasks.

Prerequisites

Attending students are required to have a background in basic Python development skills.

Take Before: Students should have attended or have incoming skills equivalent to those in the following courses:

  • Python Primer for Data Science
  • Applied Python for Data Science