Accelerating Python Pandas Workflows with Modin Training Course

Introduction

  • Modin vs Dask vs Ray
  • Overview of Modin features and architecture
  • Pandas fundamentals

Getting Started

  • Installing Modin
  • Importing Pandas from Modin
  • Defaulting to Pandas in Modin
  • Supported APIs

Managing Pandas workflows using Modin

  • Using Modin on a single node
  • Using Modin on a cluster
  • Connecting to a database (read_sql)
  • Optimizing resources for Modin

Interacting with Datasets

  • Reading data, dropping columns, and finding values
  • Executing advanced Pandas operations
  • Common issues and examples

Troubleshooting

Summary and Next Steps

Scaling Data Analysis with Python and Dask Training Course

Introduction

  • Overview of Dask features and advantages
  • Parallel computing in Python

Getting Started

  • Installing Dask
  • Dask libraries, components, and APIs
  • Best practices and tips

Scaling NumPy, SciPy, and Pandas

  • Dask arrays examples and use cases
  • Chunks and blocked algorithms
  • Overlapping computations
  • SciPy stats and LinearOperator
  • Numpy slicing and assignment
  • DataFrames and Pandas

Dask Internals and Graphical UI

  • Supported interfaces
  • Scheduler and diagnostics
  • Analyzing performance
  • Graph computation

Optimizing and Deploying Dask

  • Setting up adaptive deployments
  • Connecting to remote data
  • Debugging parallel programs
  • Deploying Dask clusters
  • Working with GPUs
  • Deploying Dask on cloud environments

Troubleshooting

Summary and Next Steps