Introduction
- Modin vs Dask vs Ray
- Overview of Modin features and architecture
- Pandas fundamentals
Getting Started
- Installing Modin
- Importing Pandas from Modin
- Defaulting to Pandas in Modin
- Supported APIs
Managing Pandas workflows using Modin
- Using Modin on a single node
- Using Modin on a cluster
- Connecting to a database (read_sql)
- Optimizing resources for Modin
Interacting with Datasets
- Reading data, dropping columns, and finding values
- Executing advanced Pandas operations
- Common issues and examples
Troubleshooting
Summary and Next Steps
Introduction
- Overview of Dask features and advantages
- Parallel computing in Python
Getting Started
- Installing Dask
- Dask libraries, components, and APIs
- Best practices and tips
Scaling NumPy, SciPy, and Pandas
- Dask arrays examples and use cases
- Chunks and blocked algorithms
- Overlapping computations
- SciPy stats and LinearOperator
- Numpy slicing and assignment
- DataFrames and Pandas
Dask Internals and Graphical UI
- Supported interfaces
- Scheduler and diagnostics
- Analyzing performance
- Graph computation
Optimizing and Deploying Dask
- Setting up adaptive deployments
- Connecting to remote data
- Debugging parallel programs
- Deploying Dask clusters
- Working with GPUs
- Deploying Dask on cloud environments
Troubleshooting
Summary and Next Steps