Course curriculum

    1. Introduction

    2. Imports & Loading the Dataset

    3. Defining the Model

    4. Train Loop Per Worker

    5. Defining the Training Loop Configuration

    6. Configuring the Scaling Config

    7. Defining the Model Wrapper

    8. Building the Dataloader

    9. Reporting Metrics and Checkpointing

    10. Persistent Storage

    11. Putting it all together with TorchTrainer

    12. Inspecting the Training Results

    13. Inference with Your Trained Model

    14. Full Chapter Notebook

    1. Introduction

    2. Train Loop Using Ray Data

    3. Building a Ray Data-Backed Dataloader

    4. Preparing and Loading the Dataset for Ray Data

    5. Transformations with Ray Data

    6. Configuring TorchTrainer and Launching Training

    7. Full Chapter Notebook

    1. Introduction

    2. Checkpoint Loading for Fault Tolerance

    3. Saving Fault Tolerant Checkpoints

    4. Launching Fault Tolerant Training

    5. Manually Restoring from Checkpoints

    6. Cleaning up Cluster Storage and Conclusion

    7. Concluding the Intro Tutorials and Next Steps

    8. Full Chapter Notebook

About this course

  • Free
  • 29 lessons
  • 1 hour of video content

Discover your potential, starting today