Abstract: As the deep learning model grows larger, training model with a single computational resource becomes impractical. To solve this, hybrid parallelism, which combines data and pipeline ...