[행사] [2023.04.27(Thu.)]Artificial Intelligence & AI Convergence Network Colloquium
- 소프트웨어융합대학교학팀
- 김성민
- Create Date 2023-04-19
- Views 494
< Artificial Intelligence & AI Convergence Network Colloquium >
![▶](https://fonts.gstatic.com/s/e/notoemoji/15.0/25b6/32.png)
![▶](https://fonts.gstatic.com/s/e/notoemoji/15.0/25b6/32.png)
![▶](https://fonts.gstatic.com/s/e/notoemoji/15.0/25b6/32.png)
![▶](https://fonts.gstatic.com/s/e/notoemoji/15.0/25b6/32.png)
![▶](https://fonts.gstatic.com/s/e/notoemoji/15.0/25b6/32.png)
acceleration. While they improve the performance, GPUs are underutilized during the training. This paper proposes
out-of-order (ooo) back-prop, an effective scheduling technique for neural network training. By exploiting the
dependencies of gradient computations, ooo backprop enables to reorder their executions to make the most of
the GPU resources. We show that the GPU utilization in single- and multi-GPU training can be commonly
improved by applying ooo backprop and prioritizing critical operations. We propose three scheduling
algorithms based on ooo backprop. For single-GPU training, we schedule with multi-stream ooo computation
to mask the kernel launch overhead. In data-parallel training, we reorder the gradient computations to
maximize the overlapping of computation and parameter communication; in pipeline-parallel training, we
prioritize critical gradient computations to reduce the pipeline stalls. We evaluate our optimizations with twelve
neural networks and five public datasets. Compared to the respective state of the art training systems, our
algorithms improve the training throughput by 1.03--1.58× for single-GPU training, by 1.10--1.27× for data-
parallel training, and by 1.41--1.99× for pipeline-parallel training.
![▶](https://fonts.gstatic.com/s/e/notoemoji/15.0/25b6/32.png)
systems and big data systems.