About the author
Young-Jun Ko
Young-Jun is an AI DevTech Engineer at NVIDIA currently working on accelerating NLP inference workloads on GPUs. Previously, he worked on HPC and AI and contributed to the RAPIDS open-source project. Before joining NVIDIA, Young-Jun received a PhD in computer science from EPFL and worked as a Machine Learning engineer at an adtech startup.
Young-Jun Ko
Post by Young-Jun Ko
Figure 6 Compute latency
By Purnendu Mukherjee, Eddie Weill, Rohit Taneja, Davide Onofrio, Young-Jun Ko and Siddharth Sharma |