Bill Dally’s invited talk at Cornell. Bill is a senior big name in accelerator/CompArch since 1985. The talk primarily focuses on Nvidia’s efforts to enhance GPU performance, starting from Kepler in the early 2000s and extending to the recent Hopper H200, which includes improvements in low-precision and sparsity support. In addition to the technical aspects, an intriguing comment he made about AI caught my attention. It directly echoes Rich Sutton’s “Bitter Lessons” blog, emphasizing that AI model architectures remain simple, much like they were 50 years ago; the only significant change lies in the increased compute power at our disposal, making it possible for us to stack deeper layers. Modern hardware enables LLM-like Bayesian-based AI models to more effectively approximate the true distribution of the world.