Recorded Events
134 recordingsRecordings are produced via SlidesLive and hosted on this site. They become freely available 30 days after the conference ends.
Closing Remarks
1Industry
1Industry Lightning Talks
Invited Talk
4Extreme PyTorch: Inside the Most Demanding ML Workloads—and the Open Challenges in Building AI Agents to Democratize Them
Presenter:
Soumith Chintala
An AI stack: from scaling AI workloads to evaluating LLMs
Presenter:
Ion Stoica
Hardware-aware training and inference for large-scale AI
Presenter:
Animashree Anandkumar
Responsible Finetuning of Large Language Models
Presenter:
Ling Liu
Opening Remarks
2Opening Remarks - Young Professional Symposium
Opening Remarks
Panel Discussion
1Panel Discussion
Presenters:
Manasi Joshi, Tim Dettmers, Soumith Chintala
Poster
120A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
Presenters:
Chenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Hafeez, Arif Merchant, Richard McDougall
A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers
Presenters:
Chenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Hafeez, Arif Merchant, Richard McDougall
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine
Presenters:
Carlo Siebenschuh, Kyle Hippe, Ozan Gokdemir, Alexander Brace, Arham Khan, Khalid Hossain, Yadu Babuji, Nicholas Chia, Venkatram Vishwanath, Arvind Ramanathan, Rick Stevens, Ian Foster, Robert Underwood
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine
Presenters:
Carlo Siebenschuh, Kyle Hippe, Ozan Gokdemir, Alexander Brace, Arham Khan, Khalid Hossain, Yadu Babuji, Nicholas Chia, Venkatram Vishwanath, Arvind Ramanathan, Rick Stevens, Ian Foster, Robert Underwood
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Presenters:
Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, S R
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Presenters:
Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, S R
APOLLO: SGD-like Memory, AdamW-level Performance
Presenters:
Hanqing Zhu, Zhenyu Zhang, Wenyan Cong, Xi Liu, Sem Park, Vikas Chandra, Bo Long, David Pan, Atlas Wang, Jinwon Lee
APOLLO: SGD-like Memory, AdamW-level Performance
Presenters:
Hanqing Zhu, Zhenyu Zhang, Wenyan Cong, Xi Liu, Sem Park, Vikas Chandra, Bo Long, David Pan, Atlas Wang, Jinwon Lee
Balancing Pipeline Parallelism with Vocabulary Parallelism
Presenters:
Man Tsung Yeung, Penghui Qi, Min Lin, Xinyi Wan
Balancing Pipeline Parallelism with Vocabulary Parallelism
Presenters:
Man Tsung Yeung, Penghui Qi, Min Lin, Xinyi Wan
COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts
Presenters:
Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen, Xin Liu
COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts
Presenters:
Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen, Xin Liu
Context Parallelism for Scalable Million-Token Inference
Presenters:
Amy Yang, Jingyi Yang, Aya Ibrahim, Xinfeng Xie, Bangsheng Tang, Grigory Sizov, Jongsoo Park, Jianyu Huang
Context Parallelism for Scalable Million-Token Inference
Presenters:
Amy Yang, Jingyi Yang, Aya Ibrahim, Xinfeng Xie, Bangsheng Tang, Grigory Sizov, Jongsoo Park, Jianyu Huang
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling
Presenters:
Sohaib Ahmad, Qizheng Yang, Haoliang Wang, Ramesh Sitaraman, Hui Guan
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling
Presenters:
Sohaib Ahmad, Qizheng Yang, Haoliang Wang, Ramesh Sitaraman, Hui Guan
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Presenters:
Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Presenters:
Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough
Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm
Presenters:
Baichuan Huang, Amir Aminifar
Efficient On-Device Machine Learning with a Biologically-Plausible Forward-Only Algorithm
Presenters:
Baichuan Huang, Amir Aminifar
Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
Presenters:
Geonhwa Jeong, Po-An Tsai, Abhimanyu Rajeshkumar Bambhaniya, Stephen Keckler, Tushar Krishna
Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
Presenters:
Geonhwa Jeong, Po-An Tsai, Abhimanyu Rajeshkumar Bambhaniya, Stephen Keckler, Tushar Krishna
FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference
Presenters:
Zaifeng Pan, Yitong Ding, Yue Guan, Zheng Wang, Zhongkai Yu, Xulong Tang, Yida Wang, Yufei Ding
FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference
Presenters:
Zaifeng Pan, Yitong Ding, Yue Guan, Zheng Wang, Zhongkai Yu, Xulong Tang, Yida Wang, Yufei Ding
FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning
Presenters:
Minxue Tang, Yitu Wang, Jingyang Zhang, Louis DiValentin, Aolin Ding, Amin Hass, Yiran Chen, Hai Li
FedProphet: Memory-Efficient Federated Adversarial Training via Robust and Consistent Cascade Learning
Presenters:
Minxue Tang, Yitu Wang, Jingyang Zhang, Louis DiValentin, Aolin Ding, Amin Hass, Yiran Chen, Hai Li
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Presenters:
Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Presenters:
Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze
FlexAttention: A Programming Model for Generating Fused Attention Variants.
Presenters:
Juechu Dong, BOYUAN FENG, Driss Guessous, Yanbo Liang, Horace He
FlexAttention: A Programming Model for Generating Fused Attention Variants.
Presenters:
Juechu Dong, BOYUAN FENG, Driss Guessous, Yanbo Liang, Horace He
FlexInfer: Flexible LLM Inference with CPU Computations
Presenters:
Seonjin Na, Geonhwa Jeong, Byung Hoon Ahn, Aaron Jezghani, Jeffrey Young, Christopher Hughes, Tushar Krishna, Hyesoon Kim
FlexInfer: Flexible LLM Inference with CPU Computations
Presenters:
Seonjin Na, Geonhwa Jeong, Byung Hoon Ahn, Aaron Jezghani, Jeffrey Young, Christopher Hughes, Tushar Krishna, Hyesoon Kim
FLStore: Efficient Federated Learning Storage for non-training workloads
Presenters:
Ahmad Faraz Khan, Samuel Fountain, Ahmed Mohamed Abdelmoniem Sayed, Ali R. Butt, Ali Anwar
FLStore: Efficient Federated Learning Storage for non-training workloads
Presenters:
Ahmad Faraz Khan, Samuel Fountain, Ahmed Mohamed Abdelmoniem Sayed, Ali R. Butt, Ali Anwar
Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs
Presenters:
Zichao Yue, Chenhui Deng, Zhiru Zhang
Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs
Presenters:
Zichao Yue, Chenhui Deng, Zhiru Zhang
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Presenters:
Sandeep Polisetty, Juelin Liu, Yi Fung, Seung-Hwan Lim, Hui Guan, Marco Serafini
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Presenters:
Sandeep Polisetty, Juelin Liu, Yi Fung, Seung-Hwan Lim, Hui Guan, Marco Serafini
HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression
Presenters:
Yujin Wang, Shunan Dong, Zongle Huang, Yichen You, Liu He, Huazhong Yang, Yongpan Liu, Hongyang Jia
HyC-LoRA: Memory Efficient LoRA Fine-tuning with Hybrid Activation Compression
Presenters:
Yujin Wang, Shunan Dong, Zongle Huang, Yichen You, Liu He, Huazhong Yang, Yongpan Liu, Hongyang Jia
Interference-aware Edge Runtime Prediction with Conformal Matrix Completion
Presenters:
Tianshu Huang, Arjun Ramesh, Emily Ruppel, Nuno Pereira, Anthony Rowe, Carlee Joe-Wong
Interference-aware Edge Runtime Prediction with Conformal Matrix Completion
Presenters:
Tianshu Huang, Arjun Ramesh, Emily Ruppel, Nuno Pereira, Anthony Rowe, Carlee Joe-Wong
Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
Presenters:
Neel P. Bhatt, Yunhao Yang, Rohan Siva, Daniel Milan, Ufuk Topcu, Atlas Wang
Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
Presenters:
Neel P. Bhatt, Yunhao Yang, Rohan Siva, Daniel Milan, Ufuk Topcu, Atlas Wang
LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions
Presenters:
Jianheng Ling, Pratik Worah, Yawen Wang, Yunchuan Kong, Chunlei Wang, Clifford Stein, Diwakar Gupta, Jason Behmer, Logan Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S. McKinley, Meeyoung Park, Martin Maas
LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions
Presenters:
Jianheng Ling, Pratik Worah, Yawen Wang, Yunchuan Kong, Chunlei Wang, Clifford Stein, Diwakar Gupta, Jason Behmer, Logan Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S. McKinley, Meeyoung Park, Martin Maas
LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Presenters:
Rya Sanovar, Srikant Bharadwaj, Renée St. Amant, Victor Ruehle, Saravan Rajmohan
LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Presenters:
Rya Sanovar, Srikant Bharadwaj, Renée St. Amant, Victor Ruehle, Saravan Rajmohan
Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers
Presenters:
Francesco Daghero, Daniele Jahier Pagliari, Francesco Conti, Luca Benini, Massimo Poncino, Alessio Burrello
Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers
Presenters:
Francesco Daghero, Daniele Jahier Pagliari, Francesco Conti, Luca Benini, Massimo Poncino, Alessio Burrello
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Presenters:
Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Presenters:
Shang Yang, Junxian Guo, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Presenters:
Mingyu Liang, Hiwot Kassa, Wenyin Fu, Brian Coutinho, Louis Feng, Christina Delimitrou
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Presenters:
Mingyu Liang, Hiwot Kassa, Wenyin Fu, Brian Coutinho, Louis Feng, Christina Delimitrou
Marconi: Prefix Caching for the Era of Hybrid LLMs
Presenters:
Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Yida Wang, Ravi Netravali
Marconi: Prefix Caching for the Era of Hybrid LLMs
Presenters:
Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Yida Wang, Ravi Netravali
MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES
Presenters:
Mohammadali Shakerdargah, Shan Lu, Chao Gao, Di Niu
MAS-ATTENTION: MEMORY-AWARE STREAM PROCESSING FOR ATTENTION ACCELERATION ON RESOURCE-CONSTRAINED EDGE DEVICES
Presenters:
Mohammadali Shakerdargah, Shan Lu, Chao Gao, Di Niu
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs
Presenters:
Abhishek Moitra, Arkapravo Ghosh, Shrey Agrawal, Aporva Amarnath, Karthik Swaminathan, Priyadarshini Panda
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs
Presenters:
Abhishek Moitra, Arkapravo Ghosh, Shrey Agrawal, Aporva Amarnath, Karthik Swaminathan, Priyadarshini Panda
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Presenters:
Beichen Huang, Yueming Yuan, Zelei Shao, Minjia Zhang
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Presenters:
Beichen Huang, Yueming Yuan, Zelei Shao, Minjia Zhang
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Presenters:
Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Presenters:
Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu
On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions
Presenters:
Maximilian Böther, Abe Sebastian, Pranjal Awasthi, Ana Klimovic, Srikumar Ramalingam
On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions
Presenters:
Maximilian Böther, Abe Sebastian, Pranjal Awasthi, Ana Klimovic, Srikumar Ramalingam
Optimizing LLM Queries in Relational Data Analytics Workloads
Presenters:
Shu Liu, Asim Biswal, Audrey Cheng, Amog Kamsetty, Luis Gaspar Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph Gonzalez, Matei Zaharia
Optimizing LLM Queries in Relational Data Analytics Workloads
Presenters:
Shu Liu, Asim Biswal, Audrey Cheng, Amog Kamsetty, Luis Gaspar Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph Gonzalez, Matei Zaharia
Photon: Federated LLM Pre-Training
Presenters:
Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Wanru Zhao, Dongqi Cai, Zexi Li, Xinchi Qiu, Nic Lane
Photon: Federated LLM Pre-Training
Presenters:
Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Wanru Zhao, Dongqi Cai, Zexi Li, Xinchi Qiu, Nic Lane
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Presenters:
Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Presenters:
Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang
ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud
Presenters:
Lu Wang, Mayukh Das, Fangkai Yang, Bo Qiao, Hang Dong, Si Qin, Victor Ruehle, Chetan Bansal, Eli Cortez, Íñigo Goiri, S R, Qingwei Lin, Dongmei Zhang
ProtoRAIL: A Risk-cognizant Imitation Agent for Adaptive vCPU Oversubscription In the Cloud
Presenters:
Lu Wang, Mayukh Das, Fangkai Yang, Bo Qiao, Hang Dong, Si Qin, Victor Ruehle, Chetan Bansal, Eli Cortez, Íñigo Goiri, S R, Qingwei Lin, Dongmei Zhang
QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Presenters:
Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han
QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Presenters:
Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han
Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training
Presenters:
Mingkai Zheng, Zhao Zhang
Radius: Range-based Gradient Sparsity for Large Foundation Model Pre-training
Presenters:
Mingkai Zheng, Zhao Zhang
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
Presenters:
Zhiyu Mei, WEI FU, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
Presenters:
Zhiyu Mei, WEI FU, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu
Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving
Presenters:
Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen
Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving
Presenters:
Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Presenters:
Xinyi Zhang, Hanyu Zhao, Wencong Xiao, Xianyan Jia, Fei Xu, Yong Li, Wei Lin, Fangming Liu
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Presenters:
Xinyi Zhang, Hanyu Zhao, Wencong Xiao, Xianyan Jia, Fei Xu, Yong Li, Wei Lin, Fangming Liu
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Presenters:
Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Xiao Chuanfu, Dahua Lin, Chao Yang
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Presenters:
Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Xiao Chuanfu, Dahua Lin, Chao Yang
ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation
Presenters:
Jiacheng Yang, Jun Wu, Zhen Zhang, Xinwei Fu, Zhiying Xu, Zhen Jia, Yida Wang, Gennady Pekhimenko
ScaleFusion: Scalable Inference of Spatial-Temporal Diffusion Transformers for High-Resolution Long Video Generation
Presenters:
Jiacheng Yang, Jun Wu, Zhen Zhang, Xinwei Fu, Zhiying Xu, Zhen Jia, Yida Wang, Gennady Pekhimenko
Scaling Deep Learning Training with MPMD Pipeline Parallelism
Presenters:
Anxhelo Xhebraj, Sean Lee, Hanfeng Chen, Vinod Grover
Scaling Deep Learning Training with MPMD Pipeline Parallelism
Presenters:
Anxhelo Xhebraj, Sean Lee, Hanfeng Chen, Vinod Grover
Seesaw: High-throughput LLM Inference via Model Re-sharding
Presenters:
Qidong Su, Wei Zhao, Xin Li, Muralidhar Andoorveedu, Chenhao Jiang, Zhanda Zhu, Kevin Song, Christina Giannoula, Gennady Pekhimenko
Seesaw: High-throughput LLM Inference via Model Re-sharding
Presenters:
Qidong Su, Wei Zhao, Xin Li, Muralidhar Andoorveedu, Chenhao Jiang, Zhanda Zhu, Kevin Song, Christina Giannoula, Gennady Pekhimenko
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Presenters:
Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Presenters:
Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie
SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling
Presenters:
Ke Hong, Xiuhong Li, Lufang Chen, Qiuli Mao, Guohao Dai, Xuefei Ning, Shengen Yan, Yun Liang, Yu Wang
SOLA: Optimizing SLO Attainment for Large Language Model Serving with State-Aware Scheduling
Presenters:
Ke Hong, Xiuhong Li, Lufang Chen, Qiuli Mao, Guohao Dai, Xuefei Ning, Shengen Yan, Yun Liang, Yu Wang
SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations
Presenters:
Md Saidul Hoque Anik, Ariful Azad
SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations
Presenters:
Md Saidul Hoque Anik, Ariful Azad
Supply-Chain Attacks in Machine Learning Frameworks
Presenters:
Yue Gao, Ilia Shumailov, Kassem Fawaz
Supply-Chain Attacks in Machine Learning Frameworks
Presenters:
Yue Gao, Ilia Shumailov, Kassem Fawaz
SwiftVI: Time-Efficient Planning and Learning with MDPs
Presenters:
Kasper Overgaard Mortensen, Konstantinos Skitsas, Emil Morre Christensen, Mohammad Sadegh Talebi, Andreas Pavlogiannis, Davide Mottin, Panagiotis Karras
SwiftVI: Time-Efficient Planning and Learning with MDPs
Presenters:
Kasper Overgaard Mortensen, Konstantinos Skitsas, Emil Morre Christensen, Mohammad Sadegh Talebi, Andreas Pavlogiannis, Davide Mottin, Panagiotis Karras
The Hidden Bloat in Machine Learning Systems
Presenters:
Huaifeng Zhang, Ahmed Ali-Eldin Hassan
The Hidden Bloat in Machine Learning Systems
Presenters:
Huaifeng Zhang, Ahmed Ali-Eldin Hassan
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Presenters:
YOUHE JIANG, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin CUI, Ana Klimovic, Eiko Yoneki
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Presenters:
YOUHE JIANG, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin CUI, Ana Klimovic, Eiko Yoneki
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
Presenters:
Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
Presenters:
Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Presenters:
Jinghan Yao, Sam Jacobs, Masahiro Tanaka, Olatunji Ruwase, Hari Subramoni, Dhabaleswar Panda
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Presenters:
Jinghan Yao, Sam Jacobs, Masahiro Tanaka, Olatunji Ruwase, Hari Subramoni, Dhabaleswar Panda
TurboAttention: Efficient attention approximation for high throughputs llm
Presenters:
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
TurboAttention: Efficient attention approximation for high throughputs llm
Presenters:
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
Venn: Resource Management For Collaborative Learning Jobs
Presenters:
Jiachen Liu, Fan Lai, Eric Ding, Yiwen Zhang, Mosharaf Chowdhury
Venn: Resource Management For Collaborative Learning Jobs
Presenters:
Jiachen Liu, Fan Lai, Eric Ding, Yiwen Zhang, Mosharaf Chowdhury
VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
Presenters:
Chendong Wang, Anlan Zhang, Yifan Yang, Lili Qiu, Yuqing Yang, XINYANG JIANG, Feng Qian, Suman Banerjee
VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
Presenters:
Chendong Wang, Anlan Zhang, Yifan Yang, Lili Qiu, Yuqing Yang, XINYANG JIANG, Feng Qian, Suman Banerjee
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Presenters:
Yixin Dong, Charlie Ruan, Yaxing Cai, Ziyi Xu, Yilong Zhao, Ruihang Lai, Tianqi Chen
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Presenters:
Yixin Dong, Charlie Ruan, Yaxing Cai, Ziyi Xu, Yilong Zhao, Ruihang Lai, Tianqi Chen
Youmu: Efficient Columnar Data Pipeline for LLM Training
Presenters:
Tianle Zhong, Jiechen Zhao, Qiang Su, Geoffrey Fox
Youmu: Efficient Columnar Data Pipeline for LLM Training
Presenters:
Tianle Zhong, Jiechen Zhao, Qiang Su, Geoffrey Fox
Poster Session
1Poster Session and Reception - Young Professional Symposium
Talk
4Lessons Learned from Successful PhD Students
Presenter:
Tim Dettmers
LMArena: An Open Platform for Crowdsourced AI benchmarks
Presenter:
Wei-Lin Chiang
Designing Models from the Hardware Up
Presenter:
Simran Arora
YPS - Talk by Beidi Chen
Presenter:
Beidi Chen
Successful Page Load