Book
Proceedings of Machine Learning and Systems 3 pre-proceedings (MLSys 2021)
Edited by:
A. Smola and A. Dimakis and I. Stoica
Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma, Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto Delmas Lascorz, Gennady Pekhimenko, Andreas Moshovos
To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, Yunxin Liu
Cortex: A Compiler for Recursive Deep Learning Models Pratik Fegade, Tianqi Chen, Phillip Gibbons, Todd Mowry
Adaptive Gradient Communication via Critical Learning Regime Identification Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos
EXPLORING THE LIMITS OF CONCURRENCY IN ML TRAINING ON GOOGLE TPUS Sameer Kumar, Yu Wang, Cliff Young, James Bradbury, Naveen Kumar, Dehao Chen, Andy Swing
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy Lucas Liebenwein, Cenk Baykal, Brandon Carter, David Gifford, Daniela Rus
Learning Fitness Functions for Machine Programming Shantanu Mandal, Todd Anderson, Javier Turek, Justin Gottschlich, Shengtian Zhou, Abdullah Muzahid
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Anshumali Shrivastava
IOS: Inter-Operator Scheduler for CNN Acceleration Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han
A Deep Learning Based Cost Model for Automatic Code Optimization Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham LEGHETTAS, Kamel Abdous, Taha Arbaoui, Karima BENATCHBA, Saman amarasinghe
Don't Forget to Sign the Gradients! Omid Aramoon, Pin-Yu Chen, Gang Qu
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators Hamzah Abdelaziz, ali shafiee, Jong Hoon Shin, Ardavan Pedram, Joseph Hassoun
Swift for TensorFlow: A portable, flexible platform for deep learning Brennan Saeta, Denys Shabalin
Equality Saturation for Tensor Graph Superoptimization Yichen Yang, Phitchaya Phothilimthana, Yisu Wang, Max Willsey, Sudip Roy, Jacques Pienaar
PipeMare: Asynchronous Pipeline Parallel DNN Training Bowen Yang, Jian Zhang, Jonathan Li, Christopher Re, Christopher Aberger, Christopher De Sa
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems Ahmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini , Marco Canini
Value Learning for Throughput Optimization of Deep Learning Workloads Benoit Steiner, Chris Cummins, Horace He, Hugh Leather
Scaling Distributed Training with Adaptive Summation Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum
Learning on Distributed Traces for Data Center Storage Systems Giulio Zhou, Martin Maas
Pufferfish: Communication-efficient Models At No Extra Cost Hongyi Wang, Saurabh Agarwal, Dimitris Papailiopoulos
A Learned Performance Model for Tensor Processing Units Sam Kaufman, Phitchaya Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu
ModularNAS: Towards Modularized and Reusable Neural Architecture Search Yunfeng Lin, Guilin Li, Xing Zhang, Weinan Zhang, Bo Chen, Ruiming Tang, Zhenguo Li, Jiashi Feng, Yong Yu
FLAML: A Fast and Lightweight AutoML Library Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng Sun, Leman Akoglu
Pipelined Backpropagation at Scale: Training Large Models without Batches Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Koster
Fluid: Resource-aware Hyperparameter Tuning Engine Peifeng Yu, Jiachen Liu, Mosharaf Chowdhury
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul Whatmough
Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices Urmish Thakker, Paul Whatmough, ZHIGANG LIU, Matthew Mattina, Jesse Beu
A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core Systems Guixiang Ma, Yao Xiao, Theodore Willke, Nesreen Ahmed, Shahin Nazarian, Paul Bogdan
Bit Error Robustness for Energy-Efficient DNN Accelerators David Stutz, Nandhini Chandramoorthy, Matthias Hein, Bernt Schiele
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko
Characterizing and Taming Model Instability Across Edge Devices Eyal Cidon, Evgenya Pergament, Zain Asgar, Asaf Cidon, Sachin Katti
Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu
FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation Bharathan Balaji, Christopher Kakovitch, Balakrishnan Narayanaswamy
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data Guanhua Wang, Zhuang Liu, Brandon Hsieh, Siyuan Zhuang, Joseph Gonzalez, Trevor Darrell, Ion Stoica
Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks Tom Bannink, Adam Hillier, Lukas Geiger, Tim de Bruin, Leon Overweel, Jelmer Neeven, Koen Helwegen
Wavelet: Efficient DNN Training with Tick-Tock Scheduling Guanhua Wang, Kehan Wang, Kenan Jiang, XIANGJUN LI, Ion Stoica
Data Movement Is All You Need: A Case Study on Optimizing Transformers Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler
Scaling Polyhedral Neural Network Verification on GPUs Christoph Müller , François Serre, Gagandeep Singh, Markus Püschel, Martin Vechev
Accounting for Variance in Machine Learning Benchmarks Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Tal Arbel, Chris Pal, Gael Varoquaux, Pascal Vincent
Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model Training Nathalie Rauschmayr, Vikas Kumar, Rahul Huilgol, Andrea Olgiati, Satadal Bhattacharjee, Nihal Harish, Vandana Kannan, Amol Lele, Anirudh Acharya, Jared Nielsen, Lakshmi Ramakrishnan, Ishan Bhatt, Kohen Chia, Neelesh Dodda, Zhihan Li, Jiacheng Gu, Miyoung Choi, Balajee Nagarajan, Jeffrey Geevarghese, Denis Davydenko, Sifei Li, Lu Huang, Edward Kim, Tyler Hill, Krishnaram Kenthapadi
RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads James Gleeson, Moshe Gabel, Gennady Pekhimenko, Eyal de Lara, Srivatsan Krishnan, Vijay Janapa Reddi
TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, Rocky Rhodes
ByzShield: An Efficient and Robust System for Distributed Training Konstantinos Konstantinidis, Aditya Ramamoorthy
In-network Aggregation for Shared Machine Learning Clusters Nadeen Gebara, Manya Ghobadi, Costa Paolo
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li , Jingren Zhou, Ce Zhang, Gustavo Alonso
Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity Toshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference Steve Dai, Rangha Venkatesan, Mark Ren, Brian Zimmer, William Dally, Brucek Khailany
CODE: Compiler-based Neuron-aware Ensemble training Ettore M. G. Trainiti, Thanapon Noraset, David Demeter, Doug Downey, Simone Campanoni
Do not remove: This comment is monitored to verify that the site is working properly