SAVE: Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit Flips

Published in In 2025 USENIX Annual Technical Conference (USENIX ATC 25), 2025

Recommended citation: Wenxin Zheng, Bin Xu, Jinyu Gu, and Haibo Chen. "SAVE: Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit Flips". In 2025 USENIX Annual Technical Conference (USENIX ATC 25) (2025). https://www.usenix.org/conference/atc25/presentation/zheng

Software-implemented fault tolerance for model inference against GPU memory bit flips.

View paper here

Wenxin Zheng, Bin Xu, Jinyu Gu, and Haibo Chen. “SAVE: Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit Flips”. In 2025 USENIX Annual Technical Conference (USENIX ATC 25) (2025).