
Excited to share that our paper, "TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations," will be presented at the International Conference of Supercomputing (ICS). This work is a collaboration between the Computer Systems Lab at UTH, William & Mary, and The University of Georgia.
Mainstream mobile GPUs (such as Qualcomm's Adreno) usually have a 2.5D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5D cache, which limits their optimization potential. TMModel introduces a novel performance modeling framework for mobile GPUs that combines micro-benchmarking, an analytical performance model, and a lightweight compiler to optimize DNN execution based on access patterns and GPU parameters. TMModel delivers up to 66× speedup for end-to-end on-device DNN training with significantly lower tuning cost than existing frameworks. As mobile devices grow more powerful, this work is a step towards efficient, real-time deep learning training directly on such devices.