Exception Analysis of Running Complex and Computation-intensive Deep Learning Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Downloads (Pure)

Abstract

Advances in GPU have facilitated design and execution of complex and computation-intensive deep learning models. As the model complexity increases, the risk of encountering problems due to very large model size, individual tensor size, Not a Number (NaN) value, and memory leak increases as well. When untreated, these problems lead to substantial increase of execution time, generating unpredictable results, and memory leak exceptions. In this paper, we address these problems and particularly large tensor support, C++ kernel changes, and recompilation of the TensorFlow framework. In addition, issues related to NaN value debugging with existing debugging toolkits and solutions to alleviate memory leaks will be explored. Based on experience gained from our analysis, we propose solutions related to better tensor dimension sanity checks, alternative tensor loop procedures, different ways of applying kernels to tensors, a debug trace file filter method, and ways how memory leak exceptions can be resolved. While these problems and solutions may be applicable to running any complex and computation-intensive deep learning model, we described how we encountered them in a use case, in which we designed a deep learning model for activity and gesture recognition using radio data aiming to mitigate domain shift problem.
Original languageEnglish
Title of host publication2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM)
EditorsKhalil Ibrahimi, Mohamed El Kamili, Abdellatif Kobbane, Ibraheem Shayea
PublisherInstitute of Electrical and Electronics Engineers
Number of pages7
ISBN (Electronic)979-8-3503-2967-4
ISBN (Print)979-8-3503-2968-1
DOIs
Publication statusPublished - 22 Nov 2023
EventInternational Conference on Wireless Networks and Mobile Communications - Istanbul, Turkey
Duration: 26 Oct 202328 Oct 2023
Conference number: 10
https://www.wincom-conf.org/WINCOM_2023/

Conference

ConferenceInternational Conference on Wireless Networks and Mobile Communications
Abbreviated titleWINCOM
Country/TerritoryTurkey
CityIstanbul
Period26/10/2328/10/23
Internet address

Keywords

  • deep learning
  • high performance computing
  • resource complexity
  • kernel function
  • exception analysis

Fingerprint

Dive into the research topics of 'Exception Analysis of Running Complex and Computation-intensive Deep Learning Models'. Together they form a unique fingerprint.

Cite this