My research interests are mobile computing, parallel programming and deep learning applications.
I build mobile frameworks to support cloud-free deep neural network applications, especially vision sensing apps. I focus on leveraging different types of processors (CPU, GPU, DSP, etc.) to speed up the processing pipeline as deep learning algorithm could be highly parallelized. I'm also interested in applying approximation techniques to reduce the computational costs of current state-of-the-art deep learning models.
Huynh, Loc Nguyen, Youngki Lee, and Rajesh Krishna Balan. "Deepsense: A gpu-based deep convolutional neural network framework on commodity mobile devices." Proceedings of the 2016 Workshop on Wearable Systems and Applications. ACM, 2016. link
Huynh, Loc N., Youngki Lee, and Rajesh Krishna Balan. "Deepmon: Mobile gpu-based deep learning framework for continuous vision applications." Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 2017. link
DeepMon: is a light-weight mobile framework to support deep learning applications on mobile devices (smart phones and tablets). DeepMon leverages the mobile GPUs (Adreno or Mali) to process the deep neural network in parallel to reduce the latency cost. We built DeepMon on top of OpenCL framework in order to support both Mali/Adreno GPU architectures which are commonly found on modern mobile devices.
Mobile GPU scheduler: Using GPU for computational-intensive tasks (e.g., vision sensing apps) in the background might interfere user's apps on the foreground. This might cause the UI unresponsive, reducing user's experience due to the non-preemptiveness of the GPU task. In this project, we are building the GPU-scheduler to preserve the user's experience while doing sensing tasks in the background. In order to achieve our goal, we divide sensing task into multiple micro-tasks which can be processed in an estimated amount of time to tackle the problem of non-preemptiveness. Furthermore, we build the GPU scheduler in kernel-space to keep track of all requests to the GPU and prioritize foreground tasks when needed.
Compressing convolutional neural network: CNN is well-known to be computational intensive. However, there are a lot of computational redundancy within the CNN pipeline. For examples, we tend to use a lot of filters during training and result in almost-zero-value outputs across a lot of channels. Traditional prunning methods remove unnecessary weights to make model smaller. Unfortunately, it produces sparse weight matrix which is hard to leverage optimized dense matrix multiplication library to compute the outputs efficiently. We propose adding an additional network (with small amount of computational cost) before convolutional layer to predict whether a particular filter is necessary or not. The proposed approach help us remove several unnecessary filters (instead of arbitrary weights) without any significant loss in accuracy. The additional network has 2 primary goals. First, we can use it to compress original network to be smaller (by removing those most unnecessary filters and additional network) and make it more computational efficient. Second, we hypothesize that each output channel will be activated for only some particular inputs. Hence, the additional network will act as an "online predictor" to choose which filters we should compute based on current input in order to reduce most processing time while preserving the accuracy.