Loading...
Thumbnail Image
Publication

Optimizing graph analyses on GPUs

Han, Wei
Citations
Altmetric:
Advisor
Editor
Date
Date Issued
2021
Date Submitted
Keywords
Research Projects
Organizational Units
Journal Issue
Embargo Expires
2022-09-10
Abstract
The massive parallelism on GPUs provides opportunities to accelerate computation onlarge real-world graphs. However, implementing parallel graph algorithms on GPUs is quite challenging because of the following two facts. First, the DRAM on a GPU is limited and the size of some real-world graphs are possibly larger than the global memory on a GPU. Second, the irregularity of node degrees poses severe load balancing issues on graph applications. We address these issues in two main graph problems in this thesis. In the rst part of this thesis, we introduce a new GPU-based graph traversal system, called Graphie, which can handle out-of-core graphs. We divide the edge list of a graph into partitions with equal number of edges so that each partition can t into the global memory. The biggest advantage of this approach is that it improves load balancing. Then we propose several techniques to leverage features on modern GPUs to accelerate graph traversal algorithms. We develop two renaming techniques to quickly nd updated partitions and cache vertex values in the shared memory for lower memory latency. We use a small array to keep track of active partitions and to guide the data movement between the host and device where unnecessary data transfers are avoided. We further utilize the Hyper-Q technique on modern GPUs to overlap the computation and data movement. In the second part, we present our GPU-based subgraph matching system, DGSM. Current subgraph matching frameworks suer from the need to maintain overly large intermediate states, which is normally far larger than a GPU's main memory . As a result, these frameworks cannot match large queries on large data graphs. Another issue is that there is redundant computation in their systems. DGSM can solve these two issues in the existing subgraph matching systems. We design a GPU friendly data structure to reduce memory access latency. Several techniques are incorporated in our system to further improve the performance. Our experimental results show that our system is about 2 orders of magnitude faster than the state-of-the-arts systems on both labeled and unlabeled subgraph matching.
Associated Publications
Rights
Copyright of the original work is retained by the author.
Embedded videos