Loading...
Optimizing graph analyses on GPUs
Han, Wei
Han, Wei
Citations
Altmetric:
Advisor
Editor
Date
Date Issued
2021
Date Submitted
Keywords
Collections
Research Projects
Organizational Units
Journal Issue
Embargo Expires
2022-09-10
Abstract
The massive parallelism on GPUs provides opportunities to accelerate computation onlarge real-world graphs. However, implementing parallel graph algorithms on GPUs is
quite challenging because of the following two facts. First, the DRAM on a GPU is limited
and the size of some real-world graphs are possibly larger than the global memory on a
GPU. Second, the irregularity of node degrees poses severe load balancing issues on graph
applications. We address these issues in two main graph problems in this thesis.
In the rst part of this thesis, we introduce a new GPU-based graph traversal system,
called Graphie, which can handle out-of-core graphs. We divide the edge list of a graph
into partitions with equal number of edges so that each partition can t into the global
memory. The biggest advantage of this approach is that it improves load balancing. Then
we propose several techniques to leverage features on modern GPUs to accelerate graph
traversal algorithms. We develop two renaming techniques to quickly nd updated
partitions and cache vertex values in the shared memory for lower memory latency. We use
a small array to keep track of active partitions and to guide the data movement between
the host and device where unnecessary data transfers are avoided. We further utilize the
Hyper-Q technique on modern GPUs to overlap the computation and data movement.
In the second part, we present our GPU-based subgraph matching system, DGSM.
Current subgraph matching frameworks suer from the need to maintain overly large
intermediate states, which is normally far larger than a GPU's main memory . As a result,
these frameworks cannot match large queries on large data graphs. Another issue is that
there is redundant computation in their systems. DGSM can solve these two issues in the
existing subgraph matching systems. We design a GPU friendly data structure to reduce
memory access latency. Several techniques are incorporated in our system to further
improve the performance. Our experimental results show that our system is about 2 orders
of magnitude faster than the state-of-the-arts systems on both labeled and unlabeled
subgraph matching.
Associated Publications
Rights
Copyright of the original work is retained by the author.