Friday, December 19, 2025

NVIDIA CUDA Unified Memory Not Needed?... World's Best Graph Computation Developed

Input
2025-05-27 09:35:54
Updated
2025-05-27 09:35:54
KAIST Department of Computer Science Professor Kim Min-su's research team developed the GFlux structure. Provided by KAIST


[Financial News] KAIST (Korea Advanced Institute of Science and Technology) researchers have succeeded in developing a computational framework with the world's best performance that can process computations that took 2000 seconds on 25 computers using a single Graphics Processing Unit (GPU) computer.
KAIST announced on the 27th that Professor Kim Min-su's research team from the Department of Computer Science developed a general computation framework (GFlux) equipped with schedulers and memory management technologies that can rapidly process various computations on ultra-large graphs with a scale of 1 trillion edges using a GPU with limited memory.
The GFlux framework developed by the research team divides graph computations into GPU-optimized unit tasks called 'GTask' and efficiently allocates and processes them on the GPU using a special scheduling technique as its core technology. It converts graphs into a self-developed compression format optimized for GPU processing called HGF and stores and manages them in storage devices like SSDs.
When stored in the existing standard format CSR, the graph size of 1 trillion edges reaches 9 terabytes (TB), but using the HGF format can reduce this size to nearly half at 4.6 terabytes (TB). Additionally, it utilizes a 3-byte address system, which was not used due to memory alignment issues on the GPU, for the first time, reducing GPU memory usage by about 25%.
Particularly, without relying on NVIDIA CUDA's Unified Memory at all, it includes GTask-specific memory management technology as a key technology to prevent computation failures due to memory shortages by integratively managing main memory and GPU memory.
Professor Kim Min-su's research team verified the performance of the GFlux technology through complex graph computations such as counting the number of triangles.
In experiments on a graph with about 70 billion edges, the existing highest performance technology took about 2000 seconds using 25 computers connected by a high-speed network to count the number of triangles, while GFlux succeeded in processing it in just 1184 seconds with a single computer equipped with a GPU, nearly twice as fast.
This is the largest known graph successfully processed for counting the number of triangles using a single computer.
Professor Kim Min-su said, "The importance of high-speed computation processing technology for large-scale graphs such as graph RAG (Retrieval-Augmented Generation), knowledge graphs, and graph vector indexing is growing," and "GFlux technology is expected to effectively solve these issues."
This research involved Oh Se-yeon and Yoon Hee-yong, Ph.D. candidates from the Department of Computer Science, as the first and second authors respectively, and Han Dong-hyung, a researcher from the graph deep tech company Graphy, founded by Professor Kim, as the third author, with Professor Kim as the corresponding author. The research results were presented at the IEEE-hosted International Conference on Data Engineering (ICDE) on the 22nd.jiany@fnnews.com Reporter Yeon Ji-an