Tradeoff Options for Bipartite Graph Partitioning


Web connectivity graphs and similar linked data such as inverted indexes are important components of the information access systems provided by social media and web search services. The Bipartite Graph Partitioning mechanism of Dhulipala et al. [KDD 2016] relabels the vertices of large sparse graphs, seeking to enhance compressibility and thus reduce the storage space occupied by these costly structures. Here we develop a range of algorithmic and heuristic refinements to Bipartite Graph Partitioning (BP) that lead to faster computation of space-reducing vertex orderings whilst continuing to apply the same broad algorithmic paradigm. Using a range of web graph and information retrieval system index data as test cases, we demonstrate an implementation that executes up to approximately four times faster than the baseline implementation we commenced with, while holding compressibility approximately constant. We have also improved the asymptotic execution time of BP by replacing a sorting step by a customized median-finding step.

IEEE Transactions on Data Engineering