Personalized PageRank (PPR) の効率的な計算

この論文は，Personalized PageRankの3つの問題を効率的に計算する手法を提案した．
QR分解あたりの証明が難しく，あとで再確認する．

問題

Problem1 (Single-node relevance computation)

Given: Query node x and query distribution d.
Find: Relevance score of node x according to query distribution d.

Problem2 (Top-k nodes identification)

Given: The number of required answer nodes, K, and query distribution d.
Find: Top K nodes with the highest relevance scores with respect to query distribution d.

Problem3 (Highly relevant node detection)

Given: The threshold, ε, and query distribution d.
Find: Nodes whose relevance scores are higher than ε with respect to query distribution d.

提案手法

アイデア

ベクトルの内積によってsingle node scoreを計算．(Problem1)
最終的なスコアの下限値と上限値を推定し，解になり得るかどうかを判定する．(Problem2, 3)

詳細

Problem1

内積の形に変形するためにQR分解をする．
QとRが疎になるように，ノードをpermuteする．

Problem2, 3

はじめに全てのノードの下限値を推定．
下限値の降順でノードをたどる．
解になり得ないノードに到達したとき，処理を終了する．

Problem2, 3でノードのスコアが必要になったとき，Problem1のsingle node score計算を利用している．
（説明が雑なので，あとでノートで確認する...）

実験

Table2より，Q，Rが確かに疎になっていることが確認できる．
すべての問題について，良い性能を示した．

Case-Studies

航空データを用いたcase studyが紹介されている．
San Diego, New York, Denver, Orlandoをseed nodeとして，関連の強い都市を検出した．

メモ帳

メモ帳です．

Efficient Personalized PageRank with Accuracy Assurance [KDD'12]