Swanson MA279 Fall2022 topic12 - Rhea

Introduction

The Internet is a huge part of our daily lives. We search for and consume a lot of the information through search engines. One might wonder how search engines work.

In the early 90s, the first search engines used a text-based ranking system to decide the relevance of websites given a query. The way this worked was the pages with the highest occurrences of key words corresponding to the query were determined as relevant. However, an obvious problem with this approach is that suppose one wanted to find information about Purdue. If the query was “Purdue”, one might expect “www.purdue.edu” to be the most relevant site for the query. But if there was a website that used the word “Purdue” more than this site, that would be determined as the most relevant one instead. This is undesirable.

The usefulness of a search engine depends on the relevance of its search results. One would want results to have relevant, popular, and authoritative websites. Modern search engines have more sophisticated methods of determining the relevance of websites based on a search query.

One of the most known and influential algorithms is the PageRank algorithm, which we will be exploring in this project.

PageRank Algorithm

Let us understand the Page Rank Algorithm through an example. We use a simplified version of it, which only considers connected graphs. In real life, we could have disjoint sets of websites. Our focus of the algorithm is its linear algebra perspective.

Suppose we have a smaller version of the internet of websites A, B, C, D, E as described in the table below.

Website	Links to Website(s)
A	B, C, D, E
B	C, D
C	A, E
D	A, C, E
E

We can visualize the relationship of websites by a directed graph with the websites as nodes, edges from vertex u to vertex v if website u references website v, and the weights of edges being the “importance” of website u being transferred to website v equally amongst all its links.

For example, as we can see in the graph, since website A has links to all other pages, it has outgoing edges to all other vertices each with weight $\frac{1}{4}$ (= 0.25).

Step 1: Create a transition matrix.

Create a transition matrix of the graph with entry (i,j) being the importance that website j transfers to website i. The transition matrix A for this graph would be as follows:

$\begin{bmatrix} 0 & 0 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & 0 & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & 0 & \frac{1}{2} & \frac{1}{3} & 0 \end{bmatrix}$

If there is a website that does not have any outgoing edges, then we distribute its importance to each other website equally. This avoids problems where the resulting PageRank vector becomes 0 in the next step due to calculations.

Step 2: Calculate the PageRank vector.

There are two ways to do this:

Way 1: Iterative version

The PageRank algorithm is an iterative algorithm as described in the official published paper.

Start with an initial rank vector v, with all entries equal to $\frac{1}{n}$ , where n is the number of websites. Each incoming link should increase the importance of a website, so we update its rank by adding the importance of the incoming links to it. This is the same as multiplying matrix A with v.

This is because in matrix multiplication of a matrix with a vector, we essentially take a dot product of each row to the vector. In the context of this problem, for example, we would be multiplying the importance each website gives to website A in row 1 to its rank, which is each corresponding entry in the vector.

Update the rank vector by repeatedly multiplying matrix A to the new rank vector after each step until it converges. Essentially, we are multiplying A by v in iteration 1, A by Av in iteration 2, …, A by $A^{(n-1)}v$ in iteration n, until the rank vector converges. This converged rank vector is called the PageRank vector.

Iteration 1:

$\begin{bmatrix} 0.21666... \\ 0.1 \\ 0.26666... \\ 0.2 \\ 0.21666... \end{bmatrix}$

Iteration 2:

$\begin{bmatrix} 0.25416... \\ 0.10833... \\ 0.225 \\ 0.15833... \\ 0.25416... \end{bmatrix}$

Iteration 3:

$\begin{bmatrix} 0.22881... \\ 0.12708... \\ 0.23402... \\ 0.18125 \\ 0.22881... \end{bmatrix}$

Iteration 4:

$\begin{bmatrix} 0.23463... \\ 0.11440... \\ 0.23836... \\ 0.17795... \\ 0.23463... \end{bmatrix}$

Iteration 5:

$\begin{bmatrix} 0.23716... \\ 0.11731... \\ 0.23383... \\ 0.17452... \\ 0.23716... \end{bmatrix}$

Iteration 6:

$\begin{bmatrix} 0.23438... \\ 0.11858... \\ 0.23541... \\ 0.17362... \\ 0.23438... \end{bmatrix}$

Iteration 7:

$\begin{bmatrix} 0.23538... \\ 0.11719... \\ 0.23556... \\ 0.17648... \\ 0.23538... \end{bmatrix}$

Iteration 8:

$\begin{bmatrix} 0.23545... \\ 0.11769... \\ 0.23511... \\ 0.17628... \\ 0.23545... \end{bmatrix}$

As we can see, after 8 iterations, the thousandth place has converged.

The PageRank vector is:

$\begin{bmatrix} 0.235 \\ 0.117 \\ 0.235 \\ 0.176 \\ 0.235 \end{bmatrix}$

This is an eigenvector of A with eigenvalue 1. It always works out this way. We will compute this directly in Way 2. Later we will explain why this works.

Way 2: Non-iterative equivalent

Let us call the importance of each of the five websites as $$ x_1, \dots, x_5 $$ . Then, we get the following system of equations:

$x_1 = \frac{1}{2} \cdot x_3 + \frac{1}{3} \cdot x_4 + \frac{1}{4} \cdot x_5 \\ x_2 = \frac{1}{4} \cdot x_1 + \frac{1}{4} \cdot x_5 \\ x_3 = \frac{1}{4} \cdot x_1 + \frac{1}{2} \cdot x_2 + \frac{1}{3} \cdot x_4 + \frac{1}{4} \cdot x_5 \\ x_4 = \frac{1}{4} \cdot x_1 + \frac{1}{2} \cdot x_2 + \frac{1}{4} \cdot x_5 \\ x_5 = \frac{1}{4} \cdot x_1 + \frac{1}{2} \cdot x_3 + \frac{1}{3} \cdot x_4$

Each equation comes from the idea that the importance of a website is the sum of the importance of the incoming links to it.

This system of equations is equivalent to asking for the solution to:

$\begin{bmatrix} 0 & 0 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & 0 & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & \frac{1}{3} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{2} & 0 & 0 & \frac{1}{4} \\ \frac{1}{4} & 0 & \frac{1}{2} & \frac{1}{3} & 0 \end{bmatrix} \cdot \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{bmatrix}$

Clearly, we want to find the eigenvector corresponding to the eigenvalue of 1. Solving for it, we get the following form for the eigenvectors:

$c \cdot \begin{bmatrix} 1 \\ \frac{1}{2} \\ 1 \\ \frac{3}{4} \\ 1 \end{bmatrix}$

The PageRank vector denotes the relative importance of the websites, so we would want all entries to sum to 1. Eigenvectors are all scalar multiples of each other, so we can normalize it by dividing each entry by $\frac{17}{4} (= 1 + \frac{1}{2} + 1 + \frac{3}{4} + 1)$ . This gives us the PageRank vector:

$\frac{4}{17} \cdot \begin{bmatrix} 1 \\ \frac{1}{2} \\ 1 \\ \frac{3}{4} \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{4}{17} \\ \frac{2}{17} \\ \frac{4}{17} \\ \frac{3}{17} \\ \frac{4}{17} \end{bmatrix} = \begin{bmatrix} 0.23529... \\ 0.11764... \\ 0.23529... \\ 0.17647... \\ 0.23529... \end{bmatrix} \approx \begin{bmatrix} 0.235 \\ 0.117 \\ 0.235 \\ 0.176 \\ 0.235 \end{bmatrix}$

As we can see, this is the same vector calculated by Way 1.

Now that we have seen the non-iterative equivalent, we can explain why we can always arrive to an eigenvector of eigenvalue 1.

A stochastic matrix is a square matrix in which all entries are non-negative. It is called a right stochastic matrix when all the columns sum to 1. Clearly, the transition matrix A from earlier is a right stochastic matrix. In fact, a transition matrix is another name for a stochastic matrix.

A property of a right stochastic matrix is that it always has an eigenvalue of 1.

Proof:

Let A be a right stochastic matrix:

$\begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{n1} & \cdots & A_{nn} \\ \end{bmatrix}$

Then, matrix A - I is (where I is the identity matrix)

$\begin{bmatrix} A_{11}-1 & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{n1} & \cdots & A_{nn}-1 \\ \end{bmatrix}$

Each column must sum to 0 now, since each column has an additional -1 term.

This means

$(A-I)^T \cdot \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} = 0 \cdot \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}$

since each column (which is a row in the transpose) sums to 0.

Clearly, matrix A-I has an eigenvalue of 0. This means that the matrix is singular, i.e. it does not have an inverse, and its rows/columns are not linearly dependent, so there is a non-zero vector v in its null space which is the eigenvector of A-I with eigenvalue 0:

$(A-I) \cdot v = 0 \cdot v \implies A \cdot v - I \cdot v = 0 \implies A \cdot v = I \cdot v \implies A \cdot v = v \implies A \cdot v = 1 \cdot v$

Therefore, matrix A must always have an eigenvalue of 1 with eigenvector v.

This is the first version of the algorithm described in the paper. Since then, it has gone through many changes which are not available publicly.