Student project for MA375: The use of graph theory

In the class we will learn much about graphs. As we known, graph theory is widely used in different areas, especially in computer science. Here I want to share some programming problems which can be solved using graph theory. It may be helpful in understanding how to modeling a problem using graph theory.
The first problem I want to share is about the MST (Minimum Spanning Tree). The algorithm I used is the Prim’s Algorithm, which will be demonstrated in the lecture.
The second problem is also about the Tree. But it’s not MST this time. What I want to talk is how to do dynamic programming on Trees.
The last one is a little beyond the lecture. I want to talk a little about Network-Flow Algorithm. The algorithm I used to get the max-flow is the Dinic Algorithm. And another thing I want to emphasize to this problem is the process to construct the network.



Problem 1. Agri-Net

Farmer John has been elected mayor of his town! One of his campaign promises was to bring internet connectivity to all farms in the area. He needs your help, of course.
Farmer John ordered a high speed connection for his farm and is going to share his connectivity with the other farmers. To minimize cost, he wants to lay the minimum amount of optical fiber to connect his farm to all the other farms.
Given a list of how much fiber it takes to connect each pair of farms, you must find the minimum amount of fiber needed to connect them all together. Each farm must connect to some other farm such that a packet can flow from any one farm to any other farm.
The distance between any two farms will not exceed 100,000.
The input includes several cases. For each case, the first line contains the number of farms, N (3 <= N <= 100). The following lines contain the N x N conectivity matrix, where each element shows the distance from on farm to another. Logically, they are N lines of N space-separated integers. Physically, they are limited in length to 80 characters, so some lines continue onto others. Of course, the diagonal will be 0, since the distance from farm i to itself is not interesting for this problem.
For each case, output a single integer length that is the sum of the minimum length of fiber required to connect the entire set of farms.
Source: USACO 102


Solve:

To solve this problem, we first need to figure out how to present the problem in mathematic words. Actually, the input description already gives us a clue. It uses the N by N matrix to describe the relation between farms. Obviously, the farms are the vertices of a graph G. The cable between the ith farm and jth farm is the the edge connecting vertex i and vertex j. Finally, the quantity of fiber needed to connecting farm i and farm j is the weight of the edge(i, j).
We have completely described the problem by graph. We are asked to calculate the minimum length of fiber required to connect the entire set of farms. To connect n vertices, we need at least n – 1 edges and make the graph a tree. So what the problem actually asking is just to calculate a MST. We can use the Prim’s Algorithm to solve it.


Analysis: We solved this problem by using the adjacency matrix, so the complexity of the Prim’s Algorithm is O(N^2).The maximum N in this problem is just 100, so our algorithm is efficient enough.


bool init()
{
     if(scanf("%d",&n)==EOF)return false;
     for(int i=0;i<n;i++)
        for(int j=0;j<n;j++)
        {
             scanf("%ld",&map[i][j]);
        }
     for(int i=0;i<n;i++)
        low[i]=map[0][i];
     return true;
}
void prim()
{
     long min;
     ans=0;
     for(int i=0;i<n-1;i++)
     {
             min=MAX;int k;
			 for(int j=0;j<n;j++)
			 {
				 if(low[j]<min && low[j]!=0)
				 {
					 min=low[j];k=j;
				 }
			 }
             if(min<MAX)ans+=min;
             low[k]=0;
             for(int j=0;j<n;j++)
             if(map[k][j]<low[j])
             low[j]=map[k][j];
     }
}



Problem 2. Rebuilding Roads

The cows have reconstructed Farmer John's farm, with its N barns (1 <= N <= 150, number 1..N) after the terrible earthquake last May. The cows didn't have time to rebuild any extra roads, so now there is exactly one way to get from any given barn to any other barn. Thus, the farm transportation system can be represented as a tree.

Farmer John wants to know how much damage another earthquake could do. He wants to know the minimum number of roads whose destruction would isolate a subtree of exactly P (1 <= P <= N) barns from the rest of the barns.
* Line 1: Two integers, N and P
* Lines 2..N: N-1 lines, each with two integers I and J. Node I is node J's parent in the tree of roads.
Source: USACO 2002 February


Solve:

The problem itself has helped us do the modeling thing: there is a tree which has N nodes. The problem is asking at least how many edges we need to delete so that in the final graph there is a subtree of exactly P nodes.
The naive idea to solve this problem is to enumerate all the cases which means we need to generate a sequence represent which edges to be deleted. But since we have N-1 edges for a tree with N nodes, there are 2^(N-1) such sequences. Obviously this method is not acceptable.
But let’s consider some properties of the tree. For a given root, if we know the information of all the subtrees whose roots are the children of the given root then we will also know the information for the whole tree. This is actually a recursion relation and it leads us to consider the dynamic programming.
We can use dp[i][j] to represent the minimum number of edges on the subtree with the ith node as its root which need to be deleted so that this subtree would have exactly j nodes.

With the states dp[i][j], It’s seems not very difficult to write down the dynamic programming equation. dp[i][j + k] = min(dp[i][k] + dp[c][j]) where node c is one of node i’s children. This means a subtree with k nodes and root i is connected to a subtree with j nodes and root c, where there is an edge between c and i.
 

But wait! Is there any problem with this equation? Consider the following situation, if the subtree with k nodes and root i contains the child c, we are in trouble. To fix this, we can keep another state lineDP[i][j] for each node s, which represents the sum of the minimum number of edges which need to be deleted for the first i’s subtrees of the root s. The method we used is kind like the packing problem. The linear dynamic programming equation is lineDP[i][j] = min (dp[ch[i]][k] + lineDP[i - 1][j –k ]), where ch[i] is the number of the node for the ith child of s, and for special cases (k = 0) lineDP[i][j] = min(lineDP[i][j], lineDP[I - 1][j] + 1). For details, please check the code below:


int csize(int f,int s)
{
	size[s]=1;
	for(int i=0;i<tree[s].size();i++)if(tree[s][i]!=f)
	{
		size[s]+=csize(s,tree[s][i]);
	}
	return size[s];
}
void get_size()
{
	memset(size,0,sizeof(size));
	size[0]=csize(0,1);
}
void work(int f,int s)
{
	int totch=0;
	if (dp[s][1]<200) return;
	for(int i=0;i<tree[s].size();i++)
	{
		if(tree[s][i]!=f)
		work(s,tree[s][i]);
	}
	memset(lineDP,1,sizeof(lineDP));
	memset(ch,0,sizeof(ch));
	memset(sumsize,0,sizeof(sumsize));
	for(int i=0;i<tree[s].size();i++)
		if(tree[s][i]!=f)
		{
			totch++;ch[totch]=tree[s][i];sumsize[totch]=sumsize[totch-1]+size[tree[s][i]];
		};
	lineDP[1][0]=1;for(int i=1;i<=sumsize[1];i++)lineDP[1][i]=dp[ch[1]][i];
	for(int i=2;i<=totch;i++)
	{
		lineDP[i][0]=lineDP[i-1][0]+1;
		for(int j=1;j<=sumsize[i];j++)
		{
			for(int k=1;k<=size[ch[i]]&&k<=j;k++)if(lineDP[i][j]>(dp[ch[i]][k]+lineDP[i-1][j-k]))lineDP[i][j]=dp[ch[i]][k]+lineDP[i-1][j-k];
			if (lineDP[i][j]>lineDP[i-1][j]) lineDP[i][j]=lineDP[i-1][j]+1;
		}
	}
	for (int i=1;i<=size[s];i++) dp[s][i]=lineDP[totch][i-1];
}
int main()
{
	read_tree();
        if (n==1) {printf("%d\n",0);return 0;}
	if (p>n) {printf("%d\n",0);return 0;}
	get_size();
	memset(dp,1,sizeof(dp));
	for(int i=2;i<=n;i++)
	{if(tree[i].size()==1)dp[i][1]=0;}
	work(0,1);
	int mincut=10000;
	for(int i=2;i<n;i++)if(dp[i][p]<mincut)mincut=dp[i][p]+1;
	if(mincut>dp[1][p])mincut=dp[1][p];
	printf("%d\n",mincut);
	return 0;
}


Problem 3. Dining

Cows are such finicky eaters. Each cow has a preference for certain foods and drinks, and she will consume no others.
Farmer John has cooked fabulous meals for his cows, but he forgot to check his menu against their preferences. Although he might not be able to stuff everybody, he wants to give a complete meal of both food and drink to as many cows as possible.
Farmer John has cooked F (1 ≤ F ≤ 100) types of foods and prepared D (1 ≤ D ≤ 100) types of drinks. Each of his N (1 ≤ N ≤ 100) cows has decided whether she is willing to eat a particular food or drink a particular drink. Farmer John must assign a food type and a drink type to each cow to maximize the number of cows who get both.
Each dish or drink can only be consumed by one cow (i.e., once food type 2 is assigned to a cow, no other cow can be assigned food type 2).
Input:
Line 1: Three space-separated integers: N, F, and D
Lines 2..N+1: Each line i starts with a two integers Fi and Di, the number of dishes that cow i likes and the number of drinks that cow i likes. The next Fi integers denote the dishes that cow i will eat, and the Di integers following that denote the drinks that cow i will drink.
Output a single integer that is the maximum number of cows that can be fed both food and drink that conform to their wishes
Source: USACO 2007 Open Gold


Solve:

One particular cow can be assigned only one type of drink and one type of food. Moreover, if one cow has been assigned one type of drink which he likes, then he must be assigned one type of food he likes and vice versa. To fulfill the first requirement, we can represent each cow by 2 nodes, namely, cow_i and cow_i’, and connect these two nodes by an edge with capacity 1. This means that only one unit of flow can go through these two nodes. To fulfill the second requirement, we connect food_p to cow_i if the ith cow likes the pth type of food. Also we connect cow_i’ to drink_k if the ith cow likes the kth type of drink. In addition, because each type of food or drink can be assigned only once, we connect the source node and food_p by an edge of capacity 1 and connect the terminal node and drink_k by and edge of capacity 1. The complete network is like: source-> food -> cow -> cow -> drink -> terminal. Then, this problem is solved by calculating a maximum flow.


S = 1;
    T = F + 2 * n + D + 2;
    for(int i = 1; i <= F; i ++)
    {
        c[S][S + i] = 1;
    }
    for(int i = 1; i <= n; i ++)
    {
        c[F + i + 1][F + n + i + 1] = 1;
    }
    for(int i = 1; i <= D; i ++)
    {
        c[F + 2 * n + i + 1][T] = 1;
    }
 
    for(int i = 1; i <= n; i ++)
    {
        int x,y;
        int temp;
        scanf("%d%d",&x,&y);
        for(int j = 0; j < x; j ++)
        {
            scanf("%d",&temp);
            c[S + temp][F + i + 1] = 1;
        }
        for(int j = 0; j < y; j ++)
        {
            scanf("%d",&temp);
            c[F + n + i  + 1][F + 2 * n + temp + 1] = 1;
        }
    }


Here’s a brief description for the Dinic’s Algorithm. It is a very fast algorithm designed to solve the max flow problem. Without any special data structure the time complexity of Dinic’s algorithm is O(V^2*E). Because E is usually proportional to V^2, obviously, we can see that Dinic’s will usually perform better than the Edmond-Karp algorithm whose time complexity is O(V*E^2). But in fact, Dinic’s is very similar to the Edmond-Karp. By the way, with a special data structure which is called the Dynamic tree, we can improve the Dinic’s algorithm’s time complexity up to O(V*E*Log V). This is actually a huge improvement but the coding can be very difficult.

The basic idea of the Dinic’s algorithm is to augment in the so called level graph. The level graph is actually a sub graph of the residual graph which consists of all the shortest augmenting paths in the residual graph. To construct the level graph, we use the BFS method, labeling each vertex by its distance from the terminal node. Then for each edge, if the capacity of edge (u, v) is greater than zero and d[u] = d[v] + 1, then edge (u, v) is in the level graph. In fact, we do not need to store the capacity graph explicitly; we can just check the labeling every time.

After the labeling (actually now we know everything about the level graph), we simply do a DFS to augment the flow. Repeat the labeling and augmenting until there's no augmenting path.

The procedures are like blow:
1. Initialize the flow, compute the capacities.
2. Start from the terminal node and use BFS to do the labeling in the residual graph, if the source node cannot be reached, exit.
3. Use DFS to augment the flow and go back to procedure 2.


reference: http://en.wikipedia.org/wiki/Dinic's_algorithm


bool bfs()
{
    int b[maxn];
    int l = 0,r = 0,u,v;
    q[r ++] = T; d[T] = 0;
    memset(b,0,sizeof(b));
    b[T] = 1;
    while(l < r)
    {
        u = q[l ++];
        for(v = 1; v <= T; v ++)
        if(f[v][u] < c[v][u] && !b[v])
        {
            d[v] = d[u] + 1;
            b[v] = 1;
            q[r ++] = v;
        }
        if(b[S])return true;
    }
    return false;
}
 
int maxflow()
{
    int u,v,flow = 0, a[maxn];
    while(bfs())
    {
        a[u = S] = (1 << 30);
        memset(cur,0,sizeof(cur));
        while(true)
        {
            int ok = 0;
            for(v = cur[u]; v <= T; v ++)
            {
                if(f[u][v] < c[u][v] && d[u] == d[v] + 1)
                {
                    ok = 1;
                    break;
                }
            }
            if(ok)
            {
                cur[u] = v + 1;
                pre[v] = u;
                a[v] = c[u][v] - f[u][v];
                if(a[v] > a[u])a[v] = a[u];
                u = v;
                if(v == T)
                {
                    do
                    {
                        cur[pre[u]] = u;
                        f[pre[u]][u] += a[T];
                        f[u][pre[u]] -= a[T];
                        if(f[pre[u]][u] == c[pre[u]][u])cur[S] = pre[u];
                        u = pre[u];
                    }while(u != S);
                    flow += a[T];
                    a[S] = (1 << 30);
                }
            }else
            {
                if(u != S)
                {
                    cur[u] = T;
                    u = pre[u];
                }else
                {
                    break;
                }
            }
        }
    }
    return flow;
}

Back to MA375, Spring 2012, Prof. Walther

Alumni Liaison

Sees the importance of signal filtering in medical imaging

Dhruv Lamba, BSEE2010