Some parts of this blog post might get a bit technical for those who aren’t familiar with computer science topics. It took me a long time to wrap my head around some concepts. If you have any questions or find some things unclear please message me, and I’ll try to update my post to clarify various ideas.
If you’ve been on the internet recently (which includes everyone except for the Amish), then you’ve definitely seen some version of an article with the buzzwords big data, AI, blockchain, or machine learning. The applications of computer science are numerous and the limits of what we can do with computing is being pushed by researchers and state-of-the-art companies every day. It seems like everybody wants to learn some coding to keep up with this explosion of opportunity in various tech fields. It is amazing what we can do with computers: amateurs with good programming skills can beat the best chess players in the world, image recognition can better diagnose malignant and benign tumors than most doctors, and AI can have a dialogue with customers over the phone without the customers realizing they aren’t talking to a real human (in experimental stage).
This article is going to talk all about the things we can’tactually do with computer science. I will be focusing on the most important unsolved problem in computer science—P vs NP. Chances are you might have heard of this problem in passing, but often I find it is underexplained and misunderstood by many who talk about it. Here is my stab at trying to simplify this critical problem to make it easier to grasp.
What is P? In computer science we try to classify various problems based upon how easy they are to solve. For example, adding two numbers can be perceived as an easier problem than raising one number to the power of another (non-modular exponentiation). However, 125246431634 + 25146214612 is harder to solve (at least for me) than 2^2. That is why we are more interested in how difficult a problem is given its input size. In general, you can think of input size as the length it takes to write out the problem. So that brings us to a definition of P: these are the problems that, given some length n,take us n^k time to figure out for some constant k. In other words, they take a polynomial time in terms of the input. For example, if I had to find the greatest common denominator (GCD) of two values I could do it in n^2 time where n is the length of the problem using Euclid’s algorithm. If this is too hard to understand, the main point you should leave with is that P is the class of problems that are relatively easy to solve.
Then we have NP. These are the Non-deterministic Polynomial algorithms. They are problems we could solve if we have a magical computer that could try multiple different values at the same time. However, a simpler way to think of NP problems is that, given a solution to the from a magical computer, they are relatively easy to verify that the solution is valid. For example, I might not be able to factor a large number into primes, but if you give me two primes, I can quickly multiply them to see if they actual are factors of a number.
So the P vs NP basically boils down to the question: are problems that are relatively easy to verifygoing to be problems that arerelatively easy to solve. Common sense and many top computer scientist researches believe that the answer to this question is no, but no one has been able to prove it. Many believe that the proof itself is impossible.
We actually rely on P not being equivalent to NP in a lot of modern security and cryptography. For example, the RSA algorithm which is used to encrypt many of your passwords and credit card information when you send them over websites relies on the fact that large numbers are hard to factor into primes. If P was actually equal to NP, none of your transactions and secrets would be safe since anyone could easily decrypt them.
This leads me to a very special set of problems in computer science: NP complete problems. Here are some examples of such problems
- Traveling salesman: A salesman is going from city to city trying to sell his merchandise. Is there a route he can take that visits all cities and is also shorter than “x” miles?
- System of Inequalities: You are given a set of inequalities 3x+y >3, x+5z < 4, … and you want to find out if there is an integer solution for x,y,z,… that can satisfy all these inequalities
- Min Vertex Cover: You are given a graph of vertices and edges (picture a Facebook graph where you and all your friends are points and there is a line connecting two people if they are friends). Is there a set of “x” vertices in this graph that touch every edge? What this means is that we can choose a small amount of points such that every line in the graph touches at least one point.
What is crazy is that these are all basically the same problem. Not only are they all the same, but if we could find a solution to any of these problems, we would prove that all NP problems can be easy to solve thus solving the P vs NP problem. Since we still haven’t solved P vs NP with thousands of smart researchers attempting various solutions, suffice it to say that finding a good solution to any of these problems is very very hard.
Just because we can’t find the exact solution to any of these problems doesn’t mean we should roll over and give up. A continuously developing field in computer science deals with how well we can approximate a solution to one of these NP complete problems. For example, we may not be able to figure out exactly what a solution to a problem might be, but we can tell you what it will be within a factor of 2. Some of these approximation algorithms are very straightforward and give powerful results.
All this exposition brings me to my main topic which is Min Vertex Covering—one of the NP problems I mentioned above. This one is good to explore since a lot of researchers have used it as a case study to try out various techniques. One of these is a 2-approximation of Min Vertex Covering which was discovered (or first written about) in 1982. The main premise of the 2-approximation algorithm is simple; in order for every edge to be covered it has to touch one of the two vertices on its endpoints. So we know that for every edge, at least one of the two points has to be in a Min Vertex Cover.
Using that insight, the algorithm for Min Vertex is straightforward, choose a random edge, and then we know that one of the two points is in the Cover so we take both of them (which gives the factor of two) and delete all the edges that might tough those nodes since those edges would now be covered. Then we just proceed plucking edges and vertices out of the graph until we have none left. If you want a good visual example of how this works you can go to the link here.
So that’s kind of cool (at least to me). Then, my question was: can we do better than a 2-approximation for min vertex covering. Alas, after doing some research I realized there is a very good chance what we can’t. A Princeton paper published in 2005 proved that we cannot have a factor better than 1.36 approximation using the PCP theorem (a very powerful theorem dealing with polynomial reduction of proofs and probabilistically checking some of the results). Even more grim was a 2008 paper saying that we actually can’t do better than 2 using the two-provers unique game conjecture.
However, I do not give up so easily. Here is a good rule in computer science: if you cannot solve a general problem, you get more and more specific until maybe you can solve some instance of the problem. So instead of trying to solve the Min Vertex Cover problem on all graphs, my next question was how well I could approximate a random graph. This actually seems like an open area of research. There is a 2007 paper that uses some techniques to reduce almost all random graphs to a 3/2 approximation. It takes advantage of some common sense concepts for min vertex covering and reduces graphs to simpler ones until it gets a solid result. In a later blog I’d like to implement their reductions and possibly improve upon them in the general case.
Now comes the fun part. I started playing around in Python with a couple heuristics which I thought were important in determining how well of an approximation we could attain. My first attempt was just to run the initial basic algorithm (which I will refer to as “basic_two_approx”) a bunch of times and see what the high and low was. Since I have limited computation resources, I tested this out on random graphs of n edges and m vertices where n was bounded by 1000. The histogram below shows what I would get from running the general random algorithm a hundred times on a graph of size 1000. We would now have a lower bound being the highest values / 2 (since it’s a 2-approximation).
I would run my basic_two_approx algorithm many times and then I would get a better approximation constant by using the equation 2*min/max of the given tries. Thus, for the histogram above I would have an approximation constant of (940/964)*2 = 1.95. So, in this case it’s not that much better than just a normal two-approximation. In fact, below is a plot of how well this approach does for various edge densities relative to the number of nodes.
A slightly more sophisticated approach breaks away from the randomization of choosing edges and decides to choose them in a smarter way. For example, an edge between nodes with lots of other neighbors is more likely to give an optimal min vertex cover. This approach tries to pull out vertices with more neighbors to get a better solution for minimum and does the opposite to get a better approach for a maximum. Surprisingly, these approaches do a lot better. For the graph that created the histogram above, the new max_two_approx was able to find a covering of 1000 and the min_two_approx found one of 650, giving a ratio closer to 1.3 which is a major improvement. I played around with both geometric and arithmetic averages when deciding which heuristic to use and arithmetic performed slightly better.
There is another thing that we can do to improve upon the ratio. We know that any result we get from any of the above three algorithms contains the optimal. Thus, why not just start plucking away vertices from them until we can no longer do so, getting a much better upper bound. This realization coupled with the above algorithm leads to a much stronger k-approximation. A graph of the new approximations using these insights is below. This result is actually really cool, even though the k approximation gets large for values close to 2. We don’t actually focus on those graphs in real world simulations since most nodes have a finite bound for neighbors, and the 2007 paper finds a neat way to reduce those problems to a factor better than 3/2. The fact that this algorithm does so well in the .8-1.5 density range is very promising.
I also tried one more thing. One heuristic I looked at was how many times a node was included in the output of the original randomized algorithm. The more often it is included the more likely it is actually a node of the min vertex cover. Thus, I would sort the nodes based on how often they were included rather than by number of neighbors, and I ran the same plucking algorithm to get a better minimum. The results are plotted below. It is interesting how similarly two completely different heuristics performed.
It actually did somewhat better than the original result, which is promising. I want to see if I could use a machine learning model on these and other features to select which nodes to reduce and which edges to choose. Maybe then we could improve on the state of the art and find high quality approximations to NP-Complete problems with feature-driven machine learning. But that is for another time; this post is way too long already. You can contact me if you want access to the notebook that I used to generate my data.