TIW: Linked List Cycle

2016-12-16 4 minute read

This is more like a special topics post, because it is a very specific algorithm with a very narrow application. The problem statement: given a linked list which has a cycle, determine where the cycle begins.

To explain further, a linked list with a cycle looks something like the number 6. In fact, the only 2 topologies (read: shapes) a linked list could have are a straight line or the number 6. We start walking from the top, and end up walking indefinitely in a loop. To determine whether there is a loop or not is fairly simple: create an unordered set, insert all the visited nodes (or just their pointers, or anything unique to the nodes) into the set, until we insert the same node twice or we run into the end of the linked list (in which case there will be no cycle).

Let’s say we have this declaration of list node.

struct ListNode {
    ListNode* next;
    int val;
};

And this would be the function to return the first node in a cycle.

ListNode* detectCycle(ListNode* head) {
    unordered_set<ListNode*> vis;
    while (head) {
        if (vis.count(head))
            return head;
        vis.insert(head);
        head = head->next;
    }
    return NULL;
}

This is trivial. What is not trivial is to accomplish the exact same task with O(n) time as before but with O(1) space.

The algorithm that does this uses 2 pointers, one fast and one slow. They both start from the head, and the fast pointer moves 2 nodes while the slow pointer moves 1 node per iteration. If there is no loop, the fast pointer will reach the end. If there is a loop, they will fall into the loop , and eventually they will end up at the same node at some point. At that time we will be sure that there is a loop. Why are we sure that they will always collide? Consider them both in the cycle already. In each iteration, the relative distance between the two pointers will increase by 1. When the distance hits a multiple of the length of the cycle, they will effectively have a distance of 0, and hence will be at the same node.

OK, that sounds clever. But we still do not know the beginning of the loop, right?

Actually, we can figure it out! After they collide, move the fast pointer back to the head. Now in each iteration, move them both at the same pace of 1 node, and eventually they will collide again. When that happens, we have found the beginning of the loop. I do not know of an intuitive explanation of this, but it is provable with a little algebra. Let’s say the part before the loop has m nodes, and the loop itself is n nodes long. Say, after k iterations, the fast pointer meets with the slow pointer. Then we know (k*2-m)-(k-m) = 0 (mod n), i.e. the relative distance between the two pointers is 0. Hence k is a multiple of n. After m more steps, the slow pointer in total has moved k+m steps. That is equivalent to moving m steps, and then moving k more steps. But moving k steps in the loop does nothing, because k is a multiple of n. Therefore after k+m steps, the slow pointer points at the beginning of the loop. Coincidentally the fast pointer, in the second phase of the algorithm, after moving m single steps, also arrive at the beginning of the loop. Therefore, we have proven (not very rigorously) the first time the two pointers meet in the second phase of the algorithm is at the beginning of the loop.

Perhaps I should show some code to make it clearer.

ListNode* detectCycle(ListNode* head) {
    ListNode *fast = head, *slow = head;
    do {
        for (int i = 0; i < 2; i++) {
            if (!fast)
                return NULL;
            fast = fast->next;
        }
        slow = slow->next;
    } while (fast != slow);
    fast = head;
    while (fast != slow) {
        fast = fast->next;
        slow = slow->next;
    }
    return fast;
}

That’s the algorithm. There is one problem on Leetcode that uses this algorithm: Find the Duplicate Number. It is not exactly obvious, so please spend some time to convince yourself this problem represents a linked list. Basically the idea is that each number in the array is a node, the index being the address, and the number being the address of the next node. The graph looks like a 6 because there is exactly one node with in degree 2, ignoring the parts of the graph that cannot be reached from the head. Here’s the code for it:

int findDuplicate(vector<int>& nums) {
    int fast = 0, slow = 0;
    do {
        fast = nums[nums[fast]];
        slow = nums[slow];
    } while (fast != slow);
    fast = 0;
    while (fast != slow) {
        fast = nums[fast];
        slow = nums[slow];
    }
    return fast;
}