Find missing numbers in 4 billion unique numbers with 50MB RAM

Upasana | December 08, 2019 | 2 min read | 148 views


4 billion numbers can not fit into 50MB RAM. Even if we take a bitset of 4 billion bits, it will need 4 GB RAM. So we need a multi-pass solution.

Solution

  1. To solve this limited memory problem, we will create buckets of integers, each bucket covering 10,000 integers.

  2. So first bucket will cover 0-9999, second bucket will cover 10,000-19,999 and so on.

  3. To cover 4 x 109 integers, we will need 4x109 / 104 = 4x105 buckets

  4. We will store count of numbers (not actual numbers) in each bucket, that way we will save lot of memory at the cost of extra passes. If no number is missing in a bucket, then the number of integers in that bucket would be 10, 000

  5. After processing all 4 billion integers, we will make a second pass to figure out the missing numbers in those buckets that have less than 10,000 items.

Time Complexity of approach: 3xO(n)

For simplicity purpose, i am providing code with much lower values, so that it can be easily tested and understood.

Find Missing Number with Limited RAM
import java.util.*;
import java.util.stream.LongStream;

public class Problem2 {

    public static final int SINGLE_BUCKET_SIZE = 10_000;
    public static final int TOTAL_BUCKETS = 400;

    final int[] buckets = new int[TOTAL_BUCKETS];
    Set<Integer> bucketsWithProblem = new HashSet<>();
    Map<Integer, BitSet> bucketData = new HashMap<>();

    public void countItemsInEachBucket(long input) {
        int bucketId = (int) (input / SINGLE_BUCKET_SIZE);
        buckets[bucketId]++;
    }

    public void findBucketsWithLesserItems() {
        for (int i = 0; i < buckets.length; i++) {
            if (buckets[i] < SINGLE_BUCKET_SIZE) {
                bucketsWithProblem.add(i);
                System.out.println("Bucket has a problem: " + i + ", total items =" + buckets[i] + ", missing items = " + (SINGLE_BUCKET_SIZE - buckets[i]));
            }
        }
    }

    public void locateMissingNumbers(long input) {
        int bucketId = (int) (input / SINGLE_BUCKET_SIZE);
        if (bucketsWithProblem.contains(bucketId)) {
            bucketData.computeIfAbsent(bucketId, BitSet::new).set((int) (input % SINGLE_BUCKET_SIZE));
        }

    }

    public void printMissingNumbers() {
        bucketData.forEach((integer, bitSet) -> {
            int nextClearBit = 0, lastClearBit = 0;
            do {
                nextClearBit = bitSet.nextClearBit(lastClearBit);
                if (nextClearBit < SINGLE_BUCKET_SIZE) {
                    int missingNumber = integer * SINGLE_BUCKET_SIZE + nextClearBit;
                    System.out.println("missing number = " + missingNumber);
                } else break;
                lastClearBit = nextClearBit + 1;
            } while (lastClearBit < SINGLE_BUCKET_SIZE);
        });

    }

    public static void main(String[] args) {
        HashSet<Long> missingNumbers = new HashSet<>(10);
        missingNumbers.add(10L);
        missingNumbers.add(100L);
        missingNumbers.add(900L);
        missingNumbers.add(15L);
        missingNumbers.add(3999L);

        Problem2 problem2 = new Problem2();
        LongStream.range(0, 4_000_000L).filter(value -> !missingNumbers.contains(value)).forEach(problem2::countItemsInEachBucket);
        System.out.println("Pass 1 is complete now");
        problem2.findBucketsWithLesserItems();
        System.out.println("Pass 2 is complete now");
        LongStream.range(0, 4_000_000L).filter(value -> !missingNumbers.contains(value)).forEach(problem2::locateMissingNumbers);
        System.out.println("Pass 3 is complete now");
        problem2.printMissingNumbers();
        System.out.println("Pass 4 is complete now");
    }
}

Top articles in this category:
  1. Given a collection of 1 million integers, all ranging between 1 to 9, sort them in Big O(n) time
  2. Factorial of a large number in Java BigInteger
  3. can we write a java method that swaps two integers
  4. Generate Random Numbers in a range using Java 8
  5. What will happen if we don't synchronize getters/accessors of a shared mutable object in multi-threaded applications
  6. What do you understand by Java Memory Model?
  7. Blocking Queue implementation in Java

Recommended books for interview preparation:

Find more on this topic: