
Want a job working on mongoDB? Your first online interview is in this post  - meghan
http://maxschireson.com/2011/04/23/want-a-job-working-on-mongodb-your-first-online-interview-is-in-this-post/
======
yuvadam
Here's a sketchy description for a possible solution.

The easy part is that we know that if x and y are the missing and duplicate
indices form vector V then we have:

    
    
        |x-y| = |sum(V) - (N*(N-1))/2|
    

So far - constant space and linear time. From here it's either cleverly
(linearly!) scanning the vector or similarly concluding something else about
the vector and crossing that info with what we have ;)

~~~
makmanalp
This is close to the solution I had in mind, however summing lots of 64 bit
integers probably yields a huge number that overflows, so some Bigints (which
are comparatively slow) or clever hackery might be required to mitigate that.

~~~
antics
That's not (EDIT:) space-constant.

~~~
pdhborges
The sum is guaranteed to fit into 128 bits

------
warrenwilkinson
Pretty easy.

Compare the sum with the expected sum. Compare the sum of squares with the
expected sum of squares.

Two equations, just solve them. I wrote the final equations in the blog
comments.

~~~
cynicalkane
You can't solve these equations (at least in the general case, maybe there's
some clever way in this specific case I'm not getting), because the ring of
64-bit integers is not a factorial ring, meaning not every multiplication can
be undone. That is, if xy = z, there might be multiple y's for a given x and
z. (Edit--previously I had said it's because it's not a field, but that's a
stronger condition than we need.) If the sum is 2 and the sum of squares is 0,
then (1, -1) and (2^63 + 1, 2^63 - 1) are both solutions.

You could do this solution using the field of bitfields modulo a 64-bit
irreducible polynomial, but I expect that's over the heads of most
interviewers :P

(edit: just realized that you can do the neccesary field multiplication and
addition very quickly in constant time and space. It's even linear in the
number of bits you have to process. This looks very close to an ideal
solution, so I posted a link on the blog. There may be a clever solution that
is better.)

(edit #2: the clever solution is just to use bigints. The sum of squares is
guaranteed to fit in 192 bits. This might be what the parent poster meant all
along. This works because integers are a factorial ring, so the equations are
guaranteed to have a unique solution.)

~~~
mschireson
Simpler to do the arithmetic modulo a prime just smaller than 2^64 if you
ignore the case where n is very close to 2^64 (which should be fine in real
life). In case you leave your code will be easier to maintain than if you
bring Galois Fields into it.

I was kinda expecting the bigint solution though.

\-- Max

------
pedrocr
This seems to work (in Ruby):

    
    
      def find_missing_duplicate(seq)
        sum = seq.reduce(0, :+)
        corrsum = (seq.size*(seq.size+1))/2
    
        sumpow = seq.map{|i| i*i}.reduce(0, :+)
        corrsumpow = (1..seq.size).map{|i| i*i}.reduce(0, :+)
        
        diff = sum - corrsum
        diffpow = sumpow - corrsumpow
       
        seq.each do |i|
          miss = i-diff
          if (i*i - miss*miss == diffpow)
            return miss, i
          end
        end 
      end
    

And here's some code to test it:

    
    
      MIN_N = 2
      MAX_N = 1000
    
      (MIN_N..MAX_N).each do |n|
        duplicate = missing = rand(n)+1
        duplicate = rand(n)+1 while duplicate == missing
        seq = (1..n).map{|i| i}
        seq[missing-1]=duplicate
    
        miss, dup = find_missing_duplicate(seq)
        if !(miss == missing and dup == duplicate)
          $stderr.puts "ERROR: expecting #{missing}:#{duplicate} got #{miss}:#{dup}"
        end
      end

------
BarkMore
Here's a solution in Go.

    
    
        package main
    
        import (
            "fmt"
        )
    
        func main() {
            // a is the sample array
            a := []int{0,2, 3, 3}
    
            // Reorder values such that a[i] == i when i is not the missing value.
            for i := 0; i < len(a); i++ {
                j := a[i]
                k := a[j]
                a[j] = j
                a[i] = k
            }
    
            // Look for a[i] != i
            for i := 0; i < len(a); i++ {
                if a[i] != i {
                    fmt.Println("missing:", i, "duplicate:", a[i])
                    break
                }
            }
        }
    

This is a simple expression of a solution. The code can be improved.

~~~
BarkMore
Oops, the problem statement says that the array cannot be modified.

------
bermanoid
(EDIT: this doesn't work, as lisper pointed out below)

I haven't worked out the details or explicitly proven that this will work, but
I think the following algorithm should do it:

1) Count up the number of even numbers in the list and compare it to the
number we'd expect if the list was "right". This will either be correct (in
which case both missing and dupe are even, or both are odd), high (the missing
one was odd and the duped one was even), or low (the missing one was even and
the duped one was odd).

If high or low, we've just pinned down the last binary digit of each of the
numbers.

If even, the digit remains in an unknown state.

2) Slice off the last digit.

3) Repeat 1) and 2) for all 64 bits.

4) Now we have our missing number in the form 001001x1...xx01x (64 bits
worth), where the x's represent digits where the missing number's digit
matches the duped one.

5) We know the difference between the missing and duped numbers because of the
triangular sum formula N(N+1)/2. Use that to solve for the xs in the missing
number. That equation will _not_ be possible to solve in general, but the
statement of the problem here assures us that it will have a solution;
further, I'm pretty sure that as long as the duped and missing numbers are not
equal, that solution will be unique (EDIT: it's not, and this method doesn't
work at all :P ).

Maybe I'm missing something, though? Haven't thought this through too
deeply...

~~~
lisper
That doesn't work. Consider the case where the missing and dup differ by 1.

~~~
bermanoid
D'oh, you're absolutely right, the solutions aren't even close to unique. Good
catch.

------
biobot
This is the solution I posted there:

Called the missing number m, duplicate d.

The main idea is to find m+d and m-d. Then it’s trivial to find m and d.

1) Find n XOR d:

    
    
       X_array = 1;
       I_array = 1;
    
       for (i=2..N)
    
          X_array = X_array XOR A[i]; I_array = I_array XOR i;
    
       endfor
    

Then m XOR d = I_array XOR X_array

2) Find m – d

It will be a little bit more complicated. The idea is to get the sum of (1..N)
subtract the sum of our array. But we have to be careful not to over or
underflow the result.

    
    
       Diff = 0;
    
       i = j = 0; // i index I, j index A
    
       while (i < N) && (j < N)
    
         if (Diff + i < N-1)
    
             Diff = Diff + i;
    
             i++; //consume one in I
    
         else
    
             Diff = Diff – A[j];
    
             j++; // consume one in J
    
         endif
    
       end
    

// Need some corner case check here to but I will omit

3) So we have (m XOR d) and m – d. Also, note that m XOR d is basically m + d
without the carry bit so it will be trivial to get m and d.

running time : 3 N <\- can be squeeze to 2N if merge the two logic

space : O(1)

~~~
pedrocr
>note that m XOR d is basically m + d without the carry bit

This isn't true. XOR is addition without any carry bits so there are several
solutions that have the same XOR and difference

~~~
biobot
There are at most 2 solutions: if m XOR d = a then either m+d = a or m+d =
a+N.

With the combination of m-d = b, you will find m and d.

If both of them satisfied ( remember m and d are bounded by 1 and N) then we
actually have two solutions.

~~~
pedrocr
Here's a counterexample.

Sequence is [1, 2, 3, 4, 6, 6, 7, 8, 9, 10] so m = 5 and d = 6

    
    
      diff = 5-6 = -1
      xor = 5^6 = 3
    
      1-2 = -1
      1^2 = 3
    
      5-6 = -1
      5^6 = 3
    
      9-10 = -1
      9^10 = 3
    

So three different combinations, within N, that obey the two constraints

~~~
biobot
OK. I think I find a way around.

Assume d XOR m = a. Find k = position of the first bit (from MSB to LSB) in a.
Then d + m can be in [a + 2^(k+1),a + 2^(k+2),...,a + 2^63].

We will have less than 63 cases. In each case, after we find d and m, make
sure they are in range and then linear test.

So in the end, we may need to run at most 63+3 = 66 N. But it still linear
though.

------
bumbledraven
Since the list consists entirely of 64-bit integers, there's a trivial linear-
time solution:

    
    
      for x in [0..2^64)
        count = 0
        for i in [0..N)
          count += (array[i] == x)
        }
        number[count] = x     // 0 <= count <= 2 
      }
      print "missing: ", number[0]," duplicate: ", number[2]

------
biobot
One of my friend came up with this solution:

m is missing,d is duplicate. I is 1..N, A is input.

Step 1: Compute m XOR d by XOR all values of A and I together. let say x = m
XOR d. Then x must has at least one 1 bit (or else m == d). Call the bit p. So
now we can separate A into two list A0 are those with bit p == 0 and A1 are
those with bit p == 1. So we have separated m and d into two different list.

Step 2: Iterate through I, maintain two number x0 and x1 as the results of XOR
operations of those elements with p bit == 0 and == 1 respectively. Do the
same for A to get a0 and a1. Then we know that m is either a0 XOR x0 or a1 XOR
x1. One more linear test in A to confirm the missing element m.

In the end, the running time can be squeezed into 2 N. space = O(1).

------
carbocation
I wonder if a Bloom filter would allow you to (probably) identify the missing
value and the duplicated value.

Create a Bloom filter on your array, one element at a time. For each element,
test whether it is present in the set, and then add it to the set. By testing
whether the element is already in the set, you identify the _duplicated
value_. After creating your complete Bloom filter, then test integers 1 to N
to see which is not in the set; this is your _missing value_. I believe this
should be constant space and linear time.

Above I said "probably" because there are false positives (though not false
negatives).

------
BarkMore
This was a common Microsoft interview question in the 1990s. The answer is
probably easy to find on the inter-tubes.

------
mikepurvis
Not sure if this is what's being discussed in the comments already, but it
seems like you should be able to XOR all the numbers in the array, and then
compare that to the XOR of all the consecutive numbers (which will xor to 0 if
n mod 4 == 3).

~~~
esrauch
This only works if you are simply missing one value, in this case the problem
is presented that one number is missing and one is duplicated.

(xor-all data) xor (xor-all Range-1-N) = missing# xor duplicate#

Simply knowing that last xor won't let you determine the actual values of
missing# and duplicate# (just as the same thing with sum will end up giving
you the difference between them and not their actual values).

~~~
mikepurvis
Interesting you should say that. What about having both the sum and the xor?
Then it's two equations, two unknowns. That should be enough to solve for
it...

~~~
esrauch
I actually posted a reply to this above, knowing both the sum and the xor is
not sufficient to determine the two variables.

Consider a situation where you know that (a xor b) = 1 and (a - b) = 1. Both a
missing 2/extra 3 and missing 4/extra5 and any 2^N/2^N+1 are solutions to that
equation.

Just because you have two equations and two unknowns doesn't necessarily mean
you end up with a single value for each except in certain circumstances (eg
they are linear equations of full rank).

------
laxk
Python solution. <http://pastebin.com/KyFTXpEr> O(N) / No additional arrays

------
zwischenzug
bitmap, one extra int and some bitfucking at the end to get the zero bit.

~~~
zwischenzug
Hm, 2^64 bits is rather more space than I thought...

------
zxcvb
This site is flagged as "Sites likely to contain little of no useful content"
by my Universities web filtering software.

~~~
mschireson
curious what algorithm you use to determine that. i personally think my blog
has useful content. maybe because i talked about a job it looks like just a
job board?

\-- Max

~~~
zxcvb
I'm not really sure to be honest. Job boards and such things are not blocked.
I've never this category before. It's not blocked; it's just flagged.

edit: just noticed it has a secondary category as: "potentially damaging
content"

~~~
mschireson
Is it hacker news that's potentially damaging or my blog? If its my blog I'm
kinda flattered :)

(Would rather be "potentially damaging" than useless)

\-- Max

