Calculating the size of a set by observing the proportionality change of its disjoint subsets

Background

Instagram provides a polling feature, which allows the user to ask a question with two answers to the audiences. Once an audience pick one of the two answers, the percentage of users who pick each answer is displayed.

So the questions are: how many people actually voted in the poll? How many people voted for each option?

After presenting my original solution to Instagram polling problem to Cosmin Gorgovan, he said:

If you make observations before and after one vote, you can directly calculate the total number of votes from the weight of that one vote.
Cosmin Gorgovan, Fri 12 Apr 2019 14:43:14 BST

I thought he made a really good point. I thought this problem is actually mathematically interesting, and practically useful. So I set out to solve this problem.

We can consider everyone who voted in an Instagram poll as a set, and the two options are disjoint subsets of the superset.

Formal problem statement

Set $A$ consists of disjoint subsets $a_1, a_2, \ldots, a_n$. Although we do not know the cardinality of set $A$ (denoted by $|A|$) and the cardinality of each of the subset, we do know the proportion of each subset in terms of set $A$, that is we know $\frac{|a_1|}{|A|}, \frac{|a_2|}{|A|}, \ldots \frac{|a_n|}{|A|}$. We are allowed to add $n$ elements into subset $a_1$, and observe its change in proportionality. What is the cardinality of set $A$ before elements were added to $a_1$?

Solution

This problem can be solved by modelling the process of adding $n$ elements to subset $a_1$. We denote the proportionality of $a_1$ as $\alpha_1$, that is $\alpha_1 = \frac{|a_1|}{|A|}$, and the change in proportionality of $a_1$ as $\delta_{a_1}$

We begin by writing down the process that leads to the proportionality change. We add $n$ elements into subset $a_1$

$$ \frac{|a_1| + n }{|A| + n} - \frac{|a_1|}{|A|} = \delta_{a_1} $$

By multiplying the equation with the denominators in the left hand side, we obtain:

$$ (|a_1| + n )|A| - |a_1|(|A| + n) = \delta_{a_1}(|A| + n)(|A|)$$

By expanding the brackets, we obtain:

$$ |a_1||A| + n|A| - |a_1||A| - |a_1|n = \delta_{a_1}|A|^2 + \delta_{a_1}n|A| $$

By simplifying and rearranging the above equation, we obtain:
$$ |A|^2\delta_{a_1} + |A|(\delta_{a_1}n - n) + |a_1|n= 0$$

The above equation has two unknowns – $|A|$ and $|a_1|$. We can remove the unknown $|a_1|$ by substituting in $|a_1| = \alpha_1|A|$. By doing so, we obtain:

$$ |A|^2\delta_{a_1} + |A|(\delta_{a_1}n - n) + \alpha_1|A|n = 0 $$

By simplifying the above equation, we obtain:
$$ |A|^2\delta_{a_1} + |A|n(\delta_{a_1} - 1 + \alpha_1)= 0 $$

By dividing the above equation by $|A|$ and rearrangement, we obtain:
$$ |A| = \frac{n(1 - \alpha_1 - \delta_{a_1})}{\delta_{a_1}} $$

You know everything in the right hand side of the equation, so solving $|A|$ is very easy.

  • public/calculating_the_size_of_a_set_by_observing_the_proportionality_change_of_its_disjoint_subsets.txt
  • Last modified: 2019/04/17 02:08
  • by fangfufu