Two dice are tossed if X is the sum of the numbers shown up find the probability mass function of X

A random variable is a variable that is subject to variations due to random chance. One can think of a random variable as the result of a random experiment, such as rolling a die, flipping a coin, picking a number from a given interval. The idea is that, each time you perform the experiment, you obtain a sample of the random variable. Since the variable is random, you expect to get different values as you obtain multiple samples. (Some values might be more likely than others, as in an experiment of rolloing two six-sided die and recording the sum of the resulting two numbers, where obtaining a value of 7 is much more likely than obtaining value of 12.) A probability distribution is a function that describes how likely you will obtain the different possible values of the random variable.

It turns out that probability distributions have quite different forms depending on whether the random variable takes on discrete values (such as numbers from the set $\{1,2,3,4,5,6\}$) or takes on any value from a continuum (such as any real number in the interval $[0,1]$). Despite their different forms, one can do the same manipulations and calculations with either discrete or continuous random variables. The main difference is usually just whether one uses a sum or an integral.

Discrete probability distribution

A discrete random variable is a random variable that can take on any value from a discrete set of values. The set of possible values could be finite, such as in the case of rolling a six-sided die, where the values lie in the set $\{1,2,3,4,5,6\}$. However, the set of possible values could also be countably infinite, such as the set of integers $\{0, 1, -1, 2, -2, 3, -3, \ldots \}$. The requirement for a discrete random variable is that we can enumerate all the values in the set of its possible values, as we will need to sum over all these possibilities.

For a discrete random variable $X$, we form its probability distribution function by assigning a probability that $X$ is equal to each of its possible values. For example, for a six-sided die, we would assign a probability of $1/6$ to each of the six options. In the context of discrete random variables, we can refer to the probability distribution function as a probability mass function. The probability mass function $P(x)$ for a random variable $X$ is defined so that for any number $x$, the value of $P(x)$ is the probability that the random variable $X$ equals the given number $x$, i.e., \begin{align*} P(x) = \Pr(X = x). \end{align*} Often, we denote the random variable of the probability mass function with a subscript, so may write \begin{align*} P_X(x) = \Pr(X = x). \end{align*}

For a function $P(x)$ to be valid probability mass function, $P(x)$ must be non-negative for each possible value $x$. Moreover, the random variable must take on some value in the set of possible values with probability one, so we require that $P(x)$ must sum to one. In equations, the requirenments are \begin{gather*} P(x) \ge 0 \quad \text{for all $x$}\\ \sum_x P(x) = 1, \end{gather*} where the sum is implicitly over all possible values of $X$.

For the example of rolling a six-sided die, the probability mass function is \begin{gather*} P(x) = \begin{cases} \frac{1}{6} & \text{if $x \in \{1,2,3,4,5,6\}$}\\ 0 & \text{otherwise.} \end{cases} \end{gather*}

If we rolled two six-sided dice, and let $X$ be the sum, then $X$ could take on any value in the set $\{2,3,4,5,6,7,8,9,10,11,12\}$. The probability mass function for this $X$ is \begin{gather*} P(x) = \begin{cases} \frac{1}{36} & \text{if $x \in \{2,12\}$}\\ \frac{2}{36}=\frac{1}{18} & \text{if $x \in \{3,11\}$}\\ \frac{3}{36}=\frac{1}{12} & \text{if $x \in \{4,10\}$}\\ \frac{4}{36}=\frac{1}{9} & \text{if $x \in \{5,9\}$}\\ \frac{5}{36} & \text{if $x \in \{6,8\}$}\\ \frac{6}{36} =\frac{1}{6} & \text{if $x = 7$}\\ 0 & \text{otherwise.} \end{cases} \end{gather*} $P(x)$ is plotted as a bar graph in the following figure.

Continuous probability distribution

A continuous random variable is a random variable that can take on any value from a continuum, such as the set of all real numbers or an interval. We cannot form a sum over such a set of numbers. (There are too many, since such a continuum is uncountable.) Instead, we replace the sum used for discrete random variables with an integral over the set of possible values.

For a continuous random variable $X$, we cannot form its probability distribution function by assigning a probability that $X$ is exactly equal to each value. The probability distribution function we must use in the case is called a probability density function, which essentially assigns the probability that $X$ is near each value. For intuition behind why we must use such a density rather than assigning individual probabilities, see the page that describes the idea behind the probability density function.

Given the probability density function $\rho(x)$ for $X$, we determine the probability that $X$ is in any set $A$ (i.e., that $X \in A$ (confused?)) by integrating $\rho(x)$ over the set $A$, i.e., \begin{gather*} \Pr(X \in A) = \int_A \rho(x)dx. \end{gather*} Often, we denote the random variable of the probability density function with a subscript, so may write \begin{gather*} \Pr(X \in A) = \int_A \rho_X(x)dx. \end{gather*}

The definition of this probability using an integral gives one important consequence for continuous random variables. If the set $A$ contains just a single element, we can immediately see that the probability that $X$ is equal to that one value is exactly zero, as the integral over a single point is zero. For a continuous random variable $X$, the probability that $X$ is any single value is always zero.

In other respects, the probability density function of a continuous random variables behaves just like the probability mass function for a discrete random variable, where we just need to use integrals rather than sums. For a function $\rho(x)$ to be valid probability density function, $\rho(x)$ must be non-negative for each possible value $x$. Just as for discrete random variable, a continuous random variable must take on some value in the set of possible values with probability one. In this case, we require that $\rho(x)$ must integral to one. In equations, the requirenments are \begin{gather*} \rho(x) \ge 0 \quad \text{for all $x$}\\ \int \rho(x)dx = 1, \end{gather*} where the integral is implicitly over all possible values of $X$.

For examples of continuous random variables and their associated probability density functions, see the page on the idea behind the probability density function.

You are conditioning on the sum not being prime; in the conditional universe, all samples compatible with the conditioning event have the same relative probabilities as in the unconditional model. You calculated these probabilities for the unconditional model, now all you need is to renormalize them so that they add up to 1.

PS: From the probabilities in your question (I did not check them), the new common denominator is $3+5+5+4+3+1=21$, so in the conditional model for example $P(X=4)=3/21$ (same numerator, new denominator), etc.

Video Available

3.1.3 Probability Mass Function (PMF)

If $X$ is a discrete random variable then its range $R_X$ is a countable set, so, we can list the elements in $R_X$. In other words, we can write $$R_X=\{x_1,x_2,x_3,...\}.$$ Note that here $x_1, x_2,x_3,...$ are possible values of the random variable $X$. While random variables are usually denoted by capital letters, to represent the numbers in the range we usually use lowercase letters such as $x$, $x_1$, $y$, $z$, etc. For a discrete random variable $X$, we are interested in knowing the probabilities of $X=x_k$. Note that here, the event $A=\{X=x_k\}$ is defined as the set of outcomes $s$ in the sample space $S$ for which the corresponding value of $X$ is equal to $x_k$. In particular, $$A=\{s \in S | X(s)=x_k\}.$$ The probabilities of events $\{X=x_k\}$ are formally shown by the probability mass function (pmf) of $X$.

Definition
Let $X$ be a discrete random variable with range $R_X=\{x_1,x_2,x_3, ...\}$ (finite or countably infinite). The function $$P_X(x_k)=P(X=x_k), \textrm{ for } k=1,2,3,...,$$ is called the probability mass function (PMF) of $X$.

Thus, the PMF is a probability measure that gives us probabilities of the possible values for a random variable. While the above notation is the standard notation for the PMF of $X$, it might look confusing at first. The subscript $X$ here indicates that this is the PMF of the random variable $X$. Thus, for example, $P_X(1)$ shows the probability that $X=1$. To better understand all of the above concepts, let's look at some examples.

Example

I toss a fair coin twice, and let $X$ be defined as the number of heads I observe. Find the range of $X$, $R_X$, as well as its probability mass function $P_X$.

Solution
- Here, our sample space is given by $$S=\{HH,HT,TH,TT\}.$$ The number of heads will be $0$, $1$ or $2$. Thus $$R_X=\{0,1,2\}.$$ Since this is a finite (and thus a countable) set, the random variable $X$ is a discrete random variable. Next, we need to find PMF of $X$. The PMF is defined as $$P_X(k)=P(X=k) \textrm{ for } k=0,1,2.$$ We have $$P_X(0)=P(X=0)=P(TT)=\frac{1}{4},$$ $$P_X(1) =P(X=1)=P(\{HT,TH\})=\frac{1}{4}+\frac{1}{4}=\frac{1}{2},$$ $$P_X(2)=P(X=2)=P(HH)=\frac{1}{4}.$$

Although the PMF is usually defined for values in the range, it is sometimes convenient to extend the PMF of $X$ to all real numbers. If $x \notin R_X$, we can simply write $P_X(x)=P(X=x)=0$. Thus, in general we can write \begin{equation} \nonumber P_X(x) = \left\{ \begin{array}{l l} P(X=x) & \quad \text{if $x$ is in } R_X\\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation}

To better visualize the PMF, we can plot it. Figure 3.1 shows the PMF of the above random variable $X$. As we see, the random variable can take three possible values $0,1$ and $2$. The figure also clearly indicates that the event $X=1$ is twice as likely as the other two possible values. The Figure can be interpreted in the following way: If we repeat the random experiment (tossing a coin twice) a large number of times, then about half of the times we observe $X=1$, about a quarter of times we observe $X=0$, and about a quarter of times we observe $X=2$.

Fig.3.1 - PMF for random Variable $X$ in Example 3.3.

For discrete random variables, the PMF is also called the probability distribution. Thus, when asked to find the probability distribution of a discrete random variable $X$, we can do this by finding its PMF. The phrase distribution function is usually reserved exclusively for the cumulative distribution function CDF (as defined later in the book). The word distribution, on the other hand, in this book is used in a broader sense and could refer to PMF, probability density function (PDF), or CDF.

Example

I have an unfair coin for which $P(H)=p$, where $0 < p < 1$. I toss the coin repeatedly until I observe a heads for the first time. Let $Y$ be the total number of coin tosses. Find the distribution of $Y$.

Solution
- First, we note that the random variable $Y$ can potentially take any positive integer, so we have $R_Y=\mathbb{N}=\{1,2,3,...\}$. To find the distribution of $Y$, we need to find $P_Y(k)=P(Y=k)$ for $k=1,2,3,...$. We have $$P_Y(1) =P(Y=1)=P(H)=p,$$ $$P_Y(2) =P(Y=2)=P(TH)=(1-p)p,$$ $$P_Y(3) =P(Y=3)=P(TTH)=(1-p)^2 p,$$ $$. \hspace{50pt} . \hspace{50pt} . \hspace{50pt} .$$ $$. \hspace{50pt} . \hspace{50pt} . \hspace{50pt} .$$ $$. \hspace{50pt} . \hspace{50pt} . \hspace{50pt} .$$ $$P_Y(k) =P(Y=k)=P(TT...TH)=(1-p)^{k-1} p.$$ Thus, we can write the PMF of $Y$ in the following way \begin{equation} \nonumber P_Y(y) = \left\{ \begin{array}{l l} (1-p)^{y-1} p& \quad \text{for } y=1,2,3,...\\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation}

Consider a discrete random variable $X$ with Range$(X)=R_X$. Note that by definition the PMF is a probability measure, so it satisfies all properties of a probability measure. In particular, we have

$0\leq P_X(x) \leq 1$ for all $x$, and
$\sum_{x \in R_X} P_X(x)=1$.

Also note that for any set $A \subset R_X$, we can find the probability that $X \in A$ using the PMF $$P(X \in A)=\sum_{x \in A} P_X(x).$$

Properties of PMF:

$0\leq P_X(x) \leq 1$ for all $x$;
$\sum_{x \in R_X} P_X(x)=1$;
for any set $A \subset R_X, P(X \in A)=\sum_{x \in A} P_X(x)$.

Example

For the random variable $Y$ in Example 3.4,

Check that $\sum_{y \in R_Y} P_Y(y)=1$.
If $p=\frac{1}{2}$, find $P(2\leq Y <5)$.<>

Solution
- In Example 3.4, we obtained $$P_Y(k) =P(Y=k)=(1-p)^{k-1} p, \textrm{ for } k=1,2,3,...$$ Thus,
  1. to check that $\sum_{y \in R_Y} P_Y(y)=1$, we have
    
    $\sum_{y \in R_Y} P_Y(y)$ $= \sum_{k=1}^{\infty} (1-p)^{k-1} p$
    
    $= p \sum_{j=0}^{\infty} (1-p)^{j}$
    
    $= p \frac{1}{1-(1-p)}$ $\hspace{30pt}\textrm{ Geometric sum}$
    
    $= 1$;
  2. if $p=\frac{1}{2}$, to find P$(2\leq Y < 5)$, we can write
    
    $\textrm{P}(2\leq Y < 5)$ $= \sum_{k=2}^{4} P_Y(k)$
    
    $= \sum_{k=2}^{4} (1-p)^{k-1} p$
    
    $= \frac{1}{2}\bigg(\frac{1}{2}+\frac{1}{4}+\frac{1}{8}\bigg)$
    
    $=\frac{7}{16}$.