Recall the definitions for the expected value of a discrete random variable $X$ and a continuous random variable $Y$, respectively:
Recall from Calculus the linearity properties of summations and integrals:
Unsurprisingly and easily proved (see textbook or try it yourself) are the following theorems:
A little more interesting is Theorem 4.8:
Note: not a linear combination! The caveat is that $X$ and $Y$ must be independent.Recall the definition of the variance of a random variable $X$:
$Var(X) = E[(X - E(X))^2]$
Theorem: $Var(aX + b) = a^2Var(X)$
Proof: Let $\mu = E[aX+b]$, then $Var(aX + b) = E[((aX+b)-\mu)^2]$
$=E[(aX+b)^2 - 2(aX+b)\mu + \mu^2]$
$=E[(aX+b)^2] - 2\mu^2 + \mu^2$
$=a^2E[X^2]+2abE[X]+b^2 - (aE[X]- b)^2$
$=a^2E[X^2]+2abE[X]+b^2 - (a^2(E[X])^2 -2abE[X] +b^2)$
$=a^2Var(X)$
Similarly, we can prove the more general theorem:
which your textbook details, but I recommend as an exercise to perform this proof on your own.
Recall that $cov(X,Y) = Var(XY) = E(XY) - E(X)E(Y)$
Now theorem 4.8 states that if $X$ and $Y$ are independent, then $E(XY) = E(X)E(Y)$
As we had suspected, if $X$ and $Y$ are independent, then their covariance is 0. However the converse is not the case!
And we get the more general corollary to theorem 4.9:
If $X_1, X_2, \dots, X_n$ are independent r.v.'s, then $Var(a_1X_1 + a_2X_2 + \dots +a_nX_n) =$ $a_1^2Var(X_1) + a_2^2Var(X_2) + \dots + a_n^2Var(X_n)$
Bad news: no general rule for getting the exact expectation and variance of a random variable $Y = g(X)$ where $g$ is a nonlinear function.
Good news: we can use the Taylor series approximation of the nonlinear function $g$ centered at $\mu = E(X)$ to get the approximate expectation and variance of $g(X)$.
Consider the set of Bernoulli trials where 3 items are selected at random from a manufacturing process, inspected, and classified as defective or nondefective. A defective item is designated a success.
The number of successes is a random variable $X$ assuming integer values from 0 through 3. The eight possible outcomes and the corresponding values of $X$ are:
Outcome | $NNN$ | $NDN$ | $NND$ | $DNN$ | $NDD$ | $DND$ | $DDN$ | $DDD$ |
$x$ | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 3 |
Does example 1 describe a Bernoulli process? Why or why not?
The pool of prospective jurors for a certain case consists of 50 individuals, of whom 35 are employed. Suppose that 6 of these individuals are randomly selected one by one to sit in the jury box for initial questioning by lawyers for the defense and the prosecution. Label the $ith$ person selected (the $ith$ trial) as a success $S$ if he or she is employed and a failure $F$ otherwise.
Does example 2 describe a Bernoulli process? Why or why not?
What if we had access to 500,000 individuals, of whom 400,000 are employed, and had to sample only 10 of them?
So the Bernoulli process can be seen as either sampling with replacement from a small dichotomous population (e.g. head vs. tails from coin tosses),
Or as sampling without replacement from a dichotomous population of size $N$, such that the number of trials $n$ is at most 5% of the population size (e.g. picking 3 items from a large manufacturing assembly line in order to test for defective parts).
If $X$ is a binomial random variable, it will depend on two important parameters: $n$ the number of trials and $p$ the probability of success.
We denote the p.m.f of a binomial r.v. by $b(x; n, p)$.
For example, consider tossing a biased coin with probability $p$ of showing $H$ead $= S$ four times:
Question: How many outcomes have 3 successes?
Each of six randomly selected cola drinkers is given a glass containing cola S and one containing cola F. The glasses are identical in appearance except for a code on the bottom to identify the cola.
Suppose there is actually no tendency among cola drinkers to prefer one cola to the other, so that $p = P$(a selected individual prefers S) = 0.5. Let $X$ be the number among the six who prefer S, that is $X \sim Bin(6,0.5)$.
Compute $P(X = 3)$.
Compute $P(X \leq 5)$.
However, we will spend more time with the binomial today!
You can read more about the multinomial distribution here.
Let $X \sim Bin(n,p)$, then $E(X) = np$, $Var(X) = np(1-p)$
How do the parameters $n$ and $p$ affect the shape of the binomial distribution?
We have several ways!
We will do these exercises in class: 5.11, 5.22