RLab 2: Probability Distributions

Published

March 1, 2022

Setup

If you want to use the custom R Markdown templates for this course install the following packages:

install.packages("remotes") # Allows you to install packages from GitHub
remotes::install_github("inqs909/inqstools") # Installs my R package for the course

Probability

We will discuss several R functions used for different probability distributions

Distributions

To see what distributions R supports, use

?Distributions

or

help(Distributions)

for more information. Every distribution has four different functions associated to them. The letter at the beginning of the distribution indicates the functions capabilities.

Letter Functionality
“d” returns the height of the probability density function
“p” returns the cummulative density function value
“q” returns the inverse cummulative density function (percentiles)
“r” returns a randomly generated number

Probabilities

R can compute the probabilities of a distribution given the correct parameters. If you need the cummulative probability, p in front of the distribution function is needed. If you need the probability for a discrete distribution, the d in front of the distribution function is needed. Below are a few examples.

To find \(P(X \leq 3 )\) where \(X \sim N(5,2)\). Here we will use the pnorm() function and we set q = 3, mean = 5 and sd = sqrt(2).

pnorm(3, 5, sqrt(2))
[1] 0.0786496

To find \(P(X \geq 7 )\) where \(X \sim Norm(5, 2)\). Here we will use the pnorm() and we set q = 7, mean = 5, sd = sqrt(2), and lower.tail = F.

pnorm(7, 5, sqrt(2), F)
[1] 0.0786496

Percentiles

Percentiles identify the \(x_0\) that satisfies \(P(X \le x_0)=p\), where p is the percentile.

Finding the \(95^{th}\) percentile from a \(N(10, 5)\), we will use the qnorm(). We will set mean=10, sd=sqrt(5), and p=0.95.

qnorm(.95, 10, sqrt(5))
[1] 13.678

Finding the \(95^{th}\) percentile from a \(Gamma(6,9)\), we will use the qgamma(). We will set shape=6, scale = 9, and p=0.95.

qgamma(.95, shape = 6, scale = 9)
[1] 94.61731

Random Number Generator

R is capable of generating random numbers. For example if we want to generate a random sample of size fifty from a normal distribution with mean eight and variance three, we will use the rnorm(). If we want to generate a random sample from any distribution, use the distribution function with r in front of it.

Let’s first generate the random sample of fifty from \(X \sim Normn(2.8, 1.3)\). This is done with the rnorm() and setting n = 50, mean = 2.8 and sd = sqrt(1.3).

rnorm(50, 2.8, sqrt(1.3))
 [1] 2.3984352 1.4008082 1.2924412 3.4322535 3.6152810 1.9523617 4.1170462
 [8] 3.7906858 3.0900068 5.1273407 3.0034423 3.6245478 3.6448887 2.4031711
[15] 2.0416685 4.2866305 2.8720997 3.9598131 2.6707746 3.9459731 3.5395160
[22] 4.1360112 2.6852912 3.4666843 1.2614398 1.2829051 4.2200997 1.7934212
[29] 4.1501066 2.9224111 4.4917564 4.0302493 3.4382405 4.1481276 0.5492955
[36] 1.8696131 4.0539968 4.7393620 0.6735766 2.3967131 2.8062328 2.2085940
[43] 1.4721390 3.0471010 0.9531207 2.6963201 1.9368524 3.2875979 2.2220100
[50] 3.4363829

No let’s generate a random sample of 100 form an \(X \sim Beta(2,4)\). This is done by using the rbeta() and setting n = 100, shape1 = 2, and shape2 = 4.

rbeta(100, 2, 4)
  [1] 0.45048094 0.29911046 0.39158462 0.23092726 0.15540259 0.16178897
  [7] 0.31284933 0.63717347 0.26597619 0.20518293 0.64686145 0.22807687
 [13] 0.49658783 0.53139689 0.59426850 0.36964069 0.21429172 0.27978432
 [19] 0.58111280 0.15238081 0.34764626 0.10334779 0.05634440 0.22302732
 [25] 0.34684400 0.44708954 0.37575899 0.22659533 0.53716318 0.15734513
 [31] 0.17497741 0.19120091 0.56198257 0.56324203 0.33913040 0.65788353
 [37] 0.22170993 0.49862582 0.57057629 0.05067749 0.40472211 0.68350947
 [43] 0.11824307 0.06236188 0.15753464 0.54591359 0.33642585 0.29199375
 [49] 0.05156889 0.27769844 0.08791769 0.54504105 0.07507829 0.36047290
 [55] 0.37077282 0.24285707 0.19516605 0.59634502 0.17325232 0.67318238
 [61] 0.30216733 0.18851617 0.12069659 0.55605024 0.28275480 0.33518883
 [67] 0.18489209 0.01219168 0.19893678 0.20761076 0.38899473 0.13329773
 [73] 0.40731814 0.21692830 0.80399415 0.15949113 0.42604846 0.38839914
 [79] 0.43795522 0.40032065 0.41024243 0.52763295 0.39468933 0.28112947
 [85] 0.54849943 0.17862052 0.29072740 0.21216697 0.46346533 0.54896485
 [91] 0.68470050 0.17837589 0.17619992 0.26564852 0.16730040 0.46155221
 [97] 0.27720365 0.80356860 0.05962084 0.37511875

Histograms

Histograms are used to plot the frequencies of observed values of a random variable. This provides a rough estimate of how a distribution function will look like. Below, we will use the hist() function to plot the distribution of a random sample of 1000 generated from a \(Exp(1.3)\):

x <- rexp(1000, 1.3)
hist(x)

Looking at the plot above, we can see that the mos common values are around 6. This is to be expected because the \(E(X)=6\) from the binomial distribution. Another way to roughly estimate the expected value is by taking the mean:

mean(x)
[1] 0.7588649

Notice how close it is to the expected value. If we are interested in computing the \(E(X^2)\), we can roughly estimate it by squaring all the values and taking the mean of it:

mean(x^2)
[1] 1.158253

Now, we know that the variance of a binomial distribution is \(np(1-p)\) which leads the \(Var(X)=4.2\) with the above distribution. We can roughly estimate the variance with the rough estimates of \(E(X^2)-E(X)^2\):

mean(x^2) - mean(x)^2
[1] 0.5823773

While it is not exact, it can give you a rough idea.

Problems

Use an RMD file to answer the following questions:

  1. Find the following probabilities:
    1. \(X\sim N(3, 3.5)\); Find \(P(2 \le X \le 6)\)

    2. \(X\sim Beta(6,8)\); Find \(P(0.5 \le X \le .7)\)

    3. \(X\sim Unif(6,25)\); Find \(P(16 \le X \le 23)\)

  2. Find the following percentiles:
    1. \(X\sim Unif(1,5)\); Find the \(68\)th percentile.
    2. \(X\sim N(17, 6)\); Find the \(52\)th percentile.
    3. \(X\sim Exp(1)\); Find the \(43\)th percentile.
  3. Generate the 50 realizations of the following random variables:
    1. \(X\sim Weibull(3,2)\)

    2. \(X\sim Laplace(2,1/2)\)1

    3. \(X\sim LogNorm(0.5, \sqrt 2)\)

  4. Generate 500 realizations of the following random variables and create a histogram:
    1. \(X\sim Exp(1.5)\)

    2. \(X \sim Exp(2)\)

    3. \(X \sim Exp(5)\)

    4. \(X\sim Exp(10)\)

  5. Generate 5000 realizations of \(X\sim Gamma(7, 12)\) and compute the following rough estimates:
    1. \(E(X)\)

    2. \(E(X^2)\)

    3. \(Var(X)\)

Submit your Lab assignment as an RMD file in Canvas on 3/10/2023 by 11:59 PM.

Footnotes

  1. You will need to install the VGAM R package.↩︎