SciPy - Continuous Probability Distributions



Continuous probability distributions refer to statistical models where the random variable can take any value within a specified range or interval. These distributions are fundamental in many scientific fields such as physics, engineering and economics, as they can model real-world scenarios like measurements or time intervals.

The scipy.stats library in Python provides an extensive collection of tools for working with these distributions by allowing us to calculate important statistical measures such as probability density functions (PDF), cumulative distribution functions (CDF) and more.

Key Continuous Distributions in SciPy

In SciPy continuous distributions represent random variables that can take any value within a range. SciPy provides a wide variety of continuous probability distributions and methods for working with them.

Normal Distribution

The Normal Distribution which often referred to as the Gaussian distribution, is one of the most commonly used continuous distributions in statistics. It has a symmetric bell-shaped curve, with the center of the distribution defined by its mean and the spread determined by its standard deviation. This distribution is widely applied in various fields like quality control, finance and natural sciences.

In SciPy the normal distribution is represented by the scipy.stats.norm object. Heres an example of calculating and visualizing the probability density and cumulative distribution of a normal distribution −

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

# Define mean and standard deviation
mean = 0
std_dev = 1

# Generate an array of values for x between -5 and 5
x_values = np.linspace(-5, 5, 100)

# Calculate the probability density function (PDF) and cumulative distribution function (CDF)
pdf_values = norm.pdf(x_values, mean, std_dev)
cdf_values = norm.cdf(x_values, mean, std_dev)

# Plot the results
plt.figure(figsize=(12, 6))

# PDF plot
plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Normal Distribution - PDF')
plt.legend()

# CDF plot
plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Normal Distribution - CDF')
plt.legend()

plt.tight_layout()
plt.show()

Here is the output of the normal distribution calculated using scipy.stats.norm.pdf() and scipy.stats.norm.cdf() function −

Normal Distribution

Exponential Distribution

The Exponential Distribution is often used to model the time between events in a Poisson process, where the events occur independently and at a constant average rate. The distribution has a single parameter, (lambda) which represents the rate at which events happen. This distribution is useful for processes that involve waiting times.

In SciPy the exponential distribution can be handled with the scipy.stats.expon object. Heres an example of calculating and plotting the PDF and CDF for the exponential distribution −

from scipy.stats import expon
import numpy as np
import matplotlib.pyplot as plt

# Set the rate (lambda)
rate = 1

# Create an array of x values from 0 to 10
x_values = np.linspace(0, 10, 100)

# Compute the PDF and CDF
pdf_values = expon.pdf(x_values, scale=1/rate)
cdf_values = expon.cdf(x_values, scale=1/rate)

# Plot the distributions
plt.figure(figsize=(12, 6))

# PDF plot
plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Exponential Distribution - PDF')
plt.legend()

# CDF plot
plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Exponential Distribution - CDF')
plt.legend()

plt.tight_layout()
plt.show()

Following is the output of the Exponential distribution calculated using scipy.stats.expon.pdf() and scipy.stats.expon.cdf() function −

Exponential Distribution

Gamma Distribution

The Gamma Distribution is a generalization of the exponential distribution that includes an additional parameter, the shape parameter which allows for a wider variety of distribution shapes. This distribution is frequently used in queuing theory and reliability analysis.

In SciPy the gamma distribution is represented by the scipy.stats.gamma object. Below is an example of calculating the PDF and CDF for the gamma distribution −

from scipy.stats import gamma
import numpy as np
import matplotlib.pyplot as plt

# Parameters for the gamma distribution
shape_param = 2
scale_param = 1

# Generate an array of x values
x_values = np.linspace(0, 10, 100)

# Compute the PDF and CDF
pdf_values = gamma.pdf(x_values, shape_param, scale=scale_param)
cdf_values = gamma.cdf(x_values, shape_param, scale=scale_param)

# Plot the results
plt.figure(figsize=(12, 6))

# PDF plot
plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Gamma Distribution - PDF')
plt.legend()

# CDF plot
plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Gamma Distribution - CDF')
plt.legend()

plt.tight_layout()
plt.show()

Below is the output of the Gamma distribution calculated using scipy.stats.gamma.pdf() and scipy.stats.gamma.cdf() function −

Gamma Distribution

Beta Distribution

The Beta Distribution is a versatile distribution used to model random variables that are constrained to a fixed interval, typically between 0 and 1. It is often applied in scenarios where probabilities and proportions are involved such as in Bayesian statistics.

The beta distribution is represented in SciPy by scipy.stats.beta. Here's an example of plotting the PDF and CDF of a beta distribution −

from scipy.stats import beta
import numpy as np
import matplotlib.pyplot as plt

# Set the shape parameters for the beta distribution
alpha = 2
beta_param = 5

# Generate values for x in the range [0, 1]
x_values = np.linspace(0, 1, 100)

# Calculate the PDF and CDF
pdf_values = beta.pdf(x_values, alpha, beta_param)
cdf_values = beta.cdf(x_values, alpha, beta_param)

# Plot the results
plt.figure(figsize=(12, 6))

# PDF plot
plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Beta Distribution - PDF')
plt.legend()

# CDF plot
plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Beta Distribution - CDF')
plt.legend()

plt.tight_layout()
plt.show()

Below is the output of the Beta distribution calculated using scipy.stats.beta.pdf() and scipy.stats.beta.cdf() function −

Beta Distribution

Working with Continuous Distributions in SciPy

SciPy provides numerous methods for manipulating and working with continuous distributions which are mentioned as below −

  • PDF (Probability Density Function): distribution.pdf(x, params) computes the likelihood of a given value x.
  • CDF (Cumulative Distribution Function): distribution.cdf(x, params) calculates the cumulative probability up to the point x.
  • PPF (Percent-Point Function): distribution.ppf(p, params) returns the value corresponding to a specified cumulative probability p.
  • Random Sampling: distribution.rvs(params, size=N) generates N random values from the distribution.
  • Mean and Variance: distribution.mean() and distribution.var() calculate the mean and variance of the distribution.

For instance we can calculate the mean and variance of a normal distribution as follows −

from scipy.stats import norm

# Calculate the mean and variance of the normal distribution
mean = norm.mean(loc=0, scale=1)
variance = norm.var(loc=0, scale=1)

print("Mean of Normal Distribution:", mean)
print("Variance of Normal Distribution:", variance)

Here is the output of Mean and Variance of a normal distribution −

Mean of Normal Distribution: 0.0
Variance of Normal Distribution: 1.0

SciPys scipy.stats module offers a powerful suite of tools for working with continuous probability distributions. Whether we're analyzing simple distributions like the normal and exponential distributions or more complex models like the beta and gamma distributions, SciPy provides the necessary functions to calculate key statistical measures and perform in-depth analysis of continuous data.