Theoretica
A C++ numerical and automatic mathematical library
|
Statistical functions. More...
Functions | |
real | likelihood (const vec< real > &X, const vec< real > &theta, stat_function f) |
Compute the likelihood of a distribution <f> with the given parameters <theta> and measures <X> More... | |
real | log_likelihood (const vec< real > &X, const vec< real > &theta, stat_function f) |
Compute the log likelihood of a distribution <f> with the given parameters <theta> and measures <X> More... | |
template<typename Matrix = mat<real>, typename Dataset = vec<real>, enable_vector< Dataset > = true> | |
Matrix | covar_mat (const std::vector< Dataset > &v) |
Build the covariance matrix given a vector of datasets by computing the covariance between all couples of sets. More... | |
template<unsigned int N = 0, typename MultiDualFunction = autodiff::dreal_t<N>(*)(autodiff::dvec_t<N>)> | |
real | error_propagation (MultiDualFunction f, const vec< real, N > &x_best, const vec< real, N > &delta_x) |
Automatically propagate uncertainties under quadrature on an arbitrary function given the uncertainties on the variables, the mean values of the variables and the function itself, by using automatic differentiation. More... | |
template<unsigned int N = 0, typename Matrix , enable_matrix< Matrix > = true, typename MultiDualFunction = autodiff::dreal_t<N>(*)(autodiff::dvec_t<N>)> | |
real | error_propagation (MultiDualFunction f, const vec< real, N > &x_best, const Matrix &cm) |
Automatically propagate uncertainties under quadrature on an arbitrary function given the uncertainties on the variables, the mean values of the variables and the function itself, by using automatic differentiation. More... | |
template<unsigned int N = 0, typename MultiDualFunction = multidual<N>(*)(autodiff::dvec_t<N>), typename Dataset = vec<real, N>> | |
real | error_propagation (MultiDualFunction f, const std::vector< Dataset > &v) |
Automatically propagate uncertainties under quadrature on an arbitrary function given the function and the set of measured data. More... | |
template<typename Function > | |
real | error_propagation_mc (Function f, std::vector< pdf_sampler > &rv, unsigned int N=1E+6) |
Propagate the statistical error on a given function using the Monte Carlo method, by generating a sample following the probability distribution of the function and computing its standard deviation. More... | |
real | mean (const histogram &h) |
Compute the mean of the values of a histogram. | |
real | tss (const histogram &h) |
Compute the total sum of squares of the values of the histogram. | |
real | variance (const histogram &h) |
Compute the variance of the values of a histogram. | |
real | stdev (const histogram &h) |
Compute the standard deviation of the values of a histogram. | |
template<typename Dataset > | |
real | mean (const Dataset &X) |
Compute the mean of a dataset. More... | |
template<typename Dataset > | |
real | range (const Dataset &X) |
Computes the range of a data set, defined as \(x_{max} - {x_min}\). More... | |
template<typename Dataset > | |
real | semidispersion (const Dataset &X) |
Computes the maximum semidispersion of a data set defined as \((x_{max} - {x_min}) / 2\). More... | |
template<typename Dataset > | |
real | propagate_sum (const Dataset &sigma) |
Propagate the error over a sum of random variables under quadrature, as \(\sqrt{\sum_{i = 1}^n \sigma_i^2}\), where each \(\sigma_i\) corresponds to the standard deviation of a variable. More... | |
template<typename Dataset1 , typename Dataset2 > | |
real | propagate_product (const Dataset1 &sigma, const Dataset2 &mean) |
Propagate the error over a product of random variables under quadrature, as \(\sqrt{\sum_{i = 1}} (\sigma_i / \mu_i)^2}\), where each \(\sigma_i\) corresponds to the standard deviation of a variable. More... | |
template<typename Dataset > | |
real | total_sum_squares (const Dataset &X) |
Compute the total sum of squares (TSS) of a given dataset as \(sum(square(x_i - x_{mean}))\) using Welford's one-pass method. More... | |
template<typename Dataset > | |
real | variance (const Dataset &X, unsigned int constraints=1) |
Compute the variance given a dataset and the number of constraints. More... | |
template<typename Dataset > | |
void | moments2 (const Dataset &X, real &out_mean, real &out_variance, unsigned int constraints=1) |
Compute the mean and the variance of a dataset in a single pass, using Welford's method, with the given number of constraints (defaults to 1 for Bessel's correction). More... | |
template<typename Dataset > | |
real | stdev (const Dataset &data, unsigned int constraints=1) |
Compute the standard deviation given a dataset and the number of constraints. More... | |
template<typename Dataset > | |
real | stdom (const Dataset &X) |
Compute the standard deviation of the mean given a dataset. More... | |
template<typename Dataset > | |
real | standard_relative_error (const Dataset &X) |
Compute the relative error on a dataset using estimates of its mean and standard deviation, with the given number of constraints (defaults to 1 for Bessel's correction). More... | |
template<typename Dataset1 , typename Dataset2 > | |
real | covariance (const Dataset1 &X, const Dataset2 &Y, unsigned int constraints=1) |
Compute the covariance between two datasets with the given number of constraints. More... | |
template<typename Dataset1 , typename Dataset2 > | |
real | correlation_coefficient (const Dataset1 &X, const Dataset2 &Y) |
Compute Pearson's correlation coefficient R between two datasets. More... | |
template<typename Dataset > | |
real | autocorrelation (const Dataset &X, unsigned int n=1) |
Compute the lag-n autocorrelation of a dataset as \(\). More... | |
template<typename Dataset > | |
real | absolute_deviation (const Dataset &X) |
Compute the mean absolute deviation of a dataset as \(\frac{\sum_{i = 1}^n |x_i - \hat \mu|}{n}\). More... | |
template<typename Dataset > | |
real | skewness (const Dataset &X) |
Compute the skewness of a dataset as \(\frac{\sum_{i=1}^n (\frac{x_i - \hat \mu}{\hat \sigma})^3}{n}\). More... | |
template<typename Dataset > | |
real | kurtosis (const Dataset &X) |
Compute the normalized kurtosis of a dataset as \(\frac{\sum_{i=1}^n (\frac{x_i - \hat \mu}{\hat \sigma})^4}{n} - 3\). More... | |
template<typename RealFunction > | |
real | gaussian_expectation (RealFunction g, real mean, real sigma) |
Compute the expectation value of a given function with respect to a Gaussian distribution with the given parameters. More... | |
real | z_score (real x, real mean, real sigma) |
Compute the Z-score of an observed value with respect to a Gaussian distribution with the given parameters. More... | |
template<typename Dataset > | |
Dataset | normalize_z_score (const Dataset &X) |
Normalize a data set using Z-score normalization. More... | |
template<typename Dataset1 , typename Dataset2 , typename Dataset3 > | |
real | chi_square (const Dataset1 &O, const Dataset2 &E, const Dataset3 &sigma) |
Compute the chi-square from the set of observed quantities, expected quantities and errors. More... | |
real | pvalue_chi_squared (real chi_sqr, unsigned int ndf) |
Compute the (right-tailed) p-value associated to a computed Chi-square value as the integral of the Chi-squared distribution from the given value to infinity (right-tailed). More... | |
template<typename Dataset1 , typename Dataset2 , typename Dataset3 > | |
real | chi_square_linear (const Dataset1 &X, const Dataset2 &Y, const Dataset3 &sigma, real intercept, real slope) |
Compute the chi-square on a linear regression, as the sum of the squares of the residuals divided by the standard deviation. More... | |
template<typename Dataset1 , typename Dataset2 , typename Dataset3 > | |
real | reduced_chi_square_linear (const Dataset1 &X, const Dataset2 &Y, const Dataset3 &sigma, real intercept, real slope) |
Compute the reduced chi-squared on a linear regression, computed as the usual chi-square (computed by chi_square_linear) divided by the number of degrees of freedom of the model ( \(N - 2\)). More... | |
Statistical functions.
|
inline |
Compute the mean absolute deviation of a dataset as \(\frac{\sum_{i = 1}^n |x_i - \hat \mu|}{n}\).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the lag-n autocorrelation of a dataset as \(\).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
n | The lag (defaults to lag-1) |
|
inline |
Compute the chi-square from the set of observed quantities, expected quantities and errors.
The provided sets should all have the same size.
Dataset1 | Any type representing a dataset as a vector of values |
Dataset2 | Any type representing a dataset as a vector of values |
Dataset3 | Any type representing a dataset as a vector of values |
O | The set of observed values |
E | The set of expected values |
sigma | The set of standard deviations on the observations |
|
inline |
Compute the chi-square on a linear regression, as the sum of the squares of the residuals divided by the standard deviation.
Dataset1 | Any type representing a dataset as a vector of values |
Dataset2 | Any type representing a dataset as a vector of values |
Dataset3 | Any type representing a dataset as a vector of values |
X | A vector of the X values of the sample |
Y | A vector of the Y values of the sample |
sigma | The standard deviations of each point of the sample |
intercept | The intercept of the linear model |
slope | The slope of the linear model |
|
inline |
Compute Pearson's correlation coefficient R between two datasets.
The two datasets must have the same size.
Dataset1 | Any type representing a dataset as a vector of values |
Dataset2 | Any type representing a dataset as a vector of values |
X | The first dataset |
Y | The second dataset |
|
inline |
Build the covariance matrix given a vector of datasets by computing the covariance between all couples of sets.
v | A vector of datasets of measures |
|
inline |
Compute the covariance between two datasets with the given number of constraints.
The two datasets must have the same size.
Dataset1 | Any type representing a dataset as a vector of values |
Dataset2 | Any type representing a dataset as a vector of values |
X | The first dataset |
Y | The second dataset |
constraints | The number of constraints (defaults to 1 for Bessel's correction). |
|
inline |
Automatically propagate uncertainties under quadrature on an arbitrary function given the function and the set of measured data.
The covar_mat function is used to estimate the covariance matrix from the data sets. For this to work, the data sets should have the same size, so as to estimate their covariance.
f | The function to propagate error on |
v | A vector of different datasets of the measures of the variables |
|
inline |
Automatically propagate uncertainties under quadrature on an arbitrary function given the uncertainties on the variables, the mean values of the variables and the function itself, by using automatic differentiation.
f | The function to propagate error on |
x | Best values for the variables |
cm | Covariance matrix of the variables, where diagonal entries are the variance of the variables and off-diagonal entries are the covariance between different variables. May be constructed from datasets using the function covar_mat. |
|
inline |
Automatically propagate uncertainties under quadrature on an arbitrary function given the uncertainties on the variables, the mean values of the variables and the function itself, by using automatic differentiation.
This function assumes that the correlation between different variables is zero, if that is not the case, the covariance matrix should be used.
f | The function to propagate error on |
x | Best values for the variables |
delta_x | Vector of uncertainties on the variables |
real theoretica::stats::error_propagation_mc | ( | Function | f, |
std::vector< pdf_sampler > & | rv, | ||
unsigned int | N = 1E+6 |
||
) |
Propagate the statistical error on a given function using the Monte Carlo method, by generating a sample following the probability distribution of the function and computing its standard deviation.
N sample vectors of size M are generated by sampling the M different pdf_sampler distributions which correspond to the input variables of the function. The resulting sample is used to estimate the standard deviation over the result of the function.
f | The function to propagate error on |
rv | A list of distribution samplers which sample from the probability distributions of the random variables. |
N | The number of sampled values to use, defaults to 1 million. |
|
inline |
Compute the expectation value of a given function with respect to a Gaussian distribution with the given parameters.
This function uses Gauss-Hermite quadrature to compute the integral \(\int_{-\infty}^{+\infty} g(x) e^{-x^2} dx\)
RealFunction | A function or lambda representing a univariate real function |
mean | The mean of the Gaussian distribution |
sigma | The standard deviation of the Gaussian distribution |
g | The function to compute the expectation of |
|
inline |
Compute the normalized kurtosis of a dataset as \(\frac{\sum_{i=1}^n (\frac{x_i - \hat \mu}{\hat \sigma})^4}{n} - 3\).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the likelihood of a distribution <f> with the given parameters <theta> and measures <X>
X | The dataset of the sample |
theta | The parameters of the distribution |
f | The statistical distribution function |
|
inline |
Compute the log likelihood of a distribution <f> with the given parameters <theta> and measures <X>
X | The dataset of the sample |
theta | The parameters of the distribution |
f | The statistical distribution function |
|
inline |
Compute the mean of a dataset.
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the mean and the variance of a dataset in a single pass, using Welford's method, with the given number of constraints (defaults to 1 for Bessel's correction).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
out_mean | A reference to overwrite with the computed mean |
out_variance | A reference to overwrite with the computed variance |
constraints | The number of constraints (defaults to 1) |
|
inline |
Normalize a data set using Z-score normalization.
Dataset | Any type representing a dataset as a vector of values |
The | data set to normalize |
|
inline |
Propagate the error over a product of random variables under quadrature, as \(\sqrt{\sum_{i = 1}} (\sigma_i / \mu_i)^2}\), where each \(\sigma_i\) corresponds to the standard deviation of a variable.
The random variables are assumed to be statistically independent and the result is the relative error over the product.
Dataset1 | Any type representing a dataset as a vector of values |
Dataset2 | Any type representing a dataset as a vector of values |
sigma | The vector of standard deviations |
mean | The vector of the mean values |
|
inline |
Propagate the error over a sum of random variables under quadrature, as \(\sqrt{\sum_{i = 1}^n \sigma_i^2}\), where each \(\sigma_i\) corresponds to the standard deviation of a variable.
The random variables are assumed to be statistically independent.
sigma | The vector of standard deviations |
Compute the (right-tailed) p-value associated to a computed Chi-square value as the integral of the Chi-squared distribution from the given value to infinity (right-tailed).
An equivalent integral is computed using Gauss-Laguerre quadrature: \( p = \frac{e^{-X^2}}{2 \Gamma (k/2)} \int_0^{+\infty} (\sqrt{x + X^2})^{k - 2} e^{-x} dx \)
chi_sqr | The computed Chi-squared |
ndf | Number of Degrees of Freedom |
|
inline |
Computes the range of a data set, defined as \(x_{max} - {x_min}\).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the reduced chi-squared on a linear regression, computed as the usual chi-square (computed by chi_square_linear) divided by the number of degrees of freedom of the model ( \(N - 2\)).
Dataset1 | Any type representing a dataset as a vector of values |
Dataset2 | Any type representing a dataset as a vector of values |
Dataset3 | Any type representing a dataset as a vector of values |
X | A vector of the X values of the sample |
Y | A vector of the Y values of the sample |
sigma | The standard deviations of each point of the sample |
intercept | The intercept of the linear model |
slope | The slope of the linear model |
|
inline |
Computes the maximum semidispersion of a data set defined as \((x_{max} - {x_min}) / 2\).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the skewness of a dataset as \(\frac{\sum_{i=1}^n (\frac{x_i - \hat \mu}{\hat \sigma})^3}{n}\).
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the relative error on a dataset using estimates of its mean and standard deviation, with the given number of constraints (defaults to 1 for Bessel's correction).
The relative error is computed as \(\epsilon_{rel} = \frac{\sigma}{\mu}\) and is not multiplied by 100.
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
constraints | The number of constraints for the estimators (defaults to 1) |
|
inline |
Compute the standard deviation given a dataset and the number of constraints.
Welford's one-pass method is used. The number of constraints defaults to 1, applying Bessel's correction. A value of 0 may be used to compute the population standard deviation.
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
constraints | The number of constraints, defaults to 1 |
|
inline |
Compute the standard deviation of the mean given a dataset.
Welford's one-pass method is used and Bessel's correction is applied.
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
|
inline |
Compute the total sum of squares (TSS) of a given dataset as \(sum(square(x_i - x_{mean}))\) using Welford's one-pass method.
Dataset | Any type representing a dataset as a vector of values |
X | The dataset to compute the TSS of |
|
inline |
Compute the variance given a dataset and the number of constraints.
Welford's one-pass method is used. The number of constraints defaults to 1, applying Bessel's correction. A value of 0 may be used to compute the population variance.
Dataset | Any type representing a dataset as a vector of values |
X | The dataset |
constraints | The number of constraints, defaults to 1 |
Compute the Z-score of an observed value with respect to a Gaussian distribution with the given parameters.
x | The observed value |
mean | The mean of the distribution |
sigma | The standard deviation of the distribution |