Back to module index
Go to module by name
statistics
Basic statistics module.
This module provides functions for calculating statistics of data, including
averages, variance, and standard deviation.
Calculating averages
--------------------
================== ==================================================
Function Description
================== ==================================================
mean Arithmetic mean (average) of data.
fmean Fast, floating point arithmetic mean.
geometric_mean Geometric mean of data.
harmonic_mean Harmonic mean of data.
median Median (middle value) of data.
median_low Low median of data.
median_high High median of data.
median_grouped Median, or 50th percentile, of grouped data.
mode Mode (most common value) of data.
multimode List of modes (most common values of data).
quantiles Divide data into intervals with equal probability.
================== ==================================================
Calculate the arithmetic mean ("the average") of data:
>>> mean([-1.0, 2.5, 3.25, 5.75])
2.625
Calculate the standard median of discrete data:
>>> median([2, 3, 4, 5])
3.5
Calculate the median, or 50th percentile, of data grouped into class intervals
centred on the data values provided. E.g. if your data points are rounded to
the nearest whole number:
>>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS
2.8333333333...
This should be interpreted in this way: you have two data points in the class
interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in
the class interval 3.5-4.5. The median of these data points is 2.8333...
Calculating variability or spread
---------------------------------
================== =============================================
Function Description
================== =============================================
pvariance Population variance of data.
variance Sample variance of data.
pstdev Population standard deviation of data.
stdev Sample standard deviation of data.
================== =============================================
Calculate the standard deviation of sample data:
>>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS
4.38961843444...
If you have previously calculated the mean, you can pass it as the optional
second argument to the four "spread" functions to avoid recalculating it:
>>> data = [1, 2, 2, 4, 4, 4, 5, 6]
>>> mu = mean(data)
>>> pvariance(data, mu)
2.5
Statistics for relations between two inputs
-------------------------------------------
================== ====================================================
Function Description
================== ====================================================
covariance Sample covariance for two variables.
correlation Pearson's correlation coefficient for two variables.
linear_regression Intercept and slope for simple linear regression.
================== ====================================================
Calculate covariance, Pearson's correlation, and simple linear regression
for two inputs:
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> covariance(x, y)
0.75
>>> correlation(x, y) #doctest: +ELLIPSIS
0.31622776601...
>>> linear_regression(x, y) #doctest:
LinearRegression(slope=0.1, intercept=1.5)
Exceptions
----------
A single exception is defined: StatisticsError is a subclass of ValueError.
Classes
Counter
Dict subclass for counting hashable items. Sometimes called a bag
or multiset. Elements are stored as dictionary keys and their counts
are stored as dictionary values.
>>> c = Counter('abcdeabcdabcaba') # count elements from a string
>>> c.most_common(3) # three most common elements
[('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c) # list all unique elements
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements())) # list elements with repetitions
'aaaaabbbbcccdde'
>>> sum(c.values()) # total of all counts
15
>>> c['a'] # count of letter 'a'
5
>>> for elem in 'shazam': # update counts from an iterable
... c[elem] += 1 # by adding 1 to each element's count
>>> c['a'] # now there are seven 'a'
7
>>> del c['b'] # remove all 'b'
>>> c['b'] # now there are zero 'b'
0
>>> d = Counter('simsalabim') # make another counter
>>> c.update(d) # add in the second counter
>>> c['a'] # now there are nine 'a'
9
>>> c.clear() # empty the counter
>>> c
Counter()
Note: If a count is set to zero or reduced to zero, it will remain
in the counter until the entry is deleted or the counter is cleared:
>>> c = Counter('aaabbc')
>>> c['b'] -= 2 # reduce the count of 'b' by two
>>> c.most_common() # 'b' is still in, but its count is zero
[('a', 3), ('c', 1), ('b', 0)]
clear(...)
D.clear() -> None. Remove all items from D.
copy(self)
Return a shallow copy.
elements(self)
Iterator over elements repeating each as many times as its count.
>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']
# Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements(): # loop over factors
... product *= factor # and multiply them
>>> product
1836
Note, if an element's count has been set to zero or is a negative
number, elements() will ignore it.
fromkeys(iterable, v=None)
get(self, key, default=None, /)
Return the value for key if key is in the dictionary, else default.
items(...)
D.items() -> a set-like object providing a view on D's items
keys(...)
D.keys() -> a set-like object providing a view on D's keys
most_common(self, n=None)
List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
pop(...)
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If the key is not found, return the default if given; otherwise,
raise a KeyError.
popitem(self, /)
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order.
Raises KeyError if the dict is empty.
setdefault(self, key, default=None, /)
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
subtract(self, iterable=None, /, **kwds)
Like dict.update() but subtracts counts instead of replacing them.
Counts can be reduced below zero. Both the inputs and outputs are
allowed to contain zero and negative counts.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.subtract('witch') # subtract elements from another iterable
>>> c.subtract(Counter('watch')) # subtract elements from another counter
>>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch
-1
total(self)
Sum of the counts
update(self, iterable=None, /, **kwds)
Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.update('witch') # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d) # add elements from another counter
>>> c['h'] # four 'h' in which, witch, and watch
4
values(...)
D.values() -> an object providing a view on D's values
Decimal
Construct a new Decimal object. 'value' can be an integer, string, tuple,
or another Decimal object. If no value is given, return Decimal('0'). The
context does not affect the conversion and is only passed to determine if
the InvalidOperation trap is active.
adjusted(self, /)
Return the adjusted exponent of the number. Defined as exp + digits - 1.
as_integer_ratio(self, /)
Decimal.as_integer_ratio() -> (int, int)
Return a pair of integers, whose ratio is exactly equal to the original
Decimal and with a positive denominator. The ratio is in lowest terms.
Raise OverflowError on infinities and a ValueError on NaNs.
as_tuple(self, /)
Return a tuple representation of the number.
canonical(self, /)
Return the canonical encoding of the argument. Currently, the encoding
of a Decimal instance is always canonical, so this operation returns its
argument unchanged.
compare(self, /, other, context=None)
Compare self to other. Return a decimal value:
a or b is a NaN ==> Decimal('NaN')
a < b ==> Decimal('-1')
a == b ==> Decimal('0')
a > b ==> Decimal('1')
compare_signal(self, /, other, context=None)
Identical to compare, except that all NaNs signal.
compare_total(self, /, other, context=None)
Compare two operands using their abstract representation rather than
their numerical value. Similar to the compare() method, but the result
gives a total ordering on Decimal instances. Two Decimal instances with
the same numeric value but different representations compare unequal
in this ordering:
>>> Decimal('12.0').compare_total(Decimal('12'))
Decimal('-1')
Quiet and signaling NaNs are also included in the total ordering. The result
of this function is Decimal('0') if both operands have the same representation,
Decimal('-1') if the first operand is lower in the total order than the second,
and Decimal('1') if the first operand is higher in the total order than the
second operand. See the specification for details of the total order.
This operation is unaffected by context and is quiet: no flags are changed
and no rounding is performed. As an exception, the C version may raise
InvalidOperation if the second operand cannot be converted exactly.
compare_total_mag(self, /, other, context=None)
Compare two operands using their abstract representation rather than their
value as in compare_total(), but ignoring the sign of each operand.
x.compare_total_mag(y) is equivalent to x.copy_abs().compare_total(y.copy_abs()).
This operation is unaffected by context and is quiet: no flags are changed
and no rounding is performed. As an exception, the C version may raise
InvalidOperation if the second operand cannot be converted exactly.
conjugate(self, /)
Return self.
copy_abs(self, /)
Return the absolute value of the argument. This operation is unaffected by
context and is quiet: no flags are changed and no rounding is performed.
copy_negate(self, /)
Return the negation of the argument. This operation is unaffected by context
and is quiet: no flags are changed and no rounding is performed.
copy_sign(self, /, other, context=None)
Return a copy of the first operand with the sign set to be the same as the
sign of the second operand. For example:
>>> Decimal('2.3').copy_sign(Decimal('-1.5'))
Decimal('-2.3')
This operation is unaffected by context and is quiet: no flags are changed
and no rounding is performed. As an exception, the C version may raise
InvalidOperation if the second operand cannot be converted exactly.
exp(self, /, context=None)
Return the value of the (natural) exponential function e**x at the given
number. The function always uses the ROUND_HALF_EVEN mode and the result
is correctly rounded.
fma(self, /, other, third, context=None)
Fused multiply-add. Return self*other+third with no rounding of the
intermediate product self*other.
>>> Decimal(2).fma(3, 5)
Decimal('11')
from_float(f, /)
Class method that converts a float to a decimal number, exactly.
Since 0.1 is not exactly representable in binary floating point,
Decimal.from_float(0.1) is not the same as Decimal('0.1').
>>> Decimal.from_float(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> Decimal.from_float(float('nan'))
Decimal('NaN')
>>> Decimal.from_float(float('inf'))
Decimal('Infinity')
>>> Decimal.from_float(float('-inf'))
Decimal('-Infinity')
is_canonical(self, /)
Return True if the argument is canonical and False otherwise. Currently,
a Decimal instance is always canonical, so this operation always returns
True.
is_finite(self, /)
Return True if the argument is a finite number, and False if the argument
is infinite or a NaN.
is_infinite(self, /)
Return True if the argument is either positive or negative infinity and
False otherwise.
is_nan(self, /)
Return True if the argument is a (quiet or signaling) NaN and False
otherwise.
is_normal(self, /, context=None)
Return True if the argument is a normal finite non-zero number with an
adjusted exponent greater than or equal to Emin. Return False if the
argument is zero, subnormal, infinite or a NaN.
is_qnan(self, /)
Return True if the argument is a quiet NaN, and False otherwise.
is_signed(self, /)
Return True if the argument has a negative sign and False otherwise.
Note that both zeros and NaNs can carry signs.
is_snan(self, /)
Return True if the argument is a signaling NaN and False otherwise.
is_subnormal(self, /, context=None)
Return True if the argument is subnormal, and False otherwise. A number is
subnormal if it is non-zero, finite, and has an adjusted exponent less
than Emin.
is_zero(self, /)
Return True if the argument is a (positive or negative) zero and False
otherwise.
ln(self, /, context=None)
Return the natural (base e) logarithm of the operand. The function always
uses the ROUND_HALF_EVEN mode and the result is correctly rounded.
log10(self, /, context=None)
Return the base ten logarithm of the operand. The function always uses the
ROUND_HALF_EVEN mode and the result is correctly rounded.
logb(self, /, context=None)
For a non-zero number, return the adjusted exponent of the operand as a
Decimal instance. If the operand is a zero, then Decimal('-Infinity') is
returned and the DivisionByZero condition is raised. If the operand is
an infinity then Decimal('Infinity') is returned.
logical_and(self, /, other, context=None)
Return the digit-wise 'and' of the two (logical) operands.
logical_invert(self, /, context=None)
Return the digit-wise inversion of the (logical) operand.
logical_or(self, /, other, context=None)
Return the digit-wise 'or' of the two (logical) operands.
logical_xor(self, /, other, context=None)
Return the digit-wise 'exclusive or' of the two (logical) operands.
max(self, /, other, context=None)
Maximum of self and other. If one operand is a quiet NaN and the other is
numeric, the numeric operand is returned.
max_mag(self, /, other, context=None)
Similar to the max() method, but the comparison is done using the absolute
values of the operands.
min(self, /, other, context=None)
Minimum of self and other. If one operand is a quiet NaN and the other is
numeric, the numeric operand is returned.
min_mag(self, /, other, context=None)
Similar to the min() method, but the comparison is done using the absolute
values of the operands.
next_minus(self, /, context=None)
Return the largest number representable in the given context (or in the
current default context if no context is given) that is smaller than the
given operand.
next_plus(self, /, context=None)
Return the smallest number representable in the given context (or in the
current default context if no context is given) that is larger than the
given operand.
next_toward(self, /, other, context=None)
If the two operands are unequal, return the number closest to the first
operand in the direction of the second operand. If both operands are
numerically equal, return a copy of the first operand with the sign set
to be the same as the sign of the second operand.
normalize(self, /, context=None)
Normalize the number by stripping the rightmost trailing zeros and
converting any result equal to Decimal('0') to Decimal('0e0'). Used
for producing canonical values for members of an equivalence class.
For example, Decimal('32.100') and Decimal('0.321000e+2') both normalize
to the equivalent value Decimal('32.1').
number_class(self, /, context=None)
Return a string describing the class of the operand. The returned value
is one of the following ten strings:
* '-Infinity', indicating that the operand is negative infinity.
* '-Normal', indicating that the operand is a negative normal number.
* '-Subnormal', indicating that the operand is negative and subnormal.
* '-Zero', indicating that the operand is a negative zero.
* '+Zero', indicating that the operand is a positive zero.
* '+Subnormal', indicating that the operand is positive and subnormal.
* '+Normal', indicating that the operand is a positive normal number.
* '+Infinity', indicating that the operand is positive infinity.
* 'NaN', indicating that the operand is a quiet NaN (Not a Number).
* 'sNaN', indicating that the operand is a signaling NaN.
quantize(self, /, exp, rounding=None, context=None)
Return a value equal to the first operand after rounding and having the
exponent of the second operand.
>>> Decimal('1.41421356').quantize(Decimal('1.000'))
Decimal('1.414')
Unlike other operations, if the length of the coefficient after the quantize
operation would be greater than precision, then an InvalidOperation is signaled.
This guarantees that, unless there is an error condition, the quantized exponent
is always equal to that of the right-hand operand.
Also unlike other operations, quantize never signals Underflow, even if the
result is subnormal and inexact.
If the exponent of the second operand is larger than that of the first, then
rounding may be necessary. In this case, the rounding mode is determined by the
rounding argument if given, else by the given context argument; if neither
argument is given, the rounding mode of the current thread's context is used.
radix(self, /)
Return Decimal(10), the radix (base) in which the Decimal class does
all its arithmetic. Included for compatibility with the specification.
remainder_near(self, /, other, context=None)
Return the remainder from dividing self by other. This differs from
self % other in that the sign of the remainder is chosen so as to minimize
its absolute value. More precisely, the return value is self - n * other
where n is the integer nearest to the exact value of self / other, and
if two integers are equally near then the even one is chosen.
If the result is zero then its sign will be the sign of self.
rotate(self, /, other, context=None)
Return the result of rotating the digits of the first operand by an amount
specified by the second operand. The second operand must be an integer in
the range -precision through precision. The absolute value of the second
operand gives the number of places to rotate. If the second operand is
positive then rotation is to the left; otherwise rotation is to the right.
The coefficient of the first operand is padded on the left with zeros to
length precision if necessary. The sign and exponent of the first operand are
unchanged.
same_quantum(self, /, other, context=None)
Test whether self and other have the same exponent or whether both are NaN.
This operation is unaffected by context and is quiet: no flags are changed
and no rounding is performed. As an exception, the C version may raise
InvalidOperation if the second operand cannot be converted exactly.
scaleb(self, /, other, context=None)
Return the first operand with the exponent adjusted the second. Equivalently,
return the first operand multiplied by 10**other. The second operand must be
an integer.
shift(self, /, other, context=None)
Return the result of shifting the digits of the first operand by an amount
specified by the second operand. The second operand must be an integer in
the range -precision through precision. The absolute value of the second
operand gives the number of places to shift. If the second operand is
positive, then the shift is to the left; otherwise the shift is to the
right. Digits shifted into the coefficient are zeros. The sign and exponent
of the first operand are unchanged.
sqrt(self, /, context=None)
Return the square root of the argument to full precision. The result is
correctly rounded using the ROUND_HALF_EVEN rounding mode.
to_eng_string(self, /, context=None)
Convert to an engineering-type string. Engineering notation has an exponent
which is a multiple of 3, so there are up to 3 digits left of the decimal
place. For example, Decimal('123E+1') is converted to Decimal('1.23E+3').
The value of context.capitals determines whether the exponent sign is lower
or upper case. Otherwise, the context does not affect the operation.
to_integral(self, /, rounding=None, context=None)
Identical to the to_integral_value() method. The to_integral() name has been
kept for compatibility with older versions.
to_integral_exact(self, /, rounding=None, context=None)
Round to the nearest integer, signaling Inexact or Rounded as appropriate if
rounding occurs. The rounding mode is determined by the rounding parameter
if given, else by the given context. If neither parameter is given, then the
rounding mode of the current default context is used.
to_integral_value(self, /, rounding=None, context=None)
Round to the nearest integer without signaling Inexact or Rounded. The
rounding mode is determined by the rounding parameter if given, else by
the given context. If neither parameter is given, then the rounding mode
of the current default context is used.
imag = <attribute 'imag' of 'decimal.Decimal' objects>
real = <attribute 'real' of 'decimal.Decimal' objects>
Fraction
This class implements rational numbers.
In the two-argument form of the constructor, Fraction(8, 6) will
produce a rational number equivalent to 4/3. Both arguments must
be Rational. The numerator defaults to 0 and the denominator
defaults to 1 so that Fraction(3) == 3 and Fraction() == 0.
Fractions can also be constructed from:
- numeric strings similar to those accepted by the
float constructor (for example, '-2.3' or '1e10')
- strings of the form '123/456'
- float and Decimal instances
- other Rational instances (including integers)
as_integer_ratio(self)
Return the integer ratio as a tuple.
Return a tuple of two integers, whose ratio is equal to the
Fraction and with a positive denominator.
conjugate(self)
Conjugate is a no-op for Reals.
from_decimal(dec)
Converts a finite Decimal instance to a rational number, exactly.
from_float(f)
Converts a finite float to a rational number, exactly.
Beware that Fraction.from_float(0.3) != Fraction(3, 10).
limit_denominator(self, max_denominator=1000000)
Closest Fraction to self with denominator at most max_denominator.
>>> Fraction('3.141592653589793').limit_denominator(10)
Fraction(22, 7)
>>> Fraction('3.141592653589793').limit_denominator(100)
Fraction(311, 99)
>>> Fraction(4321, 8765).limit_denominator(10000)
Fraction(4321, 8765)
denominator = <property object at 0x7f75e0ca84a0>
imag = <property object at 0x7f75e1394680>
Real numbers have no imaginary component.
numerator = <property object at 0x7f75e0ca8400>
real = <property object at 0x7f75e1394630>
Real numbers are their real component.
LinearRegression
LinearRegression(slope, intercept)
count(self, value, /)
Return number of occurrences of value.
index(self, value, start=0, stop=9223372036854775807, /)
Return first index of value.
Raises ValueError if the value is not present.
intercept = _tuplegetter(1, 'Alias for field number 1')
Alias for field number 1
slope = _tuplegetter(0, 'Alias for field number 0')
Alias for field number 0
NormalDist
Normal distribution of a random variable
cdf(self, x)
Cumulative distribution function. P(X <= x)
from_samples(data)
Make a normal distribution instance from sample data.
inv_cdf(self, p)
Inverse cumulative distribution function. x : P(X <= x) = p
Finds the value of the random variable such that the probability of
the variable being less than or equal to that value equals the given
probability.
This function is also called the percent point function or quantile
function.
overlap(self, other)
Compute the overlapping coefficient (OVL) between two normal distributions.
Measures the agreement between two normal probability distributions.
Returns a value between 0.0 and 1.0 giving the overlapping area in
the two underlying probability density functions.
>>> N1 = NormalDist(2.4, 1.6)
>>> N2 = NormalDist(3.2, 2.0)
>>> N1.overlap(N2)
0.8035050657330205
pdf(self, x)
Probability density function. P(x <= X < x+dx) / dx
quantiles(self, n=4)
Divide into *n* continuous intervals with equal probability.
Returns a list of (n - 1) cut points separating the intervals.
Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles.
Set *n* to 100 for percentiles which gives the 99 cuts points that
separate the normal distribution in to 100 equal sized groups.
samples(self, n, *, seed=None)
Generate *n* samples for a given mean and standard deviation.
zscore(self, x)
Compute the Standard Score. (x - mean) / stdev
Describes *x* in terms of the number of standard deviations
above or below the mean of the normal distribution.
mean = <property object at 0x7f75e09ec3b0>
Arithmetic mean of the normal distribution.
median = <property object at 0x7f75e09ed620>
Return the median of the normal distribution
mode = <property object at 0x7f75e09ed7b0>
Return the mode of the normal distribution
The mode is the value x where which the probability density
function (pdf) takes its maximum value.
stdev = <property object at 0x7f75e09ed760>
Standard deviation of the normal distribution.
variance = <property object at 0x7f75e09ed800>
Square of the standard deviation.
StatisticsError
with_traceback(...)
Exception.with_traceback(tb) --
set self.__traceback__ to tb and return self.
args = <attribute 'args' of 'BaseException' objects>
groupby
make an iterator that returns consecutive keys and groups from the iterable
iterable
Elements to divide into groups according to the key function.
key
A function for computing the group category for each element.
If the key function is not specified or is None, the element itself
is used for grouping.
itemgetter
itemgetter(item, ...) --> itemgetter object
Return a callable object that fetches the given item(s) from its operand.
After f = itemgetter(2), the call f(r) returns r[2].
After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3])
repeat
repeat(object [,times]) -> create an iterator which returns the object
for the specified number of times. If not specified, returns the object
endlessly.
Functions
bisect_left
bisect_left(a, x, lo=0, hi=None, *, key=None)
Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(i, x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
bisect_right
bisect_right(a, x, lo=0, hi=None, *, key=None)
Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e <= x, and all e in
a[i:] have e > x. So if x already appears in the list, a.insert(i, x) will
insert just after the rightmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
correlation
correlation(x, y, /)
Pearson's correlation coefficient
Return the Pearson's correlation coefficient for two inputs. Pearson's
correlation coefficient *r* takes values between -1 and +1. It measures the
strength and direction of the linear relationship, where +1 means very
strong, positive linear relationship, -1 very strong, negative linear
relationship, and 0 no linear relationship.
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> correlation(x, x)
1.0
>>> correlation(x, y)
-1.0
covariance
covariance(x, y, /)
Covariance
Return the sample covariance of two inputs *x* and *y*. Covariance
is a measure of the joint variability of two inputs.
>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> covariance(x, y)
0.75
>>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> covariance(x, z)
-7.5
>>> covariance(z, x)
-7.5
erf
erf(x, /)
Error function at x.
exp
exp(x, /)
Return e raised to the power of x.
fabs
fabs(x, /)
Return the absolute value of the float x.
fmean
fmean(data)
Convert data to floats and compute the arithmetic mean.
This runs faster than the mean() function and it always returns a float.
If the input dataset is empty, it raises a StatisticsError.
>>> fmean([3.5, 4.0, 5.25])
4.25
fsum
fsum(seq, /)
Return an accurate floating point sum of values in the iterable seq.
Assumes IEEE-754 floating point arithmetic.
geometric_mean
geometric_mean(data)
Convert data to floats and compute the geometric mean.
Raises a StatisticsError if the input dataset is empty,
if it contains a zero, or if it contains a negative value.
No special efforts are made to achieve exact results.
(However, this may change in the future.)
>>> round(geometric_mean([54, 24, 36]), 9)
36.0
harmonic_mean
harmonic_mean(data, weights=None)
Return the harmonic mean of data.
The harmonic mean is the reciprocal of the arithmetic mean of the
reciprocals of the data. It can be used for averaging ratios or
rates, for example speeds.
Suppose a car travels 40 km/hr for 5 km and then speeds-up to
60 km/hr for another 5 km. What is the average speed?
>>> harmonic_mean([40, 60])
48.0
Suppose a car travels 40 km/hr for 5 km, and when traffic clears,
speeds-up to 60 km/hr for the remaining 30 km of the journey. What
is the average speed?
>>> harmonic_mean([40, 60], weights=[5, 30])
56.0
If ``data`` is empty, or any element is less than zero,
``harmonic_mean`` will raise ``StatisticsError``.
hypot
hypot(...)
hypot(*coordinates) -> value
Multidimensional Euclidean distance from the origin to a point.
Roughly equivalent to:
sqrt(sum(x**2 for x in coordinates))
For a two dimensional point (x, y), gives the hypotenuse
using the Pythagorean theorem: sqrt(x*x + y*y).
For example, the hypotenuse of a 3/4/5 right triangle is:
>>> hypot(3.0, 4.0)
5.0
linear_regression
linear_regression(x, y, /)
Slope and intercept for simple linear regression.
Return the slope and intercept of simple linear regression
parameters estimated using ordinary least squares. Simple linear
regression describes relationship between an independent variable
*x* and a dependent variable *y* in terms of linear function:
y = slope * x + intercept + noise
where *slope* and *intercept* are the regression parameters that are
estimated, and noise represents the variability of the data that was
not explained by the linear regression (it is equal to the
difference between predicted and actual values of the dependent
variable).
The parameters are returned as a named tuple.
>>> x = [1, 2, 3, 4, 5]
>>> noise = NormalDist().samples(5, seed=42)
>>> y = [3 * x[i] + 2 + noise[i] for i in range(5)]
>>> linear_regression(x, y) #doctest: +ELLIPSIS
LinearRegression(slope=3.09078914170..., intercept=1.75684970486...)
log
log(...)
log(x, [base=math.e])
Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.
mean
mean(data)
Return the sample arithmetic mean of data.
>>> mean([1, 2, 3, 4, 4])
2.8
>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)
>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')
If ``data`` is empty, StatisticsError will be raised.
median
median(data)
Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by
taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0
median_grouped
median_grouped(data, interval=1)
Return the 50th percentile (median) of grouped continuous data.
>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
3.7
>>> median_grouped([52, 52, 53, 54])
52.5
This calculates the median as the 50th percentile, and should be
used when your data is continuous and grouped. In the above example,
the values 1, 2, 3, etc. actually represent the midpoint of classes
0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in
class 3.5-4.5, and interpolation is used to estimate it.
Optional argument ``interval`` represents the class interval, and
defaults to 1. Changing the class interval naturally will change the
interpolated 50th percentile value:
>>> median_grouped([1, 3, 3, 5, 7], interval=1)
3.25
>>> median_grouped([1, 3, 3, 5, 7], interval=2)
3.5
This function does not check whether the data points are at least
``interval`` apart.
median_high
median_high(data)
Return the high median of data.
When the number of data points is odd, the middle value is returned.
When it is even, the larger of the two middle values is returned.
>>> median_high([1, 3, 5])
3
>>> median_high([1, 3, 5, 7])
5
median_low
median_low(data)
Return the low median of numeric data.
When the number of data points is odd, the middle value is returned.
When it is even, the smaller of the two middle values is returned.
>>> median_low([1, 3, 5])
3
>>> median_low([1, 3, 5, 7])
3
mode
mode(data)
Return the most common data point from discrete or nominal data.
``mode`` assumes discrete data, and returns a single value. This is the
standard treatment of the mode as commonly taught in schools:
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
This also works with nominal (non-numeric) data:
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
If there are multiple modes with same frequency, return the first one
encountered:
>>> mode(['red', 'red', 'green', 'blue', 'blue'])
'red'
If *data* is empty, ``mode``, raises StatisticsError.
multimode
multimode(data)
Return a list of the most frequently occurring values.
Will return more than one result if there are multiple modes
or an empty list if *data* is empty.
>>> multimode('aabbbbbbbbcc')
['b']
>>> multimode('aabbbbccddddeeffffgg')
['b', 'd', 'f']
>>> multimode('')
[]
namedtuple
namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)
Returns a new subclass of tuple with named fields.
>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point.__doc__ # docstring for the new class
'Point(x, y)'
>>> p = Point(11, y=22) # instantiate with positional args or keywords
>>> p[0] + p[1] # indexable like a plain tuple
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessible by name
33
>>> d = p._asdict() # convert to a dictionary
>>> d['x']
11
>>> Point(**d) # convert from a dictionary
Point(x=11, y=22)
>>> p._replace(x=100) # _replace() is like str.replace() but targets named fields
Point(x=100, y=22)
pstdev
pstdev(data, mu=None)
Return the square root of the population variance.
See ``pvariance`` for arguments and other details.
>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
0.986893273527251
pvariance
pvariance(data, mu=None)
Return the population variance of ``data``.
data should be a sequence or iterable of Real-valued numbers, with at least one
value. The optional argument mu, if given, should be the mean of
the data. If it is missing or None, the mean is automatically calculated.
Use this function to calculate the variance from the entire population.
To estimate the variance from a sample, the ``variance`` function is
usually a better choice.
Examples:
>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
>>> pvariance(data)
1.25
If you have already calculated the mean of the data, you can pass it as
the optional second argument to avoid recalculating it:
>>> mu = mean(data)
>>> pvariance(data, mu)
1.25
Decimals and Fractions are supported:
>>> from decimal import Decimal as D
>>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
Decimal('24.815')
>>> from fractions import Fraction as F
>>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
Fraction(13, 72)
quantiles
quantiles(data, *, n=4, method='exclusive')
Divide *data* into *n* continuous intervals with equal probability.
Returns a list of (n - 1) cut points separating the intervals.
Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles.
Set *n* to 100 for percentiles which gives the 99 cuts points that
separate *data* in to 100 equal sized groups.
The *data* can be any iterable containing sample.
The cut points are linearly interpolated between data points.
If *method* is set to *inclusive*, *data* is treated as population
data. The minimum value is treated as the 0th percentile and the
maximum value is treated as the 100th percentile.
sqrt
sqrt(x, /)
Return the square root of x.
stdev
stdev(data, xbar=None)
Return the square root of the sample variance.
See ``variance`` for arguments and other details.
>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
1.0810874155219827
variance
variance(data, xbar=None)
Return the sample variance of data.
data should be an iterable of Real-valued numbers, with at least two
values. The optional argument xbar, if given, should be the mean of
the data. If it is missing or None, the mean is automatically calculated.
Use this function when your data is a sample from a population. To
calculate the variance from the entire population, see ``pvariance``.
Examples:
>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> variance(data)
1.3720238095238095
If you have already calculated the mean of your data, you can pass it as
the optional second argument ``xbar`` to avoid recalculating it:
>>> m = mean(data)
>>> variance(data, m)
1.3720238095238095
This function does not check that ``xbar`` is actually the mean of
``data``. Giving arbitrary values for ``xbar`` may lead to invalid or
impossible results.
Decimals and Fractions are supported:
>>> from decimal import Decimal as D
>>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
Decimal('31.01875')
>>> from fractions import Fraction as F
>>> variance([F(1, 6), F(1, 2), F(5, 3)])
Fraction(67, 108)
Other members
tau = 6.283185307179586
Modules
math
numbers
random