statistics


Basic statistics module.

This module provides functions for calculating statistics of data, including
averages, variance, and standard deviation.

Calculating averages
--------------------

==================  ==================================================
Function            Description
==================  ==================================================
mean                Arithmetic mean (average) of data.
fmean               Fast, floating point arithmetic mean.
geometric_mean      Geometric mean of data.
harmonic_mean       Harmonic mean of data.
median              Median (middle value) of data.
median_low          Low median of data.
median_high         High median of data.
median_grouped      Median, or 50th percentile, of grouped data.
mode                Mode (most common value) of data.
multimode           List of modes (most common values of data).
quantiles           Divide data into intervals with equal probability.
==================  ==================================================

Calculate the arithmetic mean ("the average") of data:

>>> mean([-1.0, 2.5, 3.25, 5.75])
2.625


Calculate the standard median of discrete data:

>>> median([2, 3, 4, 5])
3.5


Calculate the median, or 50th percentile, of data grouped into class intervals
centred on the data values provided. E.g. if your data points are rounded to
the nearest whole number:

>>> median_grouped([2, 2, 3, 3, 3, 4])  #doctest: +ELLIPSIS
2.8333333333...

This should be interpreted in this way: you have two data points in the class
interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in
the class interval 3.5-4.5. The median of these data points is 2.8333...


Calculating variability or spread
---------------------------------

==================  =============================================
Function            Description
==================  =============================================
pvariance           Population variance of data.
variance            Sample variance of data.
pstdev              Population standard deviation of data.
stdev               Sample standard deviation of data.
==================  =============================================

Calculate the standard deviation of sample data:

>>> stdev([2.5, 3.25, 5.5, 11.25, 11.75])  #doctest: +ELLIPSIS
4.38961843444...

If you have previously calculated the mean, you can pass it as the optional
second argument to the four "spread" functions to avoid recalculating it:

>>> data = [1, 2, 2, 4, 4, 4, 5, 6]
>>> mu = mean(data)
>>> pvariance(data, mu)
2.5


Statistics for relations between two inputs
-------------------------------------------

==================  ====================================================
Function            Description
==================  ====================================================
covariance          Sample covariance for two variables.
correlation         Pearson's correlation coefficient for two variables.
linear_regression   Intercept and slope for simple linear regression.
==================  ====================================================

Calculate covariance, Pearson's correlation, and simple linear regression
for two inputs:

>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> covariance(x, y)
0.75
>>> correlation(x, y)  #doctest: +ELLIPSIS
0.31622776601...
>>> linear_regression(x, y)  #doctest:
LinearRegression(slope=0.1, intercept=1.5)


Exceptions
----------

A single exception is defined: StatisticsError is a subclass of ValueError.

Classes

Counter

Dict subclass for counting hashable items.  Sometimes called a bag
    or multiset.  Elements are stored as dictionary keys and their counts
    are stored as dictionary values.

    >>> c = Counter('abcdeabcdabcaba')  # count elements from a string

    >>> c.most_common(3)                # three most common elements
    [('a', 5), ('b', 4), ('c', 3)]
    >>> sorted(c)                       # list all unique elements
    ['a', 'b', 'c', 'd', 'e']
    >>> ''.join(sorted(c.elements()))   # list elements with repetitions
    'aaaaabbbbcccdde'
    >>> sum(c.values())                 # total of all counts
    15

    >>> c['a']                          # count of letter 'a'
    5
    >>> for elem in 'shazam':           # update counts from an iterable
    ...     c[elem] += 1                # by adding 1 to each element's count
    >>> c['a']                          # now there are seven 'a'
    7
    >>> del c['b']                      # remove all 'b'
    >>> c['b']                          # now there are zero 'b'
    0

    >>> d = Counter('simsalabim')       # make another counter
    >>> c.update(d)                     # add in the second counter
    >>> c['a']                          # now there are nine 'a'
    9

    >>> c.clear()                       # empty the counter
    >>> c
    Counter()

    Note:  If a count is set to zero or reduced to zero, it will remain
    in the counter until the entry is deleted or the counter is cleared:

    >>> c = Counter('aaabbc')
    >>> c['b'] -= 2                     # reduce the count of 'b' by two
    >>> c.most_common()                 # 'b' is still in, but its count is zero
    [('a', 3), ('c', 1), ('b', 0)]

clear(...)

  D.clear() -> None.  Remove all items from D.

copy(self)

  Return a shallow copy.

elements(self)

  Iterator over elements repeating each as many times as its count.

          >>> c = Counter('ABCABC')
          >>> sorted(c.elements())
          ['A', 'A', 'B', 'B', 'C', 'C']

          # Knuth's example for prime factors of 1836:  2**2 * 3**3 * 17**1
          >>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
          >>> product = 1
          >>> for factor in prime_factors.elements():     # loop over factors
          ...     product *= factor                       # and multiply them
          >>> product
          1836

          Note, if an element's count has been set to zero or is a negative
          number, elements() will ignore it.

fromkeys(iterable, v=None)

get(self, key, default=None, /)

  Return the value for key if key is in the dictionary, else default.

items(...)

  D.items() -> a set-like object providing a view on D's items

keys(...)

  D.keys() -> a set-like object providing a view on D's keys

most_common(self, n=None)

  List the n most common elements and their counts from the most
          common to the least.  If n is None, then list all element counts.

          >>> Counter('abracadabra').most_common(3)
          [('a', 5), ('b', 2), ('r', 2)]

pop(...)

  D.pop(k[,d]) -> v, remove specified key and return the corresponding value.

  If the key is not found, return the default if given; otherwise,
  raise a KeyError.

popitem(self, /)

  Remove and return a (key, value) pair as a 2-tuple.

  Pairs are returned in LIFO (last-in, first-out) order.
  Raises KeyError if the dict is empty.

setdefault(self, key, default=None, /)

  Insert key with a value of default if key is not in the dictionary.

  Return the value for key if key is in the dictionary, else default.

subtract(self, iterable=None, /, **kwds)

  Like dict.update() but subtracts counts instead of replacing them.
          Counts can be reduced below zero.  Both the inputs and outputs are
          allowed to contain zero and negative counts.

          Source can be an iterable, a dictionary, or another Counter instance.

          >>> c = Counter('which')
          >>> c.subtract('witch')             # subtract elements from another iterable
          >>> c.subtract(Counter('watch'))    # subtract elements from another counter
          >>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
          0
          >>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
          -1

total(self)

  Sum of the counts

update(self, iterable=None, /, **kwds)

  Like dict.update() but add counts instead of replacing them.

          Source can be an iterable, a dictionary, or another Counter instance.

          >>> c = Counter('which')
          >>> c.update('witch')           # add elements from another iterable
          >>> d = Counter('watch')
          >>> c.update(d)                 # add elements from another counter
          >>> c['h']                      # four 'h' in which, witch, and watch
          4

values(...)

  D.values() -> an object providing a view on D's values

Decimal

Construct a new Decimal object. 'value' can be an integer, string, tuple,
or another Decimal object. If no value is given, return Decimal('0'). The
context does not affect the conversion and is only passed to determine if
the InvalidOperation trap is active.

adjusted(self, /)

  Return the adjusted exponent of the number.  Defined as exp + digits - 1.

as_integer_ratio(self, /)

  Decimal.as_integer_ratio() -> (int, int)

  Return a pair of integers, whose ratio is exactly equal to the original
  Decimal and with a positive denominator. The ratio is in lowest terms.
  Raise OverflowError on infinities and a ValueError on NaNs.

as_tuple(self, /)

  Return a tuple representation of the number.

canonical(self, /)

  Return the canonical encoding of the argument.  Currently, the encoding
  of a Decimal instance is always canonical, so this operation returns its
  argument unchanged.

compare(self, /, other, context=None)

  Compare self to other.  Return a decimal value:

      a or b is a NaN ==> Decimal('NaN')
      a < b           ==> Decimal('-1')
      a == b          ==> Decimal('0')
      a > b           ==> Decimal('1')

compare_signal(self, /, other, context=None)

  Identical to compare, except that all NaNs signal.

compare_total(self, /, other, context=None)

  Compare two operands using their abstract representation rather than
  their numerical value.  Similar to the compare() method, but the result
  gives a total ordering on Decimal instances.  Two Decimal instances with
  the same numeric value but different representations compare unequal
  in this ordering:

      >>> Decimal('12.0').compare_total(Decimal('12'))
      Decimal('-1')

  Quiet and signaling NaNs are also included in the total ordering. The result
  of this function is Decimal('0') if both operands have the same representation,
  Decimal('-1') if the first operand is lower in the total order than the second,
  and Decimal('1') if the first operand is higher in the total order than the
  second operand. See the specification for details of the total order.

  This operation is unaffected by context and is quiet: no flags are changed
  and no rounding is performed. As an exception, the C version may raise
  InvalidOperation if the second operand cannot be converted exactly.

compare_total_mag(self, /, other, context=None)

  Compare two operands using their abstract representation rather than their
  value as in compare_total(), but ignoring the sign of each operand.

  x.compare_total_mag(y) is equivalent to x.copy_abs().compare_total(y.copy_abs()).

  This operation is unaffected by context and is quiet: no flags are changed
  and no rounding is performed. As an exception, the C version may raise
  InvalidOperation if the second operand cannot be converted exactly.

conjugate(self, /)

  Return self.

copy_abs(self, /)

  Return the absolute value of the argument.  This operation is unaffected by
  context and is quiet: no flags are changed and no rounding is performed.

copy_negate(self, /)

  Return the negation of the argument.  This operation is unaffected by context
  and is quiet: no flags are changed and no rounding is performed.

copy_sign(self, /, other, context=None)

  Return a copy of the first operand with the sign set to be the same as the
  sign of the second operand. For example:

      >>> Decimal('2.3').copy_sign(Decimal('-1.5'))
      Decimal('-2.3')

  This operation is unaffected by context and is quiet: no flags are changed
  and no rounding is performed. As an exception, the C version may raise
  InvalidOperation if the second operand cannot be converted exactly.

exp(self, /, context=None)

  Return the value of the (natural) exponential function e**x at the given
  number.  The function always uses the ROUND_HALF_EVEN mode and the result
  is correctly rounded.

fma(self, /, other, third, context=None)

  Fused multiply-add.  Return self*other+third with no rounding of the
  intermediate product self*other.

      >>> Decimal(2).fma(3, 5)
      Decimal('11')

from_float(f, /)

  Class method that converts a float to a decimal number, exactly.
  Since 0.1 is not exactly representable in binary floating point,
  Decimal.from_float(0.1) is not the same as Decimal('0.1').

      >>> Decimal.from_float(0.1)
      Decimal('0.1000000000000000055511151231257827021181583404541015625')
      >>> Decimal.from_float(float('nan'))
      Decimal('NaN')
      >>> Decimal.from_float(float('inf'))
      Decimal('Infinity')
      >>> Decimal.from_float(float('-inf'))
      Decimal('-Infinity')

is_canonical(self, /)

  Return True if the argument is canonical and False otherwise.  Currently,
  a Decimal instance is always canonical, so this operation always returns
  True.

is_finite(self, /)

  Return True if the argument is a finite number, and False if the argument
  is infinite or a NaN.

is_infinite(self, /)

  Return True if the argument is either positive or negative infinity and
  False otherwise.

is_nan(self, /)

  Return True if the argument is a (quiet or signaling) NaN and False
  otherwise.

is_normal(self, /, context=None)

  Return True if the argument is a normal finite non-zero number with an
  adjusted exponent greater than or equal to Emin. Return False if the
  argument is zero, subnormal, infinite or a NaN.

is_qnan(self, /)

  Return True if the argument is a quiet NaN, and False otherwise.

is_signed(self, /)

  Return True if the argument has a negative sign and False otherwise.
  Note that both zeros and NaNs can carry signs.

is_snan(self, /)

  Return True if the argument is a signaling NaN and False otherwise.

is_subnormal(self, /, context=None)

  Return True if the argument is subnormal, and False otherwise. A number is
  subnormal if it is non-zero, finite, and has an adjusted exponent less
  than Emin.

is_zero(self, /)

  Return True if the argument is a (positive or negative) zero and False
  otherwise.

ln(self, /, context=None)

  Return the natural (base e) logarithm of the operand. The function always
  uses the ROUND_HALF_EVEN mode and the result is correctly rounded.

log10(self, /, context=None)

  Return the base ten logarithm of the operand. The function always uses the
  ROUND_HALF_EVEN mode and the result is correctly rounded.

logb(self, /, context=None)

  For a non-zero number, return the adjusted exponent of the operand as a
  Decimal instance.  If the operand is a zero, then Decimal('-Infinity') is
  returned and the DivisionByZero condition is raised. If the operand is
  an infinity then Decimal('Infinity') is returned.

logical_and(self, /, other, context=None)

  Return the digit-wise 'and' of the two (logical) operands.

logical_invert(self, /, context=None)

  Return the digit-wise inversion of the (logical) operand.

logical_or(self, /, other, context=None)

  Return the digit-wise 'or' of the two (logical) operands.

logical_xor(self, /, other, context=None)

  Return the digit-wise 'exclusive or' of the two (logical) operands.

max(self, /, other, context=None)

  Maximum of self and other.  If one operand is a quiet NaN and the other is
  numeric, the numeric operand is returned.

max_mag(self, /, other, context=None)

  Similar to the max() method, but the comparison is done using the absolute
  values of the operands.

min(self, /, other, context=None)

  Minimum of self and other. If one operand is a quiet NaN and the other is
  numeric, the numeric operand is returned.

min_mag(self, /, other, context=None)

  Similar to the min() method, but the comparison is done using the absolute
  values of the operands.

next_minus(self, /, context=None)

  Return the largest number representable in the given context (or in the
  current default context if no context is given) that is smaller than the
  given operand.

next_plus(self, /, context=None)

  Return the smallest number representable in the given context (or in the
  current default context if no context is given) that is larger than the
  given operand.

next_toward(self, /, other, context=None)

  If the two operands are unequal, return the number closest to the first
  operand in the direction of the second operand.  If both operands are
  numerically equal, return a copy of the first operand with the sign set
  to be the same as the sign of the second operand.

normalize(self, /, context=None)

  Normalize the number by stripping the rightmost trailing zeros and
  converting any result equal to Decimal('0') to Decimal('0e0').  Used
  for producing canonical values for members of an equivalence class.
  For example, Decimal('32.100') and Decimal('0.321000e+2') both normalize
  to the equivalent value Decimal('32.1').

number_class(self, /, context=None)

  Return a string describing the class of the operand.  The returned value
  is one of the following ten strings:

      * '-Infinity', indicating that the operand is negative infinity.
      * '-Normal', indicating that the operand is a negative normal number.
      * '-Subnormal', indicating that the operand is negative and subnormal.
      * '-Zero', indicating that the operand is a negative zero.
      * '+Zero', indicating that the operand is a positive zero.
      * '+Subnormal', indicating that the operand is positive and subnormal.
      * '+Normal', indicating that the operand is a positive normal number.
      * '+Infinity', indicating that the operand is positive infinity.
      * 'NaN', indicating that the operand is a quiet NaN (Not a Number).
      * 'sNaN', indicating that the operand is a signaling NaN.

quantize(self, /, exp, rounding=None, context=None)

  Return a value equal to the first operand after rounding and having the
  exponent of the second operand.

      >>> Decimal('1.41421356').quantize(Decimal('1.000'))
      Decimal('1.414')

  Unlike other operations, if the length of the coefficient after the quantize
  operation would be greater than precision, then an InvalidOperation is signaled.
  This guarantees that, unless there is an error condition, the quantized exponent
  is always equal to that of the right-hand operand.

  Also unlike other operations, quantize never signals Underflow, even if the
  result is subnormal and inexact.

  If the exponent of the second operand is larger than that of the first, then
  rounding may be necessary. In this case, the rounding mode is determined by the
  rounding argument if given, else by the given context argument; if neither
  argument is given, the rounding mode of the current thread's context is used.

radix(self, /)

  Return Decimal(10), the radix (base) in which the Decimal class does
  all its arithmetic. Included for compatibility with the specification.

remainder_near(self, /, other, context=None)

  Return the remainder from dividing self by other.  This differs from
  self % other in that the sign of the remainder is chosen so as to minimize
  its absolute value. More precisely, the return value is self - n * other
  where n is the integer nearest to the exact value of self / other, and
  if two integers are equally near then the even one is chosen.

  If the result is zero then its sign will be the sign of self.

rotate(self, /, other, context=None)

  Return the result of rotating the digits of the first operand by an amount
  specified by the second operand.  The second operand must be an integer in
  the range -precision through precision. The absolute value of the second
  operand gives the number of places to rotate. If the second operand is
  positive then rotation is to the left; otherwise rotation is to the right.
  The coefficient of the first operand is padded on the left with zeros to
  length precision if necessary. The sign and exponent of the first operand are
  unchanged.

same_quantum(self, /, other, context=None)

  Test whether self and other have the same exponent or whether both are NaN.

  This operation is unaffected by context and is quiet: no flags are changed
  and no rounding is performed. As an exception, the C version may raise
  InvalidOperation if the second operand cannot be converted exactly.

scaleb(self, /, other, context=None)

  Return the first operand with the exponent adjusted the second.  Equivalently,
  return the first operand multiplied by 10**other. The second operand must be
  an integer.

shift(self, /, other, context=None)

  Return the result of shifting the digits of the first operand by an amount
  specified by the second operand.  The second operand must be an integer in
  the range -precision through precision. The absolute value of the second
  operand gives the number of places to shift. If the second operand is
  positive, then the shift is to the left; otherwise the shift is to the
  right. Digits shifted into the coefficient are zeros. The sign and exponent
  of the first operand are unchanged.

sqrt(self, /, context=None)

  Return the square root of the argument to full precision. The result is
  correctly rounded using the ROUND_HALF_EVEN rounding mode.

to_eng_string(self, /, context=None)

  Convert to an engineering-type string.  Engineering notation has an exponent
  which is a multiple of 3, so there are up to 3 digits left of the decimal
  place. For example, Decimal('123E+1') is converted to Decimal('1.23E+3').

  The value of context.capitals determines whether the exponent sign is lower
  or upper case. Otherwise, the context does not affect the operation.

to_integral(self, /, rounding=None, context=None)

  Identical to the to_integral_value() method.  The to_integral() name has been
  kept for compatibility with older versions.

to_integral_exact(self, /, rounding=None, context=None)

  Round to the nearest integer, signaling Inexact or Rounded as appropriate if
  rounding occurs.  The rounding mode is determined by the rounding parameter
  if given, else by the given context. If neither parameter is given, then the
  rounding mode of the current default context is used.

to_integral_value(self, /, rounding=None, context=None)

  Round to the nearest integer without signaling Inexact or Rounded.  The
  rounding mode is determined by the rounding parameter if given, else by
  the given context. If neither parameter is given, then the rounding mode
  of the current default context is used.

imag = <attribute 'imag' of 'decimal.Decimal' objects>

real = <attribute 'real' of 'decimal.Decimal' objects>

Fraction

This class implements rational numbers.

    In the two-argument form of the constructor, Fraction(8, 6) will
    produce a rational number equivalent to 4/3. Both arguments must
    be Rational. The numerator defaults to 0 and the denominator
    defaults to 1 so that Fraction(3) == 3 and Fraction() == 0.

    Fractions can also be constructed from:

      - numeric strings similar to those accepted by the
        float constructor (for example, '-2.3' or '1e10')

      - strings of the form '123/456'

      - float and Decimal instances

      - other Rational instances (including integers)

as_integer_ratio(self)

  Return the integer ratio as a tuple.

          Return a tuple of two integers, whose ratio is equal to the
          Fraction and with a positive denominator.

conjugate(self)

  Conjugate is a no-op for Reals.

from_decimal(dec)

  Converts a finite Decimal instance to a rational number, exactly.

from_float(f)

  Converts a finite float to a rational number, exactly.

          Beware that Fraction.from_float(0.3) != Fraction(3, 10).

limit_denominator(self, max_denominator=1000000)

  Closest Fraction to self with denominator at most max_denominator.

          >>> Fraction('3.141592653589793').limit_denominator(10)
          Fraction(22, 7)
          >>> Fraction('3.141592653589793').limit_denominator(100)
          Fraction(311, 99)
          >>> Fraction(4321, 8765).limit_denominator(10000)
          Fraction(4321, 8765)

denominator = <property object at 0x7f75e0ca84a0>

imag = <property object at 0x7f75e1394680>
  Real numbers have no imaginary component.

numerator = <property object at 0x7f75e0ca8400>

real = <property object at 0x7f75e1394630>
  Real numbers are their real component.

LinearRegression

LinearRegression(slope, intercept)

count(self, value, /)

  Return number of occurrences of value.

index(self, value, start=0, stop=9223372036854775807, /)

  Return first index of value.

  Raises ValueError if the value is not present.

intercept = _tuplegetter(1, 'Alias for field number 1')
  Alias for field number 1

slope = _tuplegetter(0, 'Alias for field number 0')
  Alias for field number 0

NormalDist

Normal distribution of a random variable

cdf(self, x)

  Cumulative distribution function.  P(X <= x)

from_samples(data)

  Make a normal distribution instance from sample data.

inv_cdf(self, p)

  Inverse cumulative distribution function.  x : P(X <= x) = p

          Finds the value of the random variable such that the probability of
          the variable being less than or equal to that value equals the given
          probability.

          This function is also called the percent point function or quantile
          function.

overlap(self, other)

  Compute the overlapping coefficient (OVL) between two normal distributions.

          Measures the agreement between two normal probability distributions.
          Returns a value between 0.0 and 1.0 giving the overlapping area in
          the two underlying probability density functions.

              >>> N1 = NormalDist(2.4, 1.6)
              >>> N2 = NormalDist(3.2, 2.0)
              >>> N1.overlap(N2)
              0.8035050657330205

pdf(self, x)

  Probability density function.  P(x <= X < x+dx) / dx

quantiles(self, n=4)

  Divide into *n* continuous intervals with equal probability.

          Returns a list of (n - 1) cut points separating the intervals.

          Set *n* to 4 for quartiles (the default).  Set *n* to 10 for deciles.
          Set *n* to 100 for percentiles which gives the 99 cuts points that
          separate the normal distribution in to 100 equal sized groups.

samples(self, n, *, seed=None)

  Generate *n* samples for a given mean and standard deviation.

zscore(self, x)

  Compute the Standard Score.  (x - mean) / stdev

          Describes *x* in terms of the number of standard deviations
          above or below the mean of the normal distribution.

mean = <property object at 0x7f75e09ec3b0>
  Arithmetic mean of the normal distribution.

median = <property object at 0x7f75e09ed620>
  Return the median of the normal distribution

mode = <property object at 0x7f75e09ed7b0>
  Return the mode of the normal distribution

          The mode is the value x where which the probability density
          function (pdf) takes its maximum value.

stdev = <property object at 0x7f75e09ed760>
  Standard deviation of the normal distribution.

variance = <property object at 0x7f75e09ed800>
  Square of the standard deviation.

StatisticsError

with_traceback(...)

  Exception.with_traceback(tb) --
      set self.__traceback__ to tb and return self.

args = <attribute 'args' of 'BaseException' objects>

groupby

make an iterator that returns consecutive keys and groups from the iterable

  iterable
    Elements to divide into groups according to the key function.
  key
    A function for computing the group category for each element.
    If the key function is not specified or is None, the element itself
    is used for grouping.

itemgetter

itemgetter(item, ...) --> itemgetter object

Return a callable object that fetches the given item(s) from its operand.
After f = itemgetter(2), the call f(r) returns r[2].
After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3])

repeat

repeat(object [,times]) -> create an iterator which returns the object
for the specified number of times.  If not specified, returns the object
endlessly.

Functions

bisect_left

bisect_left(a, x, lo=0, hi=None, *, key=None)

  Return the index where to insert item x in list a, assuming a is sorted.

  The return value i is such that all e in a[:i] have e < x, and all e in
  a[i:] have e >= x.  So if x already appears in the list, a.insert(i, x) will
  insert just before the leftmost x already there.

  Optional args lo (default 0) and hi (default len(a)) bound the
  slice of a to be searched.

bisect_right

bisect_right(a, x, lo=0, hi=None, *, key=None)

  Return the index where to insert item x in list a, assuming a is sorted.

  The return value i is such that all e in a[:i] have e <= x, and all e in
  a[i:] have e > x.  So if x already appears in the list, a.insert(i, x) will
  insert just after the rightmost x already there.

  Optional args lo (default 0) and hi (default len(a)) bound the
  slice of a to be searched.

correlation

correlation(x, y, /)

  Pearson's correlation coefficient

      Return the Pearson's correlation coefficient for two inputs. Pearson's
      correlation coefficient *r* takes values between -1 and +1. It measures the
      strength and direction of the linear relationship, where +1 means very
      strong, positive linear relationship, -1 very strong, negative linear
      relationship, and 0 no linear relationship.

      >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
      >>> correlation(x, x)
      1.0
      >>> correlation(x, y)
      -1.0

covariance

covariance(x, y, /)

  Covariance

      Return the sample covariance of two inputs *x* and *y*. Covariance
      is a measure of the joint variability of two inputs.

      >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
      >>> covariance(x, y)
      0.75
      >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
      >>> covariance(x, z)
      -7.5
      >>> covariance(z, x)
      -7.5

erf

erf(x, /)

  Error function at x.

exp

exp(x, /)

  Return e raised to the power of x.

fabs

fabs(x, /)

  Return the absolute value of the float x.

fmean

fmean(data)

  Convert data to floats and compute the arithmetic mean.

      This runs faster than the mean() function and it always returns a float.
      If the input dataset is empty, it raises a StatisticsError.

      >>> fmean([3.5, 4.0, 5.25])
      4.25

fsum

fsum(seq, /)

  Return an accurate floating point sum of values in the iterable seq.

  Assumes IEEE-754 floating point arithmetic.

geometric_mean

geometric_mean(data)

  Convert data to floats and compute the geometric mean.

      Raises a StatisticsError if the input dataset is empty,
      if it contains a zero, or if it contains a negative value.

      No special efforts are made to achieve exact results.
      (However, this may change in the future.)

      >>> round(geometric_mean([54, 24, 36]), 9)
      36.0

harmonic_mean

harmonic_mean(data, weights=None)

  Return the harmonic mean of data.

      The harmonic mean is the reciprocal of the arithmetic mean of the
      reciprocals of the data.  It can be used for averaging ratios or
      rates, for example speeds.

      Suppose a car travels 40 km/hr for 5 km and then speeds-up to
      60 km/hr for another 5 km. What is the average speed?

          >>> harmonic_mean([40, 60])
          48.0

      Suppose a car travels 40 km/hr for 5 km, and when traffic clears,
      speeds-up to 60 km/hr for the remaining 30 km of the journey. What
      is the average speed?

          >>> harmonic_mean([40, 60], weights=[5, 30])
          56.0

      If ``data`` is empty, or any element is less than zero,
      ``harmonic_mean`` will raise ``StatisticsError``.

hypot

hypot(...)

  hypot(*coordinates) -> value

  Multidimensional Euclidean distance from the origin to a point.

  Roughly equivalent to:
      sqrt(sum(x**2 for x in coordinates))

  For a two dimensional point (x, y), gives the hypotenuse
  using the Pythagorean theorem:  sqrt(x*x + y*y).

  For example, the hypotenuse of a 3/4/5 right triangle is:

      >>> hypot(3.0, 4.0)
      5.0

linear_regression

linear_regression(x, y, /)

  Slope and intercept for simple linear regression.

      Return the slope and intercept of simple linear regression
      parameters estimated using ordinary least squares. Simple linear
      regression describes relationship between an independent variable
      *x* and a dependent variable *y* in terms of linear function:

          y = slope * x + intercept + noise

      where *slope* and *intercept* are the regression parameters that are
      estimated, and noise represents the variability of the data that was
      not explained by the linear regression (it is equal to the
      difference between predicted and actual values of the dependent
      variable).

      The parameters are returned as a named tuple.

      >>> x = [1, 2, 3, 4, 5]
      >>> noise = NormalDist().samples(5, seed=42)
      >>> y = [3 * x[i] + 2 + noise[i] for i in range(5)]
      >>> linear_regression(x, y)  #doctest: +ELLIPSIS
      LinearRegression(slope=3.09078914170..., intercept=1.75684970486...)

log

log(...)

  log(x, [base=math.e])
  Return the logarithm of x to the given base.

  If the base not specified, returns the natural logarithm (base e) of x.

mean

mean(data)

  Return the sample arithmetic mean of data.

      >>> mean([1, 2, 3, 4, 4])
      2.8

      >>> from fractions import Fraction as F
      >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
      Fraction(13, 21)

      >>> from decimal import Decimal as D
      >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
      Decimal('0.5625')

      If ``data`` is empty, StatisticsError will be raised.

median

median(data)

  Return the median (middle value) of numeric data.

      When the number of data points is odd, return the middle data point.
      When the number of data points is even, the median is interpolated by
      taking the average of the two middle values:

      >>> median([1, 3, 5])
      3
      >>> median([1, 3, 5, 7])
      4.0

median_grouped

median_grouped(data, interval=1)

  Return the 50th percentile (median) of grouped continuous data.

      >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
      3.7
      >>> median_grouped([52, 52, 53, 54])
      52.5

      This calculates the median as the 50th percentile, and should be
      used when your data is continuous and grouped. In the above example,
      the values 1, 2, 3, etc. actually represent the midpoint of classes
      0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in
      class 3.5-4.5, and interpolation is used to estimate it.

      Optional argument ``interval`` represents the class interval, and
      defaults to 1. Changing the class interval naturally will change the
      interpolated 50th percentile value:

      >>> median_grouped([1, 3, 3, 5, 7], interval=1)
      3.25
      >>> median_grouped([1, 3, 3, 5, 7], interval=2)
      3.5

      This function does not check whether the data points are at least
      ``interval`` apart.

median_high

median_high(data)

  Return the high median of data.

      When the number of data points is odd, the middle value is returned.
      When it is even, the larger of the two middle values is returned.

      >>> median_high([1, 3, 5])
      3
      >>> median_high([1, 3, 5, 7])
      5

median_low

median_low(data)

  Return the low median of numeric data.

      When the number of data points is odd, the middle value is returned.
      When it is even, the smaller of the two middle values is returned.

      >>> median_low([1, 3, 5])
      3
      >>> median_low([1, 3, 5, 7])
      3

mode

mode(data)

  Return the most common data point from discrete or nominal data.

      ``mode`` assumes discrete data, and returns a single value. This is the
      standard treatment of the mode as commonly taught in schools:

          >>> mode([1, 1, 2, 3, 3, 3, 3, 4])
          3

      This also works with nominal (non-numeric) data:

          >>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
          'red'

      If there are multiple modes with same frequency, return the first one
      encountered:

          >>> mode(['red', 'red', 'green', 'blue', 'blue'])
          'red'

      If *data* is empty, ``mode``, raises StatisticsError.

multimode

multimode(data)

  Return a list of the most frequently occurring values.

      Will return more than one result if there are multiple modes
      or an empty list if *data* is empty.

      >>> multimode('aabbbbbbbbcc')
      ['b']
      >>> multimode('aabbbbccddddeeffffgg')
      ['b', 'd', 'f']
      >>> multimode('')
      []

namedtuple

namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)

  Returns a new subclass of tuple with named fields.

      >>> Point = namedtuple('Point', ['x', 'y'])
      >>> Point.__doc__                   # docstring for the new class
      'Point(x, y)'
      >>> p = Point(11, y=22)             # instantiate with positional args or keywords
      >>> p[0] + p[1]                     # indexable like a plain tuple
      33
      >>> x, y = p                        # unpack like a regular tuple
      >>> x, y
      (11, 22)
      >>> p.x + p.y                       # fields also accessible by name
      33
      >>> d = p._asdict()                 # convert to a dictionary
      >>> d['x']
      11
      >>> Point(**d)                      # convert from a dictionary
      Point(x=11, y=22)
      >>> p._replace(x=100)               # _replace() is like str.replace() but targets named fields
      Point(x=100, y=22)

pstdev

pstdev(data, mu=None)

  Return the square root of the population variance.

      See ``pvariance`` for arguments and other details.

      >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
      0.986893273527251

pvariance

pvariance(data, mu=None)

  Return the population variance of ``data``.

      data should be a sequence or iterable of Real-valued numbers, with at least one
      value. The optional argument mu, if given, should be the mean of
      the data. If it is missing or None, the mean is automatically calculated.

      Use this function to calculate the variance from the entire population.
      To estimate the variance from a sample, the ``variance`` function is
      usually a better choice.

      Examples:

      >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
      >>> pvariance(data)
      1.25

      If you have already calculated the mean of the data, you can pass it as
      the optional second argument to avoid recalculating it:

      >>> mu = mean(data)
      >>> pvariance(data, mu)
      1.25

      Decimals and Fractions are supported:

      >>> from decimal import Decimal as D
      >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
      Decimal('24.815')

      >>> from fractions import Fraction as F
      >>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
      Fraction(13, 72)

quantiles

quantiles(data, *, n=4, method='exclusive')

  Divide *data* into *n* continuous intervals with equal probability.

      Returns a list of (n - 1) cut points separating the intervals.

      Set *n* to 4 for quartiles (the default).  Set *n* to 10 for deciles.
      Set *n* to 100 for percentiles which gives the 99 cuts points that
      separate *data* in to 100 equal sized groups.

      The *data* can be any iterable containing sample.
      The cut points are linearly interpolated between data points.

      If *method* is set to *inclusive*, *data* is treated as population
      data.  The minimum value is treated as the 0th percentile and the
      maximum value is treated as the 100th percentile.

sqrt

sqrt(x, /)

  Return the square root of x.

stdev

stdev(data, xbar=None)

  Return the square root of the sample variance.

      See ``variance`` for arguments and other details.

      >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
      1.0810874155219827

variance

variance(data, xbar=None)

  Return the sample variance of data.

      data should be an iterable of Real-valued numbers, with at least two
      values. The optional argument xbar, if given, should be the mean of
      the data. If it is missing or None, the mean is automatically calculated.

      Use this function when your data is a sample from a population. To
      calculate the variance from the entire population, see ``pvariance``.

      Examples:

      >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
      >>> variance(data)
      1.3720238095238095

      If you have already calculated the mean of your data, you can pass it as
      the optional second argument ``xbar`` to avoid recalculating it:

      >>> m = mean(data)
      >>> variance(data, m)
      1.3720238095238095

      This function does not check that ``xbar`` is actually the mean of
      ``data``. Giving arbitrary values for ``xbar`` may lead to invalid or
      impossible results.

      Decimals and Fractions are supported:

      >>> from decimal import Decimal as D
      >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
      Decimal('31.01875')

      >>> from fractions import Fraction as F
      >>> variance([F(1, 6), F(1, 2), F(5, 3)])
      Fraction(67, 108)

Other members

tau = 6.283185307179586

Modules

math

numbers

random