💾 Archived View for tris.fyi › pydoc › statistics captured on 2023-04-26 at 13:32:14. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-01-29)
-=-=-=-=-=-=-
Basic statistics module. This module provides functions for calculating statistics of data, including averages, variance, and standard deviation. Calculating averages -------------------- ================== ================================================== Function Description ================== ================================================== mean Arithmetic mean (average) of data. fmean Fast, floating point arithmetic mean. geometric_mean Geometric mean of data. harmonic_mean Harmonic mean of data. median Median (middle value) of data. median_low Low median of data. median_high High median of data. median_grouped Median, or 50th percentile, of grouped data. mode Mode (most common value) of data. multimode List of modes (most common values of data). quantiles Divide data into intervals with equal probability. ================== ================================================== Calculate the arithmetic mean ("the average") of data: >>> mean([-1.0, 2.5, 3.25, 5.75]) 2.625 Calculate the standard median of discrete data: >>> median([2, 3, 4, 5]) 3.5 Calculate the median, or 50th percentile, of data grouped into class intervals centred on the data values provided. E.g. if your data points are rounded to the nearest whole number: >>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS 2.8333333333... This should be interpreted in this way: you have two data points in the class interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in the class interval 3.5-4.5. The median of these data points is 2.8333... Calculating variability or spread --------------------------------- ================== ============================================= Function Description ================== ============================================= pvariance Population variance of data. variance Sample variance of data. pstdev Population standard deviation of data. stdev Sample standard deviation of data. ================== ============================================= Calculate the standard deviation of sample data: >>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS 4.38961843444... If you have previously calculated the mean, you can pass it as the optional second argument to the four "spread" functions to avoid recalculating it: >>> data = [1, 2, 2, 4, 4, 4, 5, 6] >>> mu = mean(data) >>> pvariance(data, mu) 2.5 Statistics for relations between two inputs ------------------------------------------- ================== ==================================================== Function Description ================== ==================================================== covariance Sample covariance for two variables. correlation Pearson's correlation coefficient for two variables. linear_regression Intercept and slope for simple linear regression. ================== ==================================================== Calculate covariance, Pearson's correlation, and simple linear regression for two inputs: >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3] >>> covariance(x, y) 0.75 >>> correlation(x, y) #doctest: +ELLIPSIS 0.31622776601... >>> linear_regression(x, y) #doctest: LinearRegression(slope=0.1, intercept=1.5) Exceptions ---------- A single exception is defined: StatisticsError is a subclass of ValueError.
Dict subclass for counting hashable items. Sometimes called a bag or multiset. Elements are stored as dictionary keys and their counts are stored as dictionary values. >>> c = Counter('abcdeabcdabcaba') # count elements from a string >>> c.most_common(3) # three most common elements [('a', 5), ('b', 4), ('c', 3)] >>> sorted(c) # list all unique elements ['a', 'b', 'c', 'd', 'e'] >>> ''.join(sorted(c.elements())) # list elements with repetitions 'aaaaabbbbcccdde' >>> sum(c.values()) # total of all counts 15 >>> c['a'] # count of letter 'a' 5 >>> for elem in 'shazam': # update counts from an iterable ... c[elem] += 1 # by adding 1 to each element's count >>> c['a'] # now there are seven 'a' 7 >>> del c['b'] # remove all 'b' >>> c['b'] # now there are zero 'b' 0 >>> d = Counter('simsalabim') # make another counter >>> c.update(d) # add in the second counter >>> c['a'] # now there are nine 'a' 9 >>> c.clear() # empty the counter >>> c Counter() Note: If a count is set to zero or reduced to zero, it will remain in the counter until the entry is deleted or the counter is cleared: >>> c = Counter('aaabbc') >>> c['b'] -= 2 # reduce the count of 'b' by two >>> c.most_common() # 'b' is still in, but its count is zero [('a', 3), ('c', 1), ('b', 0)]
clear(...) D.clear() -> None. Remove all items from D.
copy(self) Return a shallow copy.
elements(self) Iterator over elements repeating each as many times as its count. >>> c = Counter('ABCABC') >>> sorted(c.elements()) ['A', 'A', 'B', 'B', 'C', 'C'] # Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1 >>> prime_factors = Counter({2: 2, 3: 3, 17: 1}) >>> product = 1 >>> for factor in prime_factors.elements(): # loop over factors ... product *= factor # and multiply them >>> product 1836 Note, if an element's count has been set to zero or is a negative number, elements() will ignore it.
fromkeys(iterable, v=None)
get(self, key, default=None, /) Return the value for key if key is in the dictionary, else default.
items(...) D.items() -> a set-like object providing a view on D's items
keys(...) D.keys() -> a set-like object providing a view on D's keys
most_common(self, n=None) List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts. >>> Counter('abracadabra').most_common(3) [('a', 5), ('b', 2), ('r', 2)]
pop(...) D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If the key is not found, return the default if given; otherwise, raise a KeyError.
popitem(self, /) Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
setdefault(self, key, default=None, /) Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default.
subtract(self, iterable=None, /, **kwds) Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts. Source can be an iterable, a dictionary, or another Counter instance. >>> c = Counter('which') >>> c.subtract('witch') # subtract elements from another iterable >>> c.subtract(Counter('watch')) # subtract elements from another counter >>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch 0 >>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch -1
total(self) Sum of the counts
update(self, iterable=None, /, **kwds) Like dict.update() but add counts instead of replacing them. Source can be an iterable, a dictionary, or another Counter instance. >>> c = Counter('which') >>> c.update('witch') # add elements from another iterable >>> d = Counter('watch') >>> c.update(d) # add elements from another counter >>> c['h'] # four 'h' in which, witch, and watch 4
values(...) D.values() -> an object providing a view on D's values
Construct a new Decimal object. 'value' can be an integer, string, tuple, or another Decimal object. If no value is given, return Decimal('0'). The context does not affect the conversion and is only passed to determine if the InvalidOperation trap is active.
adjusted(self, /) Return the adjusted exponent of the number. Defined as exp + digits - 1.
as_integer_ratio(self, /) Decimal.as_integer_ratio() -> (int, int) Return a pair of integers, whose ratio is exactly equal to the original Decimal and with a positive denominator. The ratio is in lowest terms. Raise OverflowError on infinities and a ValueError on NaNs.
as_tuple(self, /) Return a tuple representation of the number.
canonical(self, /) Return the canonical encoding of the argument. Currently, the encoding of a Decimal instance is always canonical, so this operation returns its argument unchanged.
compare(self, /, other, context=None) Compare self to other. Return a decimal value: a or b is a NaN ==> Decimal('NaN') a < b ==> Decimal('-1') a == b ==> Decimal('0') a > b ==> Decimal('1')
compare_signal(self, /, other, context=None) Identical to compare, except that all NaNs signal.
compare_total(self, /, other, context=None) Compare two operands using their abstract representation rather than their numerical value. Similar to the compare() method, but the result gives a total ordering on Decimal instances. Two Decimal instances with the same numeric value but different representations compare unequal in this ordering: >>> Decimal('12.0').compare_total(Decimal('12')) Decimal('-1') Quiet and signaling NaNs are also included in the total ordering. The result of this function is Decimal('0') if both operands have the same representation, Decimal('-1') if the first operand is lower in the total order than the second, and Decimal('1') if the first operand is higher in the total order than the second operand. See the specification for details of the total order. This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly.
compare_total_mag(self, /, other, context=None) Compare two operands using their abstract representation rather than their value as in compare_total(), but ignoring the sign of each operand. x.compare_total_mag(y) is equivalent to x.copy_abs().compare_total(y.copy_abs()). This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly.
conjugate(self, /) Return self.
copy_abs(self, /) Return the absolute value of the argument. This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed.
copy_negate(self, /) Return the negation of the argument. This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed.
copy_sign(self, /, other, context=None) Return a copy of the first operand with the sign set to be the same as the sign of the second operand. For example: >>> Decimal('2.3').copy_sign(Decimal('-1.5')) Decimal('-2.3') This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly.
exp(self, /, context=None) Return the value of the (natural) exponential function e**x at the given number. The function always uses the ROUND_HALF_EVEN mode and the result is correctly rounded.
fma(self, /, other, third, context=None) Fused multiply-add. Return self*other+third with no rounding of the intermediate product self*other. >>> Decimal(2).fma(3, 5) Decimal('11')
from_float(f, /) Class method that converts a float to a decimal number, exactly. Since 0.1 is not exactly representable in binary floating point, Decimal.from_float(0.1) is not the same as Decimal('0.1'). >>> Decimal.from_float(0.1) Decimal('0.1000000000000000055511151231257827021181583404541015625') >>> Decimal.from_float(float('nan')) Decimal('NaN') >>> Decimal.from_float(float('inf')) Decimal('Infinity') >>> Decimal.from_float(float('-inf')) Decimal('-Infinity')
is_canonical(self, /) Return True if the argument is canonical and False otherwise. Currently, a Decimal instance is always canonical, so this operation always returns True.
is_finite(self, /) Return True if the argument is a finite number, and False if the argument is infinite or a NaN.
is_infinite(self, /) Return True if the argument is either positive or negative infinity and False otherwise.
is_nan(self, /) Return True if the argument is a (quiet or signaling) NaN and False otherwise.
is_normal(self, /, context=None) Return True if the argument is a normal finite non-zero number with an adjusted exponent greater than or equal to Emin. Return False if the argument is zero, subnormal, infinite or a NaN.
is_qnan(self, /) Return True if the argument is a quiet NaN, and False otherwise.
is_signed(self, /) Return True if the argument has a negative sign and False otherwise. Note that both zeros and NaNs can carry signs.
is_snan(self, /) Return True if the argument is a signaling NaN and False otherwise.
is_subnormal(self, /, context=None) Return True if the argument is subnormal, and False otherwise. A number is subnormal if it is non-zero, finite, and has an adjusted exponent less than Emin.
is_zero(self, /) Return True if the argument is a (positive or negative) zero and False otherwise.
ln(self, /, context=None) Return the natural (base e) logarithm of the operand. The function always uses the ROUND_HALF_EVEN mode and the result is correctly rounded.
log10(self, /, context=None) Return the base ten logarithm of the operand. The function always uses the ROUND_HALF_EVEN mode and the result is correctly rounded.
logb(self, /, context=None) For a non-zero number, return the adjusted exponent of the operand as a Decimal instance. If the operand is a zero, then Decimal('-Infinity') is returned and the DivisionByZero condition is raised. If the operand is an infinity then Decimal('Infinity') is returned.
logical_and(self, /, other, context=None) Return the digit-wise 'and' of the two (logical) operands.
logical_invert(self, /, context=None) Return the digit-wise inversion of the (logical) operand.
logical_or(self, /, other, context=None) Return the digit-wise 'or' of the two (logical) operands.
logical_xor(self, /, other, context=None) Return the digit-wise 'exclusive or' of the two (logical) operands.
max(self, /, other, context=None) Maximum of self and other. If one operand is a quiet NaN and the other is numeric, the numeric operand is returned.
max_mag(self, /, other, context=None) Similar to the max() method, but the comparison is done using the absolute values of the operands.
min(self, /, other, context=None) Minimum of self and other. If one operand is a quiet NaN and the other is numeric, the numeric operand is returned.
min_mag(self, /, other, context=None) Similar to the min() method, but the comparison is done using the absolute values of the operands.
next_minus(self, /, context=None) Return the largest number representable in the given context (or in the current default context if no context is given) that is smaller than the given operand.
next_plus(self, /, context=None) Return the smallest number representable in the given context (or in the current default context if no context is given) that is larger than the given operand.
next_toward(self, /, other, context=None) If the two operands are unequal, return the number closest to the first operand in the direction of the second operand. If both operands are numerically equal, return a copy of the first operand with the sign set to be the same as the sign of the second operand.
normalize(self, /, context=None) Normalize the number by stripping the rightmost trailing zeros and converting any result equal to Decimal('0') to Decimal('0e0'). Used for producing canonical values for members of an equivalence class. For example, Decimal('32.100') and Decimal('0.321000e+2') both normalize to the equivalent value Decimal('32.1').
number_class(self, /, context=None) Return a string describing the class of the operand. The returned value is one of the following ten strings: * '-Infinity', indicating that the operand is negative infinity. * '-Normal', indicating that the operand is a negative normal number. * '-Subnormal', indicating that the operand is negative and subnormal. * '-Zero', indicating that the operand is a negative zero. * '+Zero', indicating that the operand is a positive zero. * '+Subnormal', indicating that the operand is positive and subnormal. * '+Normal', indicating that the operand is a positive normal number. * '+Infinity', indicating that the operand is positive infinity. * 'NaN', indicating that the operand is a quiet NaN (Not a Number). * 'sNaN', indicating that the operand is a signaling NaN.
quantize(self, /, exp, rounding=None, context=None) Return a value equal to the first operand after rounding and having the exponent of the second operand. >>> Decimal('1.41421356').quantize(Decimal('1.000')) Decimal('1.414') Unlike other operations, if the length of the coefficient after the quantize operation would be greater than precision, then an InvalidOperation is signaled. This guarantees that, unless there is an error condition, the quantized exponent is always equal to that of the right-hand operand. Also unlike other operations, quantize never signals Underflow, even if the result is subnormal and inexact. If the exponent of the second operand is larger than that of the first, then rounding may be necessary. In this case, the rounding mode is determined by the rounding argument if given, else by the given context argument; if neither argument is given, the rounding mode of the current thread's context is used.
radix(self, /) Return Decimal(10), the radix (base) in which the Decimal class does all its arithmetic. Included for compatibility with the specification.
remainder_near(self, /, other, context=None) Return the remainder from dividing self by other. This differs from self % other in that the sign of the remainder is chosen so as to minimize its absolute value. More precisely, the return value is self - n * other where n is the integer nearest to the exact value of self / other, and if two integers are equally near then the even one is chosen. If the result is zero then its sign will be the sign of self.
rotate(self, /, other, context=None) Return the result of rotating the digits of the first operand by an amount specified by the second operand. The second operand must be an integer in the range -precision through precision. The absolute value of the second operand gives the number of places to rotate. If the second operand is positive then rotation is to the left; otherwise rotation is to the right. The coefficient of the first operand is padded on the left with zeros to length precision if necessary. The sign and exponent of the first operand are unchanged.
same_quantum(self, /, other, context=None) Test whether self and other have the same exponent or whether both are NaN. This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly.
scaleb(self, /, other, context=None) Return the first operand with the exponent adjusted the second. Equivalently, return the first operand multiplied by 10**other. The second operand must be an integer.
shift(self, /, other, context=None) Return the result of shifting the digits of the first operand by an amount specified by the second operand. The second operand must be an integer in the range -precision through precision. The absolute value of the second operand gives the number of places to shift. If the second operand is positive, then the shift is to the left; otherwise the shift is to the right. Digits shifted into the coefficient are zeros. The sign and exponent of the first operand are unchanged.
sqrt(self, /, context=None) Return the square root of the argument to full precision. The result is correctly rounded using the ROUND_HALF_EVEN rounding mode.
to_eng_string(self, /, context=None) Convert to an engineering-type string. Engineering notation has an exponent which is a multiple of 3, so there are up to 3 digits left of the decimal place. For example, Decimal('123E+1') is converted to Decimal('1.23E+3'). The value of context.capitals determines whether the exponent sign is lower or upper case. Otherwise, the context does not affect the operation.
to_integral(self, /, rounding=None, context=None) Identical to the to_integral_value() method. The to_integral() name has been kept for compatibility with older versions.
to_integral_exact(self, /, rounding=None, context=None) Round to the nearest integer, signaling Inexact or Rounded as appropriate if rounding occurs. The rounding mode is determined by the rounding parameter if given, else by the given context. If neither parameter is given, then the rounding mode of the current default context is used.
to_integral_value(self, /, rounding=None, context=None) Round to the nearest integer without signaling Inexact or Rounded. The rounding mode is determined by the rounding parameter if given, else by the given context. If neither parameter is given, then the rounding mode of the current default context is used.
imag = <attribute 'imag' of 'decimal.Decimal' objects>
real = <attribute 'real' of 'decimal.Decimal' objects>
This class implements rational numbers. In the two-argument form of the constructor, Fraction(8, 6) will produce a rational number equivalent to 4/3. Both arguments must be Rational. The numerator defaults to 0 and the denominator defaults to 1 so that Fraction(3) == 3 and Fraction() == 0. Fractions can also be constructed from: - numeric strings similar to those accepted by the float constructor (for example, '-2.3' or '1e10') - strings of the form '123/456' - float and Decimal instances - other Rational instances (including integers)
as_integer_ratio(self) Return the integer ratio as a tuple. Return a tuple of two integers, whose ratio is equal to the Fraction and with a positive denominator.
conjugate(self) Conjugate is a no-op for Reals.
from_decimal(dec) Converts a finite Decimal instance to a rational number, exactly.
from_float(f) Converts a finite float to a rational number, exactly. Beware that Fraction.from_float(0.3) != Fraction(3, 10).
limit_denominator(self, max_denominator=1000000) Closest Fraction to self with denominator at most max_denominator. >>> Fraction('3.141592653589793').limit_denominator(10) Fraction(22, 7) >>> Fraction('3.141592653589793').limit_denominator(100) Fraction(311, 99) >>> Fraction(4321, 8765).limit_denominator(10000) Fraction(4321, 8765)
denominator = <property object at 0x7f75e0ca84a0>
imag = <property object at 0x7f75e1394680> Real numbers have no imaginary component.
numerator = <property object at 0x7f75e0ca8400>
real = <property object at 0x7f75e1394630> Real numbers are their real component.
LinearRegression(slope, intercept)
count(self, value, /) Return number of occurrences of value.
index(self, value, start=0, stop=9223372036854775807, /) Return first index of value. Raises ValueError if the value is not present.
intercept = _tuplegetter(1, 'Alias for field number 1') Alias for field number 1
slope = _tuplegetter(0, 'Alias for field number 0') Alias for field number 0
Normal distribution of a random variable
cdf(self, x) Cumulative distribution function. P(X <= x)
from_samples(data) Make a normal distribution instance from sample data.
inv_cdf(self, p) Inverse cumulative distribution function. x : P(X <= x) = p Finds the value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability. This function is also called the percent point function or quantile function.
overlap(self, other) Compute the overlapping coefficient (OVL) between two normal distributions. Measures the agreement between two normal probability distributions. Returns a value between 0.0 and 1.0 giving the overlapping area in the two underlying probability density functions. >>> N1 = NormalDist(2.4, 1.6) >>> N2 = NormalDist(3.2, 2.0) >>> N1.overlap(N2) 0.8035050657330205
pdf(self, x) Probability density function. P(x <= X < x+dx) / dx
quantiles(self, n=4) Divide into *n* continuous intervals with equal probability. Returns a list of (n - 1) cut points separating the intervals. Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. Set *n* to 100 for percentiles which gives the 99 cuts points that separate the normal distribution in to 100 equal sized groups.
samples(self, n, *, seed=None) Generate *n* samples for a given mean and standard deviation.
zscore(self, x) Compute the Standard Score. (x - mean) / stdev Describes *x* in terms of the number of standard deviations above or below the mean of the normal distribution.
mean = <property object at 0x7f75e09ec3b0> Arithmetic mean of the normal distribution.
median = <property object at 0x7f75e09ed620> Return the median of the normal distribution
mode = <property object at 0x7f75e09ed7b0> Return the mode of the normal distribution The mode is the value x where which the probability density function (pdf) takes its maximum value.
stdev = <property object at 0x7f75e09ed760> Standard deviation of the normal distribution.
variance = <property object at 0x7f75e09ed800> Square of the standard deviation.
with_traceback(...) Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.
args = <attribute 'args' of 'BaseException' objects>
make an iterator that returns consecutive keys and groups from the iterable iterable Elements to divide into groups according to the key function. key A function for computing the group category for each element. If the key function is not specified or is None, the element itself is used for grouping.
itemgetter(item, ...) --> itemgetter object Return a callable object that fetches the given item(s) from its operand. After f = itemgetter(2), the call f(r) returns r[2]. After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3])
repeat(object [,times]) -> create an iterator which returns the object for the specified number of times. If not specified, returns the object endlessly.
bisect_left(a, x, lo=0, hi=None, *, key=None) Return the index where to insert item x in list a, assuming a is sorted. The return value i is such that all e in a[:i] have e < x, and all e in a[i:] have e >= x. So if x already appears in the list, a.insert(i, x) will insert just before the leftmost x already there. Optional args lo (default 0) and hi (default len(a)) bound the slice of a to be searched.
bisect_right(a, x, lo=0, hi=None, *, key=None) Return the index where to insert item x in list a, assuming a is sorted. The return value i is such that all e in a[:i] have e <= x, and all e in a[i:] have e > x. So if x already appears in the list, a.insert(i, x) will insert just after the rightmost x already there. Optional args lo (default 0) and hi (default len(a)) bound the slice of a to be searched.
correlation(x, y, /) Pearson's correlation coefficient Return the Pearson's correlation coefficient for two inputs. Pearson's correlation coefficient *r* takes values between -1 and +1. It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship. >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1] >>> correlation(x, x) 1.0 >>> correlation(x, y) -1.0
covariance(x, y, /) Covariance Return the sample covariance of two inputs *x* and *y*. Covariance is a measure of the joint variability of two inputs. >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3] >>> covariance(x, y) 0.75 >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1] >>> covariance(x, z) -7.5 >>> covariance(z, x) -7.5
erf(x, /) Error function at x.
exp(x, /) Return e raised to the power of x.
fabs(x, /) Return the absolute value of the float x.
fmean(data) Convert data to floats and compute the arithmetic mean. This runs faster than the mean() function and it always returns a float. If the input dataset is empty, it raises a StatisticsError. >>> fmean([3.5, 4.0, 5.25]) 4.25
fsum(seq, /) Return an accurate floating point sum of values in the iterable seq. Assumes IEEE-754 floating point arithmetic.
geometric_mean(data) Convert data to floats and compute the geometric mean. Raises a StatisticsError if the input dataset is empty, if it contains a zero, or if it contains a negative value. No special efforts are made to achieve exact results. (However, this may change in the future.) >>> round(geometric_mean([54, 24, 36]), 9) 36.0
harmonic_mean(data, weights=None) Return the harmonic mean of data. The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the data. It can be used for averaging ratios or rates, for example speeds. Suppose a car travels 40 km/hr for 5 km and then speeds-up to 60 km/hr for another 5 km. What is the average speed? >>> harmonic_mean([40, 60]) 48.0 Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed? >>> harmonic_mean([40, 60], weights=[5, 30]) 56.0 If ``data`` is empty, or any element is less than zero, ``harmonic_mean`` will raise ``StatisticsError``.
hypot(...) hypot(*coordinates) -> value Multidimensional Euclidean distance from the origin to a point. Roughly equivalent to: sqrt(sum(x**2 for x in coordinates)) For a two dimensional point (x, y), gives the hypotenuse using the Pythagorean theorem: sqrt(x*x + y*y). For example, the hypotenuse of a 3/4/5 right triangle is: >>> hypot(3.0, 4.0) 5.0
linear_regression(x, y, /) Slope and intercept for simple linear regression. Return the slope and intercept of simple linear regression parameters estimated using ordinary least squares. Simple linear regression describes relationship between an independent variable *x* and a dependent variable *y* in terms of linear function: y = slope * x + intercept + noise where *slope* and *intercept* are the regression parameters that are estimated, and noise represents the variability of the data that was not explained by the linear regression (it is equal to the difference between predicted and actual values of the dependent variable). The parameters are returned as a named tuple. >>> x = [1, 2, 3, 4, 5] >>> noise = NormalDist().samples(5, seed=42) >>> y = [3 * x[i] + 2 + noise[i] for i in range(5)] >>> linear_regression(x, y) #doctest: +ELLIPSIS LinearRegression(slope=3.09078914170..., intercept=1.75684970486...)
log(...) log(x, [base=math.e]) Return the logarithm of x to the given base. If the base not specified, returns the natural logarithm (base e) of x.
mean(data) Return the sample arithmetic mean of data. >>> mean([1, 2, 3, 4, 4]) 2.8 >>> from fractions import Fraction as F >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) Fraction(13, 21) >>> from decimal import Decimal as D >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) Decimal('0.5625') If ``data`` is empty, StatisticsError will be raised.
median(data) Return the median (middle value) of numeric data. When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values: >>> median([1, 3, 5]) 3 >>> median([1, 3, 5, 7]) 4.0
median_grouped(data, interval=1) Return the 50th percentile (median) of grouped continuous data. >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]) 3.7 >>> median_grouped([52, 52, 53, 54]) 52.5 This calculates the median as the 50th percentile, and should be used when your data is continuous and grouped. In the above example, the values 1, 2, 3, etc. actually represent the midpoint of classes 0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in class 3.5-4.5, and interpolation is used to estimate it. Optional argument ``interval`` represents the class interval, and defaults to 1. Changing the class interval naturally will change the interpolated 50th percentile value: >>> median_grouped([1, 3, 3, 5, 7], interval=1) 3.25 >>> median_grouped([1, 3, 3, 5, 7], interval=2) 3.5 This function does not check whether the data points are at least ``interval`` apart.
median_high(data) Return the high median of data. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned. >>> median_high([1, 3, 5]) 3 >>> median_high([1, 3, 5, 7]) 5
median_low(data) Return the low median of numeric data. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned. >>> median_low([1, 3, 5]) 3 >>> median_low([1, 3, 5, 7]) 3
mode(data) Return the most common data point from discrete or nominal data. ``mode`` assumes discrete data, and returns a single value. This is the standard treatment of the mode as commonly taught in schools: >>> mode([1, 1, 2, 3, 3, 3, 3, 4]) 3 This also works with nominal (non-numeric) data: >>> mode(["red", "blue", "blue", "red", "green", "red", "red"]) 'red' If there are multiple modes with same frequency, return the first one encountered: >>> mode(['red', 'red', 'green', 'blue', 'blue']) 'red' If *data* is empty, ``mode``, raises StatisticsError.
multimode(data) Return a list of the most frequently occurring values. Will return more than one result if there are multiple modes or an empty list if *data* is empty. >>> multimode('aabbbbbbbbcc') ['b'] >>> multimode('aabbbbccddddeeffffgg') ['b', 'd', 'f'] >>> multimode('') []
namedtuple(typename, field_names, *, rename=False, defaults=None, module=None) Returns a new subclass of tuple with named fields. >>> Point = namedtuple('Point', ['x', 'y']) >>> Point.__doc__ # docstring for the new class 'Point(x, y)' >>> p = Point(11, y=22) # instantiate with positional args or keywords >>> p[0] + p[1] # indexable like a plain tuple 33 >>> x, y = p # unpack like a regular tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessible by name 33 >>> d = p._asdict() # convert to a dictionary >>> d['x'] 11 >>> Point(**d) # convert from a dictionary Point(x=11, y=22) >>> p._replace(x=100) # _replace() is like str.replace() but targets named fields Point(x=100, y=22)
pstdev(data, mu=None) Return the square root of the population variance. See ``pvariance`` for arguments and other details. >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 0.986893273527251
pvariance(data, mu=None) Return the population variance of ``data``. data should be a sequence or iterable of Real-valued numbers, with at least one value. The optional argument mu, if given, should be the mean of the data. If it is missing or None, the mean is automatically calculated. Use this function to calculate the variance from the entire population. To estimate the variance from a sample, the ``variance`` function is usually a better choice. Examples: >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25] >>> pvariance(data) 1.25 If you have already calculated the mean of the data, you can pass it as the optional second argument to avoid recalculating it: >>> mu = mean(data) >>> pvariance(data, mu) 1.25 Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('24.815') >>> from fractions import Fraction as F >>> pvariance([F(1, 4), F(5, 4), F(1, 2)]) Fraction(13, 72)
quantiles(data, *, n=4, method='exclusive') Divide *data* into *n* continuous intervals with equal probability. Returns a list of (n - 1) cut points separating the intervals. Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. Set *n* to 100 for percentiles which gives the 99 cuts points that separate *data* in to 100 equal sized groups. The *data* can be any iterable containing sample. The cut points are linearly interpolated between data points. If *method* is set to *inclusive*, *data* is treated as population data. The minimum value is treated as the 0th percentile and the maximum value is treated as the 100th percentile.
sqrt(x, /) Return the square root of x.
stdev(data, xbar=None) Return the square root of the sample variance. See ``variance`` for arguments and other details. >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 1.0810874155219827
variance(data, xbar=None) Return the sample variance of data. data should be an iterable of Real-valued numbers, with at least two values. The optional argument xbar, if given, should be the mean of the data. If it is missing or None, the mean is automatically calculated. Use this function when your data is a sample from a population. To calculate the variance from the entire population, see ``pvariance``. Examples: >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> variance(data) 1.3720238095238095 If you have already calculated the mean of your data, you can pass it as the optional second argument ``xbar`` to avoid recalculating it: >>> m = mean(data) >>> variance(data, m) 1.3720238095238095 This function does not check that ``xbar`` is actually the mean of ``data``. Giving arbitrary values for ``xbar`` may lead to invalid or impossible results. Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('31.01875') >>> from fractions import Fraction as F >>> variance([F(1, 6), F(1, 2), F(5, 3)]) Fraction(67, 108)
tau = 6.283185307179586