💾 Archived View for drewdevault.com › cgi-bin › man.sh › 1 › awk captured on 2021-11-30 at 20:18:30.
-=-=-=-=-=-=-
AWK(1P) POSIX Programmer's Manual AWK(1P) PROLOG This manual page is part of the POSIX Programmer's Manual. The Linux implementation of this interface may differ (consult the corresponding Linux manual page for details of Linux behavior), or the interface may not be implemented on Linux. NAME awk -- pattern scanning and processing language SYNOPSIS awk [-F sepstring] [-v assignment]... program [argument...] awk [-F sepstring] -f progfile [-f progfile]... [-v assignment]... [argument...] DESCRIPTION The awk utility shall execute programs written in the awk programming language, which is specialized for textual data manipulation. An awk program is a sequence of patterns and corresponding actions. When input is read that matches a pattern, the action associated with that pattern is carried out. Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating <newline>, but this can be changed by using the RS built-in variable. Each record of input shall be matched in turn against each pattern in the program. For each pattern matched, the associated action shall be executed. The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non-<blank> non-<newline> characters. This default <blank> and <newline> field delimiter can be changed by using the FS built-in variable or the -F sepstring option. The awk utility shall denote the first field in a record $1, the second $2, and so on. The symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of $0. Assigning to $0 shall reset the values of all other fields and the NF built-in variable. OPTIONS The awk utility shall conform to the Base Definitions volume of POSIX.1-2017, Section 12.2, Utility Syntax Guidelines. The following options shall be supported: -F sepstring Define the input field separator. This option shall be equivalent to: -v FS=sepstring except that if -F sepstring and -v FS=sepstring are both used, it is unspecified whether the FS assignment resulting from -F sepstring is processed in command line order or is processed after the last -v FS=sepstring. See the description of the FS built-in variable, and how it is used, in the EXTENDED DESCRIPTION section. -f progfile Specify the pathname of the file progfile containing an awk program. A pathname of '-' shall denote the standard input. If multiple instances of this option are specified, the concatenation of the files specified as progfile in the order specified shall be the awk program. The awk program can alternatively be specified in the command line as a single argument. -v assignment The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified. OPERANDS The following operands shall be supported: program If no -f option is specified, the first operand to awk shall be the text of the awk program. The application shall supply the program operand as a single argument to awk. If the text does not end in a <newline>, awk shall interpret the text as if it did. argument Either of the following two types of argument can be intermixed: file A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no file operands are specified, or if a file operand is '-', the standard input shall be used. assignment An operand that begins with an <underscore> or alphabetic character from the portable character set (see the table in the Base Definitions volume of POSIX.1-2017, Section 6.1, Portable Character Set), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the '=' character, shall specify a variable assignment rather than a pathname. The characters before the '=' represent the name of an awk variable; if that name is an awk reserved word (see Grammar) the behavior is undefined. The characters following the <equals-sign> shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote ('"') character, as a STRING token (see Grammar), except that if the last character is an unescaped <backslash>, it shall be interpreted as a literal <backslash> rather than as the first character of the sequence "\"". The variable shall be assigned the value of that STRING token and, if appropriate, shall be considered a numeric string (see Expressions in awk), the variable shall also be assigned its numeric value. Each such variable assignment shall occur just prior to the processing of the following file, if any. Thus, an assignment before the first file argument shall be executed after the BEGIN actions (if any), while an assignment after the last file argument shall occur before the END actions (if any). If there are no file arguments, assignments shall be executed before processing the standard input. STDIN The standard input shall be used only if no file operands are specified, or if a file operand is '-', or if a progfile option- argument is '-'; see the INPUT FILES section. If the awk program contains no actions and no patterns, but is otherwise a valid awk program, standard input and any file operands shall not be read and awk shall exit with a return status of zero. INPUT FILES Input files to the awk program from any of the following sources shall be text files: * Any file operands or their equivalents, achieved by modifying the awk variables ARGV and ARGC * Standard input in the absence of any file operands * Arguments to the getline function Whether the variable RS is set to a value other than a <newline> or not, for these files, implementations shall support records terminated with the specified separator up to {LINE_MAX} bytes and may support longer records. If -f progfile is specified, the application shall ensure that the files named by each of the progfile option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an awk program. ENVIRONMENT VARIABLES The following environment variables shall affect the execution of awk: LANG Provide a default value for the internationalization variables that are unset or null. (See the Base Definitions volume of POSIX.1-2017, Section 8.2, Internationalization Variables for the precedence of internationalization variables used to determine the values of locale categories.) LC_ALL If set to a non-empty string value, override the values of all the other internationalization variables. LC_COLLATE Determine the locale for the behavior of ranges, equivalence classes, and multi-character collating elements within regular expressions and in comparisons of string values. LC_CTYPE Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files), the behavior of character classes within regular expressions, the identification of characters as letters, and the mapping of uppercase and lowercase characters for the toupper and tolower functions. LC_MESSAGES Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error. LC_NUMERIC Determine the radix character used when interpreting numeric input, performing conversions between numeric and string values, and formatting numeric output. Regardless of locale, the <period> character (the decimal-point character of the POSIX locale) is the decimal-point character recognized in processing awk programs (including assignments in command line arguments). NLSPATH Determine the location of message catalogs for the processing of LC_MESSAGES. PATH Determine the search path when looking for commands executed by system(expr), or input and output pipes; see the Base Definitions volume of POSIX.1-2017, Chapter 8, Environment Variables. In addition, all environment variables shall be visible via the awk variable ENVIRON. ASYNCHRONOUS EVENTS Default. STDOUT The nature of the output files depends on the awk program. STDERR The standard error shall be used only for diagnostic messages. OUTPUT FILES The nature of the output files depends on the awk program. EXTENDED DESCRIPTION Overall Program Structure An awk program is composed of pairs of the form: pattern { action } Either the pattern or the action (including the enclosing brace characters) can be omitted. A missing pattern shall match any record of input, and a missing action shall be equivalent to: { print } Execution of the awk program shall start by first executing the actions associated with all BEGIN patterns in the order they occur in the program. Then each file operand (or standard input if no files were specified) shall be processed in turn by reading data from the file until a record separator is seen (<newline> by default). Before the first reference to a field in the record is evaluated, the record shall be split into fields, according to the rules in Regular Expressions, using the value of FS that was current at the time the record was read. Each pattern in the program then shall be evaluated in the order of occurrence, and the action associated with each pattern that matches the current record executed. The action for a matching pattern shall be executed before evaluating subsequent patterns. Finally, the actions associated with all END patterns shall be executed in the order they occur in the program. Expressions in awk Expressions describe computations used in patterns and actions. In the following table, valid expression operations are given in groups from highest precedence first to lowest precedence last, with equal-precedence operators grouped between horizontal lines. In expression evaluation, where the grammar is formally ambiguous, higher precedence operators shall be evaluated before lower precedence operators. In this table expr, expr1, expr2, and expr3 represent any expression, while lvalue represents any entity that can be assigned to (that is, on the left side of an assignment operator). The precise syntax of expressions is given in Grammar. Table 4-1: Expressions in Decreasing Precedence in awk +---------------------+-------------------------+----------------+--------------+ | Syntax | Name | Type of Result |Associativity | +---------------------+-------------------------+----------------+--------------+ |( expr ) |Grouping |Type of expr |N/A | +---------------------+-------------------------+----------------+--------------+ |$expr |Field reference |String |N/A | +---------------------+-------------------------+----------------+--------------+ |lvalue ++ |Post-increment |Numeric |N/A | |lvalue -- |Post-decrement |Numeric |N/A | +---------------------+-------------------------+----------------+--------------+ |++ lvalue |Pre-increment |Numeric |N/A | |-- lvalue |Pre-decrement |Numeric |N/A | +---------------------+-------------------------+----------------+--------------+ |expr ^ expr |Exponentiation |Numeric |Right | +---------------------+-------------------------+----------------+--------------+ |! expr |Logical not |Numeric |N/A | |+ expr |Unary plus |Numeric |N/A | |- expr |Unary minus |Numeric |N/A | +---------------------+-------------------------+----------------+--------------+ |expr * expr |Multiplication |Numeric |Left | |expr / expr |Division |Numeric |Left | |expr % expr |Modulus |Numeric |Left | +---------------------+-------------------------+----------------+--------------+ |expr + expr |Addition |Numeric |Left | |expr - expr |Subtraction |Numeric |Left | +---------------------+-------------------------+----------------+--------------+ |expr expr |String concatenation |String |Left | +---------------------+-------------------------+----------------+--------------+ |expr < expr |Less than |Numeric |None | |expr <= expr |Less than or equal to |Numeric |None | |expr != expr |Not equal to |Numeric |None | |expr == expr |Equal to |Numeric |None | |expr > expr |Greater than |Numeric |None | |expr >= expr |Greater than or equal to |Numeric |None | +---------------------+-------------------------+----------------+--------------+ |expr ~ expr |ERE match |Numeric |None | |expr !~ expr |ERE non-match |Numeric |None | +---------------------+-------------------------+----------------+--------------+ |expr in array |Array membership |Numeric |Left | |( index ) in array |Multi-dimension array |Numeric |Left | | |membership | | | +---------------------+-------------------------+----------------+--------------+ |expr && expr |Logical AND |Numeric |Left | +---------------------+-------------------------+----------------+--------------+ |expr || expr |Logical OR |Numeric |Left | +---------------------+-------------------------+----------------+--------------+ |expr1 ? expr2 : expr3|Conditional expression |Type of selected|Right | | | |expr2 or expr3 | | +---------------------+-------------------------+----------------+--------------+ |lvalue ^= expr |Exponentiation assignment|Numeric |Right | |lvalue %= expr |Modulus assignment |Numeric |Right | |lvalue *= expr |Multiplication assignment|Numeric |Right | |lvalue /= expr |Division assignment |Numeric |Right | |lvalue += expr |Addition assignment |Numeric |Right | |lvalue -= expr |Subtraction assignment |Numeric |Right | |lvalue = expr |Assignment |Type of expr |Right | +---------------------+-------------------------+----------------+--------------+ Each expression shall have either a string value, a numeric value, or both. Except as stated for specific contexts, the value of an expression shall be implicitly converted to the type needed for the context in which it is used. A string value shall be converted to a numeric value either by the equivalent of the following calls to functions defined by the ISO C standard: setlocale(LC_NUMERIC, ""); numeric_value = atof(string_value); or by converting the initial portion of the string to type double representation as follows: The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as specified by isspace()) and a subject sequence interpreted as a floating-point constant. The expected form of the subject sequence is an optional '+' or '-' sign, then a non-empty sequence of digits optionally containing a <period>, then an optional exponent part. An exponent part consists of 'e' or 'E', followed by an optional sign, followed by one or more decimal digits. The sequence starting with the first digit or the <period> (whichever occurs first) is interpreted as a floating constant of the C language, and if neither an exponent part nor a <period> appears, a <period> is assumed to follow the last digit in the string. If the subject sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated. A numeric value that is exactly equal to the value of an integer (see Section 1.1.2, Concepts Derived from the ISO C Standard) shall be converted to a string by the equivalent of a call to the sprintf function (see String Functions) with the string "%d" as the fmt argument and the numeric value being converted as the first and only expr argument. Any other numeric value shall be converted to a string by the equivalent of a call to the sprintf function with the value of the variable CONVFMT as the fmt argument and the numeric value being converted as the first and only expr argument. The result of the conversion is unspecified if the value of CONVFMT is not a floating-point format specification. This volume of POSIX.1-2017 specifies no explicit conversions between numbers and strings. An application can force an expression to be treated as a number by adding zero to it, or can force it to be treated as a string by concatenating the null string ("") to it. A string value shall be considered a numeric string if it comes from one of the following: 1. Field variables 2. Input from the getline() function 3. FILENAME 4. ARGV array elements 5. ENVIRON array elements 6. Array elements created by the split() function 7. A command line variable assignment 8. Variable assignment from another numeric string variable and an implementation-dependent condition corresponding to either case (a) or (b) below is met. a. After the equivalent of the following calls to functions defined by the ISO C standard, string_value_end would differ from string_value, and any characters before the terminating null character in string_value_end would be <blank> characters: char *string_value_end; setlocale(LC_NUMERIC, ""); numeric_value = strtod (string_value, &string_value_end); b. After all the following conversions have been applied, the resulting string would lexically be recognized as a NUMBER token as described by the lexical conventions in Grammar: -- All leading and trailing <blank> characters are discarded. -- If the first non-<blank> is '+' or '-', it is discarded. -- Each occurrence of the decimal point character from the current locale is changed to a <period>. In case (a) the numeric value of the numeric string shall be the value that would be returned by the strtod() call. In case (b) if the first non-<blank> is '-', the numeric value of the numeric string shall be the negation of the numeric value of the recognized NUMBER token; otherwise, the numeric value of the numeric string shall be the numeric value of the recognized NUMBER token. Whether or not a string is a numeric string shall be relevant only in contexts where that term is used in this section. When an expression is used in a Boolean context, if it has a numeric value, a value of zero shall be treated as false and any other value shall be treated as true. Otherwise, a string value of the null string shall be treated as false and any other value shall be treated as true. A Boolean context shall be one of the following: * The first subexpression of a conditional expression * An expression operated on by logical NOT, logical AND, or logical OR * The second expression of a for statement * The expression of an if statement * The expression of the while clause in either a while or do...while statement * An expression used as a pattern (as in Overall Program Structure) All arithmetic shall follow the semantics of floating-point arithmetic as specified by the ISO C standard (see Section 1.1.2, Concepts Derived from the ISO C Standard). The value of the expression: expr1 ^ expr2 shall be equivalent to the value returned by the ISO C standard function call: pow(expr1, expr2) The expression: lvalue ^= expr shall be equivalent to the ISO C standard expression: lvalue = pow(lvalue, expr) except that lvalue shall be evaluated only once. The value of the expression: expr1 % expr2 shall be equivalent to the value returned by the ISO C standard function call: fmod(expr1, expr2) The expression: lvalue %= expr shall be equivalent to the ISO C standard expression: lvalue = fmod(lvalue, expr) except that lvalue shall be evaluated only once. Variables and fields shall be set by the assignment statement: lvalue = expression and the type of expression shall determine the resulting variable type. The assignment includes the arithmetic assignments ("+=", "-=", "*=", "/=", "%=", "^=", "++", "--") all of which shall produce a numeric result. The left-hand side of an assignment and the target of increment and decrement operators can be one of a variable, an array with index, or a field selector. The awk language supplies arrays that are used for storing numbers or strings. Arrays need not be declared. They shall initially be empty, and their sizes shall change dynamically. The subscripts, or element identifiers, are strings, providing a type of associative array capability. An array name followed by a subscript within square brackets can be used as an lvalue and thus as an expression, as described in the grammar; see Grammar. Unsubscripted array names can be used in only the following contexts: * A parameter in a function definition or function call * The NAME token following any use of the keyword in as specified in the grammar (see Grammar); if the name used in this context is not an array name, the behavior is undefined A valid array index shall consist of one or more <comma>-separated expressions, similar to the way in which multi- dimensional arrays are indexed in some programming languages. Because awk arrays are really one-dimensional, such a <comma>-separated list shall be converted to a single string by concatenating the string values of the separate expressions, each separated from the other by the value of the SUBSEP variable. Thus, the following two index operations shall be equivalent: var[expr1, expr2, ... exprn] var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn] The application shall ensure that a multi-dimensioned index used with the in operator is parenthesized. The in operator, which tests for the existence of a particular array element, shall not cause that element to exist. Any other reference to a nonexistent array element shall automatically create it. Comparisons (with the '<', "<=", "!=", "==", '>', and ">=" operators) shall be made numerically if both operands are numeric, if one is numeric and the other has a string value that is a numeric string, or if one is numeric and the other has the uninitialized value. Otherwise, operands shall be converted to strings as required and a string comparison shall be made as follows: * For the "!=" and "==" operators, the strings should be compared to check if they are identical but may be compared using the locale-specific collation sequence to check if they collate equally. * For the other operators, the strings shall be compared using the locale-specific collation sequence. The value of the comparison expression shall be 1 if the relation is true, or 0 if the relation is false. Variables and Special Variables Variables can be used in an awk program by referencing them. With the exception of function parameters (see User-Defined Functions), they are not explicitly declared. Function parameter names shall be local to the function; all other variable names shall be global. The same name shall not be used as both a function parameter name and as the name of a function or a special awk variable. The same name shall not be used both as a variable name with global scope and as the name of a function. The same name shall not be used within the same scope both as a scalar variable and as an array. Uninitialized variables, including scalar variables, array elements, and field variables, shall have an uninitialized value. An uninitialized value shall have both a numeric value of zero and a string value of the empty string. Evaluation of variables with an uninitialized value, to either string or numeric, shall be determined by the context in which they are used. Field variables shall be designated by a '