email.header

Header encoding and decoding functionality.

Classes

Charset

Map character sets to their email properties.

    This class provides information about the requirements imposed on email
    for a specific character set.  It also provides convenience routines for
    converting between character sets, given the availability of the
    applicable codecs.  Given a character set, it will do its best to provide
    information on how to use that character set in an email in an
    RFC-compliant way.

    Certain character sets must be encoded with quoted-printable or base64
    when used in email headers or bodies.  Certain character sets must be
    converted outright, and are not allowed in email.  Instances of this
    module expose the following information about a character set:

    input_charset: The initial character set specified.  Common aliases
                   are converted to their `official' email names (e.g. latin_1
                   is converted to iso-8859-1).  Defaults to 7-bit us-ascii.

    header_encoding: If the character set must be encoded before it can be
                     used in an email header, this attribute will be set to
                     charset.QP (for quoted-printable), charset.BASE64 (for
                     base64 encoding), or charset.SHORTEST for the shortest of
                     QP or BASE64 encoding.  Otherwise, it will be None.

    body_encoding: Same as header_encoding, but describes the encoding for the
                   mail message's body, which indeed may be different than the
                   header encoding.  charset.SHORTEST is not allowed for
                   body_encoding.

    output_charset: Some character sets must be converted before they can be
                    used in email headers or bodies.  If the input_charset is
                    one of them, this attribute will contain the name of the
                    charset output will be converted to.  Otherwise, it will
                    be None.

    input_codec: The name of the Python codec used to convert the
                 input_charset to Unicode.  If no conversion codec is
                 necessary, this attribute will be None.

    output_codec: The name of the Python codec used to convert Unicode
                  to the output_charset.  If no conversion codec is necessary,
                  this attribute will have the same value as the input_codec.

body_encode(self, string)

  Body-encode a string by converting it first to bytes.

          The type of encoding (base64 or quoted-printable) will be based on
          self.body_encoding.  If body_encoding is None, we assume the
          output charset is a 7bit encoding, so re-encoding the decoded
          string using the ascii codec produces the correct string version
          of the content.

get_body_encoding(self)

  Return the content-transfer-encoding used for body encoding.

          This is either the string `quoted-printable' or `base64' depending on
          the encoding used, or it is a function in which case you should call
          the function with a single argument, the Message object being
          encoded.  The function should then set the Content-Transfer-Encoding
          header itself to whatever is appropriate.

          Returns "quoted-printable" if self.body_encoding is QP.
          Returns "base64" if self.body_encoding is BASE64.
          Returns conversion function otherwise.

get_output_charset(self)

  Return the output character set.

          This is self.output_charset if that is not None, otherwise it is
          self.input_charset.

header_encode(self, string)

  Header-encode a string by converting it first to bytes.

          The type of encoding (base64 or quoted-printable) will be based on
          this charset's `header_encoding`.

          :param string: A unicode string for the header.  It must be possible
              to encode this string to bytes using the character set's
              output codec.
          :return: The encoded string, with RFC 2047 chrome.

header_encode_lines(self, string, maxlengths)

  Header-encode a string by converting it first to bytes.

          This is similar to `header_encode()` except that the string is fit
          into maximum line lengths as given by the argument.

          :param string: A unicode string for the header.  It must be possible
              to encode this string to bytes using the character set's
              output codec.
          :param maxlengths: Maximum line length iterator.  Each element
              returned from this iterator will provide the next maximum line
              length.  This parameter is used as an argument to built-in next()
              and should never be exhausted.  The maximum line lengths should
              not count the RFC 2047 chrome.  These line lengths are only a
              hint; the splitter does the best it can.
          :return: Lines of encoded strings, each with RFC 2047 chrome.

Header

append(self, s, charset=None, errors='strict')

  Append a string to the MIME header.

          Optional charset, if given, should be a Charset instance or the name
          of a character set (which will be converted to a Charset instance).  A
          value of None (the default) means that the charset given in the
          constructor is used.

          s may be a byte string or a Unicode string.  If it is a byte string
          (i.e. isinstance(s, str) is false), then charset is the encoding of
          that byte string, and a UnicodeError will be raised if the string
          cannot be decoded with that charset.  If s is a Unicode string, then
          charset is a hint specifying the character set of the characters in
          the string.  In either case, when producing an RFC 2822 compliant
          header using RFC 2047 rules, the string will be encoded using the
          output codec of the charset.  If the string cannot be encoded to the
          output codec, a UnicodeError will be raised.

          Optional `errors' is passed as the errors argument to the decode
          call if s is a byte string.

encode(self, splitchars=';, \t', maxlinelen=None, linesep='\n')

  Encode a message header into an RFC-compliant format.

          There are many issues involved in converting a given string for use in
          an email header.  Only certain character sets are readable in most
          email clients, and as header strings can only contain a subset of
          7-bit ASCII, care must be taken to properly convert and encode (with
          Base64 or quoted-printable) header strings.  In addition, there is a
          75-character length limit on any given encoded header field, so
          line-wrapping must be performed, even with double-byte character sets.

          Optional maxlinelen specifies the maximum length of each generated
          line, exclusive of the linesep string.  Individual lines may be longer
          than maxlinelen if a folding point cannot be found.  The first line
          will be shorter by the length of the header name plus ": " if a header
          name was specified at Header construction time.  The default value for
          maxlinelen is determined at header construction time.

          Optional splitchars is a string containing characters which should be
          given extra weight by the splitting algorithm during normal header
          wrapping.  This is in very rough support of RFC 2822's `higher level
          syntactic breaks':  split points preceded by a splitchar are preferred
          during line splitting, with the characters preferred in the order in
          which they appear in the string.  Space and tab may be included in the
          string to indicate whether preference should be given to one over the
          other as a split point when other split chars do not appear in the line
          being split.  Splitchars does not affect RFC 2047 encoded lines.

          Optional linesep is a string to be used to separate the lines of
          the value.  The default value is the most useful for typical
          Python applications, but it can be set to \r\n to produce RFC-compliant
          line separators when needed.

HeaderParseError

Error while parsing headers.

with_traceback(...)

  Exception.with_traceback(tb) --
      set self.__traceback__ to tb and return self.

args = <attribute 'args' of 'BaseException' objects>

Functions

decode_header

decode_header(header)

  Decode a message header value without converting charset.

      Returns a list of (string, charset) pairs containing each of the decoded
      parts of the header.  Charset is None for non-encoded parts of the header,
      otherwise a lower-case string containing the name of the character set
      specified in the encoded string.

      header may be a string that may or may not contain RFC2047 encoded words,
      or it may be a Header object.

      An email.errors.HeaderParseError may be raised when certain decoding error
      occurs (e.g. a base64 decoding exception).

make_header

make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ')

  Create a Header from a sequence of pairs as returned by decode_header()

      decode_header() takes a header value string and returns a sequence of
      pairs of the format (decoded_string, charset) where charset is the string
      name of the character set.

      This function takes one of those sequence of pairs and returns a Header
      instance.  Optional maxlinelen, header_name, and continuation_ws are as in
      the Header constructor.

Other members

BSPACE = b' '

EMPTYSTRING = ''

FWS = ' \t'

MAXLINELEN = 78

NL = '\n'

SPACE = ' '

SPACE8 = '        '

USASCII = us-ascii

UTF8 = utf-8

ecre = re.compile('\n  =\\?                   # literal =?\n  (?P<charset>[^?]*?)   # non-greedy up to the next ? is the charset\n  \\?                    # literal ?\n  (?P<encoding>[qQbB])  # either a "q" or a "b", c, re.MULTILINE|re.VERBOSE)

fcre = re.compile('[\\041-\\176]+:


)

Modules

binascii