Network Working Group                                        D. Cromwell
Request for Comments: 2897                               Nortel Networks
Category: Informational                                      August 2000


              Proposal for an MGCP Advanced Audio Package

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

Abstract

   This document is a proposal to add a new event/signal package to the
   MGCP (Media Gateway Control Protocol) protocol to control an ARF
   (Audio Resource Function) which may reside on a Media Gateway or
   specialized Audio Server.

   This event package provides support for the standard IVR (Interactive
   Voice Response) operations of PlayAnnouncement, PlayCollect, and
   PlayRecord.  It supports direct references to simple audio as well as
   indirect references to simple and complex audio. It provides audio
   variables, control of audio interruptibility, digit buffer control,
   special key sequences, and support for reprompting during data
   collection.  It also provides an arbitrary number of user defined
   qualifiers to be used in resolving complex audio structures.  For
   example, the user could define qualifiers for any or all of the
   following: language, accent, audio file format, gender, speaker, or
   customer.


Cromwell                     Informational                      [Page 1]

RFC 2897              MGCP Advanced Audio Package            August 2000


Table of Contents

   1. Introduction ................................................  2
   1.1. Audio Segments ............................................  3
   1.1.1. Sequences And Sets ......................................  3
   1.1.2. Segment Types ...........................................  4
   2. Advanced Audio Package ......................................  5
   3. Events ......................................................  5
   4. Event Parameters ............................................  7
   5. Return Parameters ...........................................  7
   6. Variables ................................................... 14
   7. Selectors ................................................... 17
   8. Aliases ..................................................... 18
   9. Examples .................................................... 21
   10. Formal Syntax Description .................................. 22
   11. References ................................................. 22
   12. Formal Syntax Description .................................. 25
   13. References ................................................. 32
   14. Author's Address ........................................... 33
   15. Full Copyright Statement ................................... 34

1.  Introduction

   The following syntax supports both simple and complex audio
   structures.  A simple audio structure might be a single announcement
   such as "Welcome to Bell South's Automated Directory Assistance
   Service".  A more complex audio structure might consist of an
   announcement followed by voice variable followed by another
   announcement, for example "There are thirty seven minutes remaining
   on your prepaid calling card," where "There are" is a prompt, the
   number of minutes is a voice variable, and "minutes remaining on your
   prepaid calling card" is another prompt.

   It is also possible to define complex audio structures that are
   qualified by user defined selectors such as language, audio file
   format, gender, accent, customer, or voice talent.  For instance, if
   the above example were qualified by language and accent selectors, it
   would be possible to play "There are thirty seven minutes remaining
   on your prepaid calling card" in English spoken with a southern
   accent or in English spoken with a mid-western accent, providing that
   the audio to support this had been provisioned.

   There are two methods of specifying complex audio.  The first is to
   directly reference the individual components.  This requires a
   complete description of each component to be specified via the
   protocol.  The second method is to provision the components on the
   Audio Server as a single entity and to export a reference to that
   entity to the call agent.  In this case, only the reference (plus any


Cromwell                     Informational                      [Page 2]

RFC 2897              MGCP Advanced Audio Package            August 2000


   dynamic data required, such as a variable data) is passed via the
   protocol, and no specification of individual components is necessary.

   The Audio Server Package provides significant functionality most of
   which is controlled via protocol parameters.  Most parameters are
   optional, and where ever possible default to reasonable values.  An
   audio application that references to provisioned, complex audio
   structures, and which takes advantage of parameter optionality and
   defaults, can specify audio events using a minimum of syntax.

1.1.  Background

   The next two sections contain background information which may be
   helpful in understanding the syntax.

1.1.1.  Sequence And Sets

   The syntax supports abstractions of set and sequence for storing and
   referencing audio data.

   A sequence is a provisioned sequence of one or more audio segments.
   Component segments are not necessarily all of the same type.  Every
   sequence is assigned a unique segment id.  On playback, a sequence id
   reference is deconstructed into its individual parts, each of which
   is played in order.

   A set is a provisioned collection of audio segments with an
   associated selector.  On playback, the selector value is resolved to
   a particular set element.  Selector types are supported by the
   syntax, but individual selector types are not defined in the syntax
   except for the pre-defined language selector; they are instead
   defined by the user (i.e.  provisioner).  A user could define one or
   more of the following selector types: language, accent, audio file
   format, gender, accent, customer, or day of the week.  For each
   selector type, the user must define a range of valid values.  The
   user may also choose to define a default value.  At runtime if a
   selector value is not supplied the default value is used.

   For example, to support an application which plays a particular piece
   of audio in either English, French, or Russian, a provisioner would
   define a set with the pre-defined selector, "Lang", and would define
   three possible values for that selector, "eng", "fra", and "rus".
   The provisioner would then provision three recordings of the prompt,
   one in each language, and would associate the French recording with
   the "fra" selector value, etc.  The provisioner also could define a
   default value of the selector when no selector value is supplied,
   "eng" for instance.  The entire set would be assigned a unique
   segment id.


Cromwell                     Informational                      [Page 3]

RFC 2897              MGCP Advanced Audio Package            August 2000


   At runtime a reference to the set with the selector set to "rus"
   would result in the Russian version of the prompt being played.  A
   reference to the set with no selector would result in the English
   version of the prompt being played since English has been set as the
   default selector value.

   Nested definition of both sets and sequences is allowed, i.e. it
   legal to define a set of sets or a sequence of sequences.  In
   addition, audio structures may also be specified by intermixing sets
   and sequences, and it is possible to specify a set of sequences or a
   sequence containing one or more set elements.  Direct or transitive
   definition of a set or segment in terms of itself is not allowed.

1.1.2.  Segment Types

   The syntax supports the following segment types:

      RECORDING:  A reference by unique id to a single piece of recorded
      audio.

      RECORDINGs may be provisioned or they may be made during the
      course of a call.  A RECORDING made during the course of a call
      can be temporary or persistent.  A temporary RECORDING lasts only
      for the life of the call during which it was recorded.  A
      persistent RECORDING lasts beyond the live of the call during
      which it was recorded.

      A provisioned RECORDING may be replaced (or overridden) by a
      persistent RECORDING.  A reference to the id of the provisioned
      RECORDING will then resolve to the persistent RECORDING.  The
      overriding persistent audio can subsequently be deleted and the
      original provisioned audio can be restored.

      A provisioned RECORDING may be overridden more than once.  In this
      case, the id of the provisioned RECORDING refers to the latest
      overriding RECORDING.  When the overriding RECORDING is deleted,
      the original provisioned RECORDING is restored, even if the
      segment has been overridden multiple times.

      TEXT:  A reference to a block of text to be converted to speech or
      to be displayed on a device. Reference may be by unique id to a
      block of provisioned text or by direct specification of text in a
      parameter.

      SILENCE:  A specification of a length of silence to be played in
      units of 100 milliseconds.


Cromwell                     Informational                      [Page 4]

RFC 2897              MGCP Advanced Audio Package            August 2000


      TONE: The specification of a tone to be played by algorithmic
      generation.  Most tones however will probably be recorded, not
      generated. Exact specification of this segment type is tbd.

      VARIABLE:  The specification of a voice variable by the parameters
      of type, subtype, and value.  Specification of variables is
      considered in more detail in a subsequent section of this
      document.

      SEQUENCE: A reference by unique id to a provisioned sequence of
      mixed RECORDING, TEXT, SILENCE, TONE, VARIABLE, SET, or SEQUENCE
      segments. Nested definition of SEQUENCE segments is allowed.
      Direct or transitive definition of a SEQUENCE segment in terms of
      itself is not allowed.

      SET:  A  reference by unique id to a provisioned set of segments.
      The intended and recommended use of the SET type is that all
      segments in the set should be semantically equivalent, however
      there is no real way of enforcing this restriction either in the
      protocol or in provisioning.  Every set has an associated selector
      which is used at runtime to resolve the set reference to a
      specific element of the set.  The elements of a set may one of the
      following segment types:  RECORDING, TEXT, TONE, SILENCE,
      SEQUENCE, or SET.  Specific selector types are not specified by
      the protocol and must be defined by the user.  Nested definition
      of SET segments is allowed. Direct or transitive definition of a
      SET segment in terms of itself is not allowed.

2.  Advanced Audio Package

   Package Name: AU

   This package defines events and signals for an ARF package for an
   Audio Server Media Gateway.

3.  Events

______________________________________________________________________
| Symbol       |   Definition           |  R   |   S       Duration   |
|______________|________________________|______|______________________|
| pa(parms)    |   PlayAnnouncement     |      |   TO      variable   |
| pc(parms)    |   PlayCollect          |      |   TO      variable   |
| pr(parms)    |   PlayRecord           |      |   TO      variable   |
| es(parm)     |   EndSignal            |      |   BR                 |
| oc(parms)    |   OperationComplete    |  x   |                      |
| of(parms)    |   OperationFailed      |  x   |                      |
|______________|________________________|______|______________________|


Cromwell                     Informational                      [Page 5]

RFC 2897              MGCP Advanced Audio Package            August 2000


   The events provided by the AS Package are defined as follows:

   PlayAnnouncement:
      Plays an announcement in situations where there is no need for
      interaction with the user.  Because there is no need to monitor
      the incoming media stream this event is an efficient mechanism for
      treatments, informational announcements, etc.

   PlayCollect:
      Plays a prompt and collects DTMF digits entered by a user.  If no
      digits are entered or an invalid digit pattern is entered, the
      user may be reprompted and given another chance to enter a correct
      pattern of digits.  The following digits are supported:  0-9, *,
      #, A, B, C, D.  By default PlayCollect does not play an initial
      prompt, makes only one attempt to collect digits, and therefore
      functions as a simple Collect operation.  Various special purpose
      keys, key sequences, and key sets can be defined for use during
      the PlayCollect operation.

   PlayRecord:
      Plays a prompt and records user speech.  If the user does not
      speak, the user may be reprompted and given another chance to
      record.  By default PlayRecord does not play an initial prompt,
      makes only one attempt to record, and therefore functions as a
      simple Record operation.

   OperationComplete:
      Detected upon the successful completion of a Play, PlayRecord, or
      Play Collect signal.

   OperationFailed:
      Detected upon the failure of a Play, PlayRecord, or PlayCollect
      signal.

   EndSignal:
      Gracefully terminates a Play, PlayCollect, or PlayRecord signal.
      For each of these signals, if the signal is terminated with the
      EndSignal signal the resulting OperationComplete event or
      OperationFailed event will contain all the parameters it would
      normally, including any collected digits or the recording id of
      the recording that was in progress when the EndSignal signal was
      received.


Cromwell                     Informational                      [Page 6]

RFC 2897              MGCP Advanced Audio Package            August 2000


4.  Signal Interactions

   If an Advanced Audio Package signal is active on an endpoint and
   another signal of the same type is applied, the two signals including
   parameters and parameter values will compared  If the signals are
   identical, the signal in progress will be allowed to continue and the
   new signal will be discarded. Because of this behavior the Advanced
   Audio Package may not interoperate well with some other packages such
   as the Line and Trunk packages.

5.  Parameters

   The PlayAnnouncement, PlayRecord, and PlayCollect events may each be
   qualified by a string of parameters, most of which are optional.
   Where appropriate,  parameters default to reasonable values.  The
   only event with a required parameter is PlayAnnouncement.  If a
   Play-Announcement event is not provided with a parameter specifying
   some form of playable audio an error is returned to the application.


Cromwell                     Informational                      [Page 7]

RFC 2897              MGCP Advanced Audio Package            August 2000


   These parameters are shown in the following table:

_______________________________________________________________________
| Parameters                                                           |
|______________________________________________________________________|
| Symbol    |  Definition                     |   pl   |  pc    |  pr  |
|___________|_________________________________|________|________|______|
| an        |  announcement                   |   x    |        |      |
| ip        |  initial prompt                 |        |  x     |  x   |
| rp        |  reprompt                       |        |  x     |  x   |
| nd        |  no digits reprompt             |        |  x     |      |
| ns        |  no speech reprompt             |        |        |  x   |
| fa