Network Working Group David D. Clark (MIT) Request for Comments: 993 Mark L. Lambert (MIT) Obsoletes: RFC-984 December 1986 PCMAIL: A Distributed Mail System for Personal Computers 1. Status of this Document This document is a discussion of the Pcmail workstation-based distri- buted mail system. It is a revision of the design published in NIC RFC-984. The revision is based on discussion and comment from a variety of sources, as well as further research into the design of interactive Pcmail clients and the use of client code on machines other than IBM PCs. As this design may change, implementation of this document is not advised. Distribution of this memo is unlimit- ed. 2. Introduction Pcmail is a distributed mail system providing mail service to an ar- bitrary number of users, each of whom owns one or more workstations. Pcmail's motivation is to provide very flexible mail service to a wide variety of different workstations, ranging in power from small, resource-limited machines like IBM PCs to resource-rich (where "resources" are primarily processor speed and disk space) machines like Suns or Microvaxes. It attempts to provide limited service to resource-limited workstations while still providing full service to resource-rich machines. It is intended to work well with machines only infrequently connected to a network as well as machines per- manently connected to a network. It is also designed to offer disk- less workstations full mail service. The system is divided into two halves. The first consists of a sin- gle entity called the "repository". The repository is a storage center for incoming mail. Mail for a Pcmail user can arrive exter- nally from the Internet or internally from other repository users. The repository also maintains a stable copy of each user's mail state (this will hereafter be referred to as the user's "global mail state"). The repository is therefore typically a computer with a large amount of disk storage. The second half of Pcmail consists of one or more "clients". Each Pcmail user may have an arbitrary number of clients, typically single-user workstations. The clients provide a user with a friendly means of accessing the user's global mail state over a network. In order to make the interaction between the repository and a user's clients more efficient, each client maintains a local copy of its Clark & Lambert [Page 1] RFC 993 December 1986 user's global mail state, called the "local mail state". It is as- sumed that clients, possibly being small personal computers, may not always have access to a network (and therefore to the global mail state in the repository). This means that the local and global mail states may not be identical all the time, making synchronization between local and global mail states necessary. Clients communicate with the repository via the Distributed Mail Sys- tem Protocol (DMSP); the specification for this protocol appears in appendix A. The repository is therefore a DMSP server in addition to a mail end-site and storage facility. DMSP provides a complete set of mail manipulation operations ("send a message", "delete a mes- sage", "print a message", etc.). DMSP also provides special opera- tions to allow easy synchronization between a user's global mail state and his clients' local mail states. Particular attention has been paid to the way in which DMSP operations act on a user's mail state. All DMSP operations are failure-atomic (that is, they are guaranteed either to succeed completely, or leave the user's mail state unchanged ). A client can be abruptly disconnected from the repository without leaving inconsistent or damaged mail states. Pcmail's design has been directed by the characteristics of currently available workstations. Some workstations are fairly portable, and can be packed up and moved in the back seat of an automobile. A few are truly portable--about the size of a briefcase--and battery- powered. Some workstations have constant access to a high-speed local-area network; pcmail should allow for "on-line" mail delivery for these machines while at the same time providing "batch" mail delivery for other workstations that are not always connected to a network. Portable and semi-portable workstations tend to be resource-poor. A typical IBM PC has a small amount (typically less than one megabyte) of main memory and little in the way of mass storage (floppy-disk drives that can access perhaps 360 kilobytes of data). Pcmail must be able to provide machines like this with ade- quate mail service without hampering its performance on more resource-rich workstations. Finally, all workstations have some com- mon characteristics that Pcmail should take advantage of. For in- stance, workstations are fairly inexpensive compared to the various time-shared systems that most people use for mail service. This means that people may own more than one workstation, perhaps putting a Microvax in an office and an IBM PC at home. Pcmail's design reflects the differing characteristics of the various workstations. Since one person can own several workstations, Pcmail allows users multiple access points to their mail state. Each Pcmail user can have several client workstations, each of which can access the user's mail by communicating with the repository over a network. The clients all maintain local copies of the user's global mail state, and synchronize the local and global states using DMSP. It is also possible that some workstations will only infrequently be Clark & Lambert [Page 2] RFC 993 December 1986 connected to a network (and thus be able to communicate with the re- pository). The Pcmail design therefore allows two modes of communi- cation between repository and client. "Interactive mode" is used when the client is always connected to the network. Any changes to the client's local mail state are immediately also made to the repository's global mail state, and any incoming mail is immediately transmitted from repository to client. "Batch mode" is used by clients that have infrequent access to the repository. Users manipu- late the client's local mail state, queueing the changes locally. When the client is next connected to the repository, the changes are executed, and the client's local mail state is synchronized with the repository's global mail state. Finally, the Pcmail design minimizes the effect of using a resource- poor workstation as a client. Mail messages are split into two parts: a "descriptor" and a "body". The descriptor is a capsule mes- sage summary whose length (typically about 100 bytes) is independent of the actual message length. The body is the actual message text, including an RFC-822 standard message header. While the client may not have enough storage to hold a complete set of messages, it can usually hold a complete set of descriptors, thus providing the user with at least a summary of his mail state. For clients with extreme- ly limited resources, Pcmail allows the storage of partial sets of descriptors. Although this means the user does not have a complete local mail state, he can at least look at summaries of some messages. In the cases where the client cannot immediately store message bo- dies, it can always pull them over from the repository as storage be- comes available. The remainder of this document is broken up into sections discussing the following: - The repository architecture - DMSP, its operations, and motivation for its design - The client architecture - A typical DMSP session between the repository and a client - The current Pcmail implementation 3. Repository architecture A typical machine running repository code has a relatively powerful processor and a large amount of disk storage. It must also be a per- manent network site, for two reasons. First clients communicate with the repository over a network, and rely on the repository's being available at any time. Second, people sending mail to repository users rely on the repository's being available to receive mail at any Clark & Lambert [Page 3] RFC 993 December 1986 time. The repository must perform several tasks. First, and most impor- tantly, the repository must efficiently manage a potentially large number of users and their mail states. Mail must be reliably stored in a manner that makes it easy for multiple clients to access the global mail state and synchronize their local mail states with the global state. Since a large category of electronic mail is represented by bulletin boards (bboards), the repository should effi- ciently manage bboard mail, using a minimum of storage to store bboard messages in a manner that still allows any user subscribing to the bboard to read the mail. Second, the repository must be able to communicate efficiently with its clients. The protocol used to com- municate between repository and client must be reliable and must pro- vide operations that (1) allow typical mail manipulation, and (2) support Pcmail's distributed nature by allowing efficient synchroni- zation between local and global mail states. Third, the repository must be able to process mail from sources outside the repository's own user community (a primary outside source is the Internet). In- ternet mail will arrive with a NIC RFC-822 standard message header; the recipient names in the message must be properly translated from the RFC-822 namespace into the repository's namespace. 3.1. Management of user mail state Pcmail divides the world into a community of users. Each user is re- ferred to by a user object. A user object consists of a unique name, a password (which the user's clients use to authenticate themselves to the repository before manipulating a global mail state), a list of "client objects" describing those clients belonging to the user, and a list of "mailbox objects". A client object consists of a unique name and a status. A user has one client object for every client he owns; a client cannot communi- cate with the repository unless it has a corresponding client object in a user's client list. Client objects therefore serve as a means of identifying valid clients to the repository. Client objects also allow the repository to manage local and global mail state synchroni- zation; the repository associates with every global state change a list of client objects corresponding to those clients which have not recorded the global change locally. A client's status is either "active" or "inactive". The repository defines inactive clients as those clients which have not connected to the repository within a set time period (one week in the current re- pository implementation). When an inactive client does connect to the repository, the repository notifies the client that it has been "reset". The repository resets a client by marking all messages in the user's mail state as having changed since the client last logged in. When the client next synchronizes with the repository, it will receive a complete copy of the repository's global mail state. A Clark & Lambert [Page 4] RFC 993 December 1986 forced reset is performed on the assumption that enough global state changes occur in a week that the client would spend too much time performing an ordinary local state-global state synchronization. Messages are stored in mailboxes. Users can have an arbitrary number of mailboxes, which serve both to store and to categorize messages. A mailbox object both names a mailbox and describes its contents. Mailboxes are identified by a unique name; their contents are described by three numeric values. The first is the total number of messages in the mailbox, the second is the total number of unseen messages (messages that have never been seen by the user via any client) in the mailbox, and the third is the mailbox's next available message unique identifier (UID). The above information is stored in the mailbox object to allow clients to get a summary of a mailbox's contents without having to read all the messages within the mailbox. Some mailboxes are special, in that other users may read the messages stored in them. These mailboxes are called "bulletin board mail- boxes" or "bboard mailboxes". The repository uses bboard mailboxes to store bboard mail. Bboard mailboxes differ from ordinary mail- boxes in the following ways: - Their names are unique across the entire repository; for instance, only one bboard mailbox named "sf-lovers" may exist in the entire repository community. This does not preclude other users from having an ordinary mailbox named "sf-lovers". - Subscribers to the bboard are granted read-only access to the messages in the bboard mailbox. The bboard mailbox's owner (typically the system manager) has read/update/delete access to the mailbox. A bboard subscriber keeps track of the messages he has looked at via a bboard object. The bboard object contains the name of the bboard, its owner (the user who owns the bboard mailbox where all the mes- sages are stored), and the UID of the first message not yet seen by the subscriber . Users gain read-only access to a bboard by "subscribing" to it; they lose that access when they "unsubscribe" to it. Associated with each mailbox are an arbitrary number of message ob- jects. Each message is broken into two parts--a "descriptor", which contains a summary of useful information about the message, and a "body", which is the message text itself, including its NIC RFC-822 message header. Each message is assigned a monotonically increasing UID based on the owning mailbox's next available UID. Each mailbox has its own set of UIDs which, together with the mailbox name and user name, uniquely identify the message within the repository. Clark & Lambert [Page 5] RFC 993 December 1986 A descriptor holds the following information: the message UID, the message size in bytes and lines, four "useful" message header fields (the "date:", "to:", "from:", and "subject:" fields), and sixteen flags. These flags are given identifying numbers 0 through: 15. Eight of these flags are reserved for the repository's use. Some of these are actually used by the repository, while others are simply held for informational purposes. In the current repository implemen- tation these flags mark: - (#0) whether it has been deleted - (#1) whether the message has been seen - (#2) whether it has been forwarded to the user - (#3) whether it has been forwarded by the user - (#4) whether it has been filed (written to a text file outside the repository) - (#5) whether it has been printed (locally or remotely) - (#6) whether it has been replied to - (#7) whether it has been copied to another mailbox The remaining eight flags are reserved for future use. Descriptors serve as an efficient means for clients to get message information without having to waste time retrieving the message from the repository. 3.2. Repository-to-RFC-822 name tr