NCSC TECHNICAL REPORT - 005 Volume 2/5 Library No. S-243,039 FOREWARD This report is the second of five companion documents to the Trusted Database Management System Interpretation of the Trusted Computer System Evaluation Criteria. The companion documents address topics that are important to the design and development of secure database management systems, and are written for database vendors, system designers, evaluators, and researchers. This report addresses entity and referential integrity issues in multilevel secure database management systems. ACKNOWLEDGMENTS The National Computer Security Center extends special recognition to the authors of this document. The initial version was written by Vinti Doshi and Sushil Jajodia of the MITRE Corporation. The final version was written by Bill Wilson and Stan Wisseman of Arca Systems, Inc. The documents in this series were produced under the guidance of Shawn P. O'Brien of the National Security Agency, LouAnna Notargiacomo and Barbara Blaustein of the MITRE Corporation, and David Wichers of Arca Systems, Inc. We wish to thank the members of the information security community who enthusiastically gave of their time and technical expertise in reviewing these documents and in providing valuable comments and suggestions. TABLE OF CONTENTS SECTION PAGE 1.0 INTRODUCTION 1 1.1 BACKGROUND AND PURPOSE 1 1.2 SCOPE 1 1.3 INTRODUCTION TO ENTITY AND REFERENTIAL INTEGRITY 2 1.4 AUDIENCES OF THIS DOCUMENT 3 1.5 ORGANIZATION OF THIS DOCUMENT 4 2.0 BACKGROUND 5 2.1 BASIC CONCEPTS OF ENTITY AND REFERENTIAL INTEGRITY 6 2.2 REFERENTIAL INTEGRITY RULES 10 2.2.1 DELETE 10 2.2.2 UPDATE of Referenced Relation 13 2.2.3 NSERT or UPDATE of Referencing Relation 17 2.2.4 Additional Rules 17 2.3 PROBLEMS WITH REFERENTIAL INTEGRITY CONSTRAINTS 17 3.0 ENTITY AND REFERENTIAL INTEGRITY IN MLS DATABASES 18 3.1 RELATED TCSEC REQUIREMENTS 18 3.2 BASIC CONCEPTS IN MULTILEVEL RELATIONS 19 3.3 THE CONCEPT OF PRIMARY KEY IN MULTILEVEL RELATIONS 21 3.4 ENTITY INTEGRITY IN MULTILEVEL RELATIONS 21 3.5 REFERENTIAL INTEGRITY IN MULTILEVEL RELATIONS 22 3.6 ENFORCEMENT OF INTEGRITY CONSTRAINTS 25 4.0 COVERT CHANNELS 27 4.1 PRIMARY KEY UNIQUENESS 27 4.2 ENFORCEMENT OF REFERENTIAL INTEGRITY RULES 28 4.2.1 Enforcement of Integrity Rules When C[FK] = C[PK] 29 4.2.2 Enforcement of Integrity Rules When C[FK] > C[PK] 30 4.3 RELATION TO DBMS ARCHITECTURE 36 5.0 SUMMARY 38 REFERENCES 39 LIST OF FIGURES FIGURE PAGE 2.1: RELATION SOD 6 2.2: RELATION PS 8 2.3: REFERENTIAL CONSTRAINTS IN RELATIONS PR, PS, AND SOD 9 2.4: RELATIONS SOD AND PS AFTER STARSHIP = "APOLLO" IS DELETED UNDER THE CASCADES-DELETE RULE 11 2.5: RELATIONS SOD AND PS AFTER STARSHIP = "APOLLO" IS DELETED UNDER THE NULLIFIES-DELETE RULE 12 2.6: RELATIONS SOD AND PS AFTER STARSHIP = "ENTERPRISE" IS UPDATED TO THE VALUE "USS ENTERPRISE" UNDER THE CASCADES-UPDATE RULE 15 2.7: RELATIONS SOD AND PS AFTER STARSHIP = "ENTERPRISE" IS UPDATED TO THE VALUE "USS ENTERPRISE" UNDER THE NULLIFIES-UPDATE RULE 16 3.1: DIFFERENT SOD VIEWS BASED ON ACCESS CLASS 21 3.2: TYPICAL RELATION INSTANCES FOR SOD AND PS 23 3.3: UNCLASSIFIED INSTANCES OF PS AND SOD RELATIONS 24 4.1: SOD AFTER INSERTION OF TUPLE CORRESPONDING TO STARSHIP = "VOYAGER" BY UNCLASSIFIED USER 28 4.2: RELATION PS AND SOD WHEN C[FK] = C[PK] 30 4.3: RELATIONS SOD AND PS WITHOUT POLYINSTANTIATION 32 4.4: RELATIONS WITH TUPLE-LEVEL GRANULARITY 34 4.5: RELATIONS WITH ATTRIBUTE-LEVEL GRANULARITY 35 LIST OF TABLES TABLE PAGE 4.1: SUMMARY OF COVERT CHANNELS IN REFERENTIAL INTEGRITY RULES 36 SECTION 1 INTRODUCTION This document is the second volume in the series of companion documents to the Trusted Database Management System Interpretation of the Trusted Computer System Evaluation Criteria [TDI 91; DoD 85]. This document examines entity and referential integrity issues in multilevel secure (MLS) database management systems and summarizes the research to date in these areas. 1.1 BACKGROUND AND PURPOSE In 1991 the National Computer Security Center published the Trusted Database Management System Interpretation (TDI) of the Trusted Computer System Evaluation Criteria (TCSEC). The TDI, however, does not address many topics that are important to the design and development of secure database management systems (DBMSs). These topics (such as inference, aggregation, and database integrity) are being addressed by ongoing research and development. Since specific techniques in these topic areas had not yet gained broad acceptance, the topics were considered inappropriate for inclusion in the TDI. The TDI is being supplemented by a series of companion documents to address these issues specific to secure DBMSs. Each companion document focuses on one topic by describing the problem, discussing the issues, and summarizing the research that has been done to date. The intent of the series is to make it clear to DBMS vendors, system designers, evaluators, and researchers what the issues are, the current approaches, their pros and cons, how they relate to a TCSEC/TDI evaluation, and what specific areas require additional research. Although some guidance may be presented, nothing contained within these documents should be interpreted as criteria. These documents assume the reader understands basic DBMS concepts and relational database terminology. A security background sufficient to use the TDI and TCSEC is also assumed; however, fundamentals are discussed whenever a common understanding is important to the discussion. 1.2 SCOPE This document addresses entity and referential integrity issues in multilevel secure DBMSs. It is the second of five volumes in the series of TDI companion documents, which includes the following documents: � Inference and Aggregation Issues in Secure Database Management Systems [Inference 96] � Entity and Referential Integrity Issues in Multilevel Secure Database Management Systems � Polyinstantiation Issues in Multilevel Secure Database Management Systems [Poly 96] � Auditing Issues in Secure Database Management Systems [Audit 96] � Discretionary Access Control Issues in High Assurance Secure Database Management Systems [DAC 96] This series of documents uses terminology from the relational model to provide a common basis for understanding the concepts presented. For most of the topics covered in this series the concepts presented should apply to most database modeling paradigms, depending on the specifics of each model. For the entity and referential integrity constraints discussed in this document, however, there are important differences between different models. This is because the primary keys in a relational DBMS represent both unique identifiers for tuples and values of attributes in the tuple. They must be protected in accordance with the sensitivity of the data they represent and this can interfere with their role as a unique identifier of tuples. By contrast, the object identifier (OID) in an Object-Oriented DBMS is not a property or attribute of the object it identifies, and does not represent a user-visible data value. Khoshafian and Copeland provide a useful taxonomy of different ways of specifying object identity including the techniques used in relational and Object- Oriented DBMSs [Khoshafian 86]. Similarly, in a relational DBMS, foreign keys represent data values and serve as pointers to other tuples. In an Object-Oriented DBMS, pointers to other objects are not used to store data values, but rather to represent the inclusion of one object within another. As in relational databases, problems may occur if a reference is deleted and some references remain in objects. The implications of these distinctions for enforcement of secrecy in an Object-Oriented DBMS require further research. This document specifically addresses relational DBMSs. This document is related to the companion documents Inference and Aggregation Issues in Secure Database Management Systems and Polyinstantiation Issues in Multilevel Secure Database Management Systems. Much of the discussion of the relationship between enforcement of integrity constraints and multilevel security centers on the potential inference channels which integrity constraints can introduce. One way to avoid these channels for entity integrity is to use polyinstantiation. This is discussed later in more detail. 1.3 INTRODUCTION TO ENTITY AND REFERENTIAL INTEGRITY Secrecy and integrity are two key concepts in the community of database security researchers and users. Secrecy refers to the protection or safety of the data against unauthorized disclosure, and integrity refers to the unauthorized alteration or destruction of data. In the wider database research community, integrity refers more generally to the correctness, accuracy, and internal consistency of data. Preserving the accuracy of information is extremely important rn any database. In the relational model, one way in which accuracy of information is preserved is by preventing updates that violate integrity constraints. Entity integrity and referential integrity are two of the most important integrity constraints. They are defined on relations and are enforced by the DBMS. Entity integrity states that no primary key attribute of a relation is allowed to have null values. In addition, the primary key must have a unique value. These properties of a primary key correspond to the fact that entities in the real world are distinguishable (i.e., they have a unique identification of some kind). In the relational model, the primary key performs the function of unique identification. Referential integrity is an interrelation integrity constraint. In general, referential integrity defines existence relationships between entities in a database that need to be maintained by the DBMS. This document defines these integrity constraints formally and presents their importance in the context of security in detail. In multilevel applications, data are classified to control disclosure. It is important to realize the necessary trade-off to be made between secrecy and integrity because of the challenges in enforcing integrity constraints across multiple secrecy levels without disclosure of information. This document defines entity and referential integrity for a standard relational DBMS and then presents methods for resolving the inherent integrity/secrecy conflict in an MLS environment. 1.4 AUDIENCES OF THIS DOCUMENT This document is targeted at four primary audiences: the security research community, database application developers/system integrators, trusted product vendors, and product evaluators. In general, this document is intended to present a framework for understanding the nature of entity and referential integrity policies or requirements and how they conflict with the need to enforce an access control policy. This framework is used to describe the research that has been done to date and the approaches they present for providing solutions. Each of the specific audiences should expect to get the following from this document: Researcher This document describes the basic problems in simultaneously enforcing integrity constraints and MAC. Some important research contributions are discussed as various topics are covered. Related research on other aspects of integrity and the relationship to secrecy is discussed in Section 3.6. For additional relevant work, the researcher should consult two other companion documents, Inference and Aggregation Issues in Secure Database Management Systems [Inference 96] and Polyinstantiation Issues in Multilevel Secure Database Management Systems [Poly 96]. Database Application Developer/System Integrator This document describes various entity and referential integrity facilities which could be provided by an MLS DBMS. It discusses the implications of these facilities relative to the tradeoffs involved in meeting a system's requirements for secrecy and integrity. Trusted Product Vendor This document describes the conflicts between Mandatory Access Controls (MAC) and multilevel entity and referential integrity constraints. It then discusses approaches to entity and referential integrity enforcement in an MLS database and the benefits and drawbacks of these approaches. In particular, it considers how the issues with enforcement of entity and referential integrity constraints in an MLS database relate to the TCSEC access control and covert channel requirements. This is discussed in the context of tuple level as well as element level labeling. This document will provide a framework for understanding specific requirements interpretations as they are developed by the National Computer Security Center. Evaluator This document describes access control and covert channel issues related to entity and referential integrity. It provides a framework for analyzing specific vendor mechanisms and understanding Technical Review Board/Interpretations Working Group decisions. 1.5 ORGANIZATION OF THIS DOCUMENT The organization of the remainder of this document is as follows: � Section 2 gives the definitions of entity and referential integrity with respect to single-level DBMSs and describes various important concepts related to referential integrity. � Section 3 defines the basic concepts of entity and referential integrity with respect to multilevel relations. � Section 4 discusses the issues associated with covert channels related to enforcing entity and referential integrity constraints. � Section 5 presents conclusions. SECTION 2 BACKGROUND Entity integrity and referential integrity are two basic integrity requirements in a relational DBMS (RDBMS). They help prevent incorrect data from being entered into the database. Entity integrity guarantees a unique representation of each entity in the database through specification of primary key attributes for each relation. Referential integrity assures that if any references exist between two or more entities, then the related entities do exist in the database. Referential integrity places restrictions on foreign key attributes. Referential integrity is one of the most important interrelation integrity constraints for the relational data model. Although its basic concepts have been discussed over the past ten years, a successful referential integrity capability for a RDBMS is still a challenging implementation problem. While referential integrity appears to be a simple concept, the operational definitions that appear in the literature are imprecise and lead to a host of anomalies. A standard relational database can be perceived by its users as a collection of relations. A relation has well-defined mathematical properties and is used to store data. Each relation has two parts: 1. A state-invariant relation schema R (A1, A2, ..., An), where each Ai is an attribute over some domain Di, 1 � i � n, which is a set of acceptable values for Ai. 2. A state-dependent relation R over R, which is a set of distinct tuples of the form (a1, a2..., an), where each element ai; is a value in domain di, 1 � i � n. The following notation and conventions are used throughout this document. In definitions, relation schemes are denoted in bold fonts (such as R, R1, R2), and the corresponding relations are denoted using plain fonts (such as R, R1, R2). However, when clear in specific examples, the same symbol is used to denote the relation schema, as well as the corresponding relation. Let X denote one or more attributes of a relation R. If t is a tuple in a relation R, t [X] denotes the projection of t onto X. This section first defines entity and referential integrity, followed by terminology related to these concepts. The examples make use of Structured Query Language (SQL) syntax [ANSI 92, 94]. The section then gives various rules for enforcing referential integrity and discusses the problems associated with referential integrity in single-level relational databases. From this understanding of integrity, we can then better discuss the conflict with enforcement of a MAC policy in Section 3. 2.1 BASIC CONCEPTS OF ENTITY AND REFERENTIAL INTEGRITY Before defining entity integrity, it is necessary to define the concept of a primary key. In simple terms, the primary key of a relation is just a unique identifier of each tuple in the relation. More precisely, a primary key for a relation R is a set of attributes PK with the following properties: 1. The value of PK uniquely identifies each tuple in the relation R. That is, for any state of R, and any tuples t1 and t2, if t1[PK] = t2[PK] then t1 = t2. 2. PK is minimal. That is, there is no proper subset of PK satisfying property 1. It is possible there is more than one set of attributes satisfying these properties. Any such set is called a candidate key. Precisely one candidate must be chosen and designated as the primary key of R. Example 1. Consider the relation schema SOD (Starship, Objective, Destination), which contains for each starship its name (Starship), the purpose of the flight (Objective), and its destination (Destination). An example relation for relation schema SOD is given in Figure 2.1. STARSHIP OBJECTIVE DESTINATION Enterprise Exploration Talos Voyager Spying Mars Apollo Exploration Moon Saratoga Mining Moon Figure 2.1: Relation SOD Since the attribute Starship uniquely identifies each tuple of the relation SOD, it is a candidate key of the relation SOD. As Starship is the only candidate key of the relation SOD, it is also the primary key of relation SOD. In SQL syntax, the attribute(s) constituting the primary key can be specified when a relation schema is created, as the following shows: CREATE TABLE SOD (Starship CHAR(20) NOT NULL, Objective CHAR(15), Destination CHAR(10), PRIMARY KEY (Starship)) Notice that the ANSI syntax uses the term "TABLE" instead of the term "relation" used in this document. The entity integrity constraint requires that all attribute values of the primary key be defined for each tuple. Entity Integrity: If PK is the primary key for relation R, then for all states of R, for every tuple t in R and every attribute A in PK, t[A] is not null. A foreign key provides a basis for a tuple to refer to another tuple. Let R1 and R2 be two relation schemas, and let R1 and R2 denote the respective relations corresponding to these schemas. Referential Integrity: Let PK denote the primary key of R2, and let FK denote one or more attributes of the relational schema R1. FK is said to be a foreign key of R1 referencing R2 if given any tuple t1 in R1, the following two requirements are met: 1. t1 [FK] is either wholly null or wholly non-null, and 2. Whenever t1[FX] is non-null, there is a tuple t2 in R2 such that t1[FK] = t2[PK]. Here R1 is the referencing relation and R2 is the referenced relation. Sometimes PK is referred to as the target value of the foreign key FK. Notice that R1 and R2 need not be distinct. Example 2. In addition to the relation schema SOD given in Example 1, consider the relation schema PS (Person_Name, Starship), which contains the name of the employee (Person_Name) and the name of the ship assigned to the employee (Starship). An example relation for PS is given in Figure 2.2. Suppose Person_Name uniquely identifies each employee in the PS relation, then Person_Name becomes the primary key for the relation schema PS. If Starship is declared a foreign key for PS with the relation schema SOD as the referenced relation, a DBMS that supports referential integrity will automatically ensure that whenever an employee is assigned a (non-null) ship, there is a matching tuple in the relation schema SOD with an identical value for the attribute Starship. Notice that employee Mr. Spock, given in Figure 2.2, has not been assigned any starship. It seems from the definition of the PS relation that nulls are allowed. In some applications it may be desirable to prohibit this situation (i.e., every employee must be assigned a starship). The referencing relation's schema definition can specify whether or not the foreign key attribute can assume a null value. Using SQL syntax, foreign keys can be specified as follows: CREATE TABLE PS Person_Name CHAR(15) NOT NULL, Starship CHAR(20), PRIMARY KEY (Person_Name), FOREIGN KEY (Starship) REFERENCES SOD (Starship)). PERSON_NAME STARSHIP James Kirk Enterprise Mr. Spock Tim McKelley Apollo Figure 2.2: Relation PS It follows from the definition of the foreign key that there is an identical matching primary key value in the referenced relation for every foreign key value in the referencing relation. It is important to maintain the integrity between the referencing values (foreign key values) and the referenced values (primary key values). Example 3. Consider the relation schema PR (Person_Name, Rank), which contains the employee name (Person_Name) and the rank of the employee (Rank). Person_Name is the primary key for the relation schema. Person_Name is also the foreign key referring to relation schema PS. The relation schema PR can be expressed in SQL as follows: CREATE TABLE PR (Person_Name CHAR(20) NOT NULL, Rank CHAR(15), PRIMARY KEY (Person_Name), FOREIGN KEY (Person_Name) REFERENCES PS (Person Name)) Here, the relation schema PR references the relation schema PS given in Example 2. In addition, the relation schema PS in Example 2 references the relation schema SOD. Since there is a referential constraint from PR to PS, and also from PS to SOD, relation schema PS becomes both a referencing relation and a referenced relation, see Figure 2.3. Figure 2.3: Referential Constraints in Relations PR, PS, and SOD This can be depicted as a referential path from PR to SOD as follows: PR > PS > SOD Referential Cycles: Closed referential paths, or referential paths that point back to the starting relation, are referred to as referential cycles. In general, the referential cycle including n relations (where n > 1) can be represented as follows [Date 86, 90]: Rn > Rn-1 > Rn-2 > ... > R2 > R1 > Rn. Referential cycles are of interest since they can cause insertion problems. In the presence of referential cycles, it is necessary that one of the following constraints be true; otherwise, it is impossible to insert the first tuple in any or each of the relations: 1. One of the foreign keys in the referential cycle must be allowed to have a null value, or 2. The constraint check should be done only at the end-of-transaction (i.e., when the transaction commits). A self-referencing relation is a special case of a referential cycle that includes exactly one relation: a relation can have multiple foreign keys corresponding to different relations. Some important concepts related to the terms defined in this section can be summarized as follows: � A primary key of a relation cannot have a null value in any tuple within that relation (entity integrity) and each primary key must be unique (property of a primary key). � Foreign keys can have null values. For example, in Figure 2.2 the employee Mr. Spock has not been assigned any starship as yet; therefore, the Starship value for Mr. Spock is null. (Note: If in a specific instance it is desired to prohibit null values for foreign keys, this can be done by using a NOT NULL clause for the attributes in the foreign key when the relation is created.) � For referential integrity, every non-null value of a foreign key must be matched by an identical primary key value in the referenced relation. But the converse need not be true; for example, starship Saratoga in Figure 2.1 has no employee assigned to it in Figure 2.2. � A relation can be a referenced relation, as well as a referencing relation. 2.2 REFERENTIAL INTEGRITY RULES Whenever two or more relations are related through referential constraints, it is necessary that references be kept consistent in the face of insertions, deletions, and updates to these relations. Date identifies several actions which can be taken to maintain consistency of references [Date 90]. Exactly which action is chosen for a particular relation depends on the behavior desired by the underlying application. These possible actions are discussed in the following subsections. 2.2.1 DELETE When the target of a foreign key is deleted it could result in a dangling reference. Some action is needed to make sure this does not happen. Four options that may be specified for the referenced relation are RESTRICTED-delete, CASCADES-delete, NULLIFIES-delete, and SET DEFALILT-delete. Each of these is explained in turn. RESTRICTED-delete If the RESTRICTED-delete option is specified, the primary key value in the referenced relation cannot be deleted as long as there exists at least one matching foreign key value in the referencing relation. The deletion of the referenced value is restricted to the case where there is no corresponding referencing value in the referencing relation. For instance, suppose an attempt is made to delete the starship Apollo from the relation SOD. Since the referencing relation PS refers to the relation SOD through the attribute Starship, under the RESTRICTED-delete rule, this deletion cannot be performed as there is a tuple in PS referencing starship Apollo. The delete is denied and the two relations would remain unchanged. The syntax for specifying the RESTRICTED-delete rule may be given as follows: CREATE TABLE PS (Person_Name CHAR(20) NOT NULL, Starship CHAR(15), PRIMARY KEY (Person_Name), FOREIGN KEY (Starship) REFERENCES SOD (Starship), ON DELETE RESTRICT) CASCADES-delete Under the CASCADES-delete option, whenever the target tuple of a foreign key is deleted, all referencing foreign key tuples are also deleted. In the previous example, if an attempt is made to delete the starship Apollo from the relation SOD, the delete would be cascaded to the relation PS and the employee tuples referencing the starship Apollo would also be deleted. After such a CASCADES-delete operation, the relations SOD and PD would change to the form given in Figure 2.4. SOD STARSHIP OBJECTIVE DESTINATION Enterprise Exploration Talos Voyager Spying Mars Saratoga Mining Rigel PS PERSON_NAME STARSHIP James Kirk Enterprise Mr. Spock Figure 2.4: Relations SOD and PS after Starship = "Apollo" is Deleted under the CASCADES-Delete Rule In SQL, the CASCADES-delete rule can be specified as follows: CREATE TABLE PS (Person_Name CHAR(20) NOT NULL, Starship CHAR(15), PRIMARY KEY (Person_Name), FOREIGN KEY (Starship) REFERENCES SOD (Starship), ON DELETE CASCADE) NULLIFIES-delete Whenever the NULLIFIES-delete option is specified for a foreign key, the referencing foreign key attribute values will be set to a null value when the tuple containing the referenced value is deleted. In the example given earlier, when starship Apollo is deleted, the foreign key value, for all tuples referencing Apollo in relation PS, is set to null. The new state of the relations is as shown in Figure 2.5. SOD STARSHIP OBJECTIVE DESTINATION Enterprise Exploration Talos Voyager Spying Mars Saratoga Mining Rigel PS PERSON_NAME STARSHIP James Kirk Enterprise Mr. Spock Tim McKelley Figure 2.5: Relations SOD and PS after Starship = "Apollo" is Deleted under the NULLIFIES-Delete Rule Before setting the referencing value to null, it is important to make sure that the relation schema allows a null value for the foreign key attribute. Otherwise, there will be a conflict. The NULLIFIES-delete rule can be expressed in SQL as follows: CREATE TABLE PS (Person_Name CHAR(20) NOT NULL, Starship CHAR(15), PRIMARY KEY (Person_Name), FOREIGN KEY (Starship) REFERENCES SOD (Starship), ON DELETE SET NULL) SET DEFAULT-delete The SQL2 [ANSI 92] and SQL3 [ANSI 94] standards provide yet another referential action for the delete operation, called SET DEFAULT. This option is similar to the NULLIFIES delete rule given above. The only difference is that the SET DEFAULT option contains a default clause that specifies the default value for the attribute. The default value may be a null or a non-null value. If the referential action specifies the SET DEFAULT option, then each referencing attribute is defined by an implicit or explicit default clause. If the default value is null, then SET DEFAULT is equivalent to a SET NULL. The SQL syntax for the SET DEFAULT-delete rule is as follows: CREATE TABLE PS (Person_Name CHAR(20) NOT NULL, Starship CHAR(15), PRIMARY KEY (Person_Name), FOREIGN KEY (Starship) REFERENCES SOD (Starship), ON DELETE SET DEFAULT < option>) The default option is specified in the