Integrity Constraints: Semantics and Applications |
| Parke Godfrey, John Grant, Jarek Gryz, & Jack Minker |
(Many people had picked up the April draft, so we leave it here now not to cause those people confusion. The newer draft is cleaner and more concise.)
Databases contain knowledge as well as data. The database's schema (how the data is organized) is knowledge, which yields constraints on the form the data must take. The relationships between data that must hold, such as functional and inclusion dependencies, are knowledge. General rules about the world or domain, to which the database's data must always conform, are knowledge as well. Such knowledge defines the semantics of the database. It is beneficial for a database to store explicitly its knowledge, in addition to its data. This has long been recognized in relational databases. Some of the database's knowledge is captured and stored via integrity constraints, statements about what are the legal states and transitions of the database. Integrity constraints (ICs) were introduced to prevent the entering of incorrect data into the database and to check the integrity of the database.
Integrity constraints actually have much wider applicability. In addition to integrity checking, these include query optimization via semantics, cooperative query answering, combining databases in a semantically consistent manner, and view updating. It is commonly held that integrity constraints are an adequate and suitable knowledge representation in databases. Thus the types of knowledge that should be kept by databases can, and should, be written as ICs. By having a standard, uniform representation for the database's knowledge, the various applications that rely on the database's semantics can all employ the same representation.
In this chapter, we consider logic databases (also called deductive databases). Logic databases employ the logic model, a subset of the first-order predicate calculus, to describe the database and queries. Records are represented as logical facts. Rules in logic databases allow implicit facts to be derived, via logical deduction. (Views play such a role in relational databases.) The logic model can be extended to allow formulas as integrity constraints. The advantage of taking a logical approach to databases is that data, rules, queries, and integrity constraints can be all handled in a common framework, and formal techniques rather than ad hoc approaches can then be employed for all database applications.
There is a broad body of work on logic and relational databases, and a general consensus on what databases (facts and rules) and queries mean. However, there is less work on the meaning of integrity constraints, and certainly no consensus. What is meant by an IC can differ widely from system to system. For instance, one may define that ICs must be consistent with the database, or define that they must be provable statements, deducible from the database. Another view is that ICs really represent meta-knowledge_knowledge about the database itself_and should, perhaps, be written in an extended logic beyond first-order. The general situation becomes more complex when we permit databases to contain indefinite (disjunctive) information or to use negation. Subtle but profound differences in meaning can arise due to different interpretations of ICs. In many systems, the semantics for ICs is never made clear; at times, one interpretation seems intended, while at other times, another interpretation is evident. This ambiguity is dangerous, and could allow a database to become corrupt in unanticipated ways.