DOCUMENTATION
FLEXIBLE METADATA STORING - ELSEM REPORT -
- Introduction
- Storage alternatives for semantic metadata
- OpenCyc
- Overall description of the SLOR prototype
- Integrating the large commonsense ontology
- Problems found
- Final Proposed Architecture
DOWNLOAD
WORKING GROUP
Further Work
SourceForge.net Logo
OPENCYC

OpenCyc (Lenat, 1995) is the open source version of the Cyc technology, one of the most complete common sense knowledge bases. Nowadays, it’s being considered as a standard upper ontology by IEEE.

The next version of OpenCyc (ver 1.0) is expected to entire Cyc Ontology (more than 300,000 terms), concepts in other languages (translation skills) and new XML features to extend the Cyc Ontology with other schemas (such as the newest semantic web ontology languages).

The main advantage that Cyc has over other knowledge bases is the language in which its knowledge is written, CycL. It is a language whose syntax derives from first order logic calculus and from Lisp. This language provides a good expressiveness to write common sense rules rather than description logic that offer lesser level expressiveness (as it is a subset of the first order logic). On the other hand, CycL provides a clear syntax that enables a good performance in inference tasks instead of the XML syntax of semantic languages specifications. Thousands of terms and rules are written in CycL, thus building the different knowledge base layers.

OpenCyc has another version (with several restrictions) written in OWL including less terms and no rules whatsoever. Due to the vast taxonomy of the knowledge base written in OWL, the most popular reasoners need too much time to carry out simple inference. In the fourth section, we explain all the complications found.

The following sections depict a briefly introduction to the knowledge base structure and the storage techniques.

Knowledge Base Schema

A Common-Sense knowledge base is a vast taxonomy of concepts and relations. OpenCyc has a pyramidal layer structure ranging from the abstract to general concepts and relationships between these (figure 5):


Figure 5 - OpenCyc Knowledge Base Layers

- Upper Ontology: represents very general relations between very general concepts. The Upper Ontology doesn’t say much about the world at all.

- Core Theories: represent general facts about space, time, and causality. These are the theories that are essential to almost all common-sense reasoning.

- Domain-Specific Theories: These theories apply to special areas of interest like military movement, the propagation of diseases, finance, chemistry, etc

- Facts: These are statements about particular individuals in the world.

OpenCyc Ontology Storage

As it has been mentioned before, OpenCyc uses CycL to represent the knowledge. Even though the natural language has more expressiveness than CycL, this doesn’t enable the reasoning capabilities and provides special problems to store the knowledge. In contrast, CycL is a logic computer language that allows automated inference process without loosing the expressiveness enclosed in a first order logic model. However, not everything can be described using first order logic; but this problem is out of the scope of these technical notes. For a detailed discussion about this, see (CYCL ref).

The OpenCyC Knowledge base provides a big universe of discourse written with facts, predicates and rules using the well formed formulas.

a) Terms:

- Constants can denote individuals, collections, or collections of collections, e.g., #$GeorgeWBush, #$Sudan or the collection #$WorldLeader.

- Functions take arguments and return results, e.g. (#$PresidentFn #$France) return #$JacquesChirac or (#$GroupFn #$Person) that returns all collections of persons (Americans, smokers, athletes, etcetera)

- Variables take values defined in the universe of discourse, e.g., ?X or ?Y. A variable is a term used in predicates or functions.

b) Basic predicates:

- #$isa is the most basic term in CycL. This term is used to say that something is part of a collection. Everything belongs to at least one collection.

(#$isa #$Golf #$Sport)

- The #$genls term is used to say that one collection is a sub-collection of another.

(#$genls #$Cat #$OrganismClassificationType)

CycL has predicates that are used to describe syntactic and semantic conditions for writing well-formed sentences. These are #$arity and #$argxisa:

- #$arity denotes the number of arguments that a predicate must have, for example:

(#$arity #$biologicalMother 2)

- #$argxisa defines the type of predicate arguments, for example:

(#$arg1Isa #$biologicalMother #$Animal)
(#$arg2Isa #$biologicalMother #$FemaleAnimal).

CycL has an easy syntax to insert facts. The predicates previously defined are used to create new facts. As e.g. “Luisa is the biological mother of Anne”, what is exposed in CycL like this:

(#$biologicalMother #$Luisa #$Anne)


c) Well formed formulas:

(#$forAll ?COUNTRY
(#$forAll ?PERSON
  (#$implies
   (#$and
    (#$isa ?COUNTRY #$Superpower)
    (#$headsGovernment ?COUNTRY ?PERSON))
    (#$hasStatus ?PERSON #$WorldLeader))))

(#$forAll ?ANIMAL
  (#$implies
   (#$isa ?ANIMAL #$Vertebrate)
    (#thereExists ?PART
     (#$and
     (#$isa ?PART #$Tongue)
     (#$anatomicalParts ?ANIMAL ?PART)))))