DOCUMENTATION
FLEXIBLE METADATA STORING - ELSEM REPORT -
- Introduction
- Storage alternatives for semantic metadata
- OpenCyc
- Overall description of the SLOR prototype
- Integrating the large commonsense ontology
- Problems found
- Final Proposed Architecture
DOWNLOAD
WORKING GROUP
Further Work
SourceForge.net Logo
Integrating the large commonsense ontology
The IEEE LOM standard (LTSC, 2002) has several elements that cover different metadata information to describe a learning object. LOM stores all knowledge about a learning object on these fields in natural language. Only being human-readable, this feature provides a good expressiveness but does not enable the reasoning capabilities over metadata records.

As an example, the coverage field could link to ontology concepts such as “Baroque” or “Renaissance period” to insert new facts that enable reasoning capabilities and provide new room to create powerful search methods. The assertions inserted in a field must be described in a logic language.

Marvin Minsky said: “People have silly reasons why computers don't really think. The answer is we haven't programmed them right; they just don't have much common sense". The key is given in the following question jointly with the Marvin Minsky assertion: How to create the common sense on computer reasoning process? In order to answer this question and to enable the capacity understand and reason about the world as intimately as people do, Cyc Corporation has been doing a huge research effort during the last years.

Although the launch of the release candidate was scheduled for April 2006, and delayed to August, still nowadays it doesn’t launch. It is not available yet, and for that reason we have been forced to use the version 0.9.

Our research explores the integration of the common sense reasoning in SLOR using the OpenCyc knowledge base. In particular, We have studied two ways to integrate OpenCyc in SLOR:

• First approach: Using an OpenCyc Server connecting thought the OpenCyc Java API.

• Second approach: OpenCyc knowledge base stored in a RDBMS System, such as MySQL or Oracle, and management by RDF Frameworks using the semantic technologies features.

OpenCyc Java API

When the OpenCyc server starts up, it creates a new instance that opens the default port (3306) and following it shows the CycL prompt “CYC(1):” at the command line.

We can connect to an OpenCyc Server instance through the OpenCyc Java API. In order to open a new connection, we need an instance of the CycAccess class. The class’s constructor receives the connection parameters (server address, server port, connection protocol and type of connection) that set up the connection against the server. We can use the traceOn method it we want to enable the status messages on the server console.

public class OpencycController

{

      CycAccess _cycaccess;

      /** Creates a new instance of OpencycController */

      public OpencycController() throws java.io.IOException

      {

            try

            {

 

                  cycaccess = new CycAccess("192.168.0.31",

                  3600,

                  CycConnection.BINARY_MODE,

                  CycAccess.PERSISTENT_CONNECTION);

 

                  _cycaccess.traceOn();

            }

            catch (java.io.IOException err)

            {

                  throw (err);

            }

      }

      //…

}

Connect to the OpenCyc Server

We can also send CycL commands such as queries ‘cyc-query’ to retrieve inference data of the knowledge base. It is an easy process that has four steps shown below in the example:

1) Declare all variables by means of the CycVariable class.
2) Build the query using the CycAccess object.
3) Link the declared variables to the query through the CycObject factory.
4) Finally, send the query through the CycAccess object and get all results into a CycList.

 

public class OpencycController

{

      CycAccess _cycaccess;

 

      public ArrayList getLivingLanguages() throws java.io.IOException

      {

 

            CycVariable languageVariable = null;

            CycList response = null;

            ArrayList results = null;

            CycList query = _cycaccess.current().makeCycList("(#$isa ?X #$LivingLanguage)");

            languageVariable = CycObjectFactory.makeCycVariable("?X");

 

            try

            {

                  CycConstant mt =

                  this._cycaccess.getConstantByName("InferencePSC");

                  response =

                  _cycaccess.current().askWithVariable(

                  query,

                  languageVariable, _cycaccess.inferencePSC);

                  results = new java.util.ArrayList();

                  Iterator iterator = response.iterator();

                  while (iterator.hasNext())

                  {

                        CycConstant item = (CycConstant)iterator.next();

                        results.add(item.getName());

                  }

                  return results;

            }

            catch (java.io.IOException err)

            {

                  throw (err);

            }

      }

      // …

 

}

Simple Inference: an example of living languages retrieval from OpenCyc

We have evaluated different queries registering the different response times. The results showed a satisfactory low average in the global response time. This fact postulated this approach as an excellent bet for the final SLOR design.

Are all metadata records written in CycL language?

There not exists an easy answer; OpenCyc has a shut model that does not binding with other ontology concepts that written in other languages. Although there are several research projects studying the OpenCyc interaction with other thesaurus, corpus or ontologies schemas to improve the common sense reasoning, the real implementation does not exits yet, in particular the interaction with semantic web schemas.

In order to provide a flexible schema that enables the reasoning capabilities offered by both the OpenCyc knowledge base and the Semantic Web technologies, we have studied two different approaches to store the knowledge within learning object metadata records:

- To store the metadata records in OpenCyc format: In this approach, we have chosen to write the SLOR ontology in CycL as a MicroTheory into OpenCyc Knowledge Base, because of the high performance offered by the OpenCyc engine. The link to other concepts and the reasoning with these are the only problems. The paper (Reed and Lenat) that shows a study about ontology mappings to OpenCyc, the same way we choose to link to other ontologies concepts despite the fact that it needs too many assertions.

(synonymousExternalConcept TERM SOURCE STRING)

(overlappingExternalConcept TERM SOURCE STRING)

(extConceptOverlapsColAndReln COL RELN SOURCE STRING)

A future research direction is about to extend the OpenCyc inference together with different ontologies written in different languages.

- A Blended model between OpenCyc and OWL Ontologies: We postulated the following hypothesis: If all the information is stored in the same format (RDF) we can increase the reasoning capabilities because it is feasible to link and create concepts between different ontologies without the necessity of establishing a mapping process. Following this hypothesis, we have serialized the OWL OpenCyc file into a relational database format using a semantic web framework (the next section explains this process in-depth). Although this approach solves the problems enunciated in the previous approach (since all knowledge is stored in the same format), the reasoning capabilities are considerably lower. The OWL language only provides a subset of the first order logic, called description logic (Baader, et al. 2003) materialized in the OWL-DL specification. However this problem is less important since there are new rule languages (e.g. the SWRL combining OWL and RULE-ML (Horrocks, et al. 2002)) that improve the reasoning capabilities providing new inference features. To enable a machine readable description within learning object metadata records, it is necessary to write the knowledge in a logic language. Besides, the techniques used must keep OWL axioms. Therefore, we can conclude that in order to carry out the creation of the OWL relationships among different fields into a metadata record, a sublanguage is required.

We had developed a sublanguage to insert semantic expressions into LOM metadata fields. For a detailed discussion, see (Sicilia, Sánchez-Alonso, and Soto, 2005).

Figure 7 - SLOR Sublanguage: Adding semantic expressions
within metadata records

OpenCyc integrated in RDF Frameworks

OpenCyc can be integrated into a semantic web framework making use of the OWL file published in the web (more than 700MB). We have been worked with Jena and Sesame, two of the most widely used semantic web frameworks.

When a new OpenCyc OWL model is created with Jena, all triples gets stored into the stmt table (figure [8]).

Figure 8 - Snapshot of Stmt Table View (OpenCyc stored in Jena)

A different look shows the Triples table of Sesame, when a new OpenCyc OWL model is created. It is strongly connected into a relational system (figure [9]).


Figure 9 - Snapshot of Triples Table View (OpenCyc stored in Sesame)

In order to create this model it is necessary to open a new connection against a database system. Focusing Jena, the code below shows how can the OpenCyc OWL file can be stored in a relational database and how can it be connected to a JENA model too.

public class OntoSchemaFactory

{

      //…

      public Model createOpenCycModel() throws PersistenceException

      {

            // Load the Driver

            try

            {

                  Class.forName("com.mysql.jdbc.Driver");

            }

            catch (ClassNotFoundException ex)

            {

                  SystemPersistenceException errsp = new SystemPersistenceException();

                  throw (errsp);

            }

 

            // URL of database server

            String DB_URL2 = "jdbc:mysql://192.168.0.31/opencyc"; String DB_USER = "root"; // database user id

            String DB_PASSWD = "senux"; // database password

            String DB = "MySQL"; // database type

 

            // Create database connection

            IDBConnection conn2 =

            new DBConnection(DB_URL2, DB_USER, DB_PASSWD, DB);

            try

            {

                  ModelMaker c = ModelFactory.createModelRDBMaker(conn2);

                  Model opencyc;

                  if (!conn2.containsModel("opencyc"))

                  {

                        //Create Model OpenCyc

                        opencyc = c.createModel("opencyc");

                        System.out.println("CONSTRUCCIÓN DEL MODELO");

                        opencyc.read("http://www.cyc.com/2004/06/04/cyc/#");

                  }

                  else

                  {

                        //Open Model OpenCyc

                        c.openModel("opencyc");

                  }

 

                  opencyc = ModelRDB.open(conn2, "opencyc");

                  return (opencyc);

            }

            catch (RDFRDBException ex1)

            {

                  ConnectPersistenceException errcon = new

                  ConnectPersistenceException();

                  throw (errcon);

            }

      }

      // …

}

Jena Persistent Framework - Connect to underlay database system.


Creating a Jena Model

Once the persistent connection has been established, the model parameters information should be set to enable transactions both for changing the schema and appending new information. Besides, it is possible to enable an OWL reasoner to discover hidden knowledge among ontological relationships. Jena provides different choices to fix the reasoning level used by the built-in JENA reasoner. In particular, there are four reasoning levels, ranging from simple to complex:

Reasoning Level  
TRANS_INF Uses a simple transitive reasoner.
RDFS_INF Uses a reasoner that enables the entailed RDFS inferences, including the transitive reasoning as well.
MICRORULE_INF Uses a rule-based reasoner that attempts to make a useful trade-off between semantic completeness and computational efficiency. (Recommended for OWL)
RULE_INF
Uses a rule-based reasoner that hard encloses the semantic model. This reasoner level requires an inference engine to perform more complex inferences. Different rules written in SWRL can be triggered by the inference engine to provide the results ordered by the reasoner.

The following code example shows how a Jena Model can be created and the initial parameters set, as the reasoning level:

Model mbase = OntoSchemaFactory.createOpenCycModel();
m = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE, mbase);

Although it is look liked a nice theory, when short datasets have been valued admissible response times in several inferences; we have been experiencing a dramatic decrease performance when an OpenCyc model is associated to a reasoner. Why this does happens?

The reasoner carries out all inference processes to discover new assertions. In big models, such as OpenCyc, the large amount of relationships complicates the inference tasks which increases the query response times. Thus we have verified in the first versions of the SLOR prototype, that this fact fosters the building of unusable applications (it delays up to 30 seconds to retrieve some data). Because of this, we discuss in the section four, several solutions to solve these problems.

Retrieving inferred data

In order to retrieve inferred data, we have been used the generic classes and interfaces provides by the Jena API. As it is shown in the example below, we can retrieve all direct and indirect class of one given. First, we need to retrieve from the model the ontology class given, it is necessary to execute the inference (through the getOntClass method of Model class). Second, if the reference is not null, we can execute the listSubClasses method of the OntClass interface in order to retrieve all direct or indirect classes (whenever the flag parameter is set to true). Once executed the listSubClasses method, an “extended iterator” is returned to move forward along the inference results. Finally we insert the current class (returned by the next method of the ExtendedIterator interface) in each iteration loop.

In OpenCyc, this easy inference takes long time, due to the big amount of data stored in the underlying persistent model.

public void getOPENCYCDerivedClasses(String dclass, LinkedList listClasses)

{

      OntClass c = m.getOntClass(Schema.NSCYC + dclass);

      if (c != null)

      {

            ExtendedIterator i = c.listSubClasses(false);

            while (i.hasNext())

            {

                  OntClass subc = (OntClass)i.next();

                  listClasses.add(c.getLocalName());

            }

      }

}