Archive for March, 2006

SWOOP - Helping to Debug Ontologies

Friday, March 24th, 2006

Henry has discovered SWOOP, a small ontology editor from MindSwap. This reminds me to point out a very nice feature of SWOOP, one that Protege does not have out of the box.

SWOOP, through its integration with Pellet, is able to not only compute classifications but also able to tell you *why* your ontology is inconsistent. This, to me, is a huge step towards being able to use ontologies in “every day” problems.

The use case here is: You are collecting information from many many different sources. You have an ontology that defines your view of the world. It’s possible that all the different information sources will present to you conflicting information. Your ontology defines what is consistent, so therefore it should be able to tell you when you have information that is *inconsistent*.

SWOOP does a pretty good job at giving you the feedback to find the source of the inconsistency. An example output is:

Inconsistent ontology Reason: Individual ErrorBoat has more than 1 values for property at
Location
violating the cardinality restriction
Axioms causing the problem:
1) (ErrorBoat decommissionedAt DryDock)
2) |_(DryDock ≠ SanDiego)
3) |_(ErrorBoat deployedTo SanDiego)
4) |_(ErrorBoat rdf:type Boat)
5) |_(Boat ⊆ (= 1 atLocation))

Now that’s a simple problem, where an instance has two values for a property that is constrained to have only one.

What about when an individual belongs to two disjoint classes? This one is a bit more confusing if you don’t have a background in DL, but if you stare long enough at it, you might see the real cause:

Inconsistent ontology Reason: Individual ErrorBoat is forced to belong to class
all(decommissionedAt, not(Location))
and its complement

Axioms causing the problem:
1) (ErrorBoat decommissionedAt DryDock)
2) |_(DryDock rdf:type Location)
3) |_(SanDiego rdf:type Location)
4) |_(ErrorBoat deployedTo SanDiego)
5) (Decommissioned ≡ (∃decommissionedAt . Location))
6) (Active ⊆ ¬ Decommissioned)
7) (Active ≡ (∃deployedTo . Location))

To explain this one better, I’ve declared that the class Active and the class Decommissioned are disjoint. And I’ve placed the instance ErrorBoat into both classes.

For this reason, I like SWOOP and Pellet. Though, Protege still has nicer workflows.

RDF - Connecting Software and People - Google Video

Thursday, March 23rd, 2006

Found RDF - Connecting Software and People - Google Video via a post to swig mailing mail. The video is quite compressed, but you can view the original slides and high quality video from the original blog post by the author.

The video does a good job of grounding RDF into some real world problem domains with real world tools such as NetBeans (the author works for Sun).

To funny bit is, as I started to watch the video, I was impressed by the cool soundtrack it had. I thought, “Wow, this guy went all out and put in a cool trance soundtrack the match with the cutting edge, futuristic feel of the semantic web.” Then I realized I had an internet radio station playing in the background.

Time for a remix of the RDF presentation video? :)

More on temporal relations in RDF

Thursday, March 23rd, 2006

Temporal relations from Henry Story on 2006-03-23 (semantic-web@w3.org from March 2006) is a post to the swig mailing list dealing with time in RDF.

The proposal in the email attempts to use N3’s ability to declare metadata about graphs, in order to say when the graph was fetched. For instance:

{ :Oven :temp “22″^^t:celsius . } :fetched [ :at “2006-03-23T10:00:00Z”^^xsd:dateTime; ] .

Of course, the problem here is there’s no formal RDF semantics for this, as there’s no formal way to write triples about a graph. This is an N3 thing. I would guess the closest thing in RDF would be reification.

This is similar to how I’ve been thinking about the problem, which is marking the time the triples were retrieved from the source. The email, though, points out that it is difficult to merge those graphs together, as you’ll end up with many values for an Oven’s :temp. Once merged, how do you query for a particular temp at a particular time?

Another way to do it is to make the measurement a formal object. I’ve blogged on this before, and it’s been heavily discussed in comments, but in short:

:Measurement :of :Oven ; :takenAt “2006-03-23T10:00:00Z”^^xsd:dateTime ; :valueRecorded “22″^^t:celcius.

A bit of OWL makes it slightly more interesting:

:TemperatureMeasurement owl:equivalentClass [ a owl:Restriction; owl:onProperty :valueRecorded; owl:someValuesFrom t:celcius ].

(Does that work? Will have to try…)

Instance Store

Wednesday, March 22nd, 2006

Instance Store is a

> Java application for performing efficient and scalable Description Logic (DL) reasoning over individuals

What does this mean? It’s an attempt to build a system that can reason (using OWL, for instance) across a large set of instances.

Apparently it stores individuals in a RDBMS, such as MySQL or Oracle. It then connects that store to a reasoner, such as Racer or FaCT++. I haven’t tried it yet, but I will soon, so I’m storing this here for future reference.

Would be interesting to see if plugging this directly into Oracle 10g’s RDF store provides any benefits.

I should note that a quick glance at the code indicates that it hasn’t been touched in about two years.

Expert Spring MVC Given Away at Symantec Seminar

Wednesday, March 22nd, 2006

The J2EE Best Practice seminar put on by Symantec is giving attendees a free copy of Expert Spring MVC and Web Flow. That’s pretty exciting, but what about a free trip to London for the authors? :)

Expert Spring MVC Gets Second Printing

Tuesday, March 21st, 2006

Expert Spring MVC and Web Flow is off for a second printing. I have no idea how many were printed in the first place, but this is good news none the less. A big mahalo to everyone who purchased a copy. All of the typos and errors identified will be fixed.

OWL Consistency Checking

Tuesday, March 21st, 2006

ConsVISor is a nice web service that can check an OWL ontology (with individuals) for consistency. It displays any errors with easy to read non-technical-jargon sentences. I’ve been trying to find a workflow where I can test a set of individuals against an ontology to see where the inconsistencies are, and this tool is the best so far for reporting any problems.

FaCT++, via Protege, doesn’t seem to provide any error messages. It does fail, though (correctly). Can’t figure out how to get it to tell me what and where the problem is.

Pellet, via their online demo, seems to work. However, the error messages are cryptic and seemingly misleading. I’m assuming they are correct, however they would never lead the lay-person to the real problem.

Anyone else using OWL for consistency checking across individuals? What tool are you using? Can it provide useful error messages?

Response to Why we need explicit temporal labelling

Sunday, March 19th, 2006

Why we need explicit temporal labelling is an excellent new post on the continuing saga about temporal labeling in RDF. The author provides an great example of a real world scenario for changing values of a web page’s title. To reiterate, yesterday this triple was valid:

:page dc:title “I like Cheeses”;

but today it’s now:

:page dc:title “I like Cheese”;

The author asserts that there are now two triples now, which would indicate that there are two titles.

Going back to my relational database roots, I don’t see how there would be two triples (unless you explicitly store two triples in your local Model). Given just the source RDF document that the triple is found in, at one time, there is at most on triple that asserts the page’s dc:title. If I’m consuming the RDF document that asserts the triple, I’m in a position to store the URI of the RDF document. When my RDF crawler hits the same RDF page, it will simply update its local store with all new values. The old triple will be deleted and replaced by whatever new triples are asserted.

Of course, that’s one strategy for crawling/consuming RDF documents. But it does remove the need to attach arbitrary metadata to triples just to attach a timestamp. I believe that if we let time into the model, it won’t stop there. We have reification for saying things *about* statements. And reification has a bad wrap mainly because of the syntax, not the model.

In any case, the use case of a web page’s title changing over time is excellent, but correctly modeling it doesn’t require a new addition to the RDF model. You can store the time you received the RDF document that asserted the triple, you can use reification to say what time the statement was asserted, or you can model explicitly that titles have a date at which they were said. Heck, nothing stops you from adding your own reifications to the triples you just downloaded.

I want to talk about one statement the blog post said:

> In the current model, I would end up with two titles for this article. While technically correct, it is intuitively wrong - and that difference is what holds back RDF for most developers. They expect to see a single title with the updated value.

Developers don’t always expect to see a single value for the title. What if someone says “I want to know what the title for the web page was two weeks ago?” In other words, it’s all in how you look at the data and what you’re trying to see. If all you care about is the *now*, then track where triples came from (the original RDF document) and consistently update it. Delete all old triples from the original document when you do an update.

Maybe this points out that an RDF triple is pretty bare all alone, and tracking it’s source document is pretty important.

On the semantic web, you can’t un-say something, and that’s part of this whole problem. If I can’t un-say something, how do I say, “This thing I just said, well, it’s no longer true.” Attaching a timestamp doesn’t really help to un-say anything, because there’s no semantics of TTL to the timestamp. Just because there’s a timestamp of yesterday on a triple doesn’t mean that today that triple is invalid.

The bigger question I have is, why don’t I ever have this problem of temporal labeling when writing relational database applications? When I need time as explicit data, I put it into the relational model (usually as a created_on, updated_at, performed_on, etc). If time isn’t important to the data, it’s assumed that whatever is in the database is the truth at now.

The web has a nice way to declare if resource representations can be cached, therefore if you can trust the data inside the representation for longer than when you received the document. If I receive an RDF document whose HTTP headers say not to cache it, then I better treat the triples inside the document as only truthful for *now*. For if I try to query the triples again from a local cache, I better understand that the values might have been updated from the source Resource. So what is the relationship between a triple, the document it’s in, and the HTTP headers sent with the document?

Wow, got off track there.

Computer Networks: The Heralds of Resource Sharing - Google Video

Sunday, March 19th, 2006

Computer Networks: The Heralds of Resource Sharing is a 1972 documentary on ARPAnet. It’s a really interesting look back at the initial thinking of computer networks, especially in contract to what we take for granted today. It includes such quotes as “programming is fun.” Amen to that. Another excellent quote: “We should deal with information, not the paper it is written on.” Watch this, and learn your roots.

Does RDF’s Model Need to Include Explicit Support for Temporal Labelling?

Friday, March 17th, 2006

John Barstow, in Visions of Aestia » Thinking about RDF-lite, requests that a RDF-lite type proposal include:

> Formally include provenance and temporal labelling in the model without requiring reification.

I agree that provenance should be a first class citizen in the RDF world. Assuming that RDF is used on the web, and many of the triples will come from some URI, why not include support for marking a triple with where it came from? From what I can see, this is required if you ever want to start working on the Trust layer. As pointed out, you can do this now with reification, but that’s a difficult and round-about concept to teach and implement. Most RDF systems support quads under the covers anyway, so there’s an obvious need to support Subject, Predicate, Object, Source (provenance).

I disagree that RDF needs to include some explicit support for temporal labelling. IMHO modeling events that need to be clarified by time is perfectly possible now, without crazy hacks.

Example: Let’s say someone asks, “What is Seth doing now?” The answer would be, “Seth is currently running.” OK, no problem.

A naive approach to modeling “Seth is currently running” would be to first create a triple like:

:Seth :is :running.

Hmm… is Seth always running? What about yesterday? I believe it’s this type of thinking that makes people think they need time and date in the model. How would you clarify that the running is only “now”, where “now” is some point in time?

Flip the triple around, and think in Nouns. The above triple doesn’t work because it’s modelling a verb (:running). Turning it around, you can model it like:

:ExerciseRun :performedBy :Seth ;
:startedAt “2006-01-03T12:23:45″;
:endedAt “2006-01-03T13:21:20″.

This says something to effect of, “Seth went for a run for exercise between 12:23 and 1:21 on the 3rd of Jan.” I’ve made the verb an instance of a class here, in other words a Noun.

The question I have, what temporal events can’t be modeled this way?