Archive for the ‘rdf’ Category

Semantic Web Talk Debrief

Friday, April 14th, 2006

The Semantic Web talk at HJUG went fairly well. About eight people attended, and a few have either used RDF or are currently using RDF and OWL. The questions from the audience were insightful, demonstrating that they were thinking critically about the technologies. Some questions were quite expected:

Q) If anyone can say anything about anything, what’s to stop someone from saying something wrong? How will I know *not* to use those incorrect assertions for my reasoning?

A) Good question! I have two solutions for this. The first is a Google-esque algorithm and hueristics will appear, allowing the top linked RDF documents to bubble to the top. That is, the more people that link to the RDF document, the more likely that the assertions contained within are valid for a majority of the views of the world. The second answer relies on ontologies, for they are able to determine if there are inconsistencies in the world. If someone says that cars and people are disjoint, and you have assertions in your RDF model that says X is a person and a car, then your reasoner can determine that at least one fact is incorrect. (of course, the hand waving here is that someone has to write the ontology, and you have to have a reasoner that provides feedback in a human readable way.)

Q) What if I use foaf:interest, but someone else uses yahoo:interest? My SPARQL queries won’t work.

A) You’re right, non-reasoning RDF stores won’t know that foaf:interest and yahoo:interest are the same thing (for your view of the world). Again, ontologies are required to provide the mapping between different ontologies. *If* you have a mapping ontology, and *if* your RDF store performs reasoning, then your SPARQL queries will work when you have two URIs for the same concept.

Q) So… what have people built with RDF?

A) The answer I gave here is the biggest adoption of semweb technologies has been bioinformatics, afaik. However, I couldn’t think of any business critical, production applications using RDF as a key component. I need to do more research here.

A better question might be, “What are some applications that utilize RDF/OWL that would have been very difficult to create otherwise?” And by applications, I mean business critical, production applications that customers use every day. I have a feeling that a lot of the RDF/OWL work is done for in-house, custom applications. What are those applications like? What scale have they been built out to?

All in all, a good talk. I met some people who are using Protege and Racer Pro to develop decision support applications for first responders. Don’t know if it’s a proof of concept of a deployed application yet.

The URI for RDF itself?

Tuesday, April 11th, 2006

Was wondering what the URI for RDF, *the concept*, is? There is http://www.w3.org/RDF/ but that’s the homepage for RDF, as far as I’m concerned.

Is there a httpRange-14 compatible URI for RDF *the idea*?

Is this a case of the shoemaker’s children going around barefoot?

Rx4RDF

Monday, April 3rd, 2006

Rx4RDF is an entire Python application framework for full stack integration with RDF. From storage to presentation layers, it appears that every effort was made to utilize RDF as the foundation for data representation.

ActiveRDF has some competition.

ActiveRDF: object oriented RDF in Ruby

Monday, April 3rd, 2006

ActiveRDF: object-oriented RDF in Ruby is a paper submitted to Scripting for the Semantic Web 2006. Inside, the authors discuss the challenges and successes with using RDF as a storage backend for applications, much in the same way that RDBMS are currently used. Scripting languages, such as Ruby, offer the best chance for RDF integration, because RDF is so flexible, dynamic, and often untyped. Ruby, because it is so dynamic (you can easily add methods to classes, for instance), is a good way to see where the intersection between an OO scripting language and semantic web technologies mix.

I’d like to see integration with transactions, plus explicit support for Redland’s contexts. ActiceRDF is a great step forward in learning how deep RDF should go in the application stack.

SWOOP - Helping to Debug Ontologies

Friday, March 24th, 2006

Henry has discovered SWOOP, a small ontology editor from MindSwap. This reminds me to point out a very nice feature of SWOOP, one that Protege does not have out of the box.

SWOOP, through its integration with Pellet, is able to not only compute classifications but also able to tell you *why* your ontology is inconsistent. This, to me, is a huge step towards being able to use ontologies in “every day” problems.

The use case here is: You are collecting information from many many different sources. You have an ontology that defines your view of the world. It’s possible that all the different information sources will present to you conflicting information. Your ontology defines what is consistent, so therefore it should be able to tell you when you have information that is *inconsistent*.

SWOOP does a pretty good job at giving you the feedback to find the source of the inconsistency. An example output is:

Inconsistent ontology Reason: Individual ErrorBoat has more than 1 values for property at
Location
violating the cardinality restriction
Axioms causing the problem:
1) (ErrorBoat decommissionedAt DryDock)
2) |_(DryDock ≠ SanDiego)
3) |_(ErrorBoat deployedTo SanDiego)
4) |_(ErrorBoat rdf:type Boat)
5) |_(Boat ⊆ (= 1 atLocation))

Now that’s a simple problem, where an instance has two values for a property that is constrained to have only one.

What about when an individual belongs to two disjoint classes? This one is a bit more confusing if you don’t have a background in DL, but if you stare long enough at it, you might see the real cause:

Inconsistent ontology Reason: Individual ErrorBoat is forced to belong to class
all(decommissionedAt, not(Location))
and its complement

Axioms causing the problem:
1) (ErrorBoat decommissionedAt DryDock)
2) |_(DryDock rdf:type Location)
3) |_(SanDiego rdf:type Location)
4) |_(ErrorBoat deployedTo SanDiego)
5) (Decommissioned ≡ (∃decommissionedAt . Location))
6) (Active ⊆ ¬ Decommissioned)
7) (Active ≡ (∃deployedTo . Location))

To explain this one better, I’ve declared that the class Active and the class Decommissioned are disjoint. And I’ve placed the instance ErrorBoat into both classes.

For this reason, I like SWOOP and Pellet. Though, Protege still has nicer workflows.

RDF - Connecting Software and People - Google Video

Thursday, March 23rd, 2006

Found RDF - Connecting Software and People - Google Video via a post to swig mailing mail. The video is quite compressed, but you can view the original slides and high quality video from the original blog post by the author.

The video does a good job of grounding RDF into some real world problem domains with real world tools such as NetBeans (the author works for Sun).

To funny bit is, as I started to watch the video, I was impressed by the cool soundtrack it had. I thought, “Wow, this guy went all out and put in a cool trance soundtrack the match with the cutting edge, futuristic feel of the semantic web.” Then I realized I had an internet radio station playing in the background.

Time for a remix of the RDF presentation video? :)

More on temporal relations in RDF

Thursday, March 23rd, 2006

Temporal relations from Henry Story on 2006-03-23 (semantic-web@w3.org from March 2006) is a post to the swig mailing list dealing with time in RDF.

The proposal in the email attempts to use N3’s ability to declare metadata about graphs, in order to say when the graph was fetched. For instance:

{ :Oven :temp “22″^^t:celsius . } :fetched [ :at “2006-03-23T10:00:00Z”^^xsd:dateTime; ] .

Of course, the problem here is there’s no formal RDF semantics for this, as there’s no formal way to write triples about a graph. This is an N3 thing. I would guess the closest thing in RDF would be reification.

This is similar to how I’ve been thinking about the problem, which is marking the time the triples were retrieved from the source. The email, though, points out that it is difficult to merge those graphs together, as you’ll end up with many values for an Oven’s :temp. Once merged, how do you query for a particular temp at a particular time?

Another way to do it is to make the measurement a formal object. I’ve blogged on this before, and it’s been heavily discussed in comments, but in short:

:Measurement :of :Oven ; :takenAt “2006-03-23T10:00:00Z”^^xsd:dateTime ; :valueRecorded “22″^^t:celcius.

A bit of OWL makes it slightly more interesting:

:TemperatureMeasurement owl:equivalentClass [ a owl:Restriction; owl:onProperty :valueRecorded; owl:someValuesFrom t:celcius ].

(Does that work? Will have to try…)

Instance Store

Wednesday, March 22nd, 2006

Instance Store is a

> Java application for performing efficient and scalable Description Logic (DL) reasoning over individuals

What does this mean? It’s an attempt to build a system that can reason (using OWL, for instance) across a large set of instances.

Apparently it stores individuals in a RDBMS, such as MySQL or Oracle. It then connects that store to a reasoner, such as Racer or FaCT++. I haven’t tried it yet, but I will soon, so I’m storing this here for future reference.

Would be interesting to see if plugging this directly into Oracle 10g’s RDF store provides any benefits.

I should note that a quick glance at the code indicates that it hasn’t been touched in about two years.

OWL Consistency Checking

Tuesday, March 21st, 2006

ConsVISor is a nice web service that can check an OWL ontology (with individuals) for consistency. It displays any errors with easy to read non-technical-jargon sentences. I’ve been trying to find a workflow where I can test a set of individuals against an ontology to see where the inconsistencies are, and this tool is the best so far for reporting any problems.

FaCT++, via Protege, doesn’t seem to provide any error messages. It does fail, though (correctly). Can’t figure out how to get it to tell me what and where the problem is.

Pellet, via their online demo, seems to work. However, the error messages are cryptic and seemingly misleading. I’m assuming they are correct, however they would never lead the lay-person to the real problem.

Anyone else using OWL for consistency checking across individuals? What tool are you using? Can it provide useful error messages?

Response to Why we need explicit temporal labelling

Sunday, March 19th, 2006

Why we need explicit temporal labelling is an excellent new post on the continuing saga about temporal labeling in RDF. The author provides an great example of a real world scenario for changing values of a web page’s title. To reiterate, yesterday this triple was valid:

:page dc:title “I like Cheeses”;

but today it’s now:

:page dc:title “I like Cheese”;

The author asserts that there are now two triples now, which would indicate that there are two titles.

Going back to my relational database roots, I don’t see how there would be two triples (unless you explicitly store two triples in your local Model). Given just the source RDF document that the triple is found in, at one time, there is at most on triple that asserts the page’s dc:title. If I’m consuming the RDF document that asserts the triple, I’m in a position to store the URI of the RDF document. When my RDF crawler hits the same RDF page, it will simply update its local store with all new values. The old triple will be deleted and replaced by whatever new triples are asserted.

Of course, that’s one strategy for crawling/consuming RDF documents. But it does remove the need to attach arbitrary metadata to triples just to attach a timestamp. I believe that if we let time into the model, it won’t stop there. We have reification for saying things *about* statements. And reification has a bad wrap mainly because of the syntax, not the model.

In any case, the use case of a web page’s title changing over time is excellent, but correctly modeling it doesn’t require a new addition to the RDF model. You can store the time you received the RDF document that asserted the triple, you can use reification to say what time the statement was asserted, or you can model explicitly that titles have a date at which they were said. Heck, nothing stops you from adding your own reifications to the triples you just downloaded.

I want to talk about one statement the blog post said:

> In the current model, I would end up with two titles for this article. While technically correct, it is intuitively wrong - and that difference is what holds back RDF for most developers. They expect to see a single title with the updated value.

Developers don’t always expect to see a single value for the title. What if someone says “I want to know what the title for the web page was two weeks ago?” In other words, it’s all in how you look at the data and what you’re trying to see. If all you care about is the *now*, then track where triples came from (the original RDF document) and consistently update it. Delete all old triples from the original document when you do an update.

Maybe this points out that an RDF triple is pretty bare all alone, and tracking it’s source document is pretty important.

On the semantic web, you can’t un-say something, and that’s part of this whole problem. If I can’t un-say something, how do I say, “This thing I just said, well, it’s no longer true.” Attaching a timestamp doesn’t really help to un-say anything, because there’s no semantics of TTL to the timestamp. Just because there’s a timestamp of yesterday on a triple doesn’t mean that today that triple is invalid.

The bigger question I have is, why don’t I ever have this problem of temporal labeling when writing relational database applications? When I need time as explicit data, I put it into the relational model (usually as a created_on, updated_at, performed_on, etc). If time isn’t important to the data, it’s assumed that whatever is in the database is the truth at now.

The web has a nice way to declare if resource representations can be cached, therefore if you can trust the data inside the representation for longer than when you received the document. If I receive an RDF document whose HTTP headers say not to cache it, then I better treat the triples inside the document as only truthful for *now*. For if I try to query the triples again from a local cache, I better understand that the values might have been updated from the source Resource. So what is the relationship between a triple, the document it’s in, and the HTTP headers sent with the document?

Wow, got off track there.