Archive for the ‘semantic web’ Category

W3C Practicing What It Preaches

Friday, May 11th, 2007

I was very happy to see that Tim Berners-Lee publishes the WWW2007 conference program as RDF. Albeit, a little late, but this exactly the type of grass roots raw RDF data generation we need to build out the semantic web. And who better to show us how it’s done than the people who are preaching the semantic web?

Thanks, TBL! I’m hoping this kind of pragmatic data generation will show us A) it’s not that hard and B) some best practices for putting real life data on the web as RDF.

Dabble DB Brings the Web of Data to Life

Wednesday, April 11th, 2007

Dabble DB has completely blown me away. Dabble DB is like Club Med for your data. You want your data to get a massage while sipping a Mai Tai on the beach, you got it. Your data will get the five star treatment at Dabble DB.

So everyone is talking about Web 3.0, AKA the Web of Data, AKA the Semantic Web. Those visions are all well and good, and I do believe we’ll see a Data Centric Web soon. But if there’s a Web of Data, that must mean you’ve got Data on the Web.

What? Your data is buried in some SQL Server database on the company LAN? That doesn’t sound very webby to me. And you’re building all these custom, one-off, Visual Basic apps or Excel macros manage your data? Tisk, tisk. So not webby.

This is where Dabble DB comes in. Not only does it provide a very slick, dripping with AJAX interface for you to import and manage your data, it’s a *very* smart interface. Normal muggles (Haven’t read Harry Potter? Whaaaa?) can easily use Dabble DB to classify, link, sort, and visualize their data. Dabble DB is not a snazzy front end to a relational database system. Dabble DB is a snazzy front end to data.

Let’s put it this way: I haven’t seen a desktop application that helps you with your data like Dabble DB.

OK, enough of the uber love fest. Bringing it all back to the semantic web, Dabble DB might be in a class of killer applications for the semantic web. I really love Dave Beckett’s description of the semantic web: “The semantic web is webby data.” So the semantic web will need, as a killer app, something that makes managing a *linking* data so super easy and more importantly: incredibly rewarding.

That last statement is important. The killer application for the semantic web must be *rewarding*. That is, you will get out of it more than you put into it. Dabble DB does this to some extent, as you can graph your data, map your data, export your data, subscribe to your data.

It doesn’t appear that Dabble exports to RDF, nor does it appear that you can link data together via ontologies. But if Dabble DB doesn’t do that, someone else will. For data that is truly webby is data that can be extended by sources outside of your control.

At work, we’ve been building a large data warehouse, and the interface to go with it, so systems like Dabble DB are extremely interesting to me. I want to give my users an experience like Dabble.

Oracle 11g Gains Native OWL Support

Thursday, March 22nd, 2007

Oracle 11g will gain native OWL support.

From the article:

> (2) Native OWL inferencing (for an OWL subset that includes property characteristics, class comparisons, proprety comparisons, individual comparions and class expressions) [New API]

Way to go, Oracle! I’ve always had a soft spot for Oracle’s RDF support. The way that you can blend RDF data sets and traditional relational data sets in the same query helps to deploy RDF slowly but surely. Not to mention that Oracle has already solved all the main problems that a RDBMS should solve (like ACID compliance, backup and recovery, strong security, wide developer toolset) makes Oracle’s RDF support (and soon OWL) a strong contender for RDF data stores.

Why the Semantic Web Marketing Message Has Failed

Wednesday, March 21st, 2007

So some guy writes why the semantic web will fail and ends up on Slashdot. How slashdot picks their articles, I’ll never know. The article is pure opinion and guesswork (as all predictions seem to be), and it’s perfectly OK for this guy to blog his opinions.

I’m not going to argue that the semantic web (that’s *small s* semantic) will succeed, although I think it will prove useful in a large sense in some form, even if that form isn’t RDF. I think what’s really telling about the doom and gloom post is that the marketing message of the semantic web has failed.

For example, a quote from the blog post:

> The Semantic Web will never work because it depends on businesses working together, on them cooperating

Where, in all of the W3C’s semantic web literature does it says that companies must work together for the semantic web to succeed? I think this is one of the biggest misinterpretations about the semantic web. For some reason, people think that the semantic web requires these large agreed upon ontologies before anything useful happens. Not only is that near impossible (for anything but the most generic or free form terms and definitions) but as we all know, specifications born out of committee have an awfully hard time meeting the pragmatic needs of the masses.

For the semantic web to succeed, the W3C doesn’t need more technical specifications (although a new RDF XML serialization would be nice). Instead, the W3C needs to completely revamp its marketing message. For instance, distance the semantic web from AI. AI, no matter how promising, leaves a bad taste in your mouth. We need to completely deny any relationship to AI. Secondly, the W3C needs to rebrand the semantic web as “Simply Putting Your Database On The Web. No More, No Less. Anything Else Is Purely Serendipity.” Thirdly, the W3C needs to really drive home that the semantic web will succeed *only if* it is not built with large top down ontologies.

So repeat after me: “The semantic web is just an effort to help expose the database that you already have to the web as RDF. Primary keys become URIs, and the intersection of a row and a column is a triple.”

Or, to put it another way:

Problem: I have data, most likely in a relational database, that I need to get on the Web.
Solution: Expose that data as RDF. URIs are the primary keys for the data.

I’m Squinting… But No Agents So Far

Thursday, March 8th, 2007

Jim Hendler asks so where are the agents? More specifically, I’d like to ask What do we need before agents can be deployed?

Let me define what I believe an agent is by looking at what it would do for me. I think a software agent is a program that can be given a set of rules and able to seek out data that satisfies those rules. Agents are different from other sets of software that can answer queries in that Agents would be able to reason about the world and would be capable of acting towards its goal(s) over a long period of time. These agents would act without direct human control, which is especially import if the task would take some time to complete.

Given that definition of an Agent, I revisit the question: What do we need before agents can be deployed?

Because Agents are task focused, we need a way to define the task in such a way that the Agent understands it. I can imagine simple use cases like “Schedule my dentist appointment every six months. Of course, make sure my calendar and the doctor’s calendar match up. If I happen to schedule a very important event for the same day, kindly move my dentist appointment to a later date.” Many calendar and event related use cases come to mind. Calendaring seems to be a perfect use case for Agents, because the data can be relatively easy to encode and because all the events are in the future, it gives the Agents some time to finish their tasks.

As you can tell, my natural language description of my simple task was easy and quick to write down, but left out many specifics that a computer might demand. Two average adults would be able to understand the gist of the Task. However, the adults would probably need to ask one or more questions for clarification.

And this brings me to the second thing an Agent needs to do: Reason and clarify about the world. Let’s assume for a moment that I was able to instruct my agent about my need to maintain healthy teeth on a regular schedule. The real difficult part becomes how to have my Agent converse with the Dentist’s Scheduling Agent. (Not to mention, my Agent has to locate the Dentist’s Agent in the first place.) Once the two Agents are communicating (waiving arms in the air here), how do they begin to speak the same language? Did they agree on some standard Scheduling Ontology before hand? I hope so, because Ontology languages such as OWL would let the two Agents formally agree upon some semantics for their conversion.

There aren’t any Agents out there helping normal people with real life tasks because Real Life ™ is too vague, complex, dirty, abstract, and otherwise beautiful to be coded into a language that a computer program can understand. We’re just not able to give abstract and fuzzy task descriptions to a computer program yet.

I don’t need an agent that can continually run Google queries and let me know if something new was found. That’s not an agent, that’s a cron job, and that’s a simple task. I need an Agent that can begin to understand my world, my tastes, _my rules_, and handle the simple things like negotiating my calendar so I don’t have to. Let the Agent handle the 80% and let me handle the 20%. That’s more fun, challenging, and ultimately rewarding.

So where are the Agents? It’s better to ask, What’s preventing the Agents from appearing?

Is my definition of an Agent way off? Am I asking too much? What’s a good middle ground?

baetle - Ontology for Software Bugs

Wednesday, March 7th, 2007

baetle is an ontology for software bugs and bug tracking systems. Henry Story has opened the baetle project on Google Code.

baetle is an effort to standardize a view into the software bug tracking world. There are a kazillion bug tracking systems out there. Heck, see for yourself.

So what’s a use case for being able to have a consistent view into bugs and issues across all thoses systems? For one, you could query one system just like another system. Another use case might be if your enterprise runs and maintains multiple different bug tracking systems, and you need to query across all of them.

Hmm, sounds like a Data Warehouse, doesn’t it? Multiple systems combined and filtered into one cohesive view for reporting and querying. Ontologies allowing for a way to combine and filter all those data sources. SPARQL for all that querying.

So is Ontologies and SPARQL the new ETL?

SPARQL Via HTTP Methods

Sunday, March 4th, 2007

Querying the web might get a bit easier, with the union of SPARQL directly with HTTP. TripleSoup, a promising proposal at Apache, aims to expose Triple Stores (RDF databases) directly via HTTP.

This reminds me of URIQA, which is an effort to provide native HTTP methods for accessing metadata about a certain resource. URIQA was interesting because it allows you to say

MGET /foo HTTP/1.1

which means “Retrieve the metadata for resource `/foo`”

It looks like TripleSoup is a bit different, in that the URI in the request methods is some type of application. TripleSoup seems to be a gateway directly into the triple store, whereas URIQA masks the concept of talking to the triple store. In URIQA, it looks like the triple store *is* the server you are connecting to. With TripleSoup, the triple store is located at the URI you are sending requests to.

URIQA’s advantage is that you don’t need to know the URI to the application or triple store, you can just send an MGET to the resource. Of course, URIQA doesn’t handle queries with SPARQL.

My first question with TripleSoup is, how would I discover the URI that I can use for querying? It’s the same problem that URIQA tries to solve, “I know the URI for the resource, but I want to get its metadata.” I can ask that question in SPARQL, but who do I ask?

Best of luck to the TripleSoup team, really looking forward to the code.

A Way To Add Trust To OpenID?

Tuesday, February 20th, 2007

Thinking about OpenID, the next step is obviously a way to integrate trust into an identity. The first question people will want to ask, I believe, is, “Is this person a spammer?” (Insert your own definition for spammer here, but typically this will mean “Will this person use this site/application/service for the originally intended purpose and will abide by the policies and rules of the site/application/serivce?”)

Now that it seems like everyone is getting on board with OpenID (AOL, digg, Technorati, LiveJournal, even Microsoft), there are a lot of identities swimming around. This is a Good Thing. However, nothing stops a spammer or Bad Guy from creating their own OpenID. This is also a Good Thing, because OpenID is only there to verify the identity. Other technologies and layers are then free to add in Trust.

There’s a lot of built up trust information out on the web if we can just get to it. Think about all the hard earned feedback profiles and rankings you’ve amassed over the years. Some examples might include:

* eBay
* slashdot
* digg
* epinions
* amazon (product comments and ratings)
* amazon marketplace
* technorati
* your Google PageRank?

If there’s a way to integrate my identity with my profile on these sites, I could build an aggregate of my Trust Rating. If you trust eBay’s trust rating, and I have a high rating, then you could trust me. It’s trust by proxy, and the entire SSL infrastructure runs on this.

Over time, each of the mentioned services will offer an OpenID. So we’ll need a way to be able to assert that all those identities are views of the same entity (person, in this case). Second, we’ll need a way to convey whatever ranking or profile each identity has with each service. Third, and optionally, it would be very nice to somehow create a TrustRank given all those statistics.

Services like eBay and Amazon won’t only be OpenID providers, but also over time will become OpenTrust providers.

Semantic Web technologies that might help to make this happen:

* OWL with its owl:sameAs, to assert that all my identities are effectively “me”.
* A simple RDF vocab with OWL rules for expressing my ranking on a particular site.

There’s Semantics in Them Thar Hills

Friday, January 12th, 2007

Bill de hÓra rightfully proposes that it would be very interesting to augment Planet software with scanning tools extracting RDF from uF and republishing the RDF for SPARQL queries or as RSS1.0.

I’ve often wondered why PlanetRDF.com, or something like it, doesn’t use semantic technologies to pull blog posts from the blogosphere that are related to, or about, semantic web technologies, RDF, OWL, etc. What’s stopping this from happening?

Isn’t the great Use Case of the semantic web to allow for more exact searches of web information? (I’m not specifically talking about Google knowing the difference between an apple (the fruit) and Apple (the computer company)) Why can’t I say, “Create a Planet of all blog posts that are about the Semantic Web. Oh, and use only posts from authors who are known by Dave Beckett, Danny Ayers, and those that work with Jim Hendler.” Wouldn’t that be neat?

Of course, there is benefit in a human edited Planet RDF.

But what would it take to build one by using semantic web technologies other than RSS 1.0?

Would that mean all blog authors that want their post to show up at Planet RDF 2.0 would have to mark the subject of their post with a globally agreed upon URI? Or a set of URIs, of which they are unified by some agreed upon OWL?

Or would a simple set of tags work just as well?

How would we build this site right now, using the data that is on the web right now?

I see lots of XHTML, CSS, tags (in various formats, alas), Atom, RSS 2.0, and some RSS 1.0, microformats, a smidge of FOAF. There’s got to be a lot of great semantic information in that mix.

Wow, I certainly can ramble.

No One That I Know

Tuesday, January 9th, 2007

Lee rightly asks Who loves RDF/XML? to which I reply, “No one that I know.”

Is RDF/XML the core of the semantic web? I think it’s easy to argue that the W3C thinks of RDF as the core of the semantic web. The web is made up of documents. Those documents are serializations (representations) or resources. The W3C’s Recommended way of serializing RDF is RDF/XML. Therefore, if RDF is the core of the semantic web and it has to be shuffled around the internets, I think it’s also say to assume that the Recommended way to do that is RDF/XML.