Archive for the ‘semantic web’ Category

Nest Those Rails Resources Or Make Baby Semantic Web Cry

Wednesday, April 2nd, 2008

Proper web architecture dictates that a you should “Assign distinct URIs to distinct resources.” And Cool URIs for the Semantic Web states that:

There should be no confusion between identifiers for Web documents and identifiers for other resources. URIs are meant to identify only one of them, so one URI can’t stand for both a Web document and a real-world object.

So we know that a URI should refer to one and only one resource. (Of course, you may have many URIs all referring to the same resource.) So why do so many web sites have URIs like http://www.example.org/myaccount? That same URI is used to refer to any account in the system, depending on who is logged in. And that makes Baby Semantic Web cry.

Why is the baby sobbing? A generic URI like http://www.example.com/myaccount isn’t useful on the semantic web, because it’s very difficult to make meaningful statements about that URI. Let’s go and try.


http://www.example.com/myaccount is the account page of "Seth Ladd".

and


http://www.example.com/myaccount is the account page of "Bob Smith".

Hmm… so http://www.example.com/myaccount is the account page for both Seth and Bob? That doesn’t make much sense!

A better URI for an account page would be http://www.example.com/accounts/23232, which is easily unique for every user.

The moral of this story is that every one of your URIs should be unique. So let’s bring this all the way back to Rails and resources.

When building your resources, ask yourself, “If I GET this URI, will I see the same thing no matter who is logged in?” If the answer is “No” then you need to nest your resources so that the URI is unique and the same representation is returned no matter who you are logged in as.

For example, a typical URI would be http://www.example.com/books, which could easily be a collection of books for the user. The contents of that URI are relative to the person logged in, so we have a problem. To fully qualify the URI, we need to nest books inside of the user collection. We end up with http://www.example.com/users/1/books, which is unique and follows web architecture best practices. Now we can say unambiguous statements about the URI, thus populating the semantic web with more useful and meaningful triples.

The Semantic Web in Action Article From Scientific American Online and Free

Monday, March 31st, 2008

The Semantic Web in Action, originally published in the December 2007 issue of Scientific American, is now online and free. The original article published in the May 2001 issue of Scientific American was certainly due for an update.

The original article made a lot of grand promises, while the December 2007 article details current efforts at applying semantic web technologies to real life problems. Check it out if you’re interested in how companies are building the semantic web today.

Why Flickr Doesn’t Do FOAF

Sunday, March 23rd, 2008

Tim Berners-Lee asks “So do you think Flickr could be persuaded to source FOAF?”

Given what I’ve heard from Stewart Butterfield (co-founder of Flickr), the answer is a No.

Back in 2004 (Mon, Nov 29, 2004 at 8:41 PM to be exact), I wrote Flickr asking if they could add sha1 hashes of user emails (in an obvious attempt to be able to convert the data into FOAF). Here’s the original request email:

Hello,

Would it be possible to add a sha1 hash of a person’s email address to
the response of flickr.people.getInfo ? I understand that we don’t
want to give out email addresses, and it’s nice that the API doesn’t
expose them. But to help in uniquely identifying users across
systems, a good identifier is often their email address. To safe
guard against spam, creating a SHA1 hash is a good way to hide the
email, yet still provide a unique identifier for the user.

This sha1′ed email address becomes a candidate key to the user, so to speak.

Thoughts?

Thanks!
Seth

To which Stewart replied (and I have his permission to quote him):

Seth, I guarantee that the problem is not that we don’t know how to
provide the functionalty - as you say, it’s easy.

It’s more that it has a lot of complications at the social level. How
do you know whether any of our users *wants* their Flickr profile
(potentially filled with cool, beautiful or emotionally important
family photos) to be associated with their Tribe profile (potentially
filled with descriptions of their kinky fetishes)? I know I don’t want
my professional profile on LinkedIn tied to my clownish profile on
Orkut.

Remember http://beta.plink.org/ ? … read about why it shut down. A
lot of those lessons apply to us. I think Dan Brickley is a super guy,
and I think FOAF is well intentioned. But I also think it has nothing
to do with Flickr (or even Tribe/Orkut/Friendster/whatever).

Last, since approximately 0% of users want or care about this
functionality, it’s not a good deal for us to implement it. It’d be
really neat if there were a machine-readable description of who I am
and what I’m up to online tied to a single idetifier, enabling
software that could make all kinds of inferences about me and tie all
kinds of facts about me together. On the other hand, that would really
suck. If you know what I mean.

We don’t even want to get into explaining to people what this is, let
alone build a UI to allow them to opt out, etc., etc.

I appreciate your enthusiasm, and I know you’re coming from the right
place, but it’s just not something we’re willing to support right now.
(And you can quote me if you’d like ;)

- Stewart

So, at least back in 2004, Flickr was concerned about making it too easy to “connect the dots”. I wonder if this still holds true today? Is anyone else worried about this?

I can certainly see Stewart’s point. But I bet with some solid privacy controls, or as Stewart puts it, “opt in” controls, I think a middle ground could be found. Like it or not, sooner or later there will be systems to tie it all together anyway. Might as well preempt it all and put the power into the hands of the users.

UPDATE: Looks like Flickr now exports mbox_sha1sum checksums from their flickr.people.getInfo API call. Someone saw the light. :)

This Post Is Ambiguous

Sunday, February 24th, 2008

When was the last time you had an unambiguous discussion?

In which Roy Fielding asks why the Semantic Web has a requirement that URIs identify a resource unambiguously?

I believe the whole attempt to make a distinction between a “document” and a “non-information resource” is just way too complicated for most users. All this business of redirecting the client from the non-information resource to a document that describes the resource seems like a hack. I understand the problem (does the URI refer to the Thing or the Document About The Thing?) and it’s complicated. Take the link I used to point you, the loyal reader, to more information about Roy Fielding. I used http://www.ics.uci.edu/~fielding/ which is some home page for Roy. Is that link pointing to The Man or The Home Page?

My answer? Both. And the semantic web needs to deal with that unambiguity. As far as I can tell, it can. Since anyone can publish metadata about anything on the web, there’s nothing un-web about saying that “http://www.ics.uci.edu/~fielding/ is a Home Page” and also “http://www.ics.uci.edu/~fielding/ is a representation for Roy Fielding” It’s all about context.

Human deal with this all the time. Take, for example, my name. You can look at “Seth Ladd” as either a string of characters, which form words that have some meaning, or you can look at “Seth Ladd” as an identifier for The Man. Humans are good at noticing when “Seth Ladd” means The Man or The String of Characters. The semantic web will need to deal with this very same ambiguity.

OpenSocial to Help With Cross Application Permissions?

Wednesday, October 31st, 2007

Hopefully with Google’s OpenSocial API launching Thursday, I’ll finally be able to tell Flickr, “Just let anyone that is my friend on Facebook, or in my GMail address book, to view my photos.”

I’m tired of telling people, “Sure, you can view my photos. Just create an account at Yahoo, then login to Flickr, then let me know your username.” I already know these people somehow, and that relationship is somewhere on the web (probably in my address book) so why can’t I tell Flickr to use that?

The idea of a portable set of relationships which enable cross site permissions is really, really important to the scalability of the web. Our data is spread out, it’s in a web. Let’s use it on the web!

Radar Networks Ties It All Up With Twine

Friday, October 19th, 2007

Radar Networks has come out of stealth to announce Twine, their “revolutionary new way to share, organize, and find information.”

I’ve been waiting for these guys to release their application for a long time. Of course I just signed up to be a beta tester.

Semantic Web Use Case #32354343

Tuesday, September 25th, 2007

I should be able to tell Flickr to allow viewing of my photos to any of my Facebook friends.

Semantic Web Doesn’t Have to Be Difficult

Thursday, September 20th, 2007

After reading Semantic Web: Difficulties with the Classic Approach, I am even more certain that we’re putting too many expectations on the semantic web. The semantic web doesn’t have to be difficult to build or use. It simply starts with resetting expectations and re-branding.

To start, the semantic web needs to be re-branded as the Data Web. Now take a deep breath. Doesn’t the air feel lighter and taste sweeter? That’s because the heaviness of the baggage brought along by the word “semantic” is gone. People see semantic and go all screwy: “Replace humans with computers?” or “How do you deal with uncertainty?” or “How do we agree on what we mean by agreement?” or “A.I. never worked.”

Even Tim B.L. thinks that the name “semantic web” isn’t very good:

I don’t think it’s a very good name but we’re stuck with it now. The word semantics is used by different groups to mean different things. But now people understand that the Semantic Web is the Data Web. I think we could have called it the Data Web. It would have been simpler.

What does it mean to have a data web? To me, it means that the underlying data that powers the web page/site/application is exposed to the web via URIs. The data web is about pulling up all those databases that live under a web application and placing them squarely on the web. Placing something on the web simply means giving it a URI and, often times, making sure a representation is returned when you dereference the URI.

We already have databases, we already have web servers, we already have HTTP, we already have URIs. The pieces are in place. We just aren’t in the habit of publishing machine readable data, as often times the data is seen as the heavily protected intellectual property. This is a mind set issue that will be changed over time as people and businesses figure out how to make money off of data (hey Google, figure out AdSense for RDF or connect all the data together, expose it to end users, and place ads on it (or wait, they already do that)).

Repeat after me: Data Web, Data Web, Data Web. Put my data on the web. Give it a URI. Create a Web of Data.

Semantics: old ‘n busted. Data: the new hotness.

(note to self, put money where mouth is)

GRDDL Is Out, How To Integrate With SPARQL

Tuesday, September 11th, 2007

GRDDL is out, providing a mechanism for providing instructions to convert documents on the web into RDF. In short, GRDDL allows you to link an XSLT transform to your XHTML page, which converts the XHTML into an RDF document. For more information, start at the GRDDL Primer.

(The irony is that while you can use XSLT to convert into RDF, you can’t ever use XSLT to convert RDF into something else with complete certainly because RDF/XML output is nondeterministic.)

The primer includes a few examples of using SPARQL to query the RDF document generated by a GRDDL transform. Here’s an example from the primer:


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://www.purl.org/stuff/rev#>
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?rating ?name ?region ?hotelname

FROM <http://www.w3.org/TR/grddl-primer/hotel-data.rdf>

WHERE {
?x rev:hasReview ?review;
	vcard:ADR ?address;
	vcard:FN ?hotelname .
?review rev:rating ?rating .
?address vcard:Locality ?region.

FILTER (?rating > "2").

?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
	foaf:homepage ?homepage

}

Looking at the FROM line, you see that we are referencing an RDF document by URI. However, if we are using GRDDL, that document doesn’t exist until after we perform any transforms.

This means we can’t use GRDDL directly in our SPARQL queries, as there isn’t a physical RDF document to reference.

However, using the ever useful GRDDL Service, which is an online web service (lower case web service :) to generate RDF from documents using GRDDL, we could integrate GRDDL enabled documents directly into our SPARQL queries.

Let’s replace the FROM clause in our original SPARQL query with the direct URI to the RDF document (instead of the generated "middle man" RDF document).


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://www.purl.org/stuff/rev#>
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?rating ?name ?region ?hotelname

FROM <http://www.w3.org/2007/08/grddl/?docAddr=http%3A%2F%2Fwww.w3.org%2FTR%2Fgrddl-primer%2Fhotel-data.html&output=rdfxml>

WHERE {
?x rev:hasReview ?review;
	vcard:ADR ?address;
	vcard:FN ?hotelname .
?review rev:rating ?rating .
?address vcard:Locality ?region.

FILTER (?rating > "2").

?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
	foaf:homepage ?homepage

}

There, now isn’t that much more webby? I think the success of GRDDL lies with the integration into existing RDF toolkits. Otherwise, it’s a two step process to get documents off the web, transformed into RDF, and then into RDF tools.

For XHTML documents, though, my money is with RDFa. I think linking XSLT to XHTML is just too complicated and brittle (hmm, rhymes with GRDDL) for the masses OR for the tools. RDFa at least lets me directly embed the markup inside my XHTML documents, which makes it much easier to change when I change the XHTML. Plus, as my tools will dynamically generate the XHTML (think template languages for web frameworks) I can easily embed the RDFa right into the templates. Plus, I’m already using CSS and CSS classes, which RDFa encourages, so I can ride off of that investment.

e c30ac536947f7330943f8de9c33f70ef2d5994e7

Thursday, August 16th, 2007

e is a stack for the data web. Not only is this all in Ruby and uses RDF, but it’s some of the most bare code I’ve seen in a while.

You had me at “data web”.

And +10 for using the file system as a data store instead of a database.