Archive for September, 2007

When will Google let us run our own Map/Reduce programs?

Tuesday, September 25th, 2007

After reading about Sam Ruby’s issues with JSON for Map/Reduce, it got me thinking. How long before Google will let us run our own Map/Reduce programs on their clusters?

We all know, one of the best ways to scale is to push the operation to the data. And who has all the data? Why, that’s Google, of course. They are in a perfect position to host and run applications which map/reduce the web for individuals or organizations.

Imagine that I want to start collecting results by microformats such as hCard. It would be nice to formulate a query which understands microformats as a map/reduce and send it off to Google.

Facebook got this right by creating a Platform. Developers can write applications which link directly into Facebook. I want to do the same with Google.

Semantic Web Use Case #32354343

Tuesday, September 25th, 2007

I should be able to tell Flickr to allow viewing of my photos to any of my Facebook friends.

links for 2007-09-25

Monday, September 24th, 2007

I Second That Emotion

Monday, September 24th, 2007

So Tim Bray finds out that Erlang IO is slow. I can attest to this fact, as my recent work on reading large files in Erlang has shown that IO and string manipulation is much slower than I would have wanted.

Yes, like Bray, my file reading is single threaded (although, what I do with the line is very multi-threaded) so I suppose using a single thread for Erlang isn’t very Erlang-like in the first place.

In the meantime, I’m porting my OLAP cube generator to Scala. The assumption (and shortly, hopefully proof) is that the JVM can do file IO much better than Erlang, yet I can still take advantage of Scala’s Actors to retain my concurrency.

Update: OK, some numbers and code. This is a benchmark for Erlang and Scala to read in a file line by line.

First, the Erlang code:


	process_file2(Filename) ->
		{ok, File} = file:open(Filename, read),
		process_lines2(File).

	process_lines2(File) ->
		case io:get_line(File, '') of
			eof -> file:close(File);
			_ -> process_lines2(File)
		end.

Now the Scala code:


object LineReader {

  def foreachline(in: BufferedReader, f: String => Unit): Unit = {
    val line = in.readLine()
    if (line == null) return
    else f(line)
    foreachline(in, f)
  }

  def forLines(filename: String, f: String => Unit) = {
    val in = new BufferedReader(new FileReader(filename))
    foreachline(in, f)
    in.close()
  }

}

OK, so these aren’t exactly the same. The Scala example is dispatching to a function, so Scala is even at a disadvantage.

The timings, three runs each, on my MacBook Pro 2.2 Ghz Intel Core 2 Duo. Erlang is the BEAM emulator 5.5.5 and Scala is 2.6 running on JDK 1.5 on Mac OS X. Erlang code was compiled with HIPE.

I am reading in a 1028071833 bytes file with 10037355 lines.

Code Run 1 Run 2 Run 3
Erlang 205.830 sec 208.999 sec 207.454 sec
Java 36.094 sec 39.917 sec 34.337 sec

QOTD

Sunday, September 23rd, 2007

7 reasons I switched back to PHP after 2 years on Rails

PROGRAMMING LANGUAGES ARE LIKE GIRLFRIENDS: THE NEW ONE IS BETTER BECAUSE *YOU* ARE BETTER

Now Using OpenDNS

Thursday, September 20th, 2007

I’ve just configured my home router to use OpenDNS. It’s a pretty cool service. OpenDNS is a suite of value added DNS services, like phishing detection and fine grained control over what resolves.

I switched because ever since I switched to DSL, my DNS lookups have become very slow. It’s not the kind of thing you notice until, well, you notice that it’s too slow. Switching to OpenDNS fixed that problem. Now I don’t notice the DNS lookups and everything continues to function.

links for 2007-09-21

Thursday, September 20th, 2007

No SMP Erlang for Windows?

Thursday, September 20th, 2007

I was trying to enable SMP on my Windows Erlang install, but I was shocked to find out that SMP is not supported in the official Windows Erlang distribution.

I confirmed this by reading the Errata for Programming Erlang:

#8893: “SMP Erlang has been enabled … since R11B-0 on all platforms where SMP Erlang is known to work.”
PLEASE mention that Windows is NOT among them.

What the heck? That seems like a horrible little secret of the Erlang world. I don’t want to compile my own Erlang on windows just to get SMP (which, imo, is the killer feature of erlang).

Semantic Web Doesn’t Have to Be Difficult

Thursday, September 20th, 2007

After reading Semantic Web: Difficulties with the Classic Approach, I am even more certain that we’re putting too many expectations on the semantic web. The semantic web doesn’t have to be difficult to build or use. It simply starts with resetting expectations and re-branding.

To start, the semantic web needs to be re-branded as the Data Web. Now take a deep breath. Doesn’t the air feel lighter and taste sweeter? That’s because the heaviness of the baggage brought along by the word “semantic” is gone. People see semantic and go all screwy: “Replace humans with computers?” or “How do you deal with uncertainty?” or “How do we agree on what we mean by agreement?” or “A.I. never worked.”

Even Tim B.L. thinks that the name “semantic web” isn’t very good:

I don’t think it’s a very good name but we’re stuck with it now. The word semantics is used by different groups to mean different things. But now people understand that the Semantic Web is the Data Web. I think we could have called it the Data Web. It would have been simpler.

What does it mean to have a data web? To me, it means that the underlying data that powers the web page/site/application is exposed to the web via URIs. The data web is about pulling up all those databases that live under a web application and placing them squarely on the web. Placing something on the web simply means giving it a URI and, often times, making sure a representation is returned when you dereference the URI.

We already have databases, we already have web servers, we already have HTTP, we already have URIs. The pieces are in place. We just aren’t in the habit of publishing machine readable data, as often times the data is seen as the heavily protected intellectual property. This is a mind set issue that will be changed over time as people and businesses figure out how to make money off of data (hey Google, figure out AdSense for RDF or connect all the data together, expose it to end users, and place ads on it (or wait, they already do that)).

Repeat after me: Data Web, Data Web, Data Web. Put my data on the web. Give it a URI. Create a Web of Data.

Semantics: old ‘n busted. Data: the new hotness.

(note to self, put money where mouth is)

Blog, Meet SliceHost

Wednesday, September 19th, 2007

My little Shuttle PC, acting as the server for this blog, is acting strange and just hanging up. A reboot fixes the problem, but a day or two later and *poof* it’s gone off the network.

Instead of tinkering around with the hardware to find out what’s wrong, I just moved to SliceHost. They offer Linux VPS servers at very reasonable prices. I couldn’t find a cheaper host that offered the same specs for the same price.

A bonus of moving to SliceHost is that I’ll get a lot more bandwidth.