Archive for September, 2007

QOTD

Wednesday, September 19th, 2007

Joel points out:

… Ajax apps can be inconsistent, and have a lot of trouble working together — you can’t really cut and paste objects from one Ajax app to another, for example, so I’m not sure how you get a picture from Gmail to Flickr. Come on guys, Cut and Paste was invented 25 years ago.

Beautiful Programs

Monday, September 17th, 2007

If you’ve been reading Beautiful Code, and I hope you have, you should also read Donald Knuth’s Computer Programming as an Art. It actually makes you feel better about writing all that code and always trying to get it right.

The possibility of writing beautiful programs, even in assembly language, is what got me hooked on programming in the first place.

links for 2007-09-14

Thursday, September 13th, 2007

Tail Recursive Line Processing in Erlang

Wednesday, September 12th, 2007

When I was testing performance in Erlang using funs, I mentioned I wanted to see what would happen if I made the functions tail recursive. I just took a crack at it, and it looks like making this particular function tail recursive didn’t help performance. (as always, please let me know if my assumptions are incorrect.)

Tail Recursive Line Processing


process_file(Filename) ->
	{ok, File} = file:open(Filename, read),
	process_lines(File, first, 0).

process_line(eof) -> done;
process_line(Line) ->
	L = string:tokens(string:strip(Line, both, $n), "	"),
	{Dimensions, Measures} = lists:split(10, L),
	lists:map(fun(X) -> {I,_} = string:to_integer(X), I end, Measures).

process_lines(_, eof, _) -> done;
process_lines(File, PreviousLine, LineNum) ->
	Line = io:get_line(File, ''),
	process_line(Line),
	process_lines(File, Line, LineNum + 1).

Running this code agains a file of 10037355 lines and 1028071833 bytes big, it takes on average 401.40 seconds. This is only slightly better than the previous code attempts that are not tail recursive (and imo easier to read).

Erlang Fun Results

Wednesday, September 12th, 2007

As I code more and more in Erlang, I’m very interested how to achieve the best performance possible. I’m very new to the language, so I’m still unsure what the best practices are concerning performance. Tonight I was testing the effects of using a fun() in an algorithm.

My test concerns reading a tab delimited text file, tokenizing it, and converting any numbers into integers. I’ve split the program into two conceptual parts: 1) the file IO and line reading, and 2) the handling of the line. I wanted to test the performance differences between using a fun() for the line handling vs just including the line handling code directly.

My test text file is 10037355 lines long and 1028071833 bytes big. I compiled my code using HIPE.

The quick answer is, using a fun() is slightly slower than not using it (which is to be expected). For my particular test, using a fun() was approximately 20% slower.

test Run 1 (sec) Run 2 (sec)
fun() 484.631 485.380
without fun() 404.017 403.632

Here’s the code. I stole the erlang timing functions from David King (thanks David!).

Using a fun()


time_takes(Mod,Fun,Args) ->
  Start=erlang:now(),
  Result = apply(Mod,Fun,Args),
  Stop=erlang:now(),
  io:format("~p~n",[time_diff(Start,Stop)]),
  Result.

time_diff({A1,A2,A3}, {B1,B2,B3}) ->
  (B1 - A1) * 1000000 + (B2 - A2) + (B3 - A3) / 1000000.0 .

handle_line(Line, SplitOn) ->
	L = string:tokens(string:strip(Line, both, $n), "	"),
	{Dimensions, Measures} = lists:split(SplitOn, L),
	lists:map(fun(X) -> {I,_} = string:to_integer(X), I end, Measures).

process_file(Filename, Proc) ->
	{ok, File} = file:open(Filename, read),
	process_lines(File, Proc, 0).

process_lines(File, Proc, LineNum) ->
	case io:get_line(File, '') of
		eof -> file:close(File);
		Line ->
			Proc(Line),
			process_lines(File, Proc, LineNum + 1)
	end.

Including Code Directly (no fun())

(I’m just showing the difference.)


process_lines(File, Proc, LineNum) ->
	case io:get_line(File, '') of
		eof -> file:close(File);
		Line ->
			L = string:tokens(string:strip(Line, both, $n), "	"),
			{Dimensions, Measures} = lists:split(10, L),
			lists:map(fun(X) -> {I,_} = string:to_integer(X), I end, Measures),
			process_lines(File, Proc, LineNum + 1)
	end.

Of course, it’s easy to argue that Erlang isn’t the best language for string manipulation. But this part of the application is hardly the bottleneck, so I’m willing to take the bloat in order to take advantage of the concurrency later on.

Next up, I’ll do timing experiments testing if tail recursion speeds anything up.

Enabling SMP Support for Erlang on Mac OS X

Tuesday, September 11th, 2007

If you are working with Erlang on Mac OS X and have installed it via Mac Ports, then you might not be running the SMP enabled erlang.

To check, start erlang as


erlang -smp

If you get:


Argument '-smp' not supported.

Then your erlang was not compiled with SMP. All you’ll need to do is:


sudo port uninstall erlang
sudo port install erlang +smp

Then when you run erlang -smp, you’ll be dropped right into the erlang shell. When you run anything with multiple processes, you’ll see both of your CPU cores active.

GRDDL Is Out, How To Integrate With SPARQL

Tuesday, September 11th, 2007

GRDDL is out, providing a mechanism for providing instructions to convert documents on the web into RDF. In short, GRDDL allows you to link an XSLT transform to your XHTML page, which converts the XHTML into an RDF document. For more information, start at the GRDDL Primer.

(The irony is that while you can use XSLT to convert into RDF, you can’t ever use XSLT to convert RDF into something else with complete certainly because RDF/XML output is nondeterministic.)

The primer includes a few examples of using SPARQL to query the RDF document generated by a GRDDL transform. Here’s an example from the primer:


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://www.purl.org/stuff/rev#>
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?rating ?name ?region ?hotelname

FROM <http://www.w3.org/TR/grddl-primer/hotel-data.rdf>

WHERE {
?x rev:hasReview ?review;
	vcard:ADR ?address;
	vcard:FN ?hotelname .
?review rev:rating ?rating .
?address vcard:Locality ?region.

FILTER (?rating > "2").

?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
	foaf:homepage ?homepage

}

Looking at the FROM line, you see that we are referencing an RDF document by URI. However, if we are using GRDDL, that document doesn’t exist until after we perform any transforms.

This means we can’t use GRDDL directly in our SPARQL queries, as there isn’t a physical RDF document to reference.

However, using the ever useful GRDDL Service, which is an online web service (lower case web service :) to generate RDF from documents using GRDDL, we could integrate GRDDL enabled documents directly into our SPARQL queries.

Let’s replace the FROM clause in our original SPARQL query with the direct URI to the RDF document (instead of the generated "middle man" RDF document).


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rev: <http://www.purl.org/stuff/rev#>
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?rating ?name ?region ?hotelname

FROM <http://www.w3.org/2007/08/grddl/?docAddr=http%3A%2F%2Fwww.w3.org%2FTR%2Fgrddl-primer%2Fhotel-data.html&output=rdfxml>

WHERE {
?x rev:hasReview ?review;
	vcard:ADR ?address;
	vcard:FN ?hotelname .
?review rev:rating ?rating .
?address vcard:Locality ?region.

FILTER (?rating > "2").

?review rev:reviewer ?reviewer.
?reviewer foaf:name ?name;
	foaf:homepage ?homepage

}

There, now isn’t that much more webby? I think the success of GRDDL lies with the integration into existing RDF toolkits. Otherwise, it’s a two step process to get documents off the web, transformed into RDF, and then into RDF tools.

For XHTML documents, though, my money is with RDFa. I think linking XSLT to XHTML is just too complicated and brittle (hmm, rhymes with GRDDL) for the masses OR for the tools. RDFa at least lets me directly embed the markup inside my XHTML documents, which makes it much easier to change when I change the XHTML. Plus, as my tools will dynamically generate the XHTML (think template languages for web frameworks) I can easily embed the RDFa right into the templates. Plus, I’m already using CSS and CSS classes, which RDFa encourages, so I can ride off of that investment.

OLAP Cube Construction with Erlang

Monday, September 10th, 2007

I’ve managed to build the parallel OLAP cube constructor in Erlang. This program achieves parallelization through creating a process for every dimension in the OLAP cube. Each process manages the file that holds the dimension data. Messages are passed from the first dimension all the way down to the last dimension which stores the measures themselves. To further parallelize things, you can partition any dimension using a modulus, which creates another file and process. This helps get around the 2 GB limit for dets tables.

I also have basic path based querying working, which is also parallelized through sending the query message through each dimension. While the querying itself isn’t parallel for a particular client, it will theoretically scale to handle many clients.

When I move to move traditional querying to generate a traditional tabular result set, I will be able to parallelize the query for a single client.

I’ll post the working code once I can choose a suitable license. I’m very interested to hear feedback, as I’m very much still an Erlang n00b.

Next up I’ll generate some performance numbers to see if this thing will actually perform in the real world.

I have to say, functional programming is great when your solution is algorithmic. Previous implementations of mine were done in Java or Ruby, which are object oriented. The classes and object obscured the algorithm, which in the case of OLAP cubes is the primary focus.

Parallel OLAP Cube Construction

Sunday, September 9th, 2007

Still tilting at windmills here, with my quest to calculate an OLAP cube still trotting along. Lately I’ve been using Erlang to see if I can use its concurrent programming abilities to scale the OLAP cube generation. The initial progress is promising, with clear concurrency complete.

I hope to post the majority of the code soon, after I run a few more tests. For now, here’s how I add two lists of numbers in Erlang. Please let me know if there’s a better way.

Update:

The best way to do this is (thanks to Daniel Larsson):


lists:zipwith(fun(X,Y) -> X+Y end, [1,2],[3,4])

Old ‘n busted way:


add_measures(OldMeasures, NewMeasures) ->
  add_measures(OldMeasures, NewMeasures, []).

add_measures([Old|OldRest], [New|NewRest], Accum) ->
  add_measures(OldRest, NewRest, [Old+New|Accum]);

add_measures([], [], Accum) -> lists:reverse(Accum).

My Other Spring 2 Book is Finally Out

Saturday, September 8th, 2007

This is a blast from the past. Way back in January 2006 I wrote a chapter for what was then called Beginning Spring 2. I sent the chapter along in draft form for initial feedback. I never heard back, so I figured the author or publisher dropped the book.

Well, the other day I received a box of three copies of Building Spring 2 Enterprise Applications. And my name was squarely on the cover. Who knew?

First off, I want to thank whoever did the review and editing of the chapter. Those are tough jobs. Also thanks to the rest of the authors for pulling together to push the book out.

I’m still not sure what the story is behind the book or my chapter. I never had a chance to edit the chapter or review it before publication. The title has changed. The original author left the project. So I want to apologize if it doesn’t make any sense or has errors.

I thumbed through the book, and all that Spring came crashing back to me. See, the secret is, ever since I wrote Expert Spring MVC and Web Flow, I have been using Ruby on Rails. So checking out this new Spring book has made me realize how much easier it is to write web applications in Rails than Spring. Don’t get me wrong, the Spring Framework has the best engineered code I’ve ever seen (certainly much better than Rails core code), and I learned a lot about the right way to construct a framework. I still maintain that Spring is a better choice if you have to integrate with many different technologies (whether you like it or not). But as always, choose the right tool for the job.

That said, I’m happy Building Spring 2 Enterprise Applications finally made it out.