Archive for the ‘cube’ Category

OLAP Cube Construction with Erlang

Monday, September 10th, 2007

I’ve managed to build the parallel OLAP cube constructor in Erlang. This program achieves parallelization through creating a process for every dimension in the OLAP cube. Each process manages the file that holds the dimension data. Messages are passed from the first dimension all the way down to the last dimension which stores the measures themselves. To further parallelize things, you can partition any dimension using a modulus, which creates another file and process. This helps get around the 2 GB limit for dets tables.

I also have basic path based querying working, which is also parallelized through sending the query message through each dimension. While the querying itself isn’t parallel for a particular client, it will theoretically scale to handle many clients.

When I move to move traditional querying to generate a traditional tabular result set, I will be able to parallelize the query for a single client.

I’ll post the working code once I can choose a suitable license. I’m very interested to hear feedback, as I’m very much still an Erlang n00b.

Next up I’ll generate some performance numbers to see if this thing will actually perform in the real world.

I have to say, functional programming is great when your solution is algorithmic. Previous implementations of mine were done in Java or Ruby, which are object oriented. The classes and object obscured the algorithm, which in the case of OLAP cubes is the primary focus.

Parallel OLAP Cube Construction

Sunday, September 9th, 2007

Still tilting at windmills here, with my quest to calculate an OLAP cube still trotting along. Lately I’ve been using Erlang to see if I can use its concurrent programming abilities to scale the OLAP cube generation. The initial progress is promising, with clear concurrency complete.

I hope to post the majority of the code soon, after I run a few more tests. For now, here’s how I add two lists of numbers in Erlang. Please let me know if there’s a better way.

Update:

The best way to do this is (thanks to Daniel Larsson):


lists:zipwith(fun(X,Y) -> X+Y end, [1,2],[3,4])

Old ‘n busted way:


add_measures(OldMeasures, NewMeasures) ->
  add_measures(OldMeasures, NewMeasures, []).

add_measures([Old|OldRest], [New|NewRest], Accum) ->
  add_measures(OldRest, NewRest, [Old+New|Accum]);

add_measures([], [], Accum) -> lists:reverse(Accum).