<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: I Second That Emotion</title>
	<link>http://www.semergence.com/2007/09/24/i-second-that-emotion/</link>
	<description>Semantic Web, Ruby on Rails, and Massive Data</description>
	<pubDate>Fri, 21 Nov 2008 04:23:27 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.3</generator>

	<item>
		<title>By: Stephan Schmidt</title>
		<link>http://www.semergence.com/2007/09/24/i-second-that-emotion/#comment-270</link>
		<dc:creator>Stephan Schmidt</dc:creator>
		<pubDate>Wed, 26 Sep 2007 16:07:38 +0000</pubDate>
		<guid>http://www.semergence.com/2007/09/24/i-second-that-emotion/#comment-270</guid>
		<description>The real question seems to be; why is there no buffered implementation in io:get_line(File). Did Tim not find it, does the standard Erlang library not provide one or should one use a 3rd party lib?

Igwans comment is not really a solution. Sometimes reading files into memory works, sometimes it doesn't. Sometimes your files are just do big to read into memory.

When comparing the in memory solution to a Java/Scala solution, one should implement the Scala solution with NIO which just uses memory mapped files and OS memory managment. This is much faster than reading files to a buffer by hand (I did use NIO to parse large log files for analysis).

And if one is supposed to develop a buffered implementation or a memory mapped one then my main concern with all languages beside Java comes into play: With Java you usually do not develop applications but assemble them. There are so many great open source libraries around (lucene, svnkit, spring, seam, camel, ...) that most of the time you only write plumbing code.

Peace
-stephan

-- 
Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
http://www.reposita.org 
Blog at http://stephan.reposita.org - No signal. No noise.</description>
		<content:encoded><![CDATA[<p>The real question seems to be; why is there no buffered implementation in io:get_line(File). Did Tim not find it, does the standard Erlang library not provide one or should one use a 3rd party lib?</p>
<p>Igwans comment is not really a solution. Sometimes reading files into memory works, sometimes it doesn&#8217;t. Sometimes your files are just do big to read into memory.</p>
<p>When comparing the in memory solution to a Java/Scala solution, one should implement the Scala solution with NIO which just uses memory mapped files and OS memory managment. This is much faster than reading files to a buffer by hand (I did use NIO to parse large log files for analysis).</p>
<p>And if one is supposed to develop a buffered implementation or a memory mapped one then my main concern with all languages beside Java comes into play: With Java you usually do not develop applications but assemble them. There are so many great open source libraries around (lucene, svnkit, spring, seam, camel, &#8230;) that most of the time you only write plumbing code.</p>
<p>Peace<br />
-stephan</p>
<p>&#8211;<br />
Stephan Schmidt :: <a href="mailto:stephan@reposita.org">stephan@reposita.org</a><br />
Reposita Open Source - Monitor your software development<br />
<a href="http://www.reposita.org" rel="nofollow">http://www.reposita.org</a><br />
Blog at <a href="http://stephan.reposita.org" rel="nofollow">http://stephan.reposita.org</a> - No signal. No noise.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Gibbs</title>
		<link>http://www.semergence.com/2007/09/24/i-second-that-emotion/#comment-266</link>
		<dc:creator>Simon Gibbs</dc:creator>
		<pubDate>Tue, 25 Sep 2007 14:11:31 +0000</pubDate>
		<guid>http://www.semergence.com/2007/09/24/i-second-that-emotion/#comment-266</guid>
		<description>Reading the comments on Tim Bray's post it seems the difference is in the buffering. 

Igwan advocates buffering the whole file before processing it and claims faster times, also Java's buffered reader implements (who'd have thought it) a reasonable sized buffer (ISTR its 8kb) by default. This means each disk read for Erlang is 1 or 2 bytes and it repeats the process to identify lines, but Java hoovers up a few Kb each time and works out where the lines are afterwards.

With this in mind, the stats you posted are no great surprise - Erlang will be working the IO subsystem pretty hard and that will slow down the whole thing.</description>
		<content:encoded><![CDATA[<p>Reading the comments on Tim Bray&#8217;s post it seems the difference is in the buffering. </p>
<p>Igwan advocates buffering the whole file before processing it and claims faster times, also Java&#8217;s buffered reader implements (who&#8217;d have thought it) a reasonable sized buffer (ISTR its 8kb) by default. This means each disk read for Erlang is 1 or 2 bytes and it repeats the process to identify lines, but Java hoovers up a few Kb each time and works out where the lines are afterwards.</p>
<p>With this in mind, the stats you posted are no great surprise - Erlang will be working the IO subsystem pretty hard and that will slow down the whole thing.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
