<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CodeMaestro</title>
	<atom:link href="http://www.codemaestro.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.codemaestro.com</link>
	<description>The Coding Experience</description>
	<lastBuildDate>Fri, 05 Feb 2010 22:15:40 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Compile Time Assertions</title>
		<link>http://www.codemaestro.com/reviews/20</link>
		<comments>http://www.codemaestro.com/reviews/20#comments</comments>
		<pubDate>Fri, 05 Feb 2010 22:15:40 +0000</pubDate>
		<dc:creator>Ami Chayun</dc:creator>
				<category><![CDATA[Reviews]]></category>

		<guid isPermaLink="false">http://www.codemaestro.com/reviews/20</guid>
		<description><![CDATA[Standard preprocessor checks can be very useful to catch programming errors. Nevertheless, these checks can be very limited, as they can only evaluate preprocessor expressions, rather than compile time values like sizeof. Presented here is a small piece of code to create compile time checks]]></description>
				<content:encoded><![CDATA[<p><em>Standard preprocessor checks can be very useful to catch programming errors. Nevertheless, these checks can be very limited, as they can only evaluate preprocessor expressions, rather than compile time values like sizeof. Presented here is a small piece of code to create compile time checks</em></p>
<p>There are many useful cases where checks done at compile time can prevent runtime errors. Consider the following (non-compiling) code:</p>
<pre>
struct {
    int x;
    int y;
} a;
#if (sizeof(a) != 8)
#error "Struct a is not the right size!"
#endif
</pre>
<p>Unfortunately this code doesn&#8217;t compile. The &#8216;#if&#8217; statement is evaluated at precompile-time, but sizeof is a compilation instruction. The solution is compile time assertion. A little hack that forces the compiler to evaluate an expression during compilation follows.</p>
<p><strong>CCASSERT &#8211; Compile Time Assert Macro</strong><br />
In order to evaluate the expression, the macro will try to define an array with size calculated by predicate. Note that this is legitimate ANSI-C code, so most compilers should support it.</p>
<pre>
#define CCASSERT(expr) \\
           typedef char ASSERT_CONCAT(constraint_violated_on_line_, __LINE__) \
                               [2*((expr)!=0)-1];
#define ASSERT_CONCAT(a, b) ASSERT_CONCAT_(a, b)
#define ASSERT_CONCAT_(a, b) a##b
</pre>
<p>Example usage:</p>
<pre>
#define TOTAL 256
#define SIZE1 122

int arr1[SIZE1]
int arr2[TOTAL - SIZE1];

CCASSERT(sizeof(arr2) == 134 * sizeof(int))
</pre>
<p>In case the validation fails, the following compile time error will occur:</p>
<pre>
error: size of array ‘assert_on_line_8’ is negative
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/reviews/20/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Easy tricks for optimizing common string operations</title>
		<link>http://www.codemaestro.com/articles/21</link>
		<comments>http://www.codemaestro.com/articles/21#comments</comments>
		<pubDate>Fri, 05 Feb 2010 21:38:52 +0000</pubDate>
		<dc:creator>Tomer Margolin</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://www.codemaestro.com/articles/21</guid>
		<description><![CDATA[Whether your data structure uses hashes, trees or any other pattern to store strings, it all boils down to comparing your query string against strings from the repository. The actual string comparison is an aspect sometimes overlooked. This article lists some easy tricks that could make your string comparisons run much faster.]]></description>
				<content:encoded><![CDATA[<p><em><br />
When implementing a data structure that contains strings, or a string based key, its most basic functionality would probably be to search the existence of a query string. Whether this data structure uses hashes, trees or any other pattern, it all boils down to comparing your query string against strings from the repository. With each search we try to find our query string, or prove that it doesn&#8217;t already exist, with as few string comparisons as possible.<br />
<br />
The actual string comparison is an aspect sometimes overlooked. This article lists some easy tricks that could make your string comparisons run much faster.<br />
</em></p>
<p>
When approaching the string comparison optimization problem, what we would like to do is to provide effective and efficient ways to rule out most of the candidate strings. We may refer to it as a &#8220;disqualifying comparison&#8221; &#8211; it lets us move faster down the search tree or move faster along the hash bucket linked list, until reaching the final string comparison in the search, keeping in mind that even the most efficient hash structure would probably waste a substantial amount of its time and cycles in string comparison.<br />
 </p>
<p>
Note that this article assumes all strings in the world are composed of 1 byte characters, which is not true (unfortunately for our community, someone invented Unicode&#8230;). However, to demonstrate the principals let&#8217;s assume they do.
</p>
<h2>A. Different lengths is sometimes enough</h2>
<p>When the only need is to find out whether two strings are identical or not, checking that string lengths differ is enough for ruling out most of the compared candidates. Of course, that doesn&#8217;t mean calculating the lengths of strings for each comparison &#8211; In common practice, usually, all that needs to be done is storing the string lengths along with the actual strings in the data structures, and designing the software so that the query will also contain both the actual string and its length. Therefore, in runtime, there will be no need for calculating lengths for most comparisons.<br />
Only when the string lengths are the same, there is a need to actually compare the strings themselves.<br />
Obviously, the above optimization would be very effective only for data repositories that contains strings with different lengths, but in these cases,for each string, we minimize all the comparison operations to only one. When comparing user names, URLs, or other human readable resource names, this little optimization would prove itself.</p>
<h2>B. Why compare byte by byte anyway?</h2>
<p>It turns out that some (if not most) implementations of libc use a byte by byte comparison for strcmp and other string comparison functions. For example, a snippet from OpenBSD&#8217;s libc implementation (found with Google Code Search):</p>
<pre>
int
strcmp(const char *s1, const char *s2)
{
        while (*s1 == *s2++)
                if (*s1++ == 0)
                        return (0);
        return (*(unsigned char *)s1 - *(unsigned char *)--s2);
}
</pre>
<p>A much more efficient implementation would compare elements according to the processor&#8217;s block size, size_t bytes usually. A 64 bit architecture is able to compare 8 ASCII characters in one cycle &#8211; why not use it?<br />
In many cases, a simple solution would be to use memcmp instead of strcmp. In many platforms, memcmp is implemented very efficiently &#8211; some by using a block size comparison in C as in this <a href="http://www.google.com/codesearch/p?hl=en#5ge3gHPB4K4/gnu/glibc/glibc-2.3.6.tar.bz2|992tyGMok7w/glibc-2.3.6/sysdeps/generic/memcmp.c&#038;q=memcmp">glibc implementation</a>, and some even implement it very carefully in assembly, such as <a href="http://www.google.com/codesearch/p?hl=en#BuFT5TyPBak/pub/linux/kernel/v2.2/linux-2.2.26.tar.bz2|p4tPAkVsQ_c/linux-2.2.26/arch/sparc64/lib/memcmp.S&#038;q=memcmp">Sparc64 linux kernel implementation</a>). When you know your memcmp is not as efficient, it&#8217;s just implementing your own function, relying on implementations such as these from the newest glibc you can find. </p>
<h2>C. Direction counts</h2>
<p>In some cases, all strings have similar characteristics, specifically &#8211; similar prefixes or suffixes. For example, a repository of phone numbers is likely to have many similar prefixes, whether the same country code for USA or even the same area code prefixes for all the phone numbers in the same state. In our example, the length comparison optimization described above would not be so effective since all the phone numbers in the same state have the same lengths.<br />
<br />
However, an effective optimization here is to implement reverse order string comparison function. This would rule out most of the strings is much faster than using regular comparison methods.<br />
Going forward with the same optimization method, looking at the string characteristic may imply the optimal comparison function to use;  As an example, let&#8217;s examine a repository with picture file names. Each file name is likely to begin with, let&#8217;s say, the prefix &#8220;pic&#8221; and end with the file extension, which is probably &#8220;.jpg&#8221;. Therefore, an effective disqualifying comparison would probably be a reverse order comparison starting four characters from the end.</p>
<h2>D. Boost your case insensitive comparisons</h2>
<p>Some string based repositories are required to be case insensitive. Therefore, given a search string, a naive implementation would first transform the string to lower case letters and only then search for it. However, this means copying the query string. Given that most of the operations to be done are queries, there must be a way to avoid these string copy operations.<br />
A nice solution found on one <a href="http://www.gamedev.net/community/forums/topic.asp?topic_id=83535&#038;whichpage=1&#420485">GameDev forum</a> is using the magic constant 0xDF. It relies on the fact that the difference between lower case and upper case ASCII characters is only in the 6th bit. Therefore, a simple bitwise operation for each character comparison could make this comparison case insensitive. So a single case-insensitive character comparison would look like this, assuming your repository entries are already in lower case:</p>
<pre>
query[i] &#038; 0xDF == dbString[i]
</pre>
<p>When expanding this method to be used in conjunction with the optimization in section B above, a case insensitive comparison of 4 characters on a 32 bit architecture would be with a bitwise operation of &#038; 0xDFDFDFDF !</p>
<pre>
query32bitValue[i] &#038; 0xDFDFDFDF == dbString32bitValue[i]
</pre>
<h2>The optimal string comparison function</h2>
<p>Perhaps the most important conclusion here, is that there is no such thing as an optimal string comparison function to copy-paste from this article. When implementing a strings repository, special consideration may be made to find your specific optimal string comparison function &#8211; suited to your own specific needs and your own specific data characteristics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/articles/21/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Variadic arguments in C/C++</title>
		<link>http://www.codemaestro.com/reviews/18</link>
		<comments>http://www.codemaestro.com/reviews/18#comments</comments>
		<pubDate>Sun, 03 Aug 2008 08:00:41 +0000</pubDate>
		<dc:creator>Ami Chayun</dc:creator>
				<category><![CDATA[Reviews]]></category>

		<guid isPermaLink="false">http://www.codemaestro.com/reviews/18</guid>
		<description><![CDATA[Very few programming languages allow the user to define functions with variable number of parameters. Although this method is very powerful, it should be used carefully. In this article we will explore variadic functions and macros, as well as ways to better utilize this capability.]]></description>
				<content:encoded><![CDATA[<p><em>Very few programming languages allow the user to define functions with variable number of parameters. Although this method is very powerful, it should be used carefully. In this review we will explore some of the uses of variadic functions and macros, as well as common pitfalls.</em></p>
<p><strong>Variadic Functions in C</strong><br />
A classic usage of a function that accepts a variable number of arguments is:</p>
<pre>
#include &lt;stdarg.h&gt;
#include &lt;stdio.h&gt;

enum {
    ERROR_LOG,
    WARN_LOG,
    INFO_LOG,
    VERBOSE_LOG,
    DEBUG_LOG
} log_levels;

const char *log_levels_str[] = {
    "Error",
    "Warning",
    "Info",
    "Verbose",
    "Debug",
};

int applog(int level, const char *fmt, ...)
{
    int ret;
    FILE *stream;
    va_list ap;
    va_start(ap, fmt);

    (level <= WARN_LOG) ? (stream = stderr) : (stream = stdout);
    fprintf(stream, "%s: ", log_levels_str[level]);
    ret = vfprintf(stream, fmt, ap);
    va_end(ap);
    return ret;
}
</pre>
<p>The call to <i>va_start</i> will initialize the variable <i>ap</i> right after <i>fmt</i>. The caller can then use <i>ap</i> to access variables via <i>va_arg</i>, or pass it to other functions.</p>
<p><strong>GCC's 'format' __attribute__</strong><br />
GCC provides a nice protection against misuse of variadic functions in compile time. Just like warnings issued on the 'printf' family of functions, it is possible to protect your own implementation. For example:</p>
<pre>
//If the compiler does not support attributes, disable them
#ifndef __GNUC__
#   define  __attribute__(x)
#endif

int applog(int level, const char *fmt, ...) 
    __attribute__ ((format(printf, 2, 3) ));
</pre>
<p>The attribute will preform a compile time test for a printf style call, and will try to match argument #2 (fmt) as the format string. Argument #3 of the function will mark the beginning of the variadic arguments.</p>
<p>The format attribute also support the following validation methods: scanf, strftime and strfmon.</p>
<p>This attribute is supported in all GCC versions 2.x and up. For more information about this attribute (and others) see [1].</p>
<p><strong>C++ quirks</strong><br />
When writing variadic functions in C++ you might think the following is correct:</p>
<pre>
class A {
...
    int myprintf(const char *fmt, ...)
        __attribute__((format(printf, 1, 2)));
};
</pre>
<p>The format attribute should use the first parameter as the format string and the second as the variable part. Compiling this in GCC will output the following error:</p>
<pre>
error: format string argument not a string type
</pre>
<p>This is of course due to the fact that C++ has an extra argument in all class members,  <i>this</i>. So the correct declaration should be:</p>
<pre>
class A {
...
    int myprintf(const char *fmt, ...)
        __attribute__((<strong>format(printf, 2, 3)</strong>));
};
</pre>
<p><strong>Accessing Variables Manually</strong><br />
Besides passing the va_list struct to various supporting functions, it is possible to iterate on the function's variables manually. This can be done similar to:</p>
<pre>
int sum_many(int first, ...)
    __attribute__((sentinel(0)));

int sum_many(int first, ...)
{
    int num, ret = first;
    va_list ap;
    va_start(ap, first);
    while ( (num = va_arg(ap, int)) != NULL) {
       ret += num;
    }
    va_end(ap);
    return ret;
}
</pre>
<p>Each call to <em>va_arg</em> will return the next argument. The <em>va_arg</em> function requires a typename (or a pointer to a typename) to cast the resulting argument.<br />
The <em>sentinel(0)</em> attribute enforces a compile time check that the argument at position 0 (last argument of the function) is an explicit NULL.</p>
<p><strong>Variadic Macros</strong><br />
Sometimes it can be very useful to have a macro with unknown number of variables. While this seems simple enough, it was only added as part of C99. Example usage follows:</p>
<pre>#define LOG_DEBUG(fmt, ...) (applog(DEBUG_LOG, "[%s at %s:%u]: " fmt,  \
__FUNCTION__, __FILE__, __LINE__, __VA_ARGS__))
</pre>
<p>Variadic macros are supported by any C99 compliant compiler. Namely GCC 3.3 and above, Microsoft Windows Visual Studio 2005 and up (see [3]).</p>
<p>References:<br />
[1] http://unixwiz.net/techtips/gnu-c-attributes.html<br />
[2] http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html<br />
[3] http://msdn2.microsoft.com/en-us/library/ms177415(VS.80).aspx </p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/reviews/18/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Easy DWORD-Alignment of Binary Structures</title>
		<link>http://www.codemaestro.com/articles/19</link>
		<comments>http://www.codemaestro.com/articles/19#comments</comments>
		<pubDate>Wed, 11 Jun 2008 12:34:29 +0000</pubDate>
		<dc:creator>Eyal Itskovits</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://www.codemaestro.com/uncategorized/19</guid>
		<description><![CDATA[Information stored in binary data structures is an area so fundamental that there is hardly any field in the industry that doesn’t relate to the parsing or extracting of binary memory sources. However, my intention in this article are so small and specific, that I really want to believe there’s someone else out there who [...]]]></description>
				<content:encoded><![CDATA[<p>Information stored in binary data structures is an area so fundamental that there is hardly any field in the industry that doesn’t relate to the parsing or extracting of binary memory sources.  However, my intention in this article are so small and specific, that I really want to believe there’s someone else out there who would ever find this useful :)<BR><br />
Binary structures tend to succeed one another.  And as long as each of them contains any indication of one’s size, there’s no problem just scrolling through the chain and reading (or memcpy-ing)  them one after the other.<br />
Recently I came across a case in which I had to perform parsing of binary headers. There were potentially ten different headers, each of a different size.  Nothing’s special, so far.  <BR><br />
Examining the data, I found out that the headers were not really attached; there were “padding bytes” between any two headers. Closer examination showed that the headers were DWORD-Aligned. That is, each header starts on the first byte of a DWORD, and in case it doesn’t end on the last byte of a certain DWORD, few padding bytes are added for completion.<br />
This is a little annoying. Before any copying of data you have to make sure you’re pointing to an actually structure and not to a padding byte.  It’s really not a big challenge to calculate the next position of the next structure (given the previous structure ending byte) – </p>
<p><PRE><code>BINARY_HEADER 	*pHeader		= NULL;</p>
<p>//pCurByte points to the last byte of the previous strcture.<br />
pCurByte+= sizeof(DWORD) – (pCurByte % sizeof(DWORD) );<br />
pHeader = pCurByte;</code></PRE></p>
<p>However I came across this really cool code in some MS sources I was looking at. They add an Align() method to the structure which returns a pointer to the next DWORD, but the coolest thing about it, is this function implementation (!). </p>
<p><PRE><code>structure BINARY_HEADER<br />
{<br />
    byte mField[2];<br />
    byte mAnotherField[10];<br />
    const  BINARY_HEADER* Align()<br />
    {<br />
        <b>return( (BINARY_HEADER*) ( ( ( (UINT_PTR) this) + 3)  &amp; ~3) );</b><br />
    }<br />
};</code></PRE></p>
<p>All you have to do now is – </p>
<p><PRE><code>pHeader = pCurByte;<br />
pCurByte = pCurByte-&gt;Align();</code></PRE></p>
<p>Much nicer! :)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/articles/19/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Implementing a Message Dispatcher in Multi-Core Environments</title>
		<link>http://www.codemaestro.com/articles/17</link>
		<comments>http://www.codemaestro.com/articles/17#comments</comments>
		<pubDate>Fri, 02 Nov 2007 20:26:01 +0000</pubDate>
		<dc:creator>Ami Chayun</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://www.codemaestro.com/articles/17</guid>
		<description><![CDATA[Multi core programming presents different challenges than traditional parallel computing. In this article we will explore a programming paradigm called ‘the dispatcher’ and its implementation in multi-core environment.]]></description>
				<content:encoded><![CDATA[<p><strong>Multi core programming presents different challenges than traditional parallel computing. In this article we will explore a programming paradigm called ‘the dispatcher’ and its implementation in multi-core environment.</strong></p>
<p>This post will present the subject and discuss design considerations, code examples will be presented in a later post.</p>
<h2>Multi-Core vs. Multi Processor</h2>
<p>Multi-core environments tend to be a bit different than multi-processor ones. Here are the two major processor-specific factors we address to in this article:<br />
<strong>Cache</strong> – Multi core CPUs usually have smaller L1 cache, and the L2 cache is shared between cores. This calls for small and specialized code. Large code will cause a lot of instruction cache misses, and will degrade performance.<br />
<strong>Bus</strong> – Inter core bus is an important factor. If the bus is slow, data transfer between cores will become a huge bottleneck.</p>
<h2>The Dispatcher</h2>
<p>The dispatcher model assumes that there is an incoming queue of messages (or tasks) that needs some processing. Each processing unit (from here on referred simply as &#8216;core&#8217;) can be used in parallel to the others. Usually in the end the message is transported to a location. Prime examples of the dispatcher model are packet processing devices (router, firewall etc.) and graphical processing units.</p>
<p>As an allegory we can see the dispatcher as a kitchen, where waiters bring in orders, and the kitchen team prepares dishes and delivers them back to the waiters for serving.</p>
<h2>Pipeline vs. Run-to-Completion</h2>
<p>There are two major models for the dispatcher. The <em>pipeline</em> model assigns a specific task to each processing unit, where in the <em>run-to-completion</em> model each processing unit handles a single message from start to end.</p>
<h2>The Pipeline Model</h2>
<p>We would like to implement a pipeline in our restaurant. We teach each person a specific task. One will be in charge of sauces, another on garnish etc. When an order arrives, the plate goes from one cook to another, each performing its relevant work. Finally the plate is returned to the counter to be transported by the waiter.<br />
In the packet processing world we can use one core to routes packet, another to enforce an ACL (access list) etc.</p>
<h3>Analysis</h3>
<p>The pipeline model has several advantages:</p>
<ul>
<li><strong>Specialization</strong> – Each processing unit specializes at a specific task. In the restaurant model this means we have to do less training, we train each person only with the appropriate task. In the multi core model this means that less code runs on each core. When the size of the code is small, it can be optimized to fit in the instruction cache, increasing overall performance.</li>
<li><strong>Flexibility</strong> – The pipeline model is very flexible, if we see that a certain task slows down the entire process, we can assign another core to that task. For example if all the dishes are waiting for garnish for a long time, we assign another person to garnish from another task.<br />
<strong>Note</strong>: This is a very expensive task to do at runtime. It so must not be done too often or we lose other benefits (see strong affiliation below).</li>
<li><strong>Shared Data Locks</strong> – This model usually needs only a few locks on shared data structures, since not all the cores access all the data structures.</li>
</ul>
<p>The pipeline model has several limitations:</p>
<ul>
<li><strong>Data duplication</strong> – Data should travel between different processing units. If the data is large, this will clog the bus, causing messages to wait on the bus most of the time.<br />
An important rule of thumb in the pipeline model is to transfer as little data as possible – preferably just a pointer between the cores.</li>
<li><strong>Strong affiliation</strong> – Since each task is assigned to a specific processing unit, it is said that the code is affiliated with this core. If we decide during runtime to change the task of a core, for a significant amount of time we lose all the instruction and data cache.
<p>In the restaurant model this is similar to a person that is trained to cook fish, and now we need to train the person to prepare sauces.</li>
<li><strong>Message Locks</strong> – Since we need to exchange data between cores, if we write on the message itself, we will almost certainly need to lock the data in transport. This calls for multiple locks for each message.</li>
<li><strong>Robustness</strong> – What happens if one of the processing units gets stuck? If no control is done, this will cause the entire process to fail. If data gets corrupt in one core, this will cause the entire message to get corrupt. The pipeline therefore requires a strong watchdog to act when something goes wrong. See more on the control section below.</li>
<li><strong>Unfair Work Division</strong> – Let’s assume that in our kitchen one person is in charge for fish, and another for desserts. If there is a slow fish day, our fish person is mostly unemployed. In a multi-core system this means that some cores might be idling while the system runs on full load. This can be handled by dynamically allocating cores to task, but as explained earlier, with some penalty.</li>
</ul>
<h3>Control Point</h3>
<p>The pipeline model requires a strong controlling process to make sure nothing goes wrong. The control will usually have a dedicated core to the task, or even an entire dedicated CPU for extra robustness (in case we need to restart an entire processor).<br />
Some of the control point roles are:</p>
<ul>
<li><strong>Message Handling Time Limit</strong> – Putting some upper limit to the amount of time a message can spend in a core is usually a good idea. This can help detect deadlocks and non uniform performance.</li>
<li><strong>Core Reassignment</strong> – A watchdog must be prepared to remove a core from the pipeline or changing its task. This helps dealing with major faults and fair task division.</li>
<li><strong>Command and Control Central Point</strong> – The control point is the central place where all control and configuration commands are processed. It is usually a bad idea giving direct user commands to processing cores. User commands can be errornous and cause system instability. The control point must assure the commands are safe, and track for complete execution of the control commands. In case the control command failed or caused system instability, the watchdog must re-stabilize the system and notify the user of the error.</li>
</ul>
<h2>The Run-to-Completion Model</h2>
<p>Let&#8217;s assume that in our restaurant we chose a different model. Every person will handle a dish from beginning to end. Every person is well trained to do all the tasks that are involved, and from the moment an order arrives, that person prepares it with no interruptions until it’s finished.</p>
<p>This is very similar to the <em>thread-pool</em> model, but here we have a guarantee that a dedicated core runs from beginning to end uninterrupted.</p>
<h3>Analysis</h3>
<p>Let’s go over the advantages of the run-to-completion model:</p>
<ul>
<li><strong>Independence</strong> – Every processing unit is independent, no data is transferred between cores, when there are no interruptions, it is easy to measure how much time each processing unit takes to complete the task and provide real-time assurances.</li>
<li><strong>Scalability</strong> – Adding more processing units is an easy task. Since all the cores are symmetric, adding a core to the game will just add another worker to the pool.</li>
<li><strong>Message Locks</strong> – The message does not need to be locked. From start to end it is accessed only by a single core.</li>
</ul>
<p>The model has several shortcomings: </p>
<ul>
<li><strong>Large code</strong> – All the processing units run all the tasks thus every processor needs to run a lot of code. If the instruction caching is not good enough, this will cause a lot of cache misses and performance penalty.</li>
<li><strong>Shared Data Locks</strong> – We will almost always need to share resources between all cores. When these resources are modified, they need to be locked, causing performance penalty.</li>
</ul>
<h3>Control Point</h3>
<p>The control point in the run-to-completion model should perform similar tasks to the ones in the previous model. Controlling the cores is usually easier, since there is no difference between the cores and there are less scenarios to deal with.</p>
<h2>Common Issues</h2>
<ul>
<li><strong>Core affiliation</strong> – In both modes it is imperative that code will run on the same core, to benefit from processor cache and better control on the process. If the controller does not knows that a specific task runs on core X it will have hard time tracking its status.</li>
<li><strong>The Input Queue</strong> – The design of the input queue has a major influence the overall process. The input queue is usually designed as a FIFO queue or a priority queue if QoS is required. The queue is a single entry point to the system, so it creates a natural bottleneck. Inefficient queue will limit the number of messages entering the system even before a single processing instruction occurred.</li>
<li><strong>Bus efficiency</strong> – Inter-core bus and I/O busses can create a bottleneck if the cores transfer data between them, once again, data should be moved as little as possible while processing. Fast busses can sometimes compensate for little or no data cache.</li>
<li><strong>Instruction and data cache</strong> – Each processing unit usually have far less cache than a full fledged CPU. Code that runs on each core should be optimized to get as many cache hits as possible, or performance will suffer. Measuring the performance of the instruction and data cache is an important system dependent factor.</li>
<li><strong>I/O and memory allocation</strong> – I/O operations and memory allocations are problematic in two factors.<br />
First, it is obvious that if the processing unit spends a lot of time waiting for I/O and memory, it is idling.<br />
A second but not less important factor is the real-time factor. I/O access and memory managers are not deterministic. We would like to be able to measure the amount of processing time for each message as accurately as possible, and I/O infer with our goal.<br />
As usual it is recommended to pre-allocate all the memory required for the message processing, and avoid any I/O operations while processing a message.</li>
</ul>
<h2>Summary</h2>
<p>When designing a complete system, you will probably need to mix-and-match the two methods to get best performance. Depending on implementation, some subsystems should have run-to-completion properties, while others should use the pipeline model as a whole.</p>
<p>It is a good idea to profile your requirements and split the work at hand to micro tasks. Once you defined all the tasks and inner relationships between them, a decision can be made.</p>
<p>If you are bound to a specif processor architecture and OS, it is imperative to research all the processor advantages and shortcomings to reach the best decision. On the other hand, on multi-platform systems you must decide and enforce a set of basic requirements, and be flexible on others you can&#8217;t control (like strong affiliation, real-time scheduler priority or fast locks, which are OS/hardware specific).</p>
<p>Take special notice to the control point. Do not satisfy designing the data data handling path. The control point is just as important!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/articles/17/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Site Update</title>
		<link>http://www.codemaestro.com/news/16</link>
		<comments>http://www.codemaestro.com/news/16#comments</comments>
		<pubDate>Sat, 27 Oct 2007 09:10:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://beta.codemaestro.com/?p=16</guid>
		<description><![CDATA[Recently we updated our site&#8217;s framework to use WordPress. This should allow easier content management, as well as several new features. Article comments are now enabled, as well as in-site search and new member registration. Hope you enjoy the changes, CodeMaestro team]]></description>
				<content:encoded><![CDATA[<p>Recently we updated our site&#8217;s framework to use <a href="http://www.wordpress.org">WordPress</a>. This should allow easier content management, as well as several new features. Article comments are now enabled, as well as in-site search and new member registration.<br />
Hope you enjoy the changes,<br />
CodeMaestro team</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/news/16/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New writers and readers</title>
		<link>http://www.codemaestro.com/news/3</link>
		<comments>http://www.codemaestro.com/news/3#comments</comments>
		<pubDate>Sat, 12 Aug 2006 21:18:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://beta.codemaestro.com/?p=3</guid>
		<description><![CDATA[New writers are joining us, as well as new readers. Lately one of the articles has even been translated to chinese (full quote in here)]]></description>
				<content:encoded><![CDATA[<p>New writers are joining us, as well as new readers. Lately one of the articles has even been translated to <a href="http://blog.donews.com/snailact/archive/2006/04/01/806368.aspx">chinese</a> (full quote in <a href="http://blog.donews.com/snailact/archive/2006/04/01/806370.aspx">here</a>) </p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/news/3/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shared Memory In Linux</title>
		<link>http://www.codemaestro.com/reviews/11</link>
		<comments>http://www.codemaestro.com/reviews/11#comments</comments>
		<pubDate>Sat, 12 Aug 2006 14:22:28 +0000</pubDate>
		<dc:creator>Ami Chayun</dc:creator>
				<category><![CDATA[Reviews]]></category>

		<guid isPermaLink="false">http://beta.codemaestro.com/?p=11</guid>
		<description><![CDATA[Linux, like most POSIX / System V compatible operating systems prefer processes over threads, as a matter of fact, a POSIX thread is nothing but a process with a layer of abstraction.]]></description>
				<content:encoded><![CDATA[<p><b>Linux, like most POSIX / System V compatible operating systems prefer processes over threads, as a matter of fact, a POSIX thread is nothing but a process with a layer of abstraction. The main motive of thread usage is the ability to share memory between several running instances and components of the program in a transparent way. Since threads have a heavy performance penalty in Linux, the preferred method of sharing an object is the use of System V SHM (SHared Memory) virtual file.</b></p>
<h2>Basic SHM setup</h2>
<p>As mentioned before, our goal is writing a program that shares a struct in C with all running instances of the software, no matter when they run. The implementation will be something in the lines of:</p>
<pre>
#include &lt;fcntl.h>
#include &lt;sys/stat.h>
#include &lt;errno.h>
#include &lt;sys/mman.h>
#include &lt;sys/types.h> //shm_open
#include &lt;stdio.h>  //printf
#include &lt;stdlib.h> //exit
#include &lt;unistd.h> //close
#include &lt;string.h> //strerror

/* This will be created under /dev/shm/ */
#define STATE_FILE "/program.shared" 
#define  NAMESIZE 1024
#define   MAXNAMES 100

/* Define a struct we wish to share. Notice that we will allocate 
 * only sizeof SHARED_VAR, so all sizes are constant              
 */
typedef struct
{
  char name[MAXNAMES][NAMESIZE];
  int flags;
}  SHARED_VAR;

int main (void)
{
int first = 0;
int shm_fd;
static SHARED_VAR *conf;

  /* Try to open the shm instance with  O_EXCL,
   * this tests if the shm is already opened by someone else 
   */
  if((shm_fd = shm_open(STATE_FILE, (O_CREAT | O_EXCL | O_RDWR), 
                       (S_IREAD | S_IWRITE))) > 0 ) {
          first = 1; /* We are the first instance */
  }
  else if((shm_fd = shm_open(STATE_FILE, (O_CREAT | O_RDWR), 
                        (S_IREAD | S_IWRITE))) < 0) {
   /* Try to open the shm instance normally and share it with 
    * existing clients 
    */
    printf("Could not create shm object. %s\n", strerror(errno));
    return errno;
  } 

  /* Set the size of the SHM to be the size of the struct. */
  ftruncate(shm_fd, sizeof(SHARED_VAR));

  /* Connect the conf pointer to set to the shared memory area,
   * with desired permissions 
   */
  if((conf =  mmap(0, sizeof(SHARED_VAR), (PROT_READ | PROT_WRITE), 
                   MAP_SHARED, shm_fd, 0)) == MAP_FAILED) {

    return errno;

  }
  if(first) {
   /* Run a set up for the first time, fill some args */
   printf("First creation of the shm. Setting up default values\n");
   conf->flags = 4;
  }
  else
  {
    printf("Value of flags = %d\n", conf->flags);
  }

/* Do some work... */

  close(shm_fd);
  exit(0);
}
</pre>
<p>Notice the code should be compiled against librt like so: <em>gcc -lrt -o shm shm.c</em></p>
<p>Once the shared memory has been set up, every time the program loads (until reboot or deletion of the virtual file), it can connect to the same shared memory location, get and store it&#8217;s data.</p>
<p>When running the binary, you will notice the shm object was created in /dev:</p>
<pre>
ami@codemaestro:~$ ls -l /dev/shm/
-rw------- 1 ami ami 102404 Jul 27 09:34 program.shared
</pre>
<h2>Access Control and Synchronization</h2>
<p>It is clear that the shm mechanism provides bare-bones tools for the user. All access control must be taken care of by the programmer. Locking and synchronization is being kindly provided by the kernel, this means the user have less worries about race conditions. Note that this model provides only a symmetric way of sharing data between processes. If a process wishes to notify another process that new data has been inserted to the shared memory, it will have to use signals, message queues, pipes, sockets, or other types of IPC.</p>
<h2>Shared Kernel memory</h2>
<p>Although it&#8217;s frighteningly more dangerous, sharing memory can also be done between kernel space and user space. For those of you wondering why on earth will they want to do this, the answer is simple &#8211; Performance ; Why copy memory when you can share it? For example, imagine the performance boost for a network appliance if its network interface doesn&#8217;t have to copy each and every packet&#8217;s data to user space, but only send its memory address to a user space network application that can read from the shared kernel memory section.</p>
<p>Defining a shared memory space between kernel space and user space is not much different from the usual user space shared memory, but as mentioned before, the possible damage is far greater in case of a bug.</p>
<p>Describing how to write a kernel module that shares some kernel memory is outside the scope of this article. However, a complete detailed example of exposing shared memory from the kernel for writing a network driver can be found in an article &#8220;Network Buffers and memory management&#8221; [3].</p>
<h2>The Registry Model</h2>
<p>SHM allows a simple management of a knowledge base to be shared between processes. The facilities supplied with SHM make it easy to implement the following model:<br />
Registry manager process loads the registry from file and maps it to SHM.<br />
Child / sibling processes can read / write entries with managed permissions.<br />
Registry manager saves desired entries to file (on every write for example).</p>
<p>Via SHM we can detect if we are the first running instance and manage permissions. If we combine this with a notification mechanism, we can create an efficient way to share structured data between processes. More on this in the referenced article.</p>
<h2>A More Advanced Model</h2>
<p>Sachin Agrawal and Swati P. Udas from IBM published an article that provides an infrastructure for shared memory in high level languages like C++. The referenced article [2] shows the use of SHM in object oriented environment, as well as a system for IPC.</p>
<h2>References</h2>
<ul>
<li><a href="http://www.opengroup.org/onlinepubs/007908799/xsh/mmap.html">Using mmap</a></li>
<li><a href="http://www-128.ibm.com/developerworks/linux/library/l-syncevent.html?ca=dgr-lnxw01LinuxMemory">Handle synchronous events from shared objects in Linux</a></li>
<li><a href="http://www.linuxjournal.com/article/1312">Network Buffers and Memory Management, By Alan Cox</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/reviews/11/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Optimizing Singletons With Double Checks</title>
		<link>http://www.codemaestro.com/reviews/10</link>
		<comments>http://www.codemaestro.com/reviews/10#comments</comments>
		<pubDate>Sat, 13 May 2006 14:16:07 +0000</pubDate>
		<dc:creator>Ofer Kapota</dc:creator>
				<category><![CDATA[Reviews]]></category>

		<guid isPermaLink="false">http://beta.codemaestro.com/?p=10</guid>
		<description><![CDATA[Writing a thread safe singleton has its difficulties. One of them is making sure that the dynamic creation of the inner singleton object is thread safe, while locking as few mutexes as possible.]]></description>
				<content:encoded><![CDATA[<p><b>Writing a thread safe singleton has its difficulties. One of them is making sure that the dynamic creation of the inner singleton object is thread safe, while locking as few mutexes as possible.</b></p>
<p>Let&#8217;s first look at a naive Instance static member function:</p>
<pre>
MyClass* MyClass::Instance() 
{ 
    if (!pInstance) { 
        pInstance = new .... 
    }
    return pInstance; 
}
</pre>
<p>Obviously, it is not thread safe. Two threads can cause the instance to be created twice. A natural solution is just to use a mutex:</p>
<pre>
MyClass* MyClass::Instance() 
{ 
    // Mutex is freed when the local lock object is destructed
    mutex lock("Mutex0");
    
    if (!pInstance) { 
        pInstance = new .... 
    }
    return pInstance; 
}
</pre>
<p>Assuming that MyClass::Instance() is called very frequently, the last implementation will slow down the execution, since locking (especially a mutex) can be very very slow. However, locking is only needed for the instance&#8217;s creation, and not for every time we are trying to get the instance. Following is a very simple improvement:</p>
<pre>
MyClass* MyClass::Instance() 
{ 
    if (!pInstance) {
        mutex lock("Mutex0");
    
        if (!pInstance) { 
            pInstance = new .... 
        }
    }
    return pInstance; 
}
</pre>
<p>Every caller who enters the function after pInstance was allocated will <strong>not</strong> lock the mutex. Callers who enter the function before the instance was created are still synchronized with the mutex.</p>
<h2>References</h2>
<ul>
<li><a href="http://www.cs.wustl.edu/~schmidt/ACE.html">An article about this issue by Douglas Schmidt, 1996</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/reviews/10/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Implementing Callback Functions Using Delegates in C++</title>
		<link>http://www.codemaestro.com/articles/15</link>
		<comments>http://www.codemaestro.com/articles/15#comments</comments>
		<pubDate>Fri, 16 Dec 2005 23:33:31 +0000</pubDate>
		<dc:creator>Slavik Birger</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://beta.codemaestro.com/?p=15</guid>
		<description><![CDATA[Nice C++ object oriented code basically means writing everything in objects/classes. However, not everything we need in programming is classes. This article concentrates on callback functions, and presents a C++ implementation of delegates which doesn't use pointers to functions.]]></description>
				<content:encoded><![CDATA[<p><b>Nice C++ object oriented code basically means writing everything in objects/classes. However, not everything we need in programming is classes. This article concentrates on callback functions, and presents a C++ implementation of delegates which doesn&#8217;t use pointers to functions.</b></p>
<p>It&#8217;s not surprising that callback functions will usually look ugly in C++ code. Pointers to member functions are usually a mess. It is mainly because they need the &#8220;this&#8221; pointer as a parameter. Observe the following example:</p>
<pre>
class A  {
public:
    int func ();
};

...

int (A::*pmf)();
pmf = &#038;A::func;
A a;
A *pa = &#038;a;
(a.*pmf)();
</pre>
<p>C++ was meant to be nicer than this, wasn&#8217;t it?</p>
<h2>Implementing Delegates in C++</h2>
<p>If you&#8217;re familiar with C# programming, one of the new bright ideas in C# is &#8220;Delegates&#8221;. If you get to the bottom of it, delegates can actually be easily implemented in C++ using regular classes, as shown below.</p>
<p>As a rule of thumb, use the ideas of the following code whenever you want to implement a callback function in C++ and you won&#8217;t get disappointed.</p>
<p>Implementing a C++ delegates consist of a few simple tasks. The following example illustrates how to create and use a delegate of a function that takes string and returns void.</p>
<p><strong>1. </strong>Declare a prototype of a &#8220;pointer to a function that takes string and returns void&#8221; as a pure virtual class with one member function:</p>
<pre>
class StringDelegate
{
public:
      virtual void runTheFunction(string params) = 0;
};
</pre>
<p><strong>2. </strong>Implement your specific callback function</p>
<p>The callback function&#8217;s implementation is a class that inherits from StringDelegate. Optionally, it would be nicer if it contained some kind of a &#8220;Runner&#8221; class that is responsible to handle the received data. The code follows:</p>
<pre>
class OurDelegate : public StringDelegate
{
public:
   void runTheFunction(string data); // Implementation!
   OurDelegate(Runner&#038; runner); // The constructor should get the runner
 private:
   OurDelegate(); // No default constructor
   Runner m_runner;
};

// The constructor
OurDelegate::OurDelegate(Runner&#038; runner):m_runner(runner)
{
}

// The actual implementation
void OurDelegate::runTheFunction(string data)
{
   m_runner.run(data);
}

Now we can write the code that's calling our "callback function":

void callme(StringDelegate sd)
{
     sd.runTheFunction("Tralala");
}
</pre>
<p>Running the delegates is simple:</p>
<pre>
callme(OurDelegate(runner));
</pre>
<p><strong>No pointers needed! Now that&#8217;s C++. </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.codemaestro.com/articles/15/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
