<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Go Parallel &#187; Build</title>
	<atom:link href="http://goparallel.sourceforge.net/build/feed/" rel="self" type="application/rss+xml" />
	<link>http://goparallel.sourceforge.net</link>
	<description>Translating Multicore Power into Application Performance</description>
	<lastBuildDate>Thu, 23 May 2013 17:28:42 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>New Windows Parallel Dev Environment</title>
		<link>http://goparallel.sourceforge.net/new-windows-parallel-dev-environment/</link>
		<comments>http://goparallel.sourceforge.net/new-windows-parallel-dev-environment/#comments</comments>
		<pubDate>Wed, 22 May 2013 14:36:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3277</guid>
		<description><![CDATA[Intel is soliciting Windows workstation and server users to kick the tires on the beta-release of its Windows development environment for the 60-core Xeon Phi coprocessor. Buying a Xeon Phi co-processor card for your Xeon-based workstation or server gets you free access to all Intel’s parallel compilers, profilers, performance analyzers and debuggers, which run as [...]]]></description>
				<content:encoded><![CDATA[<p>Intel is soliciting Windows workstation and server users to kick the tires on the beta-release of its Windows development environment for the 60-core <span style="text-decoration: underline;"><a href="http://software.intel.com/en-us/mic-developer">Xeon Phi</a></span> coprocessor. Buying a Xeon Phi co-processor card for your <span style="text-decoration: underline;"><a href="http://www.intel.com/content/www/us/en/servers/server-products.html">Xeon</a></span>-based workstation or server gets you free access to all Intel’s <span style="text-decoration: underline;"><a href="http://intel.ly/UpKLdL">parallel</a></span> compilers, profilers, performance analyzers and debuggers, which run as plug-ins to Microsoft&#8217;s Visual Studio development environment.</p>
<p>&#8220;For the beta, we are allowing access to Intel&#8217;s parallel development tools even if new Xeon Phi buyers don&#8217;t, currently, own them,&#8221; explained James Reinders, Intel&#8217;s Software Products Director and Multi-core Evangelist. &#8220;But a lot of workstation users already own Parallel Studio for Windows, so all they have to do to access the Xeon Phi is make sure they are using the latest version.”</p>
<div id="attachment_3278" class="wp-caption alignleft" style="width: 650px"><a href="http://goparallel.sourceforge.net/wp-content/uploads/2013/05/Intel-Blog-4.jpg" rel="wp-prettyPhoto[g3277]"><img class="size-large wp-image-3278" alt="Intel Blog 4" src="http://goparallel.sourceforge.net/wp-content/uploads/2013/05/Intel-Blog-4-1024x764.jpg" width="640" height="477" /></a><p class="wp-caption-text">The configuration of Intel&#8217;s Many-Integrated-Core (MIC) Testing Lab for Windows OS</p></div>
<p>Intel unveiled the <span style="text-decoration: underline;"><a href="http://software.intel.com/en-us/mic-developer">Xeon Phi</a></span> 60-core coprocessor last year as the first member of its many-integrated core (<a href="http://software.intel.com/en-us/forums/intel-many-integrated-core/">MIC</a>) <span style="text-decoration: underline;"><a href="http://intel.ly/UpKLdL">parallel</a></span> processor architecture. Since then it has been marketing the Xeon Phi coprocessor almost exclusively to high-performance computer (HPC) users, who almost universally run the Linux operating system (OS).</p>
<p>&#8220;Today the Xeon Phi is mostly used by HPCs, which are already running highly parallel workloads in a Linux environment,” Reinders continued in an interview.” But now that Windows users are learning to take advantage of multi-core processors, they too can take advantage of the Xeon Phi&#8217;s massively parallel processors.&#8221;</p>
<p>With nearly a year of experience under its belt, Intel is confident its current beta-release is stable and that users will find they can get real work done using their <span style="text-decoration: underline;"><a href="http://software.intel.com/en-us/mic-developer">Xeon Phi</a></span> coprocessor with a Windows OS. In fact, Reinders estimates it will only be a few months before Intel is confident enough to do a general release of Parallel Studio XE for Windows.</p>
<p>&#8220;It is a beta, but we have a decade of experience in supporting Windows with our toolset, and we have gained a lot of experience supporting the Xeon Phi under Linux, which helped us in preparing the Windows release,&#8221; said Reinders. &#8220;However, we are releasing it as a beta to get feedback from users, who might even find a few bugs. Then probably this fall we will do a full release of our Xeon Phi toolset for Windows.&#8221;</p>
<p><b>Linux Inside Windows</b></p>
<p>The <span style="text-decoration: underline;"><a href="http://software.intel.com/en-us/mic-developer">Xeon Phi</a></span> co-processor card runs Linux&#8211;even when hosted on a Windows-based workstation or server&#8211;but the Windows user does not have to hassle with Linux, because it is downloaded to the card automatically when the user installs the device drivers for the Xeon Phi coprocessor. Thereafter, the programmer can offload any <span style="text-decoration: underline;"><a href="http://intel.ly/UpKLdL">parallel</a></span> program running on a Xeon host to the Xeon Phi coprocessor.</p>
<p>&#8220;The host Xeon processor can run Windows or Linux, but the Xeon Phi coprocessor card will still run Linux,&#8221; said Reinders. &#8220;However, our software allows Windows applications to off-load jobs to the coprocessor at any time.&#8221;</p>
<p>Intel expects Windows parallel programmers will make user of the new Xeon Phi coprocessor for applications in the fields of engineering, computer-aided design (CAD), virtual prototyping, modeling, stress analysis, life sciences, drug analysis, oil exploration, energy generation, financial analysis and big-data analytics.</p>
<p><span style="text-decoration: underline;"><a href="http://software.intel.com/en-us/intel-parallel-studio-xe-evaluation">Intel&#8217;s Parallel Studio XE </a></span>&nbsp;runs under Microsoft Windows 7 Enterprise SP1 (64-bit), Windows 8 Enterprise (64-bit), Windows Server 2008 R2 SP1 (64-bit) and Windows Server 2012 (64-bit). Reinders also claims that the programming techniques described in his new book, <span style="text-decoration: underline;"><a href="http://goparallel.sourceforge.net/book-review-intel-xeon-phi-coprocessor-high-performance-programming/">Intel Xeon Phi Coprocessor High-Performance Programming</a></span>, can be used by Windows programmers because its examples are in C++ or Fortran.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/new-windows-parallel-dev-environment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Pain Of Parallel Programming</title>
		<link>http://goparallel.sourceforge.net/the-pain-of-parallel-programming/</link>
		<comments>http://goparallel.sourceforge.net/the-pain-of-parallel-programming/#comments</comments>
		<pubDate>Thu, 02 May 2013 16:00:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3147</guid>
		<description><![CDATA[If you’ve ever heard about parallel programming it probably sounded like a painful endeavor. But why does parallelism hurt? And does it really have to? Those who have experience with parallel programming know&#160;it’s mostly true with “traditional” approaches that incorporate parallel constructs into mainstream programming languages like C++ or FORTRAN. In light of decades of [...]]]></description>
				<content:encoded><![CDATA[<p>If you’ve ever heard about parallel programming it probably sounded like a painful endeavor. But why does parallelism hurt? And does it really have to?</p>
<p>Those who have experience with parallel programming know&nbsp;it’s mostly true with “traditional” approaches that incorporate parallel constructs into mainstream programming languages like C++ or FORTRAN. In light of decades of research in parallel computing, this is an irritating situation &#8211; and more so now that multi-core systems have been mainstream for years.&nbsp;&nbsp;</p>
<p>You probably guessed already that I don’t think it needs to be that painful, but let’s defer that for the moment. So what it is that makes traditional approaches so painful?</p>
<p><strong>Pain is Implicit </strong></p>
<p>The pain is in what they require the programmer to think about and more importantly what they leave implicit. The programmer is required to explicitly state what computations should go in parallel.</p>
<p>However, the programmer must implicitly make sure that no race-conditions can occur; it is up to the developer to guarantee that the computations s/he declared as parallel do not depend on each other.</p>
<p>Without offering a way to specify/define dependences (if one computation depends on the other we say there is a dependence between the two) it is simply required (or assumed) that none exist. Prominent examples of tools for C/C++ that make you guarantee the non-existence of dependences are OpenMP, MPI, TBB and Cilk. Without becoming too philosophical,&nbsp; it should be obvious that proving the non-existence of something is very hard (and in some cases impossible). Not uncommonly this approach brings programmers to the verge of despair by making them think about what doesn’t exist.</p>
<p><strong>The Secret? Inversion</strong></p>
<p>That said, the key to less pain with parallelism becomes apparent. It’s not in syntax, more powerful parallel constructs, automatic parallelization or new programming languages. As long as dependences are implicit, getting parallelism won’t become significantly simpler.</p>
<p>However, “inverting” the problem simplifies it: it’s much easier to tell what exists than proving what doesn’t. If we define what can *<strong>not</strong>* go in parallel, e.g. which (dependent) computations need to go in order, then we also know what can go in parallel: everything else.</p>
<p>So why not just letting the programmer define the dependencies (or the required orderings) between computations? It is a simple thought process; simple mostly because the necessary information is known by the developer anyway: when writing a computation kernel the programmer knows what input/information it needs, so it is clear what it depends on. If all such dependences are declared, executing non-dependent pieces in parallel is an almost straight forward task.</p>
<p><strong>Make Relevant Information Explicit!</strong></p>
<p>In other words, the key is making the relevant information explicit rather than tediously working around implicit information. Serial languages (like C/C++) hide such information even though it is known to the programmer. They have no capability to express the necessary information directly and natively. Tools for automatic parallelization then try to uncover what’s not defined explicitly &#8211; in general that’s a futile endeavor.</p>
<p>&nbsp;So what’s needed is a program structure to explicitly express the dependences which are actually needed to find a semantically correct execution order of compute kernels. Even a runtime can take care of the parallelization and things like “parallel_for”, pipelining-constructs etc. become superfluous. Expressing the dependences avoids the requirement to make parallelism explicit or even to think about it but still exposes the available parallelism in the program.</p>
<p>Inverting the traditional approach of thinking about parallelism, e.g. thinking about what needs to be serial/ordered, makes parallelism not only easier but also uncovers more of it (because no explicit construct limits it to a specific type of parallelism).</p>
<p>There are different approaches to tackle this, like streaming and functional languages. In a series of articles I will introduce <a href="http://software.intel.com/en-us/articles/intel-concurrent-collections-for-cc/">Intel’s C++ library/runtime implementation of CnC</a>* which directly implements the above idea. Without specializing to a certain domain it provides purity in design and proven scalability from single-core to multi-cores systems up to clusters of multicore-workstations. I will demonstrate that the thought process is simple and even detecting errors is usually easy, because the runtime has the information to actually issue meaningful warnings and messages.</p>
<p>For the impatient: <a href="http://software.intel.com/en-us/articles/intel-concurrent-collections-for-cc/">Here is CnC’s homepage</a> with the free download, papers, talks and tutorials.&nbsp;</p>
<p><a href="http://software.intel.com/en-us/blogs/2013/01/31/about-the-pain-of-parallel-programming"><strong>http://software.intel.com/en-us/blogs/2013/01/31/about-the-pain-of-parallel-programming</strong></a><strong></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/the-pain-of-parallel-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel Prediction Retrains Neural Networks</title>
		<link>http://goparallel.sourceforge.net/parallel-prediction-retrains-neural-networks/</link>
		<comments>http://goparallel.sourceforge.net/parallel-prediction-retrains-neural-networks/#comments</comments>
		<pubDate>Wed, 01 May 2013 15:49:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3120</guid>
		<description><![CDATA[&#160; Here’s how it usually happens: Neural networks are trained, and an application uses them to make decisions. Unfortunately, this overlooks the real-life fact that circumstances change, invalidating network training. What makes the situation particularly serious is that retraining a network takes time and may not be practical for real-time applications. But the parallel processing [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p><a name="_GoBack"></a></p>
<p><span style="font-size: small;">Here’s how it usually happens: Neural networks are trained, and an application uses them to make decisions. Unfortunately, this overlooks the real-life fact that circumstances change, invalidating network training. What makes the situation particularly serious is that retraining a network takes time and may not be practical for real-time applications. But the parallel processing technology offers a solution that makes retraining neural networks in real time an option.</span></p>
<p>Two examples:</p>
<p>Imagine a robot scanning the ocean floor to gather geological data. It makes movement and navigation decisions based on trained neural networks. But if it encounters situations for which it was not trained, the robot&#8217;s behavior may be unpredictable, or downright wrong. If such a robot could retrain in real time, as do flesh and blood animals, it has a chance of surviving and completing the mission.</p>
<p>Let&#8217;s say that a financial trading program has neural networks that have been trained for all known trends. It predicts investment decisions based on the trained neural networks it contains. Here again, if a new trend surfaces, the predictions may be inaccurate and may result in a loss of investment capital. What this program needs is the ability to respond to new trends, and retrain itself so that predictions can be accurate.</p>
<p><strong>Demo Program for a Smarter Solution</strong></p>
<p>To illustrate a possible solution, I wrote a demonstration program using Intel parallel processing technology. The program generates four pre-calculated data sets that can be used to train and test neural networks. Four neural network objects are intended to dovetail with the parallel processing technology. The following methodology is used whenever a new data set is selected.</p>
<p><strong>1. A derived data set for training is created.</strong> This is important because neural networks, which are trained on sequential data, do not learn as well as those, which are trained on data that has been reordered. A network trained by sequential (or ordered) data is said to be over-fitted, and the results are usually less than optimal.</p>
<p><strong>2. Three neural networks are trained.</strong> The first is trained with only 100 training epochs so that it will be ready to use quickly. The reason for this is that in an authentic situation, even a poorly trained network is better than one that is trained for another purpose.</p>
<p><strong>3. The two other networks are then trained</strong> with progressively more thorough training, each with greater amounts of training epochs. In this way, a network will eventually emerge that makes good decisions.</p>
<p><strong>The Magic Sauce</strong></p>
<p>In order to train the three networks without abruptly stopping the program is to spin up parallel threads. The program uses <strong>cilk_spawn</strong>. Using it could not be easier, the following code shows how to spin up the three training methods.</p>
<p>cilk_spawn TrainFirst();</p>
<p>cilk_spawn TrainSecond();</p>
<p>cilk_spawn TrainThird();</p>
<p>The three functions are extremely simple. They each train a neural network object, and then assign the values to a neural network object, which is being used to draw the predictions on the application. The first trains with 100 epochs, the second with 200 epochs, and the third with 300 epochs. When you run the application you can easily see the user interface reflect the status of each neural network.</p>
<p><strong>Gotchas and Caveats</strong></p>
<p>I discovered a few gotchas and caveats when writing the app. The major one is that you cannot put any user interface code within the function that <strong>cilk_spawn</strong> references. If you do, the function call will block, meaning that the calling code won&#8217;t get control back until the spawned method finishes (or I should say the allegedly spawned method). What I ended up doing was creating some semaphore variables, which are examined from an independent thread. When the semaphore variables are set, then the independent thread can make the user interface calls.</p>
<p>Another warning is in order. I originally wrote the application with three consecutive uses of <strong>cilk_spawn</strong>. While this worked, the CPU usage on my six-core development machine went to the point of being sluggish and almost non responsive. I would suggest that you experiment with spawning simultaneous threads that perform CPU intensive tasks. You might find that it is not an issue, but you should make sure you have thoroughly tested your application. My final solution was to wait until each training function finished before launching another training function.</p>
<p>If you use <strong>cilk_spawn</strong> from within a spawned thread, you won&#8217;t gain any benefit. It is as if you simply performed a normal function call from within the spawned thread.</p>
<p><strong>The Results</strong></p>
<p>The application (which can be <a href="http://goparallel.sourceforge.net/wp-content/uploads/2013/04/4-April-Blog-9-ParallelNeuralNetworksForPrediction_Project-2.zip">downloaded </a><a href="http://goparallel.sourceforge.net/wp-content/uploads/2013/04/4-April-Blog-9-ParallelNeuralNetworksForPrediction_Project-2.zip">here</a>) allows users to select the data set, have the application retrain without missing a beat, and display a graphical representation as shown in Figure 1.</p>
<div id="attachment_3133" class="wp-caption aligncenter" style="width: 650px"><a href="http://goparallel.sourceforge.net/wp-content/uploads/2013/05/ParallelNeuralNetworkForPrediction.png" rel="wp-prettyPhoto[g3120]"><img class="size-large wp-image-3133" title="Parallel Neural Network For Prediction" src="http://goparallel.sourceforge.net/wp-content/uploads/2013/05/ParallelNeuralNetworkForPrediction-1024x365.png" alt="" width="640" height="228" /></a><p class="wp-caption-text">The application uses whichever neural network has the most thorough training to make predictions.</p></div>
<p>Now that I have explored neural networks within the parallel processing world, I am ready to look for new and innovative applications of neural networks. Now that real-time retraining is an option, the applications for neural networks have increased very dramatically.</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/parallel-prediction-retrains-neural-networks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software without coding – Parallel Boon?</title>
		<link>http://goparallel.sourceforge.net/software-without-coding-parallel-boon/</link>
		<comments>http://goparallel.sourceforge.net/software-without-coding-parallel-boon/#comments</comments>
		<pubDate>Thu, 25 Apr 2013 23:46:33 +0000</pubDate>
		<dc:creator>gpmcarollo</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3097</guid>
		<description><![CDATA[&#160; For the last eight years, the Programming Without Coding Technology project has been working on ways of developing, well, programming without coding. The initiative’s General-Purpose Visual Programming tool is aimed for clue-free newbies, expert developers and everyone in between. Interesting, but doubly so because of how the effort might quicken adoption of parallel programming.&#160; [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>For the last eight years, the <a href="https://sourceforge.net/projects/doublesvsoop/">Programming Without Coding Technology</a> project has been working on ways of developing, well, programming without coding. The initiative’s General-Purpose Visual Programming tool is aimed for clue-free newbies, expert developers and everyone in between. Interesting, but doubly so because of how the effort might quicken adoption of parallel programming.&nbsp; Read how.</p>
<p>First, a quick explanation. &nbsp;Project head Mahmoud Fayed laid out the basics in a <a href=" http://sourceforge.net/blog/guest-post-programming-without-coding-technology/?hubRefSrc=email#lf_comment=70232144" target="_blank">recent Source Forge guest post</a>:</p>
<p>&#8220;PWCT is not a Wizard for creating your application in 1-2-3 steps… A novice programmer can use PWCT to learn programming concepts like Data Structure, Control Structure, Programming Paradigm..etc. &nbsp;An expert programmer can use PWCT to develop large and/or complex software.&#8221;</p>
<p>He continues:</p>
<p>&#8220;Using PWCT, we developed a textual programming language Compiler and Virtual Machine without writing a single line of code where the programming process done using the PWCT visual components. This language called Supernova and it&#8217;s free-open source (<a href="http://supernova.sourceforge.net/">http://supernova.sourceforge.net</a>).   Many database, multi-media, network, AI, simulation &amp; math applications developed using PWCT.&#8221;</p>
<p>That sounded pretty interesting. So I contacted Fayed and asked what I hoped was not a dumb question: “Could the code produced by PWCT be parallelized?”</p>
<p>His rapid reply:</p>
<p>&#8220;Yes, PWCT comes with more than one VPL (Visual Programming Language) like HarbourPWCT, PythonPWCT, C#PWCT &amp; SupernovaPWCT and you can extended PWCT to support any textual programming language.&#8221;</p>
<p>And you can …</p>
<p>1 &#8211; Use Threads</p>
<p>2 &#8211; Use the Super Server Programming Paradigm (Embedded in HarbourPWCT) to develop network applications (Client-Server &amp; Distributed )</p>
<p>3 &#8211; Support new textual programming language for concurrent programming like ErLang.</p>
<p>&#8220;The idea of PWCT is to create new VPLs powerful as the textual programming languages we are using for programming tasks.  And we have 11 lessons to help developers extend PWCT.&#8221; &nbsp;<a href="http://sourceforge.net/blog/guest-post-programming-without-coding-technology/" target="_blank">You’ll find these at here.&nbsp;</a></p>
<p>So… insofar as Go Parallel is devoted to helping people, well, go parallel, this seems like good news, right? After all, one of the problems with programming in general &#8212; and parallel programming in particular – is the lack of programmers.</p>
<p>So what if some basic, workaday programming could be handled by talented-enough non-specialists? Wouldn&#8217;t that free more expert developers to focus on higher-value tasks, including parallel programming?</p>
<p>Taking the idea further, what if the “commodity” programs created by non-experts could be easily parallelized? That would be a good thing too, right?</p>
<p>I frankly don’t know if trying to parallelize non-coded PWCT programs would be worth the trouble. Or what the quality would be. Or how/if they could be handed off into an Intel software environment for parallelization.</p>
<p>This last point is important not just because Intel sponsors the <em>Go Parallel </em>site, published in partnership with Slashdot Media. But also because the company’s parallel programming suite and tools represent the gold standard. That’s a fact that can’t be ignored.</p>
<p>Still, the whole idea raises provocative questions. A few weeks ago at the annual Davos Economic Forum, WWW Demigod <a href="http://goparallel.sourceforge.net/tim-berners-lee-developers-freed-from-tedium-do-incredible-things/" target="_blank">Tim Berners-Lee issued a passionate call for ordinary users to become more literate in basic programming concepts and skills</a>.&nbsp;</p>
<p>PWCT seems to answer that call pretty well. Could it, or similiar approaches, end up climbing the evolutionary tree of programming and play (or adapt) well enough with deeply established apex companies like Intel? Or are we looking at a coding Cro-Magnon, at least where parallel is concerned?</p>
<p>I honestly don’t know the answer. I raise the question, because part of our site’s mission is “thought leadership. What’s <em>your</em> take? Please weigh in below. Not because I say so, but because Sir TB-L think it’s a topic worth whacking round.</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/software-without-coding-parallel-boon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a thread-safe C++ Singleton Instance with TBB</title>
		<link>http://goparallel.sourceforge.net/creating-a-thread-safe-c-singleton-instance-with-tbb/</link>
		<comments>http://goparallel.sourceforge.net/creating-a-thread-safe-c-singleton-instance-with-tbb/#comments</comments>
		<pubDate>Fri, 19 Apr 2013 14:35:38 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3069</guid>
		<description><![CDATA[&#160; Singleton instances have their place in programming, but can present a problem in multithreaded programming. You only want a single thread to create the initial instance and to perform the initialization. With the help of the atomic type in Threading Building Blocks, you can safely create your instance. Last time we met up, we [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p><span style="font-size: 16px;">Singleton instances have their place in programming, but can present a problem in multithreaded programming. You only want a single thread to create the initial instance and to perform the initialization. With the help of the atomic type in Threading Building Blocks, you can safely create your instance.</span></p>
<p>Last time we met up, we talked about the inner workings of the atomic template in <a href="http://intel.ly/ShS7Qw " target="_blank">Threading Building Blocks</a>. That template helps you with some high-performance mutexes and adding and swapping by making use of some assembler-level operations. These operations use what&#8217;s called fencing to make such operations possible. The end result is that you can create fully thread-safe classes that, for example, include a usage count.</p>
<p><strong>Quick Theory: Atomic Template</strong></p>
<p>When might this type of template might be useful? If you want to have some initialization take place only after an object is accessed the first time, for example, then some cleanup when the object is no longer used—all while being multi-thread, multi-core safe. The good news is that you don&#8217;t have to write a single line of assembler code. The template already has done that part for you.</p>
<p>Without such a template, you might use operating-system level mutexes. But then you end up taking a performance hit. By letting the template work at the assembler level, time spent is minimal.</p>
<p>Here&#8217;s the idea: When you need to increment a value, you want to grab the current value, add one to it, and store it back in. Without any kind of mutex in place, two competing threads might both grab the current value at almost the exact same time, thus getting the same value. They both add one to it, and store the new value back in. So instead of incrementing by two as it should, the value only increments by one, resulting in a bug. At the assembler level, the processor allows a single thread to grab the value, add on to it, and write it back on as an atomic operation (hence the name of the template, atomic). This fencing happens extremely fast, so the wait time for other threads is very minimal. That&#8217;s far more efficient than creating a mutex at the operating-system level.</p>
<p><strong>And Quick Practice…</strong></p>
<p>Let&#8217;s try it out. Remember, we&#8217;re dealing with templates, so we&#8217;ll be using the template as a starting point for our own class. We&#8217;ll create a singleton class. The first time the instance is needed, we&#8217;ll create the instance. To keep this simple, we won&#8217;t do a decrement or cleanup; feel free to try that part yourself and discuss it in the comments, and we&#8217;ll look at in the future if there&#8217;s interest.</p>
<p>Here&#8217;s some code that does it. This isn&#8217;t the only approach, but it&#8217;s one way that works:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="cpp"><div class="devcodeoverflow"><ol><li>class Singleton</li><li>{</li><li>&nbsp;&nbsp;&nbsp;&nbsp;static tbb::atomic&lt;Singleton *&gt; inst;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;int x;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;int y;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;Singleton(): x(10), y(20) { }</li><li>public:</li><li>&nbsp;&nbsp;&nbsp;&nbsp;int getX() {</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return x;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;}</li><li>&nbsp;&nbsp;&nbsp;&nbsp;int getY() {</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return y;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;}</li><li>&nbsp;&nbsp;&nbsp;&nbsp;static Singleton &amp;getInst() {</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (inst == 0) {</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Singleton* temp = new Singleton();</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (inst.compare_and_swap(temp, NULL) != NULL) {</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;delete temp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return *inst;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;}</li><li>};</li><li>tbb::atomic&lt;Singleton *&gt; Singleton::inst;</li><li>&nbsp;</li><li>int _tmain(int argc, _TCHAR* argv[])</li><li>{</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;cilk_for(int i=0; i&lt;8; i++) {</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Singleton&amp; s = Singleton.getInst();</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cout &lt;&lt; s.getX();</li><li>&nbsp;&nbsp;&nbsp;&nbsp;}</li><li>}</li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>Before diving into this code, I need to point out something about static initialization. In C++, the static line inside the class <em>defines</em> the member, but doesn&#8217;t actually declare the member. (In other words, it sets up a name for it, but doesn&#8217;t create the storage for it.) You create the storage for it after the class with a line like this:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="cpp"><div class="devcodeoverflow"><ol><li>tbb::atomic&lt;Singleton *&gt; Singleton::inst;</li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>The <a href="http://intel.ly/PMfN3E" target="_blank">C++ compiler</a> will then allocate the storage. But we have a bit of a problem. I&#8217;d like to initialize this, but I can&#8217;t. The reason is the static initialization treats the initialization as a call to a copy constructor (which doesn&#8217;t exist), instead of using the overloaded = operator to assign the value.</p>
<p>The solution is a bit odd, but it&#8217;s the officially sanctioned solution from Intel. The compiler initializes the data to 0, so that&#8217;s what we&#8217;ll go with. In fact, that&#8217;s exactly what we want, an initial value of 0 (which is the same as NULL). So it works as is, without us initializing the data. It seems a little odd, and for purists it might seem outright dangerous, but we&#8217;re fine. We have a guarantee from Intel that their compiler will initialize it to 0.</p>
<p>Now for the important part of the code. If the instance exists, we just return it in the getInst function. If it doesn&#8217;t exist, we attempt to create it. This is where the thread-safe part comes in. We don&#8217;t just create it and shove it into the private inst variable, because there&#8217;s a slight possibility that another thread could be doing the same thing at the same time. Instead, we create an instance and save it in a temporary variable. Then in a single atomic operation, we check if the private inst variable is 0, and if so, save the new instance into the variable. We use the compare_and_swap function to do that. In the event another thread managed to create the instance in between the time it took to check if the inst variable is 0, and when we run the compare_and_swap, we&#8217;ll be safe; that other thread did it in an atomic operation, and basically beat us to it. So in that case, we&#8217;ll get back a non-NULL value, in which case we just delete the temporary variable.</p>
<p>This means you won&#8217;t want to do any initialization in the constructor, of course; put that in a separate function, add an else statement and do it there.&nbsp;</p>
<p>In the end, you&#8217;ll have a singleton instance that you can access in your threads with only a single thread guaranteed to create the final instance used.</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/creating-a-thread-safe-c-singleton-instance-with-tbb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel Intros Free Dev Platform for HTML5</title>
		<link>http://goparallel.sourceforge.net/intel-intros-free-dev-platform-for-html5/</link>
		<comments>http://goparallel.sourceforge.net/intel-intros-free-dev-platform-for-html5/#comments</comments>
		<pubDate>Thu, 18 Apr 2013 13:32:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3063</guid>
		<description><![CDATA[At this week’s Intel Developer Forum (IDF) in Beijing, Intel launched the Intel HTML5 App Development Environment. The company says the integrated system will help developers more quickly and economically create, test, debug and deploy applications across multiple operating systems including iOS, Android, Windows 8 and Windows Phone 8.&#160; Learn more here.&#160;]]></description>
				<content:encoded><![CDATA[<p>At this week’s Intel Developer Forum (IDF) in Beijing, Intel launched the <a href="http://software.intel.com/en-us/html5" target="_blank">Intel HTML5 App Development Environment</a>. The company says the integrated system will help developers more quickly and economically create, test, debug and deploy applications across multiple operating systems including iOS, Android, Windows 8 and Windows Phone 8.&nbsp;</p>
<p><strong><a href="http://software.intel.com/en-us/html5" target="_blank">Learn more here.&nbsp;</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/intel-intros-free-dev-platform-for-html5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Intel OpenCL Developers Kit 2013</title>
		<link>http://goparallel.sourceforge.net/new-intel-opencl-developers-kit-2013/</link>
		<comments>http://goparallel.sourceforge.net/new-intel-opencl-developers-kit-2013/#comments</comments>
		<pubDate>Wed, 10 Apr 2013 20:50:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3028</guid>
		<description><![CDATA[The Intel&#174; Software Development Kit (SDK) for OpenCL Applications 2013 is now available as a Free download.&#160; The new SDK supports the OpenCL 1.2 standard on 3rd-and future 4th-generation Intel&#174; Core&#8482; processors running Microsoft Windows* 7 and 8 operating systems. Your application can get better performance and improved battery life from the combination of OpenCL [...]]]></description>
				<content:encoded><![CDATA[<p>The Intel&reg; Software Development Kit (SDK) for OpenCL Applications 2013 is now available as a <a href="http://software.intel.com/en-us/vcsource/tools/opencl-sdk-2013" target="_blank">Free download</a>.&nbsp; The new SDK supports the OpenCL 1.2 standard on 3<sup>rd</sup>-and future 4<sup>th</sup>-generation Intel&reg; Core&trade; processors running Microsoft Windows* 7 and 8 operating systems. Your application can get better performance and improved battery life from the combination of OpenCL general-purpose programing coupled with the hardware acceleration capability of Intel HD Graphics on low-power Intel Core platforms.&nbsp;The SDK is ideal for content creation applications like video editing, music creation, and photo editing.&nbsp;</p>
<p><strong><a href="http://software.intel.com/en-us/vcsource/tools/opencl-sdk-2013" target="_blank">Download the&nbsp;Intel&reg; Software Development Kit (SDK) for OpenCL Applications here.&nbsp;</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/new-intel-opencl-developers-kit-2013/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>C++ Atomic Template: Inner workings in TBB</title>
		<link>http://goparallel.sourceforge.net/c-atomic-template-inner-workings-in-tbb/</link>
		<comments>http://goparallel.sourceforge.net/c-atomic-template-inner-workings-in-tbb/#comments</comments>
		<pubDate>Fri, 05 Apr 2013 19:58:13 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Build]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3013</guid>
		<description><![CDATA[&#160; Threading Building Blocks includes a template class called atomic that helps you accomplish some atomic-level tasks. Under the hood these tasks are actually implemented at the processor level using assembly language. But you don&#8217;t need to do any assembly programming yourself. Let&#8217;s take a look at this interesting class.&#160; Buried down inside the Threading [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>Threading Building Blocks includes a template class called atomic that helps you accomplish some atomic-level tasks. Under the hood these tasks are actually implemented at the processor level using assembly language. But you don&#8217;t need to do any assembly programming yourself. Let&#8217;s take a look at this interesting class.&nbsp;</p>
<p>Buried down inside the <a href="http://intel.ly/ShS7Qw" target="_blank">Threading Building Blocks</a> documentation is a template class called atomic. At first glance it appears simple&nbsp; and nothing special, providing “read, write, fetch-and-add, fetch-and-store, and compare-and-swap.” Not much there, but in fact, it&#8217;s these are useful and highly optimized.</p>
<p>In my article about performing <a href="http://goparallel.sourceforge.net/prevent-deadlocks-with-lazy-initialization/" target="_blank">lazy initialization</a>, I briefly mentioned this little template. Let&#8217;s take a closer look at it and see why it&#8217;s optimized for parallel programming.</p>
<p><strong>Fencing Memory is Key</strong></p>
<p>At the heart of the atomic type is a technique called memory fencing. There are a lot of highly technical explanations across the web of what exactly memory fencing is. But for a quick explanation, just think of it as a forced ordering. Two operations run in separate cores, with one putting up a fence to block other operations from accessing memory until the first thread is finished. Typically this means doing a couple of operations that must happen together as a single, atomic operation. For example, a thread might need to read the contents of some memory, add one to it, and write it back into memory, with no other threads making the change. This can actually be accomplished at the assembler-level, essentially implementing a very low-level mutex.</p>
<p>Threading Building Blocks uses a minimal amount of assembly language to accomplish parallel tasks. The atomic template is one such place, making use of fencing features of the Intel processor. Down in the source code are a few directories with processor-specific assembly code for accomplishing fences at the assembler level. The header file atomic.h includes several macros that make calls to this assembly code. What this means is that the code used in the atomic template is about as optimized as it can get. So I encourage you to make use of this deceptively simple template.&nbsp;</p>
<p>The atomic template includes a member function called compare_and_swap. I briefly mentioned this function in the article about lazy initialization. Here’s more of the story:</p>
<p><strong>compare_and_swap</strong></p>
<p>When you look at the lazy initialization algorithms, specifically the one that doesn&#8217;t use mutexes, you might get a bit concerned that by not using mutexes, you could still end up with a race condition. In fact, you don&#8217;t, because template&#8217;s compare_and_swap function makes use of the assembly code that implements the fencing. (Check out the other blog for more info on what exactly compare_and_swap does.)</p>
<p>Intel processors provide built-in support for fencing through the use of a LOCK opcode that precedes certain other opcodes, including one called CMPCHG. This CMPCHG opcode does exactly what the compare_and_swap function does, but at the assembly level. TBB uses this opcode and adds in the LOCK instruction so that the compare_and_swap member function <em>runs at the assembly level with a single instruction</em>. And the LOCK opcode provides the hardware-level mutex we need. I&#8217;d say that&#8217;s pretty optimized! And it has minimal—if any—<a href="http://intel.ly/Szxj80" target="_blank">performance bottlenecks</a>.</p>
<p><strong>fetch_and_add</strong></p>
<p>Another function the atomic template provides is fetch_and_add. This is also implemented at the assembly level with a LOCK instruction, and with a single instruction with two opcodes that does the work.</p>
<p>The idea behind fetch_and_add is to retrieve the value of a variable, add something to it, and store it back in. This is important in parallel processing because you want the whole thing to happen without another thread intervening before the task is complete.</p>
<p>For example, if two threads are using the same variable, and both threads need to add to the variable, you want the final value to be the original value plus <em>both</em> added-on values. But if one thread grabs the value, and the other thread grabs it too, before the first thread can write the new value back, then the final value will be wrong.</p>
<p>A common example of this is in reference counting. When a thread needs to access a variable, and the variable needs to keep a reference count by adding one to an internal integer, you don&#8217;t want a race condition to cause the increment to be wrong. The great thing here is that the processor can handle this at the assembler level, meaning it&#8217;s fast and easy to do. That&#8217;s how the fetch_and_add function works, and why it’s a key part of the atomic template. Again, this one also uses the LOCK opcode to implement a mutex of sort.&nbsp;</p>
<p><strong>Assembly Level Locking = Great Thread Performance </strong></p>
<p>Next time we&#8217;ll look at some actual code examples of the atomic template, and how you can implement various techniques such as a reference counter. Since the atomic template makes uses of assembly level locking, it&#8217;s efficient and has excellent <a href="http://intel.ly/Qc5gim " target="_blank">thread performance</a>. And since it&#8217;s a template, you can use it to build your own types, resulting in code that&#8217;s both easy to use and highly optimized—all within a parallel environment.&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/c-atomic-template-inner-workings-in-tbb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenCL &#8211; Apply Parallel Compute Capabilities to 2D/3D Scenes</title>
		<link>http://goparallel.sourceforge.net/opencl-apply-parallel-compute-capabilities-to-2d3d-scenes/</link>
		<comments>http://goparallel.sourceforge.net/opencl-apply-parallel-compute-capabilities-to-2d3d-scenes/#comments</comments>
		<pubDate>Fri, 22 Mar 2013 21:38:45 +0000</pubDate>
		<dc:creator>gpmcarollo</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=2966</guid>
		<description><![CDATA[If you’re fortunate enough to attend the 2013 Game Developers Conference and Expo March 25-29 in San Francisco, don’t miss this session. Intel graphics SW aces will describe how to apply the heterogeneous parallel compute capabilities of standardized OpenCL into applications. Intel OpenCL lets applications utilize both CPU and GPU power while balancing the work [...]]]></description>
				<content:encoded><![CDATA[<p>If you’re fortunate enough to attend the 2013<a href="http://www.gdconf.com" target="_blank"> Game Developers Conference and Expo </a>March 25-29 in San Francisco, don’t miss <a href="http://schedule2013.gdconf.com/session-id/824196" target="_blank">this session</a>.</p>
<p>Intel graphics SW aces will describe how to apply the heterogeneous parallel compute capabilities of standardized OpenCL into applications.</p>
<p><img class="alignleft  wp-image-2968" title="Intel OpenCL" src="http://goparallel.sourceforge.net/wp-content/uploads/2013/03/Intel-OpenCL-SDK-300x254.jpg" alt="" width="216" height="183" />Intel OpenCL lets applications utilize both CPU and GPU power while balancing the work load between devices for better performance. At the same time, advanced OpenGL and DirectX interoperability extensions allow instant data sharing between rendering and compute domains, with efficient synchronization techniques. Intel OpenCL SDK provides various samples to learn from. Adding OpenCL code to an application gets even easier with CLU library.</p>
<p>We’ll resist the temptation to say “ make the scene.” But you should. If you can’t stay tuned here for session materials.</p>
<p>Explore the range of Intel OpenCL tools here: <a href="http://software.intel.com/en-us/vcsource/tools/opencl-sdk" target="_blank">http://software.intel.com/en-us/vcsource/tools/opencl-sdk</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/opencl-apply-parallel-compute-capabilities-to-2d3d-scenes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Train Software Engineers</title>
		<link>http://goparallel.sourceforge.net/how-to-train-software-engineers/</link>
		<comments>http://goparallel.sourceforge.net/how-to-train-software-engineers/#comments</comments>
		<pubDate>Fri, 22 Mar 2013 18:24:58 +0000</pubDate>
		<dc:creator>gpmcarollo</dc:creator>
				<category><![CDATA[Build]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[Home]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=2936</guid>
		<description><![CDATA[How do you build a successful curriculum for training new software engineers? This article takes a look at how to build an effective on-the-job training curriculum for new software engineers and developers. But be warned: one size does not fit all. As you design new training programs, you must take into account the skill level [...]]]></description>
				<content:encoded><![CDATA[<p>How do you build a successful curriculum for training new software engineers?</p>
<p><a href="http://slashdot.org/topic/bi/training-software-engineers-start-with-a-plan/" target="_blank">This article</a> takes a look at how to build an effective on-the-job training curriculum for new software engineers and developers. But be warned: one size does not fit all. As you design new training programs, you must take into account the skill level and background for each candidate.</p>
<p>Read the full article here: <a href="http://slashdot.org/topic/bi/training-software-engineers-start-with-a-plan/" target="_blank">http://slashdot.org/topic/bi/training-software-engineers-start-with-a-plan/</a></p>
<p><strong>About the Author</strong></p>
<p><em>Catherine has spent the last ten years working throughout engineering, including development, test, support, and product management. She focuses on agile team management and effective software delivery, building high-performance multi-functional teams that work effectively with business needs. Catherine&#8217;s projects also include non-dogmatic agile training for teams just starting up or looking to move to agile methods. Past experience includes an enterprise storage system, a tablet solution for restaurants, a mobile data synchronization platform, a marketing analytics platform, and several web-based applications.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/how-to-train-software-engineers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
