<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Go Parallel &#187; Verify</title>
	<atom:link href="http://goparallel.sourceforge.net/verify/feed/" rel="self" type="application/rss+xml" />
	<link>http://goparallel.sourceforge.net</link>
	<description>Translating Multicore Power into Application Performance</description>
	<lastBuildDate>Wed, 19 Jun 2013 02:15:36 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Vectorization, The Other Parallelism</title>
		<link>http://goparallel.sourceforge.net/vectorization-the-other-parallelism/</link>
		<comments>http://goparallel.sourceforge.net/vectorization-the-other-parallelism/#comments</comments>
		<pubDate>Thu, 02 May 2013 16:10:33 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3151</guid>
		<description><![CDATA[&#160; Moore&#8217;s Law correctly predicted that processor transistor density would increase year after year. All those extra transistors have been used to make powerful hardware capabilities like multiple cores and hardware extensions that improve performance. Intel processors have extensions that support SIMD (single instruction, multiple data) parallelism with Intel&#174; SSE and Intel&#174; AVX(2). These instructions [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>Moore&#8217;s Law correctly predicted that processor transistor density would increase year after year. All those extra transistors have been used to make powerful hardware capabilities like multiple cores and hardware extensions that improve performance. Intel processors have extensions that support SIMD (single instruction, multiple data) parallelism with Intel&reg; SSE and Intel&reg; AVX(2). These instructions operate on a vector of data in parallel. The vector width, and therefore the number of elements that can be accessed in parallel, continues to expand as new processor technologies are introduced. Applications need to be vectorized to take advantage of SIMD instructions that utilize the expanded vector width.</p>
<p><a href="http://goparallel.sourceforge.net/wp-content/uploads/2013/05/Intel-Vectorization.jpg" rel="wp-prettyPhoto[g3151]"><img class="aligncenter size-full wp-image-3152" title="Intel Vectorization" src="http://goparallel.sourceforge.net/wp-content/uploads/2013/05/Intel-Vectorization.jpg" alt="" width="459" height="292" /></a></p>
<p>All those extra transistors have been used to make powerful hardware capabilities like multiple cores and hardware extensions that improve performance. Intel processors have extensions that support SIMD (single instruction, multiple data) parallelism with Intel&reg; SSE and Intel&reg; AVX(2). These instructions operate on a vector of data in parallel.</p>
<p>There are some pretty easy ways to vectorize your code, such as&nbsp; autovectorization, that requires no changes to code. Next are using libraries that utilize both threading and vectorization to improve performance. For some applications, more advanced techniques may be needed to provide information to the compiler, including using special build logs and features such as Intel&reg; Cilk&trade; Plus.</p>
<p>A great place to start is by downloading the <a href="http://www.makebettercode.com/vectorization">Vectorization CodeBook</a>. You will find simple, yet powerful vectorization techniques that can be used by just about any application developer using Intel&reg; Compilers and libraries.</p>
<p>Learn lots more about vectorization at this great resource site <a href="http://software.intel.com/en-us/intel-vectorization-tools/">http://software.intel.com/en-us/intel-vectorization-tools/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/vectorization-the-other-parallelism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speedy Agile QA Testing</title>
		<link>http://goparallel.sourceforge.net/speedy-agile-qa-testing/</link>
		<comments>http://goparallel.sourceforge.net/speedy-agile-qa-testing/#comments</comments>
		<pubDate>Thu, 11 Apr 2013 14:22:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=3037</guid>
		<description><![CDATA[The time to find latent errors is early in agile development—not after you ship product. Read how the new Intel Inspector XE 2013 helps meet QA and developer demands and enables you to deliver reliable code to market faster, under tight deadlines.&#160; Download Intel Inspector XE 2013 one-sheeter here (pdf.)]]></description>
				<content:encoded><![CDATA[<p>The time to find latent errors is early in agile development—not after you ship product. Read how the new Intel Inspector XE 2013 helps meet QA and developer demands and enables you to deliver reliable code to market faster, under tight deadlines.&nbsp;</p>
<p><strong><a href="http://goparallel.sourceforge.net/wp-content/uploads/2013/04/7277_InspectorXE_OneSheet_v3.pdf" target="_blank">Download Intel Inspector XE 2013 one-sheeter here (pdf.)</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/speedy-agile-qa-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Real-World Verification, Done Fluidly</title>
		<link>http://goparallel.sourceforge.net/real-world-verification-done-fluidly/</link>
		<comments>http://goparallel.sourceforge.net/real-world-verification-done-fluidly/#comments</comments>
		<pubDate>Fri, 29 Mar 2013 00:42:47 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=2990</guid>
		<description><![CDATA[&#160; While it&#8217;s important to master the tools used in parallel development such as Parallel Composer, Parallel Amplifier and Parallel Inspector, it&#8217;s good now and then to step back and let the work of others inspire your work. &#160;Today we’ll see how a company called Flow Science used the full Intel&#174; Cluster Studio XE tool [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>While it&#8217;s important to master the tools used in parallel development such as Parallel Composer, Parallel Amplifier and Parallel Inspector, it&#8217;s good now and then to step back and let the work of others inspire your work. &nbsp;Today we’ll see how a company called Flow Science used the full Intel&reg; Cluster Studio XE tool suite to enhance the serial and parallel performance of its FLOW-3D app.&nbsp; &nbsp;</p>
<p><img class="size-full wp-image-2991 alignleft" title="Untitled" src="http://goparallel.sourceforge.net/wp-content/uploads/2013/03/Untitled3.jpg" alt="" width="300" height="236" /><strong>Beyond Theory and Technique&nbsp;</strong></p>
<p>In parallel development, verification is important. While developing parallel algorithms might at first seem easy, it’s skill that must be honed. Intel Parallel Studio helps out by making suggestions and verifying your parallel code is correct.</p>
<p>Here on <em>Go Parallel, </em>we’ve provided lots of information about using various tools to perform verification at different levels. For example, <a href="http://goparallel.sourceforge.net/verifying-with-parallel-amplifier/" target="_blank">in this blog</a> I showed how to use Parallel Amplifier to do verification, even though most people think of using Parallel Inspector. Both tools can be used for different aspects of the development process. Inspector is usually used for verification and <a href="http://intel.ly/Szxj80" target="_blank">memory leaks</a>, Amplifier for other types of verification.</p>
<p>It’s easy to get wrapped up in theory and technique. But how well do these tools perform in the real world? And how can we get new ideas about ways to maximize our development efforts.</p>
<p>When a big company like Intel publishes case studies, they’re usually targeted at management. As a part-time writer, I find myself reading a lot of case studies to gather facts and figures for articles. But as a full-time software engineer, I’ve found that reading cases really helps my own development.&nbsp; &nbsp;</p>
<p>So let’s look at a real case focusing on parallel verification. We’ll start with a look at the basic problems. (Impatient types can skip to the next section.)</p>
<p><strong>The challenge: Fluid Dynamics and Parallel Programming </strong></p>
<p>If you’ve taken any physics classes or, better, are fortunate to work in a physics laboratory, you’ve learned about fluid dynamics, the study of how fluids flow. But more specifically, it’s about how fluids flow through and around things, relevant to many fields including hydraulics, aerodynamics, and even climatology and meteorology (since the Earth’s atmosphere is a gaseous fluid). The mathematics is pretty amazing, even using the “partial derivatives” we suffered in Calculus class.</p>
<p>Typically fluid dynamics occurs in three dimensions. Modeling such systems takes advanced computing power. Consider, for example, a weather system. Depending on the method of calculation (because Earth’s atmosphere doesn’t end abruptly), the effective height of the atmosphere is 8.2 kilometers. For the U.S. alone, that means the atmosphere takes up about 80 million cubic kilometers. Suppose you want to model not just the entire U.S. atmosphere, but the motions at point at every cubic meter. You’re dealing with an enormous amount of data. The points alone are 8.0 x 10^<sup>16</sup> data positions, and that’s just positions. Add to that complete calculations at each point. It’s pretty obvious you’ll need more than a simple quad-core processor for this kind of work.</p>
<p>Similarly, if you’re dealing with a small enclosure with fluid flowing say 1 meter wide, deep, and tall. You’re dealing with 1 million date points if you’re only looking at the millimeter level. Switch to the micron, which is one millionth of a meter, and you’re dealing with 1&#215;10^<sup>27</sup> data points.</p>
<p>The calculations in fluid dynamics are hard enough. But now take the established algorithms and split them out to run in parallel across 61 cores or more. Using full <a href="http://intel.ly/SXMKY3" target="_blank">vectorization</a>&nbsp;on each core, then things get messy very quickly. &nbsp;But persist and you’ll get code that works. But then we must ask: Does it use reduction correctly? Does it accurately pull in all data for the final results?</p>
<p><strong>Flow Science: Masterful Use of Verification Tools </strong></p>
<p>Now we look to the real world answer. A company called Flow Science Flow, which supports a worldwide customer base of commercial, academic, and government users, has created a fluid dynamics application using Parallel Studio (<a href="http://software.intel.com/en-us/articles/sdp-case-studies" target="_blank">see the full case study here</a>). Flow Science used the full Intel&reg; Cluster Studio XE tool suite to enhance the serial and parallel performance of its FLOW-3D app.&nbsp;</p>
<p>Although at Go Parallel we usually focus on the <a href=" http://intel.ly/PMfN3E " target="_blank">C++ Compiler</a>, Flow Science used the Intel <a href="http://intel.ly/VIB2jB" target="_blank">Fortran Compiler</a>. Parallel Studio’s verification tools were key. Building such a product was certainly no easy task, no matter how large the team of programmers and engineers. The code has to be right, and there are only so many tests you can perform. That’s where Parallel Inspector and Parallel Amplifier both come to save the day.</p>
<p>Challenge: The company’s customers face ever-larger data sets. And they continue  to demand accurate solutions in less time. Moreover, introduction of multicore architecture makes parallelization difficult, since the computational load keeps changing throughout the simulation.</p>
<p>So the initial challenge was to maintain the accuracy and consistency of results, while greatly reducing simulation time. The team decided to extend the shared memory parallelism of FLOW-3D to a hybrid MPI*-OpenMP* version.</p>
<p>They soon discovered that introducing a distributed memory approach made debugging difficult. Furthermore, once errors were corrected, obtaining good speedups or scaling on a higher number of cores was difficult. So the next challenge was addressing scalability and parallel performance.</p>
<p>Developers used the Intel&reg; MPI Library to enable distributed memory performance. By just switching runtime environment variables, company engineers and our customers have been able to achieve maximum interconnect performance. This feature enabled Flow Science to provide the Intel MPI runtime toolkit as part of the user installation for a seamless user experience.</p>
<p>For Flow Science, the primary benefit of using the Intel Cluster Studio XE suite was improved customer satisfaction due to better speedups for larger, more complex problems. Other benefits included reduced development time and costs. Bottom line: Flow Science enabled faster, more accurate simulation with Intel&reg; software development tools, delivering improved results even as customer data sets grow larger.</p>
<p><strong>Conclusion </strong></p>
<p>I doubt every the programmer on the fluid dynamics project had gigantic Xeon Phi processors for their desktop development machines, but that’s okay. With Parallel Studio, you can develop and debug on a simple quad-core, and then let the runtime automatically scale your program up to a Many Integrated Core (MIC) architecture. And with the help of the correct verification tests on your machine, you can be assured your code is correct. For real.</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/real-world-verification-done-fluidly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Benign Data Races: What Could Possibly Go Wrong?</title>
		<link>http://goparallel.sourceforge.net/benign-data-races-what-could-possibly-go-wrong/</link>
		<comments>http://goparallel.sourceforge.net/benign-data-races-what-could-possibly-go-wrong/#comments</comments>
		<pubDate>Thu, 10 Jan 2013 21:00:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Home]]></category>
		<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=2498</guid>
		<description><![CDATA[A data race occurs when two threads access the same variable concurrently and at least one of the accesses is a write. Data races are one of the most common and hardest to debug types of bugs in concurrent systems. Data races on complex data structures (like strings and hash maps) are undoubtedly harmful and [...]]]></description>
				<content:encoded><![CDATA[<p>A <em>data race</em> occurs when two threads access the same variable concurrently and at least one of the accesses is a write. Data races are one of the most common and hardest to debug types of bugs in concurrent systems. Data races on complex data structures (like strings and hash maps) are undoubtedly harmful and can lead crashes and memory corruption.</p>
<p style="text-align: center;"><img class=" wp-image-2499 aligncenter" title="Untitled" src="http://goparallel.sourceforge.net/wp-content/uploads/2013/01/Untitled.png" alt="" width="307" height="230" /></p>
<p>But what about “Benign” data races. Are they actually harmful? How do you suppress them?</p>
<p>Consider this &#8220;innocent&#8221; code:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="C++"><div class="devcodeoverflow"><ol><li><span style="color: #0000ff;">int</span> op_count<span style="color: #008080;">;</span></li><li>...</li><li><span style="color: #007788;">op_count</span><span style="color: #000040;">++</span><span style="color: #008080;">;</span>&nbsp;&nbsp;<span style="color: #666666;">// Executed by several threads, it’s OK if it’s not 100% precise.</li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p><strong><em><a href="http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong" target="_blank">To learn what could go wrong and what you can do, read the rest of the post here…</a></em></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/benign-data-races-what-could-possibly-go-wrong/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tracking Usability Bugs</title>
		<link>http://goparallel.sourceforge.net/tracking-usability-bugs/</link>
		<comments>http://goparallel.sourceforge.net/tracking-usability-bugs/#comments</comments>
		<pubDate>Mon, 10 Dec 2012 15:53:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=2337</guid>
		<description><![CDATA[A&#160;usability bug&#160;is any unintended behavior by the product, noticed by and impacting the user – and not in a good way. &#160;Find out how to track yours.&#160; Read more here.]]></description>
				<content:encoded><![CDATA[<p>A&nbsp;usability bug&nbsp;is any unintended behavior by the product, noticed by and impacting the user – and not in a good way. &nbsp;Find out how to track yours.&nbsp;</p>
<p><strong><a href="http://software.intel.com/en-us/blogs/2012/11/15/are-you-tracking-your-usability-bugs" target="_blank">Read more here.</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/tracking-usability-bugs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checklist for Programming Intel® Xeon Phi™ Coprocessors</title>
		<link>http://goparallel.sourceforge.net/checklist-programming-intel-xeon-phitm-coprocessors/</link>
		<comments>http://goparallel.sourceforge.net/checklist-programming-intel-xeon-phitm-coprocessors/#comments</comments>
		<pubDate>Tue, 20 Nov 2012 09:00:13 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=2234</guid>
		<description><![CDATA[&#160; Key tips for programming a high degree of parallelism, while using familiar programming methods and the latest Intel&#174; tools supporting the Intel&#174; Xeon Phi&#8482; coprocessor. Read the complete posting here.&#160;]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>Key tips for programming a high degree of parallelism, while using familiar programming methods and the latest Intel&reg; tools supporting the Intel&reg; Xeon Phi<strong>&trade;</strong> coprocessor.</p>
<p><strong><a href="http://goparallel.sourceforge.net/wp-content/uploads/2012/11/Asset-2-Checklist-for-Programming-.pdf" target="_blank">Read the complete posting here.&nbsp;</a></strong></p>
<p style="text-align: center;"><a href="http://goparallel.sourceforge.net/wp-content/uploads/2012/11/Asset-2-Checklist-for-Programming-.pdf"><img class="aligncenter  wp-image-2235" title="2" src="http://goparallel.sourceforge.net/wp-content/uploads/2012/11/2.png" alt="" width="540" height="398" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/checklist-programming-intel-xeon-phitm-coprocessors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pointer Checker: Easily Catch Out-of-Bounds Memory Accesses</title>
		<link>http://goparallel.sourceforge.net/pointer-checker-easily-catch-out-of-bounds-memory-accesses/</link>
		<comments>http://goparallel.sourceforge.net/pointer-checker-easily-catch-out-of-bounds-memory-accesses/#comments</comments>
		<pubDate>Mon, 10 Sep 2012 18:59:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=1765</guid>
		<description><![CDATA[This article introduces a powerful new feature called Pointer Checker, which precisely and easily isolates elusive bugs in programs. Found in the Intel&#174; C++ Composer XE 2013 product, its integration into the compiler adds powerful functionality in a way that slides seamlessly into build systems. Clever implementation and powerful error reporting provide precise information about [...]]]></description>
				<content:encoded><![CDATA[<p>This article introduces a powerful new feature called Pointer Checker, which precisely and easily isolates elusive bugs in programs. Found in the Intel&reg; C++ Composer XE 2013 product, its integration into the compiler adds powerful functionality in a way that slides seamlessly into build systems. Clever implementation and powerful error reporting provide precise information about latent program defects. We are excited that during beta testing of this new feature, customers reported that this tool found numerous defects.</p>
<p><strong><a href="http://d3f8ykwhia686p.cloudfront.net/1live/intel/Intel_PUMag_Issue11_Pointer_Checker.pdf" target="_blank">Read the full article from Intel’s&nbsp;<em>Parallel Universe</em>&nbsp;magazine here.</a></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/pointer-checker-easily-catch-out-of-bounds-memory-accesses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tools that Boost .NET Apps Reliability and Performance</title>
		<link>http://goparallel.sourceforge.net/tools-boost-net-apps-reliability-performance/</link>
		<comments>http://goparallel.sourceforge.net/tools-boost-net-apps-reliability-performance/#comments</comments>
		<pubDate>Thu, 28 Jun 2012 11:00:54 +0000</pubDate>
		<dc:creator>gpmcarollo</dc:creator>
				<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=1382</guid>
		<description><![CDATA[&#160; .NET developers can boost application performance and increase the code quality and reliability needed for high-performance computing and enterprise applications with the help of tools like Intel&#174; Inspector XE and Intel&#174; VTune&#8482;&#160;Amplifier XE. Read the full article from Intel’s&#160;Parallel Universe&#160;magazine here. You can also visit The Parallel Universe archive to access past issues here.]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>.NET developers can boost application performance and increase the code quality and reliability needed for high-performance computing and enterprise applications with the help of tools like Intel&reg; Inspector XE and Intel&reg; VTune&trade;&nbsp;Amplifier XE.</p>
<p><a href="http://d3f8ykwhia686p.cloudfront.net/1live/intel/Intel_Issue10_ToolsThatBoostNETapps.pdf" target="_blank">Read the full article from Intel’s&nbsp;<em>Parallel Universe</em>&nbsp;magazine here.</a></p>
<p><a href="http://software.intel.com/en-us/articles/intel-parallel-universe-magazine/?wapkw=%28parallel+universe%29" target="_blank">You can also visit The Parallel Universe archive to access past issues here.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/tools-boost-net-apps-reliability-performance/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Verifying Parallelization of C++ Code with Parallel Inspector</title>
		<link>http://goparallel.sourceforge.net/verifying-parallelization-of-c-code-with-parallel-inspector/</link>
		<comments>http://goparallel.sourceforge.net/verifying-parallelization-of-c-code-with-parallel-inspector/#comments</comments>
		<pubDate>Thu, 10 May 2012 15:29:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=1060</guid>
		<description><![CDATA[When you start with a serial algorithm and convert it into a parallel algorithm, you want to use the tools in Intel Parallel Studio to do a full test on your running code to make sure there are no memory and thread errors. This is the “verify” stage of development, and where the Parallel Inspector [...]]]></description>
				<content:encoded><![CDATA[<p>When you start with a serial algorithm and convert it into a parallel algorithm, you want to use the tools in Intel Parallel Studio to do a full test on your running code to make sure there are no memory and thread errors. This is the “verify” stage of development, and where the Parallel Inspector tool can help.</p>
<p>Let’s suppose we have an algorithm that’s going to do a character-by-character comparison of two strings of the same length. It will build a third string of the same length as the other two, and fill in the letter T where the first two strings match, and a letter F where the first two strings differ. For example, given these two strings:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="C++"><div class="devcodeoverflow"><ol><li>“parallel”</li><li>“paxallxl”</li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>the function will detect differences at the third and seventh positions, and generate the string</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="C++"><div class="devcodeoverflow"><ol><li>“TTFTTTFT”</li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>A function like this is certainly a good candidate for a parallelism. Here’s a first take of the function in serial form:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="C++"><div class="devcodeoverflow"><ol><li>std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> finddiffs<span style="color: #008000;">&#40;</span>std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> s1, std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> s2<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>s1.<span style="color: #007788;">length</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #000040;">!</span><span style="color: #000080;">=</span> s2.<span style="color: #007788;">length</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> <span style="color: #0000ff;">return</span> <span style="color: #FF0000;">&quot;&quot;</span><span style="color: #008080;">;</span></li><li> <span style="color: #008000;">&#125;</span></li><li>&nbsp;</li><li> <span style="color: #0000ff;">int</span> len <span style="color: #000080;">=</span> s1.<span style="color: #007788;">length</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></li><li> std<span style="color: #008080;">::</span><span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> len <span style="color: #000080;">&lt;&lt;</span> std<span style="color: #008080;">::</span><span style="color: #007788;">endl</span><span style="color: #008080;">;</span></li><li> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span> x <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> <span style="color: #0000ff;">char</span><span style="color: #008000;">&#91;</span>len <span style="color: #000040;">+</span> <span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span></li><li> <span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000080;">&lt;</span>len<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>s1<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span> <span style="color: #000080;">==</span> s2<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> x<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'T'</span><span style="color: #008080;">;</span></li><li> <span style="color: #008000;">&#125;</span></li><li> <span style="color: #0000ff;">else</span> <span style="color: #008000;">&#123;</span></li><li> x<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'F'</span><span style="color: #008080;">;</span></li><li> <span style="color: #008000;">&#125;</span></li><li> <span style="color: #008000;">&#125;</span></li><li> x<span style="color: #008000;">&#91;</span>len<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #008080;">;</span></li><li>&nbsp;</li><li> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> result <span style="color: #000080;">=</span> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span><span style="color: #008000;">&#40;</span>x<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></li><li> <span style="color: #0000dd;">delete</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> x<span style="color: #008080;">;</span></li><li> <span style="color: #0000ff;">return</span> result<span style="color: #008080;">;</span></li><li><span style="color: #008000;">&#125;</span> </li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>The function takes two strings as parameters. It checks that their lengths are the same; if not, it just returns an empty string. Next, it allocates an array of characters of the size in question. It begins looping through the two strings, comparing character-by-character. With each iteration, if the two strings have matching characters, it fills a T in the target array; otherwise it fills in an F. It then adds a null-terminator ‘\0’ to the end of the array, and finally creates a new std::string instance from the array.</p>
<p>I wanted to find out how long a function like this takes before creating the parallel version. To do that, I used the timing tools in Parallel Amplifier. Because I wanted to really test it out, I made two enormous strings (114 million characters each). After running the code, Parallel Amplifier told me the function took 333 milliseconds to run.</p>
<p>Converting this to a parallel function is easy; initially all we need to do is try changing our for loop to a cilk_for, like so:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="C++"><div class="devcodeoverflow"><ol><li>std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> finddiffs2<span style="color: #008000;">&#40;</span>std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> s1, std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> s2<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>s1.<span style="color: #007788;">length</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #000040;">!</span><span style="color: #000080;">=</span> s2.<span style="color: #007788;">length</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> <span style="color: #0000ff;">return</span> <span style="color: #FF0000;">&quot;&quot;</span><span style="color: #008080;">;</span></li><li> <span style="color: #008000;">&#125;</span></li><li>&nbsp;</li><li> <span style="color: #0000ff;">int</span> len <span style="color: #000080;">=</span> s1.<span style="color: #007788;">length</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></li><li> std<span style="color: #008080;">::</span><span style="color: #0000dd;">cout</span> <span style="color: #000080;">&lt;&lt;</span> len <span style="color: #000080;">&lt;&lt;</span> std<span style="color: #008080;">::</span><span style="color: #007788;">endl</span><span style="color: #008080;">;</span></li><li> <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span> x <span style="color: #000080;">=</span> <span style="color: #0000dd;">new</span> <span style="color: #0000ff;">char</span><span style="color: #008000;">&#91;</span>len <span style="color: #000040;">+</span> <span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span></li><li> cilk_for <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000080;">&lt;</span>len<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>s1<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span> <span style="color: #000080;">==</span> s2<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span></li><li> x<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'T'</span><span style="color: #008080;">;</span></li><li> <span style="color: #008000;">&#125;</span></li><li> <span style="color: #0000ff;">else</span> <span style="color: #008000;">&#123;</span></li><li> x<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'F'</span><span style="color: #008080;">;</span></li><li> <span style="color: #008000;">&#125;</span></li><li> <span style="color: #008000;">&#125;</span></li><li> x<span style="color: #008000;">&#91;</span>len<span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #008080;">;</span></li><li>&nbsp;</li><li> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> result <span style="color: #000080;">=</span> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span><span style="color: #008000;">&#40;</span>x<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></li><li> <span style="color: #0000dd;">delete</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> x<span style="color: #008080;">;</span></li><li> <span style="color: #0000ff;">return</span> result<span style="color: #008080;">;</span></li><li><span style="color: #008000;">&#125;</span></li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>Running the code again under Parallel Amplifier with the same 114-million-character strings on a processor with four cores shows an improvement. The time has been reduced to 77 milliseconds. But is this code correct? I used Parallel Inspector’s Inspect Threading Errors function, with Analysis Time set all the way to the top. The test seemed to go okay, except that a warning appeared:</p>
<p><!--DEVFMTCODE--><pre class="devcodeblock" title="C++"><div class="devcodeoverflow"><ol><li>Warning<span style="color: #008080;">:</span> Cross<span style="color: #000040;">-</span>thread stack access</li></ol></div></pre><!--END_DEVFMTCODE--></p>
<p>A cross-thread stack access warning occurs when your threads are writing to data owned by other threads. We have exactly that situation happening here; we’re writing to the array that was allocated. This can sometimes be a problem, but not necessarily. It can especially be a problem if a loop relies on the results of previous loop, where “previous” refers to the previous one when running serially. In parallel, there’s no guarantee on the order that your loops run.</p>
<p>For example, if we have an array like we do in our example, but each thread reads the value of a different slot in the array, and that slot was calculated from a previous loop serially, then we would have a problem when running in parallel.</p>
<p>The fundamental test is this: Can the loops be run in a different order and still give you the same results? In our case, even though we’re sharing data among the threads (resulting in the warning), we can see that we’re not relying on other iterations. As such, we can run any iteration of our loop in any order and get the same results. And that means we can safely ignore this warning. (It is, after all, just a warning, not an error.)</p>
<p>In summary, we have successfully converted our serial algorithm to parallel, and we received an improvement in performance.</p>
<p>Have you run into similar verification situations when coding? Please tell me about it in the Comments section below.</p>
<p>&nbsp;_______________________________________________________________</p>
<p>Jeff Cogswell is a Geeknet contributing editor, and is the author of several tech books including <em>C++ All-In-One Desk Reference For Dummies</em>, <em>C++ Cookbook</em>, and <em>Designing Highly Useable Software</em>. A software engineer for over 20 years, Jeff has written extensively on many different development topics. An expert in C++ and JavaScript, he has experience starting from low-level C development on Linux, up through modern web development in JavaScript and jQuery, PHP, and ASP.NET MVC.&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/verifying-parallelization-of-c-code-with-parallel-inspector/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Games and Formal Verification</title>
		<link>http://goparallel.sourceforge.net/games-and-formal-verification/</link>
		<comments>http://goparallel.sourceforge.net/games-and-formal-verification/#comments</comments>
		<pubDate>Wed, 09 May 2012 13:19:11 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Verify]]></category>

		<guid isPermaLink="false">http://goparallel.sourceforge.net/?p=1066</guid>
		<description><![CDATA[Intel Cluster Studio XE 2012 (see http://goparallel.sourceforge.net/intel-cluster-studio-xe-2012-overview-from-intel-software-conference-2012/ and http://software.intel.com/en-us/articles/intel-cluster-studio-xe/) is the latest iteration of Intel’s development suite for HPC clusters. Its powerful, wizard-based code-commenting and code-generating workflow interfaces make counterintuitively simple the task of parallelizing applications efficiently on grand-scale cluster and supercomputing platforms, using Cilk Plus and Threaded Building Blocks to parallelize on local nodes, [...]]]></description>
				<content:encoded><![CDATA[<p>Intel Cluster Studio XE 2012 (see <a href="http://goparallel.sourceforge.net/intel-cluster-studio-xe-2012-overview-from-intel-software-conference-2012/">http://goparallel.sourceforge.net/intel-cluster-studio-xe-2012-overview-from-intel-software-conference-2012/</a> and <a href="http://software.intel.com/en-us/articles/intel-cluster-studio-xe/">http://software.intel.com/en-us/articles/intel-cluster-studio-xe/</a>) is the latest iteration of Intel’s development suite for HPC clusters. Its powerful, wizard-based code-commenting and code-generating workflow interfaces make counterintuitively simple the task of parallelizing applications efficiently on grand-scale cluster and supercomputing platforms, using Cilk Plus and Threaded Building Blocks to parallelize on local nodes, and highly configurable MPI (Message-Passing Interface) patterns to deploy across nodes in the cluster. Huge clusters are supported – presently up to a theoretical maximum of 90,000 nodes.</p>
<p>Applications employing several types of parallelism are intrinsically very complex, and can be hard to debug. So Intel offers tools like Intel Inspector XE – an upgrade to Thread Checker – that are designed for use early in development cycles, to analyze source code and resolve threading and memory errors – both predictively and in early, small-scale test runs. At the node level, tuning can be improved, and issues quickly resolved with Intel VTune Amplifier XE, which does profiling, hotspot identification, and lets you drill down to graphically and numerically assess locks and waits, thread balance, and transitions, enabling resolution to single lines of code and individual functions. For performing similar graphically enhanced analysis of applications on the target cluster, Intel has also introduced a tool called Intel Trace Analyzer &amp; Collector, which instruments and traces execution across clustered machines, monitors all the MPI traffic, and offers a wide range of drill-down tools to help fix bugs, identify bottlenecks, and optimize MPI pattern choices (e.g. blocking, non-blocking, etc.) for best performance.</p>
<p>For some mission critical applications, though – especially those aimed at healthcare, scientific, aerospace, military, and other markets where software faults can be catastrophic – further assurance is required, both that programs have been written according to specifications, and that they will perform reliably given all combinations of input data and conditions. This is the domain of “V&amp;V” – Validation and Verification. While a V&amp;V process exists around any coding project, ramping up criticality and scale necessitates more deliberate, formal, and reliable treatment.</p>
<p>Eventually – typically sooner in the domain of hardware than in software, at least today – you enter the zone of “formal verification:” a mathematical and logical discipline used in abstracting and describing systems (hardware, software, etc.) in ways that can be verified by theorem-proving. In effect, the program (or other system under test) becomes a theorem predicting its end-states, and proving this theorem complete and correct means the program works as advertised. A reasonable place to start exploring this area is a presentation by Intel’s John Harrison (http://www.cl.cam.ac.uk/~jrh13/slides/lics-22jun03.pdf) , delivered in 2003 at the Ottawa LICS conference, which shows clearly why programs can be hard to test, why assumptions about them may be very wrong, why testing with normal inputs provides no assurance that any other input won’t break the software, and why formal verification is thus useful. The presentation goes remarkably deeply (and with virtually no math) into explaining why formal verification is hard, and how it can be made easier.</p>
<p>One way Intel and others make it easier is by using automated and interactive theorem-proving via tools like HOL (which stands for Higher Order Logic) Light, a proof-checker written in CAML (Categorical Abstract Machine Language), a derivative of Standard Machine Language. CAML and its sibling MLs are functional languages that help you map formal logic into code and treat programs as proofs – the underlying notion that programs actually <em>are</em> proofs articulated as the famous Curry-Howard-Lambek correspondence, showing the three-way isomorphism among intuitionistic logic, typed lambda calculus, and Cartesian closed categories. CH correspondence has also recently been recommended as a method for partitioning search-spaces explored by genetic algorithms, where each node (called a “species”) is indexed by its Curry-Howard isomorphic proof. Now hold that thought.</p>
<p>The trouble is that – as software and the parallel components and systems it runs on get individually denser and more numerous – all this austere and beautiful math gets bigger and harder for machines to model and exhaustively verify, given finite resources and time, and with the understanding that most formal verification is carried out interactively, with highly trained mathematicians tweaking the theorem-proving software every step of the way. This is a huge problem for organizations like the military, who want to develop and run perfectly reliable systems at epic scales.</p>
<p>So, late last year, DARPA initiated a program, and created a community around research into CSFV – Crowd Sourced Formal Verification. Basically, the DARPA CSFV program is looking for ways to take units of software, turn them into collections of theorem templates in a logical calculus, then programmatically create games based on these templates, have people play these games to explore the search space and statistically prove the individual theorems, then compile back the results of many plays to make progress towards verifying the program property under evaluation. If it works for protein-folding, why not for software?</p>
<p>If you’re interested in joining the effort, check out the program homepage at <a href="http://www.darpa.mil/Our_Work/I2O/Programs/Crowd_Sourced_Formal_Verification_(CSFV).aspx">http://www.darpa.mil/Our_Work/I2O/Programs/Crowd_Sourced_Formal_Verification_(CSFV).aspx</a>. The Proposer’s Day Briefing PDF by program manager Dr. Drew Dean of DARPA provides a full explanation of the program, its goals, and the range of skills and capabilities required to participate (plenty of room in this program, by the way, for people who aren’t Turing Fellows – they also need webdevs and cool kids to make these games fun, after all). Anyone qualified can join the program’s online community.</p>
<p>Meanwhile, folks with more of a hunger to grasp the mathematical and logical connections between game theory and problems in formal verification should check out the elegant and erudite review modules (<a href="http://pub.ist.ac.at/gametheory/">http://pub.ist.ac.at/gametheory/</a>) for the course in Game Theory in Formal Verification, taught in 2010 by Krishnendu Chatterjee at the new Institute for Science and Technology in Klosterneuburg, Austria.</p>
<p>Do you have an interesting verification anecdote? I’d love to hear about it so please let me know in the Comments section below.</p>
<p>&nbsp;______________________________________________________________</p>
<p>John Jainschigg is a Geeknet contributing editor, and is CEO of World2Worlds, Inc., a digital agency focused on immersive technology and gaming. John’s initial intro to concurrency was via interrupt and re-entrancy programming at the assembler level on Z80 and 68000-based systems. He wrote concurrent, time-critical packet-switching applications on HP-UX RISC machines in the late 1980s, and since then has worked up and down the client-server stack in Java, C++, PHP, and other conventional and scripting languages, and more recently, in task-specific, state-based, radically concurrent languages like LSL.</p>
]]></content:encoded>
			<wfw:commentRss>http://goparallel.sourceforge.net/games-and-formal-verification/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
