In this blog, we’re going to look at the issue presented in my last blog post (“Tracking Parallel Loops in Cilk Plus”) regarding the problem of writing data out to a stream and the data getting mixed together between threads. As an example, I want several threads to simultaneously write portions of XML. I want the final output to be valid XML, something like the below, but without the newlines and indentation (which I’m including here to make it easier to see what we’re building):
1 <doc>2 <loop>3 <totalcount>24</totalcount>4 <worker>1</worker>5 </loop>6 <loop> ... etc... </loop>7 </doc>
Here’s a first take at the code. I create a stringstream instance, and begin our loops, stuffing the XML snippets into the stringstream.
01 std::stringstream xml;02 xml << "<doc>";03 cilk_for (int i=0; i<10; i++) {04 xml << "<loop>" << std::endl;05 xml << "<totalcount>" << __cilkrts_get_total_workers() << "</totalcount>" << std::endl;06 xml << "<worker>" << __cilkrts_get_worker_number() << "</worker>" << std::endl;07 int accum = i;08 int multiplier = -1;09 for (int j = 0; j< 500000000; j++) {10 if (j < 100) {11 accum = accum + j;12 }13 else {14 accum = accum + (multiplier * j);15 multiplier = multiplier * -1;16 }17 }18 xml << "</loop>" << std::endl;19 }20 xml << "</doc>";
Normally, you wouldn’t include newlines in your XML because it doesn’t add anything other than bandwidth. But I included them here just so you can see more easily what we end up with:
01 <doc><loop>02 <totalcount>24</totalcount>03 <worker>8</worker>04 <loop>05 <totalcount>24</totalcount>06 <worker>6</worker>07 <loop>08 <totalcount>24</totalcount>09 <worker>3</worker>10 <loop>11 <loop>12 <totalcount>24</totalcount>13 <worker>2</worker>14 <loop>15 <totalcount>24</totalcount>16 <worker>5</worker>17 <loop>18 <totalcount>24</totalcount>19 <worker>7</worker>20 <loop>21 <totalcount>24</totalcount>22 <worker>1</worker>23 <totalcount>24</totalcount>24 <worker>0</worker>25 </loop>26 <loop>27 <totalcount>24</totalcount>28 <worker>3</worker>29 </loop>30 </loop>31 <loop>32 <totalcount>24</totalcount>33 <worker>5</worker>34 </loop>35 </loop>36 </loop>37 </loop>38 </loop>39 </loop>40 </loop>41 </doc>
In other words, we have a complete mess.
One solution would be to create a lock representing the stream, whereby each loop waits until the previous finishes. But then that defeats the purpose of doing this in parallel, causing each loop to run in sequence just like a regular old non-parallel program. Or, you could put a lock around each individual call that writes to the string, but that would take a lot of coding, especially in a large program.
There are better ways. The easiest is to give each loop its own string to fill in, and then push the constructed string into the main string like so:
1 cilk_for (int i=0; i<10; i++) {2 3 std::stringstream xmlchunk;4 5 xmlchunk << "<loop>" << std::endl;6 xmlchunk << "<totalcount>" << __cilkrts_get_total_workers() << "</totalcount>" << std::endl;7 xmlchunk << "<worker>" << __cilkrts_get_worker_number() << "</worker>" << std::endl;
… the middle is the same as before …
1 xmlchunk << "</loop>" << std::endl;2 xml << xmlchunk.str();3 4 }
This works. Here’s the first few lines of the resulting XML:
01 <doc><loop>02 <totalcount>24</totalcount>03 <worker>8</worker>04 </loop>05 <loop>06 <totalcount>24</totalcount>07 <worker>3</worker>08 </loop>09 <loop>10 <totalcount>24</totalcount>11 <worker>7</worker>12 </loop>
But is it the best way? In a real-world example where you’re dealing with more complex, sophisticated data, it might not be – especially if we’re pushing data out a pipe at the end of each loop, where we could still end up with contention. Let’s look at an alternate method that the Cilk Plus library allows for. Also, and this is important, technically we don’t know if this line is thread safe:
1 xml << xmlchunk.str();
If by chance two threads hit this line at exactly the same point, we could still end up with a mess.
Let’s take a different approach. Cilk Plus includes the concept of reducers, which are basically thread-safe global variables. Using one of these, we don’t need to bother with creating a new local variable with every iteration. Instead of declaring our xml variable as a string, we declare it as a reducer_string. Then we use the += operator, like so:
01 cilk::reducer_string xml;02 xml += "<doc>";03 cilk_for (int i=0; i<10; i++) {04 xml += "<loop>";05 xml += "<totalcount>";06 xml += intToString(__cilkrts_get_total_workers());07 xml += "</totalcount>";08 xml += "<worker>";09 xml += intToString(__cilkrts_get_worker_number());10 xml += "</worker>";
…the middle part is the same…
1 xml += "</loop>";2 3 }4 xml += "</doc>";5 6 std::cout << xml.get_value() << std::endl;
(The intToString function is a helper function I wrote that converts an integer to a string.) I took out the carriage returns, but running the result through an XML beautifier shows we get what we wanted:
1 <doc>2 <loop>3 <totalcount>24</totalcount>4 <worker>8</worker>5 </loop>6 <loop>7 <totalcount>24</totalcount>8 <worker>2</worker>9 </loop>
…etc.
Note that to retrieve the final string, I call get_value(). That performs the actual reduction, as well as retrieving the final string for us. Remember an important point here, however: The reducer function takes place after all the loops run in parallel. We’re assuming that running the loop in sequence would take longer than running the reducer function. If not, then again we’re defeating the purpose. (In general, the reducer functions that come with Cilk Plus are highly optimized, and indeed in our case they worked very quickly, giving us the parallel speed we wanted.)
As you can see, there are several factors that you need to consider when adding parallelism to your own programs. You’ll want to consider the time of each loop; the time of the all the loops together versus in parallel, and the time of any reductions that need to take place afterwards.
Jeff Cogswell is a Geeknet contributing editor, and is the author of several tech books including C++ All-In-One Desk Reference For Dummies, C++ Cookbook, and Designing Highly Useable Software. A software engineer for over 20 years, Jeff has written extensively on many different development topics. An expert in C++ and JavaScript, he has experience starting from low-level C development on Linux, up through modern web development in JavaScript and jQuery, PHP, and ASP.NET MVC.







My original comment seems to have vanished, so here's a retype in case anyone finds this.
re - non-convergent -- I was totally aware of the fact that Sum(-1)^n doesn't converge when I wrote this; in fact, that was intentional by design. I purposely wanted an example that would not converge so that I could run the code as long as I wanted to without it converging. Hence my choice of the -1^n.
And about the added bandwidth, no, I wouldn't personally be worried about it, but a lot of SourceForge readers would likely beat me up if I didn't say something about it. (You can't win either way. :-) )
Good Grief, CS student go back to math classes.
Sum (-1)^n n, is not convergent
Split you for loop so that you don't waste CPU cycles.
Which language reads for XML this? My code uses known trusted, (sometimes self written) XML parsers.
The XML reading syntax looks fantastic through, like++.
Halting/Looping, can strange XML,inf loop your code, side effect your code.
You're outputting xml and your worried about the added bandwidth of newlines?