This directory contains a simple example that sums values in a tree.

The example exhibits some speedup, but not a lot, because it quickly saturates the system bus on a multiprocessor. For good speedup, there needs to be more computation cycles per memory reference. The point of the example is to teach how to use the raw task interface, so the computation is deliberately trivial.

The performance of this example is better when objects are allocated by the scalable_allocator instead of the default "operator new". The reason is that the scalable_allocator typically packs small objects more tightly than the default "operator new", resulting in a smaller memory footprint, and thus more efficient use of cache and virtual memory. In addition, the scalable_allocator performs better for multi-threaded allocations.

System Requirements

For the most up to date system requirements, see the release notes.

Files
SerialSumTree.cpp
Sums sequentially.
SimpleParallelSumTree.cpp
Sums in parallel without any fancy tricks.
OptimizedParallelSumTree.cpp
Sums in parallel, using "recycling" and "continuation-passing" tricks. In this case, it is only slightly faster than the simple version.
common.h
Shared declarations.
main.cpp
Main program which parses command line options and runs the algorithm.
Makefile
Makefile for building the example.
Directories
msvs
Contains Microsoft* Visual Studio* workspace for building and running the example (Windows* systems only).
xcode
Contains Xcode* IDE workspace for building and running the example (macOS* systems only).

For information about the minimum supported version of IDE, see release notes.

Build instructions

General build directions can be found here.

Usage
tree_sum -h
Prints the help for command line options
tree_sum [n-of-threads=value] [number-of-nodes=value] [silent] [stdmalloc]
tree_sum [n-of-threads [number-of-nodes]] [silent] [stdmalloc]
n-of-threads is the number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the default.
number-of-nodes is the number of nodes in the tree.
silent - no output except elapsed time.
stdmalloc - causes the default "operator new" to be used for memory allocations instead of the scalable_allocator.
To run a short version of this example, e.g., for use with Intel® Parallel Inspector:
Build a debug version of the example (see the build instructions).
Run it with a small problem size and the desired number of threads, e.g., tree_sum 4 100000.

Up to parent directory
Legal Information

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
© 2019, Intel Corporation