# Optimizing HPX applications¶

## Performance counters¶

Performance Counters in HPX are used to provide information as to how well the runtime system or an application is performing. The counter data can help determine system bottlenecks and fine-tune system and application performance. The HPX runtime system, its networking, and other layers provide counter data that an application can consume to provide users with information of how well the application is performing.

Applications can also use counter data to determine how much system resources to consume. For example, an application that transfers data over the network could consume counter data from a network switch to determine how much data to transfer without competing for network bandwidth with other network traffic. The application could use the counter data to adjust its transfer rate as the bandwidth usage from other network traffic increases or decreases.

Performance Counters are HPX parallel processes which expose a predefined interface. HPX exposes special API functions that allow one to create, manage, read the counter data, and release instances of Performance Counters. Performance Counter instances are accessed by name, and these names have a predefined structure which is described in the section Performance counter names. The advantage of this is that any Performance Counter can be accessed remotely (from a different locality) or locally (from the same locality). Moreover, since all counters expose their data using the same API, any code consuming counter data can be utilized to access arbitrary system information with minimal effort.

Counter data may be accessed in real time. More information about how to consume counter data can be found in the section Consuming performance counter data.

All HPX applications provide command line options related to performance counters, such as the ability to list available counter types, or periodically query specific counters to be printed to the screen or save them in a file. For more information, please refer to the section HPX Command Line Options.

### Performance counter names¶

All Performance Counter instances have a name uniquely identifying this instance. This name can be used to access the counter, retrieve all related meta data, and to query the counter data (as described in the section Consuming performance counter data). Counter names are strings with a predefined structure. The general form of a countername is:

/objectname{full_instancename}/countername@parameters


where full_instancename could be either another (full) counter name or a string formatted as:

parentinstancename#parentindex/instancename#instanceindex


Each separate part of a countername (e.g. objectname, countername parentinstancename, instancename, and parameters) should start with a letter ('a''z', 'A''Z') or an underscore character ('_'), optionally followed by letters, digits ('0''9'), hyphen ('-'), or underscore characters. Whitespace is not allowed inside a counter name. The characters '/', '{', '}', '#' and '@' have a special meaning and are used to delimit the different parts of the counter name.

The parts parentinstanceindex and instanceindex are integers. If an index is not specified HPX will assume a default of -1.

### Two simple examples¶

An instance for a well formed (and meaningful) simple counter name would be:

/threads{locality#0/total}/count/cumulative


This counter returns the current cumulative number of executed (retired) HPX-threads for the locality 0. The counter type of this counter is /threads/count/cumulative and the full instance name is locality#0/total. This counter type does not require an instanceindex or parameters to be specified.

In this case, the parentindex (the '0') designates the locality for which the counter instance is created. The counter will return the number of HPX-threads retired on that particular locality.

Another example for a well formed (aggregate) counter name is:

/statistics{/threads{locality#0/total}/count/cumulative}/average@500


This counter takes the simple counter from the first example, samples its values every 500 milliseconds, and returns the average of the value samples whenever it is queried. The counter type of this counter is /statistics/average and the instance name is the full name of the counter for which the values have to be averaged. In this case, the parameters (the '500') specify the sampling interval for the averaging to take place (in milliseconds).

### Performance counter types¶

Every Performance Counter belongs to a specific Performance Counter type which classifies the counters into groups of common semantics. The type of a counter is identified by the objectname and the countername parts of the name.

/objectname/countername


At application start, HPX will register all available counter types on each of the localities. These counter types are held in a special Performance Counter registration database which can be later used to retrieve the meta data related to a counter type and to create counter instances based on a given counter instance name.

### Performance counter instances¶

The full_instancename distinguishes different counter instances of the same counter type. The formatting of the full_instancename depends on the counter type. There are two types of counters: simple counters which usually generate the counter values based on direct measurements, and aggregate counters which take another counter and transform its values before generating their own counter values. An example for a simple counter is given above: counting retired HPX-threads. An aggregate counter is shown as an example above as well: calculating the average of the underlying counter values sampled at constant time intervals.

While simple counters use instance names formatted as parentinstancename#parentindex/instancename#instanceindex, most aggregate counters have the full counter name of the embedded counter as its instance name.

Not all simple counter types require specifying all 4 elements of a full counter instance name, some of the parts parentinstancename, parentindex, instancename, and instanceindex) are optional for specific counters. Please refer to the documentation of a particular counter for more information about the formatting requirements for the name of this counter (see Existing HPX performance counters).

The parameters are used to pass additional information to a counter at creation time. They are optional and they fully depend on the concrete counter. Even if a specific counter type allows additional parameters to be given, those usually are not required as sensible defaults will be chosen. Please refer to the documentation of a particular counter for more information about what parameters are supported, how to specify them, and what default values are assumed (see also Existing HPX performance counters).

Every locality of an application exposes its own set of Performance Counter types and Performance Counter instances. The set of exposed counters is determined dynamically at application start based on the execution environment of the application. For instance, this set is influenced by the current hardware environment for the locality (such as whether the locality has access to accelerators), and the software environment of the application (such as the number of OS-threads used to execute HPX-threads).

### Using wildcards in performance counter names¶

It is possible to use wildcard characters when specifying performance counter names. Performance counter names can contain 2 types of wildcard characters:

• Wildcard characters in the performance counter type
• Wildcard characters in the performance counter instance name

Wildcard character have a meaning which is very close to usual file name wildcard matching rules implemented by common shells (like bash).

 Wildcard Description * This wildcard character matches any number (zero or more) of arbitrary characters. ? This wildcard character matches any single arbitrary character. [...] This wildcard character matches any single character from the list of specified within the square brackets.
 Wildcard Description * This wildcard character matches any locality or any thread, depending on whether it is used for locality#* or worker-thread#*. No other wildcards are allowed in counter instance names.

### Consuming performance counter data¶

You can consume performance data using either the command line interface or via the HPX application or the HPX API. The command line interface is easier to use, but it is less flexible and does not allow one to adjust the behaviour of your application at runtime. The command line interface provides a convenience abstraction but simplified abstraction for querying and logging performance counter data for a set of performance counters.

### Consuming performance counter data from the command line¶

HPX provides a set of predefined command line options for every application which uses hpx::init for its initialization. While there are much more command line options available (see HPX Command Line Options), the set of options related to Performance Counters allow one to list existing counters, query existing counters once at application termination or repeatedly after a constant time interval.

The following table summarizes the available command line options:

 Command line option Description --hpx:print-counter print the specified performance counter either repeatedly and/or at the times specified by --hpx:print-counter-at (see also option --hpx:print-counter-interval). --hpx:print-counter-reset print the specified performance counter either repeatedly and/or at the times specified by --hpx:print-counter-at reset the counter after the value is queried. (see also option --hpx:print-counter-interval). --hpx:print-counter-interval print the performance counter(s) specified with --hpx:print-counter repeatedly after the time interval (specified in milliseconds) (default:0 which means print once at shutdown). --hpx:print-counter-destination print the performance counter(s) specified with --hpx:print-counter to the given file (default: console)). --hpx:list-counters list the names of all registered performance counters. --hpx:list-counter-infos list the description of all registered performance counters. --hpx:print-counter-format print the performance counter(s) specified with --hpx:print-counter possible formats in csv format with header or without any header (see option --hpx:no-csv-header), possible values: csv (prints counter values in CSV format with full names as header) csv-short (prints counter values in CSV format with shortnames provided with --hpx:print-counter as --hpx:print-counter shortname,full-countername) --hpx:no-csv-header print the performance counter(s) specified with --hpx:print-counter and csv or csv-short format specified with --hpx:print-counter-format without header. --hpx:print-counter-at arg print the performance counter(s) specified with --hpx:print-counter (or --hpx:print-counter-reset) at the given point in time, possible argument values: startup, shutdown (default), noshutdown. --hpx:reset-counters reset all performance counter(s) specified with --hpx:print-counter after they have been evaluated)

While the options --hpx:list-counters and --hpx:list-counter-infos give a short listing of all available counters, the full documentation for those can be found in the section Existing HPX performance counters.

### A simple example¶

All of the commandline options mentioned above can be for instance tested using the hello_world_distributed example.

Listing all available counters hello_world_distributed --hpx:list-counters yields:

List of available counter instances (replace * below with the appropriate
sequence number)
-------------------------------------------------------------------------
/agas/count/allocate /agas/count/bind /agas/count/bind_gid


Providing more information about all available counters hello_world_distributed --hpx:list-counter-infos yields:

Information about available counter instances (replace * below with the
appropriate sequence number)
------------------------------------------------------------------------------
fullname: /agas/count/allocate helptext: returns the number of invocations of
the AGAS service 'allocate' type: counter_raw version: 1.0.0
------------------------------------------------------------------------------

------------------------------------------------------------------------------
fullname: /agas/count/bind helptext: returns the number of invocations of the
AGAS service 'bind' type: counter_raw version: 1.0.0
------------------------------------------------------------------------------

------------------------------------------------------------------------------
fullname: /agas/count/bind_gid helptext: returns the number of invocations of
the AGAS service 'bind_gid' type: counter_raw version: 1.0.0
------------------------------------------------------------------------------

...


This command will not only list the counter names but also a short description of the data exposed by this counter.

Note

The list of available counters may differ depending on the concrete execution environment (hardware or software) of your application.

Requesting the counter data for one or more performance counters can be achieved by invoking hello_world_distributed with a list of counter names:

hello_world_distributed \
--hpx:print-counter=/agas{locality#0/total}/count/bind


which yields for instance:

hello world from OS-thread 0 on locality 0
/agas{locality#0/total}/count/bind,1,0.212790,[s],11


The first line is the normal output generated by hello_world_distributed and has no relation to the counter data listed. The last two lines contain the counter data as gathered at application shutdown. These lines have 6 fields, the counter name, the sequence number of the counter invocation, the time stamp at which this information has been sampled, the unit of measure for the time stamp, the actual counter value, and an optional unit of measure for the counter value.

The actual counter value can be represented by a single number (for counters returning singular values) or a list of numbers separated by ':' (for counters returning an array of values, like for instance a histogram).

Note

The name of the performance counter will be enclosed in double quotes '"' if it contains one or more commas ','.

Requesting to query the counter data once after a constant time interval with this command line:

hello_world_distributed \
--hpx:print-counter=/agas{locality#0/total}/count/bind \
--hpx:print-counter-interval=20


yields for instance (leaving off the actual console output of the hello_world_distributed example for brevity):

threads{locality#0/total}/count/cumulative,1,0.002409,[s],22
agas{locality#0/total}/count/bind,1,0.002542,[s],9
agas{locality#0/total}/count/bind,2,0.023557,[s],10
agas{locality#0/total}/count/bind,3,0.038679,[s],10


The command --hpx:print-counter-destination=<file> will redirect all counter data gathered to the specified file name, which avoids cluttering the console output of your application.

The command line option --hpx:print-counter supports using a limited set of wildcards for a (very limited) set of use cases. In particular, all occurrences of #* as in locality#* and in worker-thread#* will be automatically expanded to the proper set of performance counter names representing the actual environment for the executed program. For instance, if your program is utilizing 4 worker threads for the execution of HPX threads (see command line option --hpx:threads) the following command line

hello_world_distributed \


will print the value of the performance counters monitoring each of the worker threads:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
hello world from OS-thread 3 on locality 0
hello world from OS-thread 2 on locality 0


The command --hpx:print-counter-format takes values csv and csv-short to generate CSV formatted counter values with header.

With format as csv:

hello_world_distributed \
--hpx:print-counter-format csv \


will print the values of performance counters in CSV format with full countername as header:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
39,93


With format csv-short:

hello_world_distributed \
--hpx:print-counter-format csv-short \


will print the values of performance counters in CSV format with short countername as header:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
cumulative,phases
39,93


With format csv and csv-short when used with --hpx:print-counter-interval:

hello_world_distributed \
--hpx:print-counter-format csv-short \
--hpx:print-counter-interval 5


will print the header only once repeating the performance counter value(s) repeatedly:

cum,phases
25,42
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
44,95


The command --hpx:no-csv-header to be used with --hpx:print-counter-format to print performance counter values in CSV format without any header:

hello_world_distributed \
--hpx:print-counter-format csv-short \


will print:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
37,91


### Consuming performance counter data using the HPX API¶

HPX provides an API allowing to discover performance counters and to retrieve the current value of any existing performance counter from any application.

### Retrieve the current value of any performance counter¶

Performance counters are specialized HPX components. In order to retrieve a counter value, the performance counter needs to be instantiated. HPX exposes a client component object for this purpose:

hpx::performance_counters::performance_counter counter(std::string const& name);


Instantiating an instance of this type will create the performance counter identified by the given name. Only the first invocation for any given counter name will create a new instance of that counter, all following invocations for a given counter name will reference the initially created instance. This ensures, that at any point in time there is always not more than one active instance of any of the existing performance counters.

In order to access the counter value (or invoking any of the other functionality related to a performance counter, like start, stop or reset) member functions of the created client component instance should be called:

// print the current number of threads created on locality 0
hpx::performance_counters::performance_counter count(
hpx::cout << count.get_value<int>().get() << hpx::endl;


Note

In the above example count.get_value() returns a future. In order to print the result we must append .get() to retrieve the value. You could write the above example like this for more clarity:

// print the current number of threads created on locality 0
hpx::performance_counters::performance_counter count(
hpx::future<int> result = count.get_value<int>();
hpx::cout << result.get() << hpx::endl;


### Providing performance counter data¶

HPX offers several ways by which you may provide your own data as a performance counter. This has the benefit of exposing additional, possibly application specific information using the existing Performance Counter framework, unifying the process of gathering data about your application.

An application that wants to provide counter data can implement a Performance Counter to provide the data. When a consumer queries performance data, the HPX runtime system calls the provider to collect the data. The runtime system uses an internal registry to determine which provider to call.

Generally, there two ways of exposing your own Performance Counter data: a simple, function based way and a more complex, but more powerful way of implementing a full Performance Counter. Both alternatives are described in the following sections.

### Exposing performance counter data using a simple function¶

The simplest way to expose arbitrary numeric data is to write a function which will then be called whenever a consumer queries this counter. Currently, this type of Performance Counter can only be used to expose integer values. The expected signature of this function is:

std::int64_t some_performance_data(bool reset);


The argument bool reset (which is supplied by the runtime system when the function is invoked) specifies whether the counter value should be reset after evaluating the current value (if applicable).

For instance, here is such a function returning how often it was invoked:

// The atomic variable 'counter' ensures the thread safety of the counter.
boost::atomic<std::int64_t> counter(0);

std::int64_t some_performance_data(bool reset)
{
std::int64_t result = ++counter;
if (reset)
counter = 0;
return result;
}


This example function exposes a linearly increasing value as our performance data. The value is incremented on each invocation, e.g. each time a consumer requests the counter data of this Performance Counter.

The next step in exposing this counter to the runtime system is to register the function as a new raw counter type using the HPX API function hpx::performance_counters::install_counter_type. A counter type represents certain common characteristics of counters, like their counter type name, and any associated description information. The following snippet shows an example of how to register the function some_performance_data which is shown above for a counter type named "/test/data". This registration has to be executed before any consumer instantiates and queries an instance of this counter type:

#include <hpx/include/performance_counters.hpp>

void register_counter_type()
{
// Call the HPX API function to register the counter type.
hpx::performance_counters::install_counter_type(
"/test/data",                                   // counter type name
&some_performance_data,                         // function providing counter data
"returns a linearly increasing counter value"   // description text (optional)
""                                              // unit of measure (optional)
);
}


Now it is possible to instantiate a new counter instance based on the naming scheme "/test{locality#*/total}/data" where * is a zero based integer index identifying the locality for which the counter instance should be accessed. The function hpx::performance_counters::install_counter_type enables to instantiate exactly one counter instance for each locality. Repeated requests to instantiate such a counter will return the same instance, e.g. the instance created for the first request.

If this counter needs to be accessed using the standard HPX command line options, the registration has to be performed during application startup, before hpx_main is executed. The best way to achieve this is to register an HPX startup function using the API function hpx::register_startup_function before calling hpx::init to initialize the runtime system:

int main(int argc, char* argv[])
{
// By registering the counter type we make it available to any consumer
// who creates and queries an instance of the type "/test/data".
//
// This registration should be performed during startup. The
// function 'register_counter_type' should be executed as an HPX thread right
// before hpx_main is executed.
hpx::register_startup_function(&register_counter_type);

// Initialize and run HPX.
return hpx::init(argc, argv);
}


Please see the code in [hpx_link examples/performance_counters/simplest_performance_counter.cpp..simplest_performance_counter.cpp] for a full example demonstrating this functionality.

### Implementing a full performance counter¶

Sometimes, the simple way of exposing a single value as a Performance Counter is not sufficient. For that reason, HPX provides a means of implementing full Performance Counters which support:

• Retrieving the descriptive information about the Performance Counter
• Retrieving the current counter value
• Resetting the Performance Counter (value)
• Starting the Performance Counter
• Stopping the Performance Counter
• Setting the (initial) value of the Performance Counter

Every full Performance Counter will implement a predefined interface:

//  Copyright (c) 2007-2018 Hartmut Kaiser
//

#if !defined(HPX_PERFORMANCE_COUNTERS_PERFORMANCE_COUNTER_JAN_18_2013_0939AM)
#define HPX_PERFORMANCE_COUNTERS_PERFORMANCE_COUNTER_JAN_18_2013_0939AM

#include <hpx/config.hpp>
#include <hpx/lcos/future.hpp>
#include <hpx/runtime/components/client_base.hpp>
#include <hpx/runtime/launch_policy.hpp>
#include <hpx/util/bind_front.hpp>

#include <hpx/performance_counters/counters_fwd.hpp>
#include <hpx/performance_counters/stubs/performance_counter.hpp>

#include <string>
#include <utility>
#include <vector>

///////////////////////////////////////////////////////////////////////////////
namespace hpx { namespace performance_counters
{
///////////////////////////////////////////////////////////////////////////
struct HPX_EXPORT performance_counter
: components::client_base<performance_counter, stubs::performance_counter>
{
typedef components::client_base<
performance_counter, stubs::performance_counter
> base_type;

performance_counter() {}

performance_counter(std::string const& name);

performance_counter(std::string const& name, hpx::id_type const& locality);

performance_counter(future<id_type> && id)
: base_type(std::move(id))
{}

performance_counter(hpx::future<performance_counter> && c)
: base_type(std::move(c))
{}

///////////////////////////////////////////////////////////////////////
future<counter_info> get_info() const;
counter_info get_info(launch::sync_policy,
error_code& ec = throws) const;

future<counter_value> get_counter_value(bool reset = false);
counter_value get_counter_value(launch::sync_policy,
bool reset = false, error_code& ec = throws);

future<counter_value> get_counter_value() const;
counter_value get_counter_value(launch::sync_policy,
error_code& ec = throws) const;

future<counter_values_array> get_counter_values_array(bool reset = false);
counter_values_array get_counter_values_array(launch::sync_policy,
bool reset = false, error_code& ec = throws);

future<counter_values_array> get_counter_values_array() const;
counter_values_array get_counter_values_array(launch::sync_policy,
error_code& ec = throws) const;

///////////////////////////////////////////////////////////////////////
future<bool> start();
bool start(launch::sync_policy, error_code& ec = throws);

future<bool> stop();
bool stop(launch::sync_policy, error_code& ec = throws);

future<void> reset();
void reset(launch::sync_policy, error_code& ec = throws);

future<void> reinit(bool reset = true);
void reinit(
launch::sync_policy, bool reset = true, error_code& ec = throws);

///////////////////////////////////////////////////////////////////////
future<std::string> get_name() const;
std::string get_name(launch::sync_policy, error_code& ec = throws) const;

private:
template <typename T>
static T extract_value(future<counter_value> && value)
{
return value.get().get_value<T>();
}

public:
template <typename T>
future<T> get_value(bool reset = false)
{
return get_counter_value(reset).then(
hpx::launch::sync,
util::bind_front(
&performance_counter::extract_value<T>));
}
template <typename T>
T get_value(launch::sync_policy, bool reset = false,
error_code& ec = throws)
{
return get_counter_value(launch::sync, reset).get_value<T>(ec);
}

template <typename T>
future<T> get_value() const
{
return get_counter_value().then(
hpx::launch::sync,
util::bind_front(
&performance_counter::extract_value<T>));
}
template <typename T>
T get_value(launch::sync_policy, error_code& ec = throws) const
{
return get_counter_value(launch::sync).get_value<T>(ec);
}
};

/// Return all counters matching the given name (with optional wildcards).
HPX_API_EXPORT std::vector<performance_counter> discover_counters(
std::string const& name, error_code& ec = throws);
}}

#endif


In order to implement a full Performance Counter you have to create an HPX component exposing this interface. To simplify this task, HPX provides a ready made base class which handles all the boiler plate of creating a component for you. The remainder of this section will explain the process of creating a full Performance Counter based on the Sine example which you can find in the directory examples/performance_counters/sine/.

The base class is defined in the header file [hpx_link hpx/performance_counters/base_performance_counter.hpp..hpx/performance_counters/base_performance_counter.hpp] as:

//  Copyright (c) 2007-2018 Hartmut Kaiser
//

#if !defined(HPX_PERFORMANCE_COUNTERS_BASE_PERFORMANCE_COUNTER_JAN_18_2013_1036AM)
#define HPX_PERFORMANCE_COUNTERS_BASE_PERFORMANCE_COUNTER_JAN_18_2013_1036AM

#include <hpx/config.hpp>
#include <hpx/performance_counters/counters.hpp>
#include <hpx/performance_counters/server/base_performance_counter.hpp>
#include <hpx/runtime/actions/component_action.hpp>
#include <hpx/runtime/components/component_type.hpp>
#include <hpx/runtime/components/server/component_base.hpp>

///////////////////////////////////////////////////////////////////////////////
//[performance_counter_base_class
namespace hpx { namespace performance_counters
{
template <typename Derived>
class base_performance_counter;
}}
//]

///////////////////////////////////////////////////////////////////////////////
namespace hpx { namespace performance_counters
{
template <typename Derived>
class base_performance_counter
: public hpx::performance_counters::server::base_performance_counter,
public hpx::components::component_base<Derived>
{
private:
typedef hpx::components::component_base<Derived> base_type;

public:
typedef Derived type_holder;
typedef hpx::performance_counters::server::base_performance_counter
base_type_holder;

base_performance_counter()
{}

base_performance_counter(hpx::performance_counters::counter_info const& info)
: base_type_holder(info)
{}

// Disambiguate finalize() which is implemented in both base classes
void finalize()
{
base_type_holder::finalize();
base_type::finalize();
}
};
}}

#endif


The single template parameter is expected to receive the type of the derived class implementing the Performance Counter. In the Sine example this looks like:

//  Copyright (c) 2007-2012 Hartmut Kaiser
//

#if !defined(PERFORMANCE_COUNTERS_SINE_SEP_20_2011_0112PM)
#define PERFORMANCE_COUNTERS_SINE_SEP_20_2011_0112PM

#include <hpx/hpx.hpp>
#include <hpx/util/interval_timer.hpp>
#include <hpx/lcos/local/spinlock.hpp>
#include <hpx/performance_counters/base_performance_counter.hpp>

#include <cstdint>

namespace performance_counters { namespace sine { namespace server
{
///////////////////////////////////////////////////////////////////////////
//[sine_counter_definition
class sine_counter
: public hpx::performance_counters::base_performance_counter<sine_counter>
//]
{
public:
sine_counter() : current_value_(0) {}
sine_counter(hpx::performance_counters::counter_info const& info);

/// This function will be called in order to query the current value of
/// this performance counter
hpx::performance_counters::counter_value get_counter_value(bool reset);

/// The functions below will be called to start and stop collecting
/// counter values from this counter.
bool start();
bool stop();

/// finalize() will be called just before the instance gets destructed
void finalize();

protected:
bool evaluate();

private:
typedef hpx::lcos::local::spinlock mutex_type;

mutable mutex_type mtx_;
double current_value_;
std::uint64_t evaluated_at_;

hpx::util::interval_timer timer_;
};
}}}

#endif


i.e. the type sine_counter is derived from the base class passing the type as a template argument (please see [hpx_link examples/performance_counters/sine/server/sine.hpp..sine.hpp] for the full source code of the counter definition). For more information about this technique (called Curiously Recurring Template Pattern - CRTP), please see for instance the corresponding Wikipedia article. This base class itself is derived from the performance_counter interface described above.

Additionally, a full Performance Counter implementation not only exposes the actual value but also provides information about

• The point in time a particular value was retrieved
• A (sequential) invocation count
• The actual counter value
• An optional scaling coefficient
• Information about the counter status

### Existing HPX performance counters¶

The HPX runtime system exposes a wide variety of predefined Performance Counters. These counters expose critical information about different modules of the runtime system. They can help determine system bottlenecks and fine-tune system and application performance.

 Counter type Counter instance formatting Description Parameters /agas/count/ where:  is one of the following: primary namespace services: route, bind_gid, resolve_gid, unbind_gid, increment_credit, decrement_credit, allocate, begin_migration, end_migration component namespace services: bind_prefix, bind_name, resolve_id, unbind_name, iterate_types, get_component_typename, num_localities_type locality namespace services: free, localities, num_localities, num_threads, resolve_locality, resolved_localities symbol namespace services: bind, resolve, unbind, iterate_names, on_symbol_namespace_event /total where:  is the name of the AGAS service to query. Currently, this value will be locality#0 where 0 is the root locality (the id of the locality hosting the AGAS service). The value for * can be any locality id for the following : route, bind_gid, resolve_gid, unbind_gid, increment_credit, decrement_credit, bin, resolve, unbind, and iterate_names (only the primary and symbol AGAS service components live on all localities, whereas all other AGAS services are available on locality#0 only). None Returns the total number of invocations of the specified AGAS service since its creation. /agas//count where:  is one of the following: primary, locality, component or symbol /total where:  is the name of the AGAS service to query. Currently, this value will be locality#0 where 0 is the root locality (the id of the locality hosting the AGAS service). Except for , primary or symbol for which the value for * can be any locality id (only the primary and symbol AGAS service components live on all localities, whereas all other AGAS services are available on locality#0 only). None Returns the overall total number of invocations of all AGAS services provided by the given AGAS service category since its creation. agas/time/ where:  is one of the following: primary namespace services: route, bind_gid, resolve_gid, unbind_gid, increment_credit, decrement_credit, allocate begin_migration, end_migration component namespace services: bind_prefix, bind_name, resolve_id, unbind_name, iterate_types, get_component_typename, num_localities_type locality namespace services: free, localities, num_localities, num_threads, resolve_locality, resolved_localities symbol namespace services: bind, resolve, unbind, iterate_names, on_symbol_namespace_event /total where:  is the name of the AGAS service to query. Currently, this value will be locality#0 where 0 is the root locality (the id of the locality hosting the AGAS service). The value for * can be any locality id for the following : route, bind_gid, resolve_gid, unbind_gid, increment_credit, decrement_credit, bin, resolve, unbind, and iterate_names (only the primary and symbol AGAS service components live on all localities, whereas all other AGAS services are available on locality#0 only). None Returns the overall execution time of the specified AGAS service since its creation (in nanoseconds). /agas//time where:  is one of the following: primary, locality, component or symbol. /total where:  is the name of the AGAS service to query. Currently, this value will be locality#0 where 0 is the root locality (the id of the locality hosting the AGAS service). Except for ). /agas/count/ where:  is one of the following: cache/evictions, cache/hits, cache/inserts, cache/misses locality#*/total where: * is the locality id of the locality the AGAS cache should be queried. The locality id is a (zero based) number identifying the locality. None Returns the number of cache events (evictions, hits, inserts, and misses) in the AGAS cache of the specified locality (see ). /agas/count/ where:  is one of the following: cache/get_entry, cache/insert_entry, cache/update_entry, cache/erase_entry locality#*/total where: * is the locality id of the locality the AGAS cache should be queried. The locality id is a (zero based) number identifying the locality. None Returns the number of invocations of the specified cache API function of the AGAS cache. /agas/time/ where:  is one of the following: cache/get_entry, cache/insert_entry, cache/update_entry, cache/erase_entry locality#*/total where: * is the locality id of the locality the AGAS cache should be queried. The locality id is a (zero based) number identifying the locality. None Returns the overall time spent executing of the specified API function of the AGAS cache.
 Counter type Counter instance formatting Description Parameters /data/count// where:  is one of the following: sent, received . The performance counters for the connection type mpi are available only if the compile time constant HPX_HAVE_PARCELPORT_MPI was defined while compiling the HPX core library (which is not defined by default, the corresponding cmake configuration constant is HPX_WITH_PARCELPORT_MPI. Please see CMake variables used to configure HPX for more details. None /data/time// where:  is one of the following: sent, received  the given locality (see / where:  is one of the following: sent, received , e.g. sent or received possibly compressed) for the specified  by the given locality. The performance counters for the connection type mpi are available only if the compile time constant HPX_HAVE_PARCELPORT_MPI was defined while compiling the HPX core library (which is not defined by default, the corresponding cmake configuration constant is HPX_WITH_PARCELPORT_MPI. Please see CMake variables used to configure HPX for more details. If the configure-time option -DHPX_WITH_PARCELPORT_ACTION_COUNTERS=On was specified, this counter allows to specify an optional action name as its parameter. In this case the counter will report the number of bytes transmitted for the given action only. /serialize/time// where:  is one of the following: sent, received  on the given locality (see / where:  is one of the following: sent, received , e.g. sent or received. The performance counters for the connection type mpi are available only if the compile time constant HPX_HAVE_PARCELPORT_MPI was defined while compiling the HPX core library (which is not defined by default, the corresponding cmake configuration constant is HPX_WITH_PARCELPORT_MPI. Please see CMake variables used to configure HPX for more details. None /messages/count// where:  is one of the following: sent, received  by the given locality (see / where:  is one of the following: cache/insertions, cache/evictions, cache/hits, cache/misses cache/misses  where:  is one of the following: send, receive locality#*/total where: * is the locality id of the locality the parcel queue should be queried. The locality id is a (zero based) number identifying the locality. Returns the current number of parcels stored in the parcel queue (see 
 Counter type Counter instance formatting Description Parameters /threads/count/cumulative locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the overall number of retired HPX-threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the overall number of retired HPX-threads should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the overall number of executed (retired) HPX-threads on the given locality since application start. If the instance name is total the counter returns the accumulated number of retired HPX-threads for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the overall number of retired HPX-threads for all worker threads separately. This counter is available only if the configuration time constant HPX_WITH_THREAD_CUMULATIVE_COUNTS is set to ON (default: ON). None /threads/time/average locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average time spent executing one HPX-thread should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the average time spent executing one HPX-thread should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average time spent executing one HPX-thread on the given locality since application start. If the instance name is total the counter returns the average time spent executing one HPX-thread for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the average time spent executing one HPX-thread for all worker threads separately. This counter is available only if the configuration time constants HPX_WITH_THREAD_CUMULATIVE_COUNTS (default: ON) and HPX_WITH_THREAD_IDLE_RATES are set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/time/average-overhead locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average overhead spent executing one HPX-thread should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the average overhead spent executing one HPX-thread should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average time spent on overhead while executing one HPX-thread on the given locality since application start. If the instance name is total the counter returns the average time spent on overhead while executing one HPX-thread for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the average time spent on overhead executing one HPX-thread for all worker threads separately. This counter is available only if the configuration time constants HPX_WITH_THREAD_CUMULATIVE_COUNTS (default: ON) and HPX_WITH_THREAD_IDLE_RATES are set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/count/cumulative-phases locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the overall number of executed HPX-thread phases (invocations) should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the overall number of executed HPX-thread phases (invocations) should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the overall number of executed HPX-thread phases (invocations) on the given locality since application start. If the instance name is total the counter returns the accumulated number of executed HPX-thread phases (invocations) for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the overall number of executed HPX-thread phases for all worker threads separately. This counter is available only if the configuration time constant HPX_WITH_THREAD_CUMULATIVE_COUNTS is set to ON (default: ON). The unit of measure for this counter is nanosecond [ns]. None /threads/time/average-phase locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average time spent executing one HPX-thread phase (invocation) should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the average time executing one HPX-thread phase (invocation) should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average time spent executing one HPX-thread phase (invocation) on the given locality since application start. If the instance name is total the counter returns the average time spent executing one HPX-thread phase (invocation) for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the average time spent executing one HPX-thread phase for all worker threads separately. This counter is available only if the configuration time constants HPX_WITH_THREAD_CUMULATIVE_COUNTS (default: ON) and HPX_WITH_THREAD_IDLE_RATES are set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/time/average-phase-overhead locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average time overhead executing one HPX-thread phase (invocation) should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the average overhead executing one HPX-thread phase (invocation) should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average time spent on overhead executing one HPX-thread phase (invocation) on the given locality since application start. If the instance name is total the counter returns the average time spent on overhead while executing one HPX-thread phase (invocation) for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the average time spent on overhead executing one HPX-thread phase for all worker threads separately. This counter is available only if the configuration time constants HPX_WITH_THREAD_CUMULATIVE_COUNTS (default: ON) and HPX_WITH_THREAD_IDLE_RATES are set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/time/overall locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the overall time spent running the scheduler should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the overall time spent running the scheduler should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the overall time spent running the scheduler on the given locality since application start. If the instance name is total the counter returns the overall time spent running the scheduler for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the overall time spent running the scheduler for all worker threads separately. This counter is available only if the configuration time constant HPX_WITH_THREAD_IDLE_RATES is set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/time/cumulative locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the overall time spent executing all HPX-threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the overall time spent executing all HPX-threads should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the overall time spent executing all HPX-threads on the given locality since application start. If the instance name is total the counter returns the overall time spent executing all HPX-threads for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the overall time spent executing all HPX-threads for all worker threads separately. This counter is available only if the configuration time constants HPX_THREAD_MAINTAIN_CUMULATIVE_COUNTS (default: ON) and HPX_THREAD_MAINTAIN_IDLE_RATES are set to ON (default: OFF). None /threads/time/cumulative-overheads locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the overall overhead time incurred by executing all HPX-threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the the overall overhead time incurred by executing all HPX-threads should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the overall overhead time incurred executing all HPX-threads on the given locality since application start. If the instance name is total the counter returns the overall overhead time incurred executing all HPX-threads for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the overall overhead time incurred executing all HPX-threads for all worker threads separately. This counter is available only if the configuration time constants HPX_THREAD_MAINTAIN_CUMULATIVE_COUNTS (default: ON) and HPX_THREAD_MAINTAIN_IDLE_RATES are set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None threads/count/instantaneous/ where:  is one of the following: all, active, pending, suspended, terminated, staged locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the current number of threads with the given state should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the current number of threads with the given state should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. The staged thread state refers to registered tasks before they are converted to thread objects. Returns the current number of HPX-threads having the given thread state on the given locality. If the instance name is total the counter returns the current number of HPX-threads of the given state for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the current number of HPX-threads in the given state for all worker threads separately. None threads/wait-time/ where:  is one of the following: pending staged locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average wait time of HPX-threads (pending) or thread descriptions (staged) with the given state should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the average wait time for the given state should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. The staged thread state refers to the wait time of registered tasks before they are converted into thread objects, while the pending thread state refers to the wait time of threads in any of the scheduling queues. Returns the average wait time of HPX-threads (if the thread state is pending or of task descriptions (if the thread state is staged on the given locality since application start. If the instance name is total the counter returns the wait time of HPX-threads of the given state for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the wait time of HPX-threads in the given state for all worker threads separately. These counters are available only if the compile time constant HPX_WITH_THREAD_QUEUE_WAITTIME was defined while compiling the HPX core library (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/idle-rate locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average idle rate of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the averaged idle rate should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average idle rate for the given worker thread(s) on the given locality. The idle rate is defined as the ratio of the time spent on scheduling and management tasks and the overall time spent executing work since the application started. This counter is available only if the configuration time constant HPX_WITH_THREAD_IDLE_RATES is set to ON (default: OFF). None /threads/creation-idle-rate locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average creation idle rate of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the averaged idle rate should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average idle rate for the given worker thread(s) on the given locality which is caused by creating new threads. The creation idle rate is defined as the ratio of the time spent on creating new threads and the overall time spent executing work since the application started. This counter is available only if the configuration time constants HPX_WITH_THREAD_IDLE_RATES (default: OFF) and HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES are set to ON. None /threads/cleanup-idle-rate locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the average cleanup idle rate of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the averaged cleanup idle rate should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the average idle rate for the given worker thread(s) on the given locality which is caused by cleaning up terminated threads. The cleanup idle rate is defined as the ratio of the time spent on cleaning up terminated thread objects and the overall time spent executing work since the application started. This counter is available only if the configuration time constants HPX_WITH_THREAD_IDLE_RATES (default: OFF) and HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES are set to ON. None /threadqueue/length locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the current length of all thread queues in the scheduler for all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the current length of all thread queues in the scheduler should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the overall length of all queues for the given worker thread(s) on the given locality. None /threads/count/stack-unbinds locality#*/total where: * is the locality id of the locality the unbind (madvise) operations should be queried for. The locality id is a (zero based) number identifying the locality. Returns the total number of HPX-thread unbind (madvise) operations performed for the referenced locality. Note that this counter is not available on Windows based platforms. None /threads/count/stack-recycles locality#*/total where: * is the locality id of the locality the recycling operations should be queried for. The locality id is a (zero based) number identifying the locality. Returns the total number of HPX-thread recycling operations performed. None /threads/count/stolen-from-pending locality#*/total where: * is the locality id of the locality the number of ‘stole’ threads should be queried for. The locality id is a (zero based) number identifying the locality. Returns the total number of HPX-threads ‘stolen’ from the pending thread queue by a neighboring thread worker thread (these threads are executed by a different worker thread than they were initially scheduled on). This counter is available only if the configuration time constant HPX_WITH_THREAD_STEALING_COUNTS is set to ON (default: ON). None /threads/count/pending-misses locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the number of pending queue misses of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the number of pending queue misses should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the total number of times that the referenced worker-thread on the referenced locality failed to find pending HPX-threads in its associated queue. This counter is available only if the configuration time constant HPX_WITH_THREAD_STEALING_COUNTS is set to ON (default: ON). None /threads/count/pending-accesses locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the number of pending queue accesses of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the number of pending queue accesses should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the total number of times that the referenced worker-thread on the referenced locality looked for pending HPX-threads in its associated queue. This counter is available only if the configuration time constant HPX_WITH_THREAD_STEALING_COUNTS is set to ON (default: ON). None /threads/count/stolen-from-staged locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the number of HPX-threads stolen from the staged queue of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the number of HPX-threads stolen from the staged queue should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the total number of HPX-threads ‘stolen’ from the staged thread queue by a neighboring worker thread (these threads are executed by a different worker thread than they were initially scheduled on). This counter is available only if the configuration time constant HPX_WITH_THREAD_STEALING_COUNTS is set to ON (default: ON). None /threads/count/stolen-to-pending locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the number of HPX-threads stolen to the pending queue of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the number of HPX-threads stolen to the pending queue should be queried for. The worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the total number of HPX-threads ‘stolen’ to the pending thread queue of the worker thread (these threads are executed by a different worker thread than they were initially scheduled on). This counter is available only if the configuration time constant HPX_WITH_THREAD_STEALING_COUNTS is set to ON (default: ON). None /threads/count/stolen-to-staged locality#*/total or locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the number of HPX-threads stolen to the staged queue of all (or one) worker threads should be queried for. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the number of HPX-threads stolen to the staged queue should be queried for. The worker thread number (given by the * is a (zero based) worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the total number of HPX-threads ‘stolen’ to the staged thread queue of a neighboring worker thread (these threads are executed by a different worker thread than they were initially scheduled on). This counter is available only if the configuration time constant HPX_WITH_THREAD_STEALING_COUNTS is set to ON (default: ON). None /threads/count/objects locality#*/total or locality#*/allocator#* where: locality#* is defining the locality for which the current (cumulative) number of all created HPX-thread objects should be queried for. The locality id (given by * is a (zero based) number identifying the locality. allocator#* is defining the number of the allocator instance using which the threads have been created. HPX uses a varying number of allocators to create (and recycle) HPX-thread objects, most likely these counters are of use for debugging purposes only. The allocator id (given by * is a (zero based) number identifying the allocator to query. Returns the total number of HPX-thread objects created. Note that thread objects are reused to improve system performance, thus this number does not reflect the number of actually executed (retired) HPX-threads. None /scheduler/utilization/instantaneous locality#*/total where: locality#* is defining the locality for which the current (instantaneous) scheduler utilization queried for. The locality id (given by * is a (zero based) number identifying the locality. Returns the total (instantaneous) scheduler utilization. This is the current percentage of scheduler threads executing HPX threads. Percent /threads/idle-loop-count/instantaneous locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the current current accumulated value of all idle-loop counters of all worker threads should be queried. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the current value of the idle-loop counter should be queried for. The worker thread number (given by the * is a (zero based) worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the current (instantaneous) idle-loop count for the given HPX- worker thread or the accumulated value for all worker threads. None /threads/busy-loop-count/instantaneous locality#*/worker-thread#* or locality#*/pool#*/worker-thread#* where: locality#* is defining the locality for which the current current accumulated value of all busy-loop counters of all worker threads should be queried. The locality id (given by * is a (zero based) number identifying the locality. pool#* is defining the pool for which the current value of the idle-loop counter should be queried for. worker-thread#* is defining the worker thread for which the current value of the busy-loop counter should be queried for. The worker thread number (given by the * is a (zero based) worker thread number (given by the * is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. If no pool-name is specified the counter refers to the ‘default’ pool. Returns the current (instantaneous) busy-loop count for the given HPX- worker thread or the accumulated value for all worker threads. None /threads/time/background-work-duration locality#*/total or locality#*/worker-thread#* where: locality#* is defining the locality for which the overall time spent performing background work should be queried for. The locality id (given by *) is a (zero based) number identifying the locality. worker-thread#* is defining the worker thread for which the overall time spent performing background work should be queried for. The worker thread number (given by the *) is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. Returns the overall time spent performing background work on the given locality since application start. If the instance name is total the counter returns the overall time spent performing background work for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return the overall time spent performing background work for all worker threads separately. This counter is available only if the configuration time constants HPX_WITH_BACKGROUND_THREAD_COUNTERS (default: OFF) and HPX_WITH_THREAD_IDLE_RATES are set to ON (default: OFF). The unit of measure for this counter is nanosecond [ns]. None /threads/background-overhead locality#*/total or locality#*/worker-thread#* where: locality#* is defining the locality for which the background overhead should be queried for. The locality id (given by *) is a (zero based) number identifying the locality. worker-thread#* is defining the worker thread for which the background overhead should be queried for. The worker thread number (given by the *) is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. Returns the background overhead on the given locality since application start. If the instance name is total the counter returns the background overhead for all worker threads (cores) on that locality. If the instance name is worker-thread#* the counter will return background overhead for all worker threads separately. This counter is available only if the configuration time constants HPX_WITH_BACKGROUND_THREAD_COUNTERS (default: OFF) and HPX_WITH_THREAD_IDLE_RATES are set to ON (default: OFF). The unit of measure displayed for this counter is 0.1%. None
 Counter type Counter instance formatting Description Parameters /runtime/count/component locality#*/total where: * is the locality id of the locality the number of components should be queried. The locality id is a (zero based) number identifying the locality. Returns the overall number of currently active components of the specified type on the given locality. The type of the component. This is the string which has been used while registering the component with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_COMPONENT. /runtime/count/action-invocation locality#*/total where: * is the locality id of the locality the number of action invocations should be queried. The locality id is a (zero based) number identifying the locality. Returns the overall (local) invocation count of the specified action type on the given locality. The action type. This is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID. /runtime/count/remote-action-invocation locality#*/total where: * is the locality id of the locality the number of action invocations should be queried. The locality id is a (zero based) number identifying the locality. Returns the overall (remote) invocation count of the specified action type on the given locality. The action type. This is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID. /runtime/uptime locality#*/total where: * is the locality id of the locality the system uptime should be queried. The locality id is a (zero based) number identifying the locality. Returns the overall time since application start on the given locality in nanoseconds. None /runtime/memory/virtual locality#*/total where: * is the locality id of the locality the allocated virtual memory should be queried. The locality id is a (zero based) number identifying the locality. Returns the amount of virtual memory currently allocated by the referenced locality (in bytes). None /runtime/memory/resident locality#*/total where: * is the locality id of the locality the allocated resident memory should be queried. The locality id is a (zero based) number identifying the locality. Returns the amount of resident memory currently allocated by the referenced locality (in bytes). None /runtime/memory/total locality#*/total where: * is the locality id of the locality the total available memory should be queried. The locality id is a (zero based) number identifying the locality. Note: only supported in Linux. Returns the total available memory for use by the referenced locality (in bytes). This counter is available on Linux and Windows systems only. None /runtime/io/read_bytes_issued locality#*/total where: * is the locality id of the locality the number of bytes read should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of bytes read by the process (aggregate of count arguments passed to read() call or its analogues). This performance counter is available only on systems which expose the related data through the /proc file system. None /runtime/io/write_bytes_issued locality#*/total where: * is the locality id of the locality the number of bytes written should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of bytes written by the process (aggregate of count arguments passed to write() call or its analogues). This performance counter is available only on systems which expose the related data through the /proc file system. None /runtime/io/read_syscalls locality#*/total where: * is the locality id of the locality the number of system calls should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of system calls that perform I/O reads. This performance counter is available only on systems which expose the related data through the /proc file system. None /runtime/io/write_syscalls locality#*/total where: * is the locality id of the locality the number of system calls should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of system calls that perform I/O writes. This performance counter is available only on systems which expose the related data through the /proc file system. None /runtime/io/read_bytes_transferred locality#*/total where: * is the locality id of the locality the number of bytes transferred should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of bytes retrieved from storage by I/O operations. This performance counter is available only on systems which expose the related data through the /proc file system. None /runtime/io/write_bytes_transferred locality#*/total where: * is the locality id of the locality the number of bytes transferred should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of bytes retrieved from storage by I/O operations. This performance counter is available only on systems which expose the related data through the /proc file system. None /runtime/io/write_bytes_cancelled locality#*/total where: * is the locality id of the locality the number of bytes not being transferred should be queried. The locality id is a (zero based) number identifying the locality. Returns the number of bytes accounted by write_bytes_transferred that has not been ultimately stored due to truncation or deletion. This performance counter is available only on systems which expose the related data through the /proc file system. None
 Counter type Counter instance formatting Description Parameters /papi/ where:  is the name of the PAPI event to expose as a performance counter (such as PAPI_SR_INS). Note that the list of available PAPI events changes depending on the used architecture. For a full list of available PAPI events and their (short) description use the --hpx:list-counters and --papi-event-info=all command line options. locality#*/total or locality#*/worker-thread#* where: locality#* is defining the locality for which the current current accumulated value of all busy-loop counters of all worker threads should be queried. The locality id (given by *) is a (zero based) number identifying the locality. worker-thread#* is defining the worker thread for which the current value of the busy-loop counter should be queried for. The worker thread number (given by the *) is a (zero based) worker thread number (given by the *) is a (zero based) number identifying the worker thread. The number of available worker threads is usually specified on the command line for the application using the option --hpx:threads. This counter returns the current count of occurrences of the specified PAPI event. This counter is available only if the configuration time constant HPX_WITH_PAPI is set to ON (default: OFF). None
 Counter type Counter instance formatting Description Parameters /statistics/average Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current average (mean) value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to two comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/rolling_average Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current rolling average (mean) value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to three comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value will be interpreted as the size of the rolling window (the number of latest values to use to calculate the rolling average). The default value for this is 10. The third value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/stddev Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current standard deviation (stddev) value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to two comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/rolling_stddev Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current rolling variance (stddev) value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to three comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value will be interpreted as the size of the rolling window (the number of latest values to use to calculate the rolling average). The default value for this is 10. The third value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/median Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current (statistically estimated) median value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to two comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/max Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current maximum value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to two comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/rolling_max Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current rolling maximum value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to three comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value will be interpreted as the size of the rolling window (the number of latest values to use to calculate the rolling average). The default value for this is 10. The third value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/min Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current minimum value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to two comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0. /statistics/rolling_min Any full performance counter name. The referenced performance counter is queried at fixed time intervals as specified by the first parameter. Returns the current rolling minimum value calculated based on the values queried from the underlying counter (the one specified as the instance name). Any parameter will be interpreted as a list of up to three comma separated (integer) values, where the first is the time interval (in milliseconds) at which the underlying counter should be queried. If no value is specified, the counter will assume 1000 [ms] as the default. The second value will be interpreted as the size of the rolling window (the number of latest values to use to calculate the rolling average). The default value for this is 10. The third value can be either 0 or 1 and specifies whether the underlying counter should be reset during evaluation 1 or not 0. The default value is 0.
 Counter type Counter instance formatting Description Parameters /arithmetics/add None Returns the sum calculated based on the values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/subtract None Returns the difference calculated based on the values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/multiply None Returns the product calculated based on the values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/divide None Returns the result of division of the values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/mean None Returns the average value of all values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/variance None Returns the standard deviation of all values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/median None Returns the median value of all values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/min None Returns the minimum value of all values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/max None Returns the maximum value of all values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded. /arithmetics/count None Returns the count value of all values queried from the underlying counters (the ones specified as the parameters). The parameter will be interpreted as a comma separated list of full performance counter names which are queried whenever this counter is accessed. Any wildcards in the counter names will be expanded.

Note

The /arithmetics counters can consume an arbitrary number of other counters. For this reason those have to be specified as parameters (a comma separated list of counters appended after a '@'. For instance:

./bin/hello_world_distributed -t2 \
hello world from OS-thread 0 on locality 0
hello world from OS-thread 1 on locality 0


Since all wildcards in the parameters are expanded, this example is fully equivalent to specifying both counters separately to /arithmetics/add:

./bin/hello_world_distributed -t2 \

 Counter type Counter instance formatting Description Parameters /coalescing/count/parcels locality#*/total where: * is the locality id of the locality the number of parcels for the given action should be queried for. The locality id is a (zero based) number identifying the locality. Returns the number of parcels handled by the message handler associated with the action which is given by the counter parameter. The action type. This is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID. /coalescing/count/messages locality#*/total where: * is the locality id of the locality the number of messages for the given action should be queried for. The locality id is a (zero based) number identifying the locality. Returns the number of messages generated by the message handler associated with the action which is given by the counter parameter. The action type. This is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID. /coalescing/count/average-parcels-per-message locality#*/total where: * is the locality id of the locality the number of messages for the given action should be queried for. The locality id is a (zero based) number identifying the locality. Returns the average number of parcels sent in a message generated by the message handler associated with the action which is given by the counter parameter. The action type. This is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID /coalescing/time/average-parcel-arrival locality#*/total where: * is the locality id of the locality the average time between parcels for the given action should be queried for. The locality id is a (zero based) number identifying the locality. Returns the average time between arriving parcels for the action which is given by the counter parameter. The action type. This is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID /coalescing/time/parcel-arrival-histogram locality#*/total where: * is the locality id of the locality the average time between parcels for the given action should be queried for. The locality id is a (zero based) number identifying the locality. Returns a histogram representing the times between arriving parcels for the action which is given by the counter parameter. This counter returns an array of values, where the first three values represent the three parameters used for the histogram followed by one value for each of the histogram buckets. The first unit of measure displayed for this counter [ns] refers to the lower and upper boundary values in the returned histogram data only. The second unit of measure displayed [0.1%] refers to the actual histogram data. For each bucket the counter shows a value between 0 and 1000 which corresponds to a percentage value between 0% and 100%. The action type and optional histogram parameters. The action type is the string which has been used while registering the action with HPX, e.g. which has been passed as the second parameter to the macro HPX_REGISTER_ACTION or HPX_REGISTER_ACTION_ID. The action type may be followed by a comma separated list of up-to three numbers: the lower and upper boundaries for the collected histogram, and the number of buckets for the histogram to generate. By default these three numbers will be assumed to be 0 ([ns], lower bound), 1000000 ([ns], upper bound), and 20 (number of buckets to generate).

Note

The performance counters related to parcel coalescing are available only if the configuration time constant HPX_WITH_PARCEL_COALESCING is set to ON (default: ON). However, even in this case it will be available only for those actions, which are enabled for parcel coalescing (see the macros HPX_ACTION_USES_MESSAGE_COALESCING and HPX_ACTION_USES_MESSAGE_COALESCING_NOTHROW).

 [1] A message can potentially consist of more than one parcel.

## APEX integration¶

HPX provides integration with APEX, which is a framework for application profiling using task timers and various performance counters. It can be added as a git submodule by turning on the option HPX_WITH_APEX:BOOL during CMake configuration. TAU is an optional dependency when using APEX.

To build HPX with APEX add HPX_WITH_APEX=ON, and, optionally, TAU_ROOT=\$PATH_TO_TAU to your CMake configuration. In addition, you can override the tag used for APEX with the HPX_WITH_APEX_TAG option. Please see the APEX HPX documentation for detailed instructions on using APEX with HPX.