Namespaces
namespace	trait
	The warp traits.

Classes
struct	ConceptWarp

class	WarpSingleThread
	The single-threaded warp to emulate it on CPUs. More...

class	WarpUniformCudaHipBuiltIn
	The GPU CUDA/HIP warp. More...

Functions
template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	activemask (TWarp const &warp) -> decltype(trait::Activemask< interface::ImplementationBase< ConceptWarp, TWarp > >::activemask(warp))
	Returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if the Nth thread of the warp is active.

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	all (TWarp const &warp, std::int32_t predicate) -> std::int32_t
	Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for all of them.

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	any (TWarp const &warp, std::int32_t predicate) -> std::int32_t
	Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for any of them.

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	ballot (TWarp const &warp, std::int32_t predicate)
	Evaluates predicate for all non-exited threads in a warp and returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if predicate evaluates to non-zero for the Nth thread of the warp and the Nth thread is active.

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	getSize (TWarp const &warp) -> std::int32_t
	Returns warp size.

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	shfl (TWarp const &warp, T value, std::int32_t srcLane, std::int32_t width=0)
	Exchange data between threads within a warp.

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	shfl_down (TWarp const &warp, T value, std::uint32_t offset, std::int32_t width=0)
	Exchange data between threads within a warp. It copies from a lane with higher ID relative to caller. The lane ID is calculated by adding delta to the caller’s lane ID.

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	shfl_up (TWarp const &warp, T value, std::uint32_t offset, std::int32_t width=0)
	Exchange data between threads within a warp. It copies from a lane with lower ID relative to caller. The lane ID is calculated by subtracting delta from the caller’s lane ID.

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto	shfl_xor (TWarp const &warp, T value, std::int32_t mask, std::int32_t width=0)
	Exchange data between threads within a warp. It copies from a lane based on bitwise XOR of own lane ID. The lane ID is calculated by performing a bitwise XOR of the caller’s lane ID with mask.

Function Documentation

◆ activemask()

template<typename TWarp >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::activemask ( TWarp const & warp ) -> decltype(trait::Activemask<interface::ImplementationBase<ConceptWarp, TWarp>>::activemask(warp))

Returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if the Nth thread of the warp is active.

Note: decltype for return type is required there, otherwise compilcation with a CPU and a GPU accelerator enabled fails as it tries to call device function from a host-device one. The reason is unclear, but likely related to deducing the return type.

Note:

The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {activemask} else {activemask}.

Template Parameters

TWarp The warp implementation type.

Parameters

warp	The warp implementation.

Returns: 32-bit or 64-bit unsigned type depending on the accelerator.

Definition at line 90 of file Traits.hpp.

◆ all()

template<typename TWarp >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::all	(	TWarp const &	warp,
		std::int32_t	predicate
	)		-> std::int32_t

Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for all of them.

It follows the logic of __all(predicate) in CUDA before version 9.0 and HIP, the operation is applied for all active threads. The modern CUDA counterpart would be __all_sync(__activemask(), predicate).

Note:

The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {all} else {all}.

Template Parameters

TWarp The warp implementation type.

Parameters

warp	The warp implementation.
predicate	The predicate value for current thread.

Definition at line 114 of file Traits.hpp.

◆ any()

template<typename TWarp >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::any	(	TWarp const &	warp,
		std::int32_t	predicate
	)		-> std::int32_t

Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for any of them.

It follows the logic of __any(predicate) in CUDA before version 9.0 and HIP, the operation is applied for all active threads. The modern CUDA counterpart would be __any_sync(__activemask(), predicate).

Note:

The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {any} else {any}.

Template Parameters

TWarp The warp implementation type.

Parameters

warp	The warp implementation.
predicate	The predicate value for current thread.

Definition at line 137 of file Traits.hpp.

◆ ballot()

template<typename TWarp >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::ballot	(	TWarp const &	warp,
		std::int32_t	predicate
	)

Evaluates predicate for all non-exited threads in a warp and returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if predicate evaluates to non-zero for the Nth thread of the warp and the Nth thread is active.

It follows the logic of __ballot(predicate) in CUDA before version 9.0 and HIP, the operation is applied for all active threads. The modern CUDA counterpart would be __ballot_sync(__activemask(), predicate). Return type is 64-bit to fit all platforms.

Note:

The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {ballot} else {ballot}.

Template Parameters

TWarp The warp implementation type.

Parameters

warp	The warp implementation.
predicate	The predicate value for current thread.

Returns: 32-bit or 64-bit unsigned type depending on the accelerator.

Definition at line 164 of file Traits.hpp.

◆ getSize()

template<typename TWarp >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::getSize ( TWarp const & warp ) -> std::int32_t

Returns warp size.

Template Parameters

TWarp The warp implementation type.

Parameters

warp	The warp implementation.

Definition at line 65 of file Traits.hpp.

◆ shfl()

template<typename TWarp , typename T >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl	(	TWarp const &	warp,
		T	value,
		std::int32_t	srcLane,
		std::int32_t	width = `0`
	)

Exchange data between threads within a warp.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
return values[width*(threadIdx.x/width) + srcLane%width];

However, it does not use shared memory.

Notes:

The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
Commonly used with width = warpsize (the default), (returns values[srcLane])
Width must be a power of 2.

Template Parameters

TWarp warp implementation type

Parameters

warp	warp implementation
value	value to broadcast (only meaningful from threadIdx == srcLane)
srcLane	source lane sending value
width	number of threads receiving a single value

Returns: val from the thread index srcLane.

Definition at line 198 of file Traits.hpp.

◆ shfl_down()

template<typename TWarp , typename T >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl_down	(	TWarp const &	warp,
		T	value,
		std::uint32_t	offset,
		std::int32_t	width = `0`
	)

Exchange data between threads within a warp. It copies from a lane with higher ID relative to caller. The lane ID is calculated by adding delta to the caller’s lane ID.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
return (threadIdx.x % width + delta < width) ? values[threadIdx.x + delta] : values[threadIdx.x];

However, it does not use shared memory.

Notes:

The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
Commonly used with width = warpsize (the default), (returns values[threadIdx.x+delta] if threadIdx.x+delta < warpsize)
Width must be a power of 2.

Template Parameters

TWarp	warp implementation type
T	value type

Parameters

warp	warp implementation
value	value to broadcast
offset	corresponds to the delta used to compute the lane ID
width	size of the group participating in the shuffle operation

Returns: val from the thread index lane ID.

Definition at line 274 of file Traits.hpp.

◆ shfl_up()

template<typename TWarp , typename T >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl_up	(	TWarp const &	warp,
		T	value,
		std::uint32_t	offset,
		std::int32_t	width = `0`
	)

Exchange data between threads within a warp. It copies from a lane with lower ID relative to caller. The lane ID is calculated by subtracting delta from the caller’s lane ID.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
return (threadIdx.x % width >= delta) ? values[threadIdx.x - delta] : values[threadIdx.x];

However, it does not use shared memory.

Notes:

The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
Commonly used with width = warpsize (the default), (returns values[threadIdx.x - delta] if threadIdx.x >= delta)
Width must be a power of 2.

Template Parameters

TWarp	warp implementation type
T	value type

Parameters

warp	warp implementation
value	value to broadcast
offset	corresponds to the delta used to compute the lane ID
width	size of the group participating in the shuffle operation

Returns: val from the thread index lane ID.

Definition at line 236 of file Traits.hpp.

◆ shfl_xor()

template<typename TWarp , typename T >

ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl_xor	(	TWarp const &	warp,
		T	value,
		std::int32_t	mask,
		std::int32_t	width = `0`
	)

Exchange data between threads within a warp. It copies from a lane based on bitwise XOR of own lane ID. The lane ID is calculated by performing a bitwise XOR of the caller’s lane ID with mask.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
int lane = threadIdx.x ^ mask;
return values[lane / width > threadIdx.x / width ? threadIdx.x : lane];

However, it does not use shared memory.

Notes:

The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
Commonly used with width = warpsize (the default), (returns values[threadIdx.x^mask])
Width must be a power of 2.

Template Parameters

TWarp	warp implementation type
T	value type

Parameters

warp	warp implementation
value	value to broadcast
mask	corresponds to the mask used to compute the lane ID
width	size of the group participating in the shuffle operation

Returns: val from the thread index lane ID.

Definition at line 312 of file Traits.hpp.

Namespaces

Classes

Functions

Function Documentation

◆ activemask()

◆ all()

◆ any()

◆ ballot()

◆ getSize()

◆ shfl()

◆ shfl_down()

◆ shfl_up()

◆ shfl_xor()