alpaka
Abstraction Library for Parallel Kernel Acceleration
alpaka::warp Namespace Reference

Namespaces

 trait
 The warp traits.
 

Classes

struct  ConceptWarp
 
class  WarpSingleThread
 The single-threaded warp to emulate it on CPUs. More...
 
class  WarpUniformCudaHipBuiltIn
 The GPU CUDA/HIP warp. More...
 

Functions

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto activemask (TWarp const &warp) -> decltype(trait::Activemask< interface::ImplementationBase< ConceptWarp, TWarp >>::activemask(warp))
 Returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if the Nth thread of the warp is active. More...
 
template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto all (TWarp const &warp, std::int32_t predicate) -> std::int32_t
 Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for all of them. More...
 
template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto any (TWarp const &warp, std::int32_t predicate) -> std::int32_t
 Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for any of them. More...
 
template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto ballot (TWarp const &warp, std::int32_t predicate)
 Evaluates predicate for all non-exited threads in a warp and returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if predicate evaluates to non-zero for the Nth thread of the warp and the Nth thread is active. More...
 
template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto getSize (TWarp const &warp) -> std::int32_t
 Returns warp size. More...
 
template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto shfl (TWarp const &warp, T value, std::int32_t srcLane, std::int32_t width=0)
 Exchange data between threads within a warp. More...
 
template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto shfl_down (TWarp const &warp, T value, std::uint32_t offset, std::int32_t width=0)
 Exchange data between threads within a warp. It copies from a lane with higher ID relative to caller. The lane ID is calculated by adding delta to the caller’s lane ID. More...
 
template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto shfl_up (TWarp const &warp, T value, std::uint32_t offset, std::int32_t width=0)
 Exchange data between threads within a warp. It copies from a lane with lower ID relative to caller. The lane ID is calculated by subtracting delta from the caller’s lane ID. More...
 
template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto shfl_xor (TWarp const &warp, T value, std::int32_t mask, std::int32_t width=0)
 Exchange data between threads within a warp. It copies from a lane based on bitwise XOR of own lane ID. The lane ID is calculated by performing a bitwise XOR of the caller’s lane ID with mask. More...
 

Function Documentation

◆ activemask()

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::activemask ( TWarp const &  warp) -> decltype(trait::Activemask<interface::ImplementationBase<ConceptWarp, TWarp>>::activemask(warp))

Returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if the Nth thread of the warp is active.

Note: decltype for return type is required there, otherwise compilcation with a CPU and a GPU accelerator enabled fails as it tries to call device function from a host-device one. The reason is unclear, but likely related to deducing the return type.

Note:

  • The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {activemask} else {activemask}.
Template Parameters
TWarpThe warp implementation type.
Parameters
warpThe warp implementation.
Returns
32-bit or 64-bit unsigned type depending on the accelerator.

Definition at line 90 of file Traits.hpp.

◆ all()

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::all ( TWarp const &  warp,
std::int32_t  predicate 
) -> std::int32_t

Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for all of them.

It follows the logic of __all(predicate) in CUDA before version 9.0 and HIP, the operation is applied for all active threads. The modern CUDA counterpart would be __all_sync(__activemask(), predicate).

Note:

  • The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {all} else {all}.
Template Parameters
TWarpThe warp implementation type.
Parameters
warpThe warp implementation.
predicateThe predicate value for current thread.

Definition at line 114 of file Traits.hpp.

◆ any()

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::any ( TWarp const &  warp,
std::int32_t  predicate 
) -> std::int32_t

Evaluates predicate for all active threads of the warp and returns non-zero if and only if predicate evaluates to non-zero for any of them.

It follows the logic of __any(predicate) in CUDA before version 9.0 and HIP, the operation is applied for all active threads. The modern CUDA counterpart would be __any_sync(__activemask(), predicate).

Note:

  • The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {any} else {any}.
Template Parameters
TWarpThe warp implementation type.
Parameters
warpThe warp implementation.
predicateThe predicate value for current thread.

Definition at line 137 of file Traits.hpp.

◆ ballot()

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::ballot ( TWarp const &  warp,
std::int32_t  predicate 
)

Evaluates predicate for all non-exited threads in a warp and returns a 32- or 64-bit unsigned integer (depending on the accelerator) whose Nth bit is set if and only if predicate evaluates to non-zero for the Nth thread of the warp and the Nth thread is active.

It follows the logic of __ballot(predicate) in CUDA before version 9.0 and HIP, the operation is applied for all active threads. The modern CUDA counterpart would be __ballot_sync(__activemask(), predicate). Return type is 64-bit to fit all platforms.

Note:

  • The programmer must ensure that all threads calling this function are executing the same line of code. In particular it is not portable to write if(a) {ballot} else {ballot}.
Template Parameters
TWarpThe warp implementation type.
Parameters
warpThe warp implementation.
predicateThe predicate value for current thread.
Returns
32-bit or 64-bit unsigned type depending on the accelerator.

Definition at line 164 of file Traits.hpp.

◆ getSize()

template<typename TWarp >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::getSize ( TWarp const &  warp) -> std::int32_t

Returns warp size.

Template Parameters
TWarpThe warp implementation type.
Parameters
warpThe warp implementation.

Definition at line 65 of file Traits.hpp.

◆ shfl()

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl ( TWarp const &  warp,
value,
std::int32_t  srcLane,
std::int32_t  width = 0 
)

Exchange data between threads within a warp.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
return values[width*(threadIdx.x/width) + srcLane%width];

However, it does not use shared memory.

Notes:

  • The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
  • Commonly used with width = warpsize (the default), (returns values[srcLane])
  • Width must be a power of 2.
Template Parameters
TWarpwarp implementation type
Parameters
warpwarp implementation
valuevalue to broadcast (only meaningful from threadIdx == srcLane)
srcLanesource lane sending value
widthnumber of threads receiving a single value
Returns
val from the thread index srcLane.

Definition at line 198 of file Traits.hpp.

◆ shfl_down()

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl_down ( TWarp const &  warp,
value,
std::uint32_t  offset,
std::int32_t  width = 0 
)

Exchange data between threads within a warp. It copies from a lane with higher ID relative to caller. The lane ID is calculated by adding delta to the caller’s lane ID.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
return (threadIdx.x % width + delta < width) ? values[threadIdx.x + delta] : values[threadIdx.x];

However, it does not use shared memory.

Notes:

  • The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
  • Commonly used with width = warpsize (the default), (returns values[threadIdx.x+delta] if threadIdx.x+delta < warpsize)
  • Width must be a power of 2.
Template Parameters
TWarpwarp implementation type
Tvalue type
Parameters
warpwarp implementation
valuevalue to broadcast
offsetcorresponds to the delta used to compute the lane ID
widthsize of the group participating in the shuffle operation
Returns
val from the thread index lane ID.

Definition at line 274 of file Traits.hpp.

◆ shfl_up()

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl_up ( TWarp const &  warp,
value,
std::uint32_t  offset,
std::int32_t  width = 0 
)

Exchange data between threads within a warp. It copies from a lane with lower ID relative to caller. The lane ID is calculated by subtracting delta from the caller’s lane ID.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
return (threadIdx.x % width >= delta) ? values[threadIdx.x - delta] : values[threadIdx.x];

However, it does not use shared memory.

Notes:

  • The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
  • Commonly used with width = warpsize (the default), (returns values[threadIdx.x - delta] if threadIdx.x >= delta)
  • Width must be a power of 2.
Template Parameters
TWarpwarp implementation type
Tvalue type
Parameters
warpwarp implementation
valuevalue to broadcast
offsetcorresponds to the delta used to compute the lane ID
widthsize of the group participating in the shuffle operation
Returns
val from the thread index lane ID.

Definition at line 236 of file Traits.hpp.

◆ shfl_xor()

template<typename TWarp , typename T >
ALPAKA_NO_HOST_ACC_WARNING ALPAKA_FN_ACC auto alpaka::warp::shfl_xor ( TWarp const &  warp,
value,
std::int32_t  mask,
std::int32_t  width = 0 
)

Exchange data between threads within a warp. It copies from a lane based on bitwise XOR of own lane ID. The lane ID is calculated by performing a bitwise XOR of the caller’s lane ID with mask.

Effectively executes:

__shared__ int32_t values[warpsize];
values[threadIdx.x] = value;
__syncthreads();
int lane = threadIdx.x ^ mask;
return values[lane / width > threadIdx.x / width ? threadIdx.x : lane];

However, it does not use shared memory.

Notes:

  • The programmer must ensure that all threads calling this function (and the srcLane) are executing the same line of code. In particular it is not portable to write if(a) {shfl} else {shfl}.
  • Commonly used with width = warpsize (the default), (returns values[threadIdx.x^mask])
  • Width must be a power of 2.
Template Parameters
TWarpwarp implementation type
Tvalue type
Parameters
warpwarp implementation
valuevalue to broadcast
maskcorresponds to the mask used to compute the lane ID
widthsize of the group participating in the shuffle operation
Returns
val from the thread index lane ID.

Definition at line 312 of file Traits.hpp.