#[kernel]
kernel
only.Expand description
Provides the #[kernel]
attribute macro. When applied to a
function, it compiles it as a CUDA kernel that can be safely called from
Rust code on the host.
The annotated function must be public, not const, not async, not have an
explicit ABI, not be variadic, not have a receiver (e.g. &self
), and
return the unit type ()
. At the moment, the kernel function must also
not use a where clause – use type generic bounds instead.
While the #[kernel]
attribute supports functions with any
number of arguments, rust_cuda::kernel::TypedPtxKernel
only supports
launching kernels with up to 12 parameters at the moment.
The #[kernel]
attribute uses the following syntax:
#[kernel(pub? use link! for impl)]
fn my_kernel(/* parameters */) {
/* kernel code */
}
where link
is the name of a macro that will be generated to manually link
specific monomorphised instantiations of the (optionally generic) kernel
function, and the optional pub
controls whether this macro is public or
private.
Note that all kernel parameters must implement the sealed
rust_cuda::kernel::CudaKernelParameter
trait.
To use a specific monomorphised instantiation of the kernel, the generated
link!
macro must be invoked with the following syntax:
struct KernelPtx;
link! { impl my_kernel for KernelPtx }
for the non-generic kernel function my_kernel
and a non-generic marker
type KernelPtx
, which can be used as the generic Kernel
type parameter
for rust_cuda::kernel::TypedPtxKernel
to instantiate and launch the
kernel. Specifically, the rust_cuda::kernel::CompiledKernelPtx
trait is
implemented for the KernelPtx
type.
If the kernel function is generic, the following syntax is used instead:
#[kernel(pub? use link! for impl)]
fn my_kernel<'a, A, B: Bounded, const N: usize>(/* parameters */) {
/* kernel code */
}
struct KernelPtx<'a, A, B: Bounded, const N: usize>(/* ... */);
link! { impl my_kernel<'a, u32, MyStruct, 42> for KernelPtx }
link! { impl my_kernel<'a, bool, MyOtherStruct, 24> for KernelPtx }
If the kernel generic space is closed, the link!
macro can be made
private and all instantiations must be requested in the same crate that
defines the kernel function. If downstream code should be allowed to use
and compile new specific monomorphised instantiations of the kernel, the
link!
macro should be publicly exported. Then, downstream code can define
its own MyKernelPtx
marker types for which the kernel is linked and which
can be passed to rust_cuda::kernel::CompiledKernelPtx
-generic code in
the kernel-defining crate to construct the requested
rust_cuda::kernel::TypedPtxKernel
.
Inside the scope of the #[kernel]
attribute, a helper
#[kernel(...)]
attribute can be applied to the kernel function:
#[kernel(crate = "<crate-path>")]
changes the path to therust-cuda
crate that the kernel compilation uses, which by default isrust_cuda
.#[kernel(allow/warn/deny/forbid(<lint>))]
checks the specified CUDA-specific lint for each kernel compilation, using default Rust semantics for allowing, warning on, denying, or forbidding a lint. The following lints are supported:ptx::double_precision_use
: check for any uses off64
operations inside the compiled PTX binary, as they are often significantly less performant on NVIDIA GPUs thanf32
operations. By default,#[kernel(warn(ptx::double_precision_use))]
is set.ptx::local_memory_use
: check for any usage of local memory, which may slow down kernel execution. By default,#[kernel(warn(ptx::local_memory_use))]
is set.ptx::register_spills
: check for any spills of registers to local memory. While using less registers can allow more kernels to be run in parallel, register spills may also point to missed optimisations. By default,#[kernel(warn(ptx::register_spills))]
is set.ptx::dynamic_stack_size
: check if the PTX compiler is unable to statically determine the size of the required kernel function stack. When the static stack size is known, the compiler may be able to keep it entirely within the fast register file. However, when the stack size is dynamic, more costly memory load and store operations are needed. By default,#[kernel(warn(ptx::dynamic_stack_size))]
is set.ptx::verbose
: utility lint to output verbose PTX compiler messages as warnings (warn
) or errors (deny
orforbid
) or to not output them (allow
). By default,#[kernel(allow(ptx::verbose))]
is set.ptx::dump_assembly
: utility lint to output the compiled PTX assembly code as a warning (warn
) or an error (deny
orforbid
) or to not output it (allow
). By default,#[kernel(allow(ptx::dump_assembly))]
is set.