The OpenCL SOP provides a general interface to create and run OpenCL kernels on geometry. It allows binding of constants, attributes, and volume data to OpenCL parameters in the kernel.
The name of the OpenCL kernel to execute with the loaded program.
Use Code Snippet
Use the code provided in the Kernel Code parameter rather than an external disk file. This makes for quicker editing and creation of OpenCL SOPs.
The path to OpenCL program file to compile. This can include a path to an on disk file or asset.
The OpenCL kernel to execute when in Code Snippet mode.
Specify any desired compile flags for the kernel. The most common is to use -D to provide #define directives for the pre-processor.
The Apple OSX OpenCL compiler requires only a single space between kernel options!
Houdini defines additional flags while compiling kernels depending on the
OpenCL device. The flags
__H_CPU__ distinguish between GPU and
CPU devices, and
signify the hardware vendor. You can set the environment variable
HOUDINI_OCL_REPORT_BUILD_LOGS to 1 before running Houdini to get a
dump of all kernels compiled along with their preprocessor flags.
Specify a detail string attribute to be added as a compile flag to the kernel. This will take the detail attribute in the input geometry of this name. If it is a string, it will be injected as a kernel options. The string should have the -D options if specifying a define, for example.
The value of the string should not change frequently or the kernel may keep recompiling, which can be very expensive.
Use Write Back Kernel
After the kernel is executed, a second kernel may be immediately executed with the same set of parameters bound to it. You can avoid race conditions where multiple threads want to write to the same data by breaking it into a two-pass operation.
Write Back Kernel Name
The name of the kernel to use in the write back operation. It will have the same function signature as the main operation.
Creates a prototype for the required kernel function taking all of your current selected parameters into account. This can be used as a starting point or to update your interface when new parameters are added or removed.
When loading kernels from disk the kernel is cached to avoid regenerating it every solve. Turning this on forces the re-loading and recompiling of the kernel. This is useful if #include files refer to code that has changed, or the kernel file is changed in an external text editor.
It should always be disabled when prototyping is complete.
The provided OpenCL kernel is invoked once. The number of global ids, however, is controlled by this setting. First Writeable attribute sets it to the size of the first bound attribute that is marked writeable. First Writeable Volume likewise sets it to the total voxels of the first bound volume that is marked writeable.
In workset mode, detail integer array attributes are used to determine the number of worksets and the sizes of them.
The global ids will be rounded up to ensure efficient processing on the GPU, so you should always compare the
get_global_id(0) with the actual length of the bound attribute.
The kernel can be re-executed a variable number of times. This avoids having to use more nodes to create a for loop, and ensures all data remains on the video card during the successive evaluations.
Worksets Begin Attr.
Detail integer array attribute specifying the start value of each workset.
Worksets Length Attr.
Detail integer array attribute specifying the length of each workset. The kernel will be invoked for each of these sizes. Zero sizes will be skipped.
Use Single Workgroup if Possible
When running over Worksets on a GPU, it can be faster to execute many
small worksets on the GPU within one kernel call, performing synchronization
within the kernel after each workset, rather than executing a kernel for each
separate workset. When this option is enabled, if the largest workset will fit
within one workgroup on the OpenCL GPU device, the
flag will be defined, and the entire Worksets Begin and Worksets Length
arrays will be passed to the kernel. It is up to the kernel to synchronize
at the end of each workset, usually using
code generated by the Generate Kernel button shows one way of handling
When Finish Kernels is disabled, no attempt is to wait for the OpenCL kernels to complete before continuing the next solver. This lets them run in the background until their results are actually needed. To simplify debugging, it is useful to ensure kernels are finished to make sure errors are detected in the right spot.
Include the current time as a parameter.
Include the current timestep as a parameter.
A constant multiplier on the timestep.
For some operations you may wish to know the power of the timestep.
Rather than recomputing in the kernel, you can set this to
and have the exponentiation pre-computed.
Include Simplex Noise Data
Include an opaque pointer that can be passed to the simplex noise functions in <xnoise.h> to generate simplex noise and curl noise from OpenCL kernels.
Controls the precision of this node. The
types will be defined in the generated code to correspond with
this specified precision. The vector variants will also be defined,
fpreal4, etc. Additionally the
is defined as 16 for half, 32 for float, or 64 for double.
Auto will use the preferred precision of the incoming geometry, as set by the Attribute Cast SOP.
16-bit cannot be used for computation in most drivers.
Each parameter can either be a fixed constant value, evaluated before kernel invocation, or read/write from a volume or geometry attribute.
The name of the parameter. This is used in the
button, but is otherwise only present as a comment. The actual
binding to an OpenCL kernel is done by parameter order, not
by the name.
The type of parameter to create and bind.
A constant integer value, allowing you to bind channel references and expressions that are pre-computed.
A constant float value. Optionally you can scale it by the timestep.
A constant tuple of four floats, binding to a
float4 OpenCL parameter.
A scalar ramp. Because evaluating a spline-based ramp inside of an OpenCL kernel is complex, the ramp is instead sampled into a uniform
array of floats. The
Ramp Size parameter controls the number of samples used.
Bind a geometry attribute.
Bind a volume.
Bind a VDB.
Integer value to use integer mode.
Float value to use float mode.
Float 4 value to use float 4 mode.
How to scale the provided float value by the timestep. Because timeinc may not be known at time of parameter evaluation, it can be computed as a constant prior to evaluating the kernel and applied to the float value.
The ramp data to provide as a list of float values.
The number of floating point values to evaluate the ramp in.
The name or number of the volume or VDB primitive to bind.
To simplify kernels one may often assume all volumes are aligned in resolution and transform. If Force Alignment is set, this is enforced and volumes that are misaligned generate errors.
Add the resolution of the volume as a parameter.
Add the size of the volume as a parameter, in SOP space.
Volume Transform to World
Add a matrix transform that converts from the volume’s voxel coordinates to the SOP coordinates.
Volume Transform to Voxel
Add a matrix transform that converts from SOP coordinates to the volume’s voxel coordinates.
Which attribute to bind. It is an error if it is missing, unless the optional flag is set.
Present for Attributes.
The type of the attribute. Since the first writeable attribute can determine the iteration order, this can determine the number of global ids processed by the OpenCL solver.
Not all bound attributes need to be the same type, or even come from the same geometry data.
Present for Attributes.
What sort of attribute to bind. Float and integer attributes are bound as single arrays containing all element values in order. Tuples are
P will be bound as
Array attributes are bound as two arrays. One array contains the offsets of each element’s array data. Thus, the difference of a pair of offsets provides the elements array length. The second array is the data of all elements' arrays concatenated into a single array.
Present for Attributes.
Tuple size of the attribute to bind. If greater than zero, the attribute must be able to provide this tuple size. If zero, it will bind automatically and an extra parameter will be generated storing the tuplesize.
Present for Attributes.
Controls the precision the data of this parameter is bound with.
The Node option wil use the node’s precision, so will vary depending
on its setting and the corresponding kernel code should use the
This is the precision the data is stored on the video card so using
lower precision can save GPU memory. But note that 16-bit, which
half, often cannot be used for computation. The
vload_half can be used to promote it to
float for computation.
If the same attribute ends up bound with different precisions it will fail the binding.
Currently volumes only bind with 32bit data precision.
Determines if the OpenCL kernel will read from this attribute or volume. If not set, the attributes values will not be copied onto the GPU. This is useful for write-only attributes as it avoids an unnecessary copy, but requires care as uninitialized data will be present.
Determines if the OpenCL kernel will write back to this attribute or volume. Causes the CPU version of the attribute or volume to be marked out of date so the next time it is needed it will be copied back from the GPU.
Marks the attribute or volume as not necessary. If the attribute or volume isn’t present in the geometry, rather than erroring, a #define is set in the kernel options to disable the attribute. Note that this also changes the parameter signature, so the Generate Code button should be used to verify the syntax.
The parameter name is used in the
#define, so changing the parameter name requires changing the code.