On this page | |
Since | 20.5 |
The synthetic data pipeline generates data adhering to the COCO (Common Objects in Context) format, a widely used standard for computer vision model training. This format includes several types of ground truth annotations: bounding boxes (bbox), segmentations (RLE and polygon), and keypoints. Its extensibility also allows us to enhance the base schema with additional signals made possible by synthetic data.
This PDG node serves as base template for dataset pipelines. It manages a dataset from content generation to annotation export. It sets up global PDG attributes that can be used for seeding content generation at various stages of a pipeline. It is designed to work with other Synthetics nodes like the Labs ML CV ROP Annotation Output SOP and the
Labs ML CV Synthetics Karma ROP.
Tip
It will take longer to cook this node the first you run it in a new directory, because it needs to set up the Python virtual environment. The Python virtual environment will be saved under $HIP/ml/labs/
by default. You can speed up the process next time by copying the existing /ml/labs/
folder and its contents to a new $HIP
directory.
Note
The default attributes are:
@variant
: The number of variants generated by the 'Variant Count' parameter.
@frame_index
: The frame number specified in the 'Frame Range' parameter, this repeats per variant.
@seed
: A random float generated for each frame; this is unique per frame/variant.
@v_seed
: A random float generated per variant but is consistent per frame.
@frame_name
: The string used for file output naming.
@ds_dir
: The path to the dataset render directory.
@ds_major
: The major version number of the dataset.
@ds_minor
: The minor version number of the dataset.
@res
: The resolution of the images in the dataset.
The original ML Computer Vision tools were developed by the Synthetic Data team at Endava PLC.
Parameters ¶
Cook Controls ¶
Generate Static Work Items
Generates static work items in all nodes in the TOP network. None of the work items will be cooked, and dynamic nodes will do nothing.
Dirty All
Dirties all the work items in all of the TOP nodes inside the network and marks every node inside the network as needing to recook.
Delete All File Outputs From Disk
Deletes all previous cook files from disk and then dirties all the work items in all of the TOP nodes inside the network.
Dataset Controls ¶
Render Directory
The location where the dataset will be rendered.
Delivery Directory
Specifies the processed dataset will be saved, this version will only contain the files essential for training.
Debug Dataset and Skip Delivery
Skips dataset validation, does not create delivery directory, and does not save hip file backups to disk.
Annotations ROP
Path to the ML CV ROP Annotation Output node in SOP context.
Synth Render ROP
Path to the ML CV Synthetics Karma ROP LOP configured for the RGBA pass.
Synth GT Render ROP
Path to the ML CV Synthetics Karma ROP LOP configured for the GT pass.
FiftyOne ¶
View Data Set on Complete
Opens the current dataset in an instance of Fifty One once it has completed rendering.
View Current Dataset
Opens the dataset version defined in the “Dataset Version” parameter in an instance of Fifty One.
View Other Dataset
Opens a file browser to select the root directory of another COCO dataset for viewing. Once selected, the dataset will open in a Fifty One instance.
Dive Targets ¶
Custom Per Variant
An internal subnet for additional top nodes to add variation per variant. Example: changing the light intensity per variant.
Custom Per Frame
An internal subnet for additional top nodes to add variation per frame.
Image Compositor
An internal copnet for additional control over the look of the image for more variation between images in the dataset.
Dataset Settings ¶
Skip JSON
Skip JSON output, useful for testing and debugging renders.
Dataset Version Major
The version of the data set.
Example: if dataset version major is 3 and dataset version minor is 2 the dataset version is ds3.2
Dataset Version Minor
The sub-version of the data set.
Example: if dataset version major is 3 and dataset version minor is 2 the dataset version is ds3.2
Variant ¶
Variant Count
Total number of scene variants to be generated for the dataset. Each variant will be associated with a @variant and @v_seed PDG attribute, which can be utilized to control parameters.
Render Variant Range
Enables the use of variant range.
Variant Range
Specifies the range of variants to render, useful when distributing renders across multiple computers.
Info ¶
Description
Optional input to provide a brief overview of the dataset, including its purpose, content, and any key features or highlights. Written out to the “info” array of the coco.json file
Contributor
Optional input for including the names, roles, affiliations, and contact information of individuals or organizations involved in creating the dataset. Written out to the “info” array of the coco.json file.
Notes
Optional input for dataset changelog information. Written out to the “info” array of the coco.json file.
Render Settings ¶
Skip Render
Toggle to skip Karma rendering, allowing re-cooking of the ROP Annotation output without re-rendering the dataset.
Start/End/Inc
Start and end frames of the sequence to be rendered for each variant. For non-temporal datasets, maintain the values at 1 for both the start and end frames.
Render Partial Sequence
Enables rendering of a specified percentage of a larger dataset. Useful for previewing a sample spread of variants to ensure desired distributions are achieved.
Percentage to Render
Enables rendering of a specified percentage of a larger dataset. Useful for previewing a sample spread of variants to ensure desired distributions are achieved.
Note: Value is 0-100% This is not an exact percentage, 1% of a 100-frame dataset may result in 0 or 2 frames.
Resolution
Resolution of the dataset’s images.
Visualize ¶
Visualize Distributions
Plots the synth attribute with matplotlib and visualize it in a new window to make sure the dataset distribution is good.
Filter by Category
Filters by category id from the COCO JSON.
Category Name
String name of the category that will be associated with the integer category ID.
Synth Attributes
The number of synthetic attributes.
Synth Attributes ¶
Synth Attribute
Name of the synth attribute to be exported to COCO JSON.
Validation Settings ¶
Skip Validation
Skips dataset validation.
Validate Keypoints
Enables keypoint validation ensuring all keypoints are correctly in the dataset.
Categories
The number of categories.
Annotation Property ¶
Annotation
COCO Annotation name to validate.
Image Compositing ¶
Use Shadow Matte
Enables the use of a Shadow Matte set up in the Synthetics Karma ROP
resulting in a shadow being composited to ground the object to its backplate.
Composite Background
Enables compositing each foreground image onto a random background image from a specified directory of backgrounds.
Background Images
If enabled takes a file pattern with filters for the backgrounds to randomly be composited.
Post Comp ¶
Enable Clipping
Enables clipping tab.
Enable Grain
Enables grain controls tab.
Fill Alpha
Enables a process that ensures the output frames have no transparent pixels.
Grade ¶
Brightness Min/Max
Defines the allowable range for the brightness factor to be randomly applied per image.
Brightness
The scale factor to control how bright or dim the layer is. Higher values increase the brightness. Lower values decrease the brightness. Defaults to an expression controlled by the brightness min/max.
Levels ¶
Input Levels
Adjusts black and white points to increase contrast.
Gamma
Adjusts midrange balance.
Output Levels
Remaps the result of Input Levels and Gamma to reduce contrast.
Blur ¶
Filter
Selects between box blur and Gaussian blur.
Read Pixels outside Image
Defines edge of frame behavior.
Units
Defines the units that the diameter of the blur is expressed in.
Blur Min/Max
Defines the allowable range for the amount of blur to be randomized per image.
Size
Defines the diameter of the blur. Defaults to an expression controlled by the blur min/max.
Scale Size
Disproportionately scales the blur on either axis.
Clipping ¶
Mask
Blends the modified image with the unclipped image.
Lower Limit
Clips the black point in the render.
Upper Limit
Clips the white point in the render.
Clamped Values
Specifies how clamped values should be handled.
Grain ¶
Grain Min/Max
Specifies the allowable range of grain amplitude to be randomized per image.
Grain Amplitude
Specifies the intensity of the grain. Defaults to an expression controlled by the Grain Min/Max.
Element Size
The size, in image coordinates, of the basic element of the grain.
Contrast
Used to make the noise appear more extreme without exceeding the 0 to 1 range.
Seed
Randomizes the grain pattern.
Advanced ¶
Base Frame Format
The base filename used to generate data, by default it is derived from the frame number and variant number. Changing this is likely to cause issues with the automated portions of the PDG graph.
Dataset Seed
Seed that controls variation in a dataset, this is driven by the dataset major version.
Operating System
The current operating system.
Virtaul Environment ¶
Environment Path
The path to the python virtual environment in which the internal training script of this node is run.
Examples ¶
Tip
When viewing in Houdini’s Help Browser, please copy the example file’s URL to a regular browser to proceed with the download.