I have noticed that when I set up a VEX wrangle to run over primitives, it only executes on one thread.
I'm currently running this code as a test, and just cranking up the count channel. Flipping between primitive and point, while watching my processors run in htop shows that in point mode it runs on all threads, while in primitive mode it only runs on one.
It's multi threaded for all run over modes, other than ‘detail.’ However, there is a minimum workorder size before it will ask for another thread. I believe it is around 1024 elements required per each additional thread.