pbowmar
Negligible difference.
Wondering if there are other variables?
The basic idea is to minimize the amount of data that has to be copied from OpenCL memory back to regular memory (even though on the CPU they are the same *type* of memory, they are stored in different formats: dense grids for OpenCL vs. tiled for regular CPU).
That means turning off caching since that copies the data for the entire sim, and not displaying the results of each frame in the viewport. So for interactive work the memory transfer overhead mostly negates the increased CPU performance, as you've found. The main performance gain shows up for offline sims where caching is off and you're just writing density and maybe vel to file with Save In Background on.
Even then, the type of sim makes a difference. A resizing Pyro sim with several complicated sources and collisions is going to spend enough generating those fields in SOPs and then transferring them to OpenCL that again you won't see as much of a speedup. A “pure” smoke sim on the other hand will show a big speedup. That's basically the difference in the two attached charts, one if a pure smoke sim, one is a Pyro sim with a source and collision object. Still faster in OpenCL, but not by as much.
All of the above holds for GPU as well, or even more so, since the processor gain is higher, but the memory transfer goes across the PCI/E bus so is even more expensive.