Hi Brian,
I have a similar setup with 2x4090(identical cards) and AMD 64 core 7980x. Both 4090 are on x16 slots. Rendering GPU only(embree device disabled via env. var). I'm also seeing reduced performance on one of the cards.
It manifest gradually as scenes are getting more complex. Simple scenes will render 50:50 or 49:51, render almost same number of passes. A bit more complex scenes will render 45:55, and really complex scenes will render 35:65. One card can do 100 passes and other just 60.
I looked into the issue fair bit, but could not find why. It`s not OS W11 ralated, same issue on Linux. Same on different drivers. It`s not PCIE slot/mobo/hardware level issue, because rendering on each card separatly(with optix device env. var), will result is same rendertime/passes. This issue only happens when both of them are rendering. Thermals are not and issue, both cards are watercooled hovering around 70c under load. GPU utilisation is not and issue, when i check with gpuZ, both cards are being fully utilized. It`s a mistery to me, how a 2 cards both being utilized at 100% can produce 100 and 60 passes. Is it possible that Karma stats reporting is broken? But why would that vary with scenes? too many questionn...

This performance loss seems to be specific to 20.5.x version of Karma XPU. I can render a heavy scene on 20.0.653 with 48:52 utilisation, and same file with 20.5.307 with 32:68 utilisation.
Is there anything else worth trying that will help us locate what`s causing this?
Thanks!