I have two computers rendering scenes in Karma using Houdini's pdg.

I'm using a queue worker, a cpu and a gpu workers on each workstation so i can split tasks dedicated to cpu and gpu that way, while the queue worker keeps managing the queue between the two computers.
(see screenshot, i hope that's clear)




I send the tasks using the queue scheduler, and the tasks will have a rule if they need to use cpu or gpu worker.


Sometimes tasks fail with the following error on the workstation of which queue worker is not the one that received the main pdg job.

FranticX.Processes.ManagedProcessAbort: Failed RPC start_cook with error: PDGnet RPC send-get-reply failed. (error 268436688: MQ error #268436688)

Then, if it tries again IT MAY pick up the task and successfully render it. Or keep failing it.
It seems to fail ONLY when it has to deal with Karma tasks.

What could be the cause?
I can't replicate the issue when i want, as sometimes it works, sometimes it doesn't.