RFE: Houd license reservation/ticket scheme (continued)?

   8328   11   1
User Avatar
Staff
1448 posts
Joined: July 2005
Offline
continued from:
http://www.sidefx.com/index.php?option=com_forum&Itemid=172&page=viewtopic&t=9672&highlight= [sidefx.com]

Error codes would be good, but that's not the point - i want to
avoid all this, but it requires SideFx to participate.
but how would you avoid this if the ticketing system issues ticket 0 (when there is no more licenses). Is it not equivalent of obtaining a return code that indicates there is no licenses? You still need to retry to obtain ticket (instead of retrying to start a render)

Perhaps we are not envisioning the same mechanics of the rendering supervision system. What I was thinking about is that a render node starts a render and if it failed to obtain a license, it would inform the dispatcher about the license failure (much like obtaining a zero ticket). Then the dispatcher would wait a second or two and retry the same (or the next) job (again, like waiting a second and polling the server for a ticket).

The only difference between the return code and ticketing systems is that the dispatcher obtains the licensing availability information from the render node rather than directly from the license server.

Perhaps requeing or resending the job consumes bandwith (or other resources), in which case the render node could store the render data for the failed job and the dispatcher would only need to send a signal to “go ahead and try again” (or “abort altogether”, if need be).
User Avatar
Member
387 posts
Joined: July 2005
Offline
Executive Summary: Sorry Sir, That Table is Gone.

I wish this RFE & discussion was still on-line. This issue has come up again, now at Dr D.
Is it not equivalent of obtaining a return code that indicates there is no licenses?
They're not equivalent, because there is a “Race Condition”, solved by a reservation. The problem occurs when there's a return code saying there is a license available. You can't trust this, because if you then go start a job someone else might take the license while the job is being dispatched!
Perhaps we are not envisioning the same mechanics of the rendering supervision system.
Indeed. It's no good to think there is a license available, dispatch a job, and then have it fail because the license was taken by a user on a workstation. That's a waste of time and resources. In the current scenario, we don't see the failure until the end of the job, the next day, or until a Wrangler spots it.
… the render node could store the render data for the failed job
Oh no thanks. Leaving the job dispatched and have the render-blade then poll/plead for a license just makes the race condition worse. The farm will become locked up, idle, waiting and hammering the License Server.

I think the original post outlined the protocol request (in 2007). Essentially:

1. The Job Dispatcher asks for a license type, and specifies a time-out (say 5 minutes)
2. If there is a license available, the Houdini License server replies yes, issues a ticket-number, and holds the license as “reserved” for that time-out period.
3. The job dispatcher then sends the job to the render-blade with the ticket number
4. The render-blade starts the job, and takes the license by sending the ticket number to the License Server.

It's like reserving a table at a restaurant. I don't want to phone up a place asking if there's a table free, only to drive downtown and find out someone else took it.

cheers,
ben.
''You're always doing this: reducing it to science. Why can't it be real?'' – Jackie Tyler
User Avatar
Staff
1448 posts
Joined: July 2005
Offline
There is already an RFE (32843) for this ticket mechanism.

But the reality is that the sesinetd and hserver licensing code is quite sensitive and this RFE would require changes to the fundamental data structures, etc, and, thus, given the importance of licensing reliability, is high risk. Plus there is also an issue of synchronizing tickets among the redundant (distributed) license servers, which adds even more complexity. So, it may be not easy or quick to implement that RFE.
User Avatar
Member
1390 posts
Joined: July 2005
Offline
rafal
There is already an RFE (32843) for this ticket mechanism.

But the reality is that the sesinetd and hserver licensing code is quite sensitive and this RFE would require changes to the fundamental data structures, etc, and, thus, given the importance of licensing reliability, is high risk. Plus there is also an issue of synchronizing tickets among the redundant (distributed) license servers, which adds even more complexity. So, it may be not easy or quick to implement that RFE.

That's too bad, but let me add my vote for the ticketing system. We have constant problems with SGE failing down jobs because of licensing issues, like mplay left on a workstation which eats Houdini Master license or situations mentioned by ben, when render dispatcher gives hscript a green light meanwhile license is eaten by a users, who wished to fire up hython locally for any purpose (and which from time to time eats additional license besides of gui one).

Moreover the usage of houdini command line toolset is limited, or rather avoided, because you can't be sure it won't hurt your render farm, eating license in a least desired moment.
User Avatar
Member
4140 posts
Joined: July 2005
Offline
I thought mplay only checks to see if a valid license exists, but never actually eats a token? That's been my experience. Are you saying you have seen a bug scenario where mplay demands a token?

I haven't seen hython snag an additional license if Master is running, but then we use hscript options to set a workstation as GUI only, so hython only grabs a GUI token. Is that what's different with your setup?

Cheers,

J.C.
John Coldrick
User Avatar
Staff
1448 posts
Joined: July 2005
Offline
Yeah, I know.. licensing could be improved, and there are a lot of good RFEs waiting to be implemented.

In the meantime, you may need to create own ticketing workaround. I assume it is only a problem with MASTER, ESCAPE, or BATCH licenses (since RENDER are free and you can get as many as you need, never to run out of).

One way would be for the dispatcher to start a dummy idle hbatch application on a render blade, if it succeeds, it can send the render job there, since that blade already holds the license token and is guaranteed to start the render job (since BATCH is per-machine). Sending a render job starts a new hbatch and quits (or kills) the dummy one.

Another way is to have wrappers around hmaster and hbatch and have own control daemon manage the wrapper startup (given that the daemon can get the license count with ‘sesictrl -s’).
User Avatar
Member
1390 posts
Joined: July 2005
Offline
JColdrick
I thought mplay only checks to see if a valid license exists, but never actually eats a token? That's been my experience. Are you saying you have seen a bug scenario where mplay demands a token?

It doesn't take one, but it seems to keep it after you close Houdini itself. And indeed it looks like a bug. I saw this today morning. One of our master license was gone, hkey pointed me to a workstation left after long night. The only item shown by “ps aux | grep houdini” was mplay-bin. A voila!

I haven't seen hython snag an additional license if Master is running, but then we use hscript options to set a workstation as GUI only, so hython only grabs a GUI token. Is that what's different with your setup?
.

Again, apologize, as I can't give you a reproductable case, but I swear I see it constantly and it makes us trouble in exactly the case we discuss here. Yes, I know it shouldn't be like that, and even now I can fire up Master and hython and use only one license as expected.

I can imagine though a reason for that, for example, if your GUI session crashes, hscript does its job on per frame basis (restarts after every frame), requesting non-gui license, then you start your session again, and bumm, you request second license. This is just a theory, of course, and it wouldn't be a bug, but our lame. Nevertheless most of our folks tent to keep two or three sessions opened plus some hscript processes and quite often we see logs of missing frames due to lack of licenses that just should be there.

The point is all such cases are hard to nail down and solve without a tools from hserver side. As I remember SGE has a plugin for flexlm software to keep track of licenses, but I'm not sure how well it operates.


skk.

ps you say whenever you run hscript on a workstation you set it to use gui license? Yes, this would be a solution for some of our headache. thank you!
User Avatar
Member
4256 posts
Joined: July 2005
Offline
rafal wisely
Another way is to have wrappers around hmaster and hbatch and have own control daemon manage the wrapper startup (given that the daemon can get the license count with ‘sesictrl -s’).

I agree with this approach.

FWIW, we used to have these kind of issues quite a lot, and not just with Houdini. In the end we wrote our own daemon manager which acts as a layer between the license servers and the apps. So when Houdini, Maya, Nuke, etc are launched its actually starting the app through our queue manager. That way we can see who or what is waiting for resources. Plus using a licensing middle manager for this you can give certain tasks/artists different license priorities.

TL;DR, I see this as more of a studio pipeline solution than a SESI thing. Because, no offense to SESI, no matter what they do, it won't be exactly what you want.
if(coffees<2,round(float),float)
User Avatar
Member
4140 posts
Joined: July 2005
Offline
SYmek
ps you say whenever you run hscript on a workstation you set it to use gui license? Yes, this would be a solution for some of our headache. thank you!

Yeppers, check this out:

http://www.sidefx.com/index.php?option=com_content&task=view&id=702&Itemid=9 [sidefx.com]

You'll want the hserver options. We have a standard location on all nodes that we search for hserver.opt files. We can make sure no GUI licenses try to run on render nodes and vice versa. There's some other useful settings in there too.

Cheers,

J.C.
John Coldrick
User Avatar
Member
387 posts
Joined: July 2005
Offline
@rafal - there's already an RFE? OK thanks!

@JC yes, sometimes users just start hscript or hython in a shell, plus we have written (HDK) bgeo conversion tools which take a license. We're looking at open source ways to handle bgeo, as a partial work-around.

imho the work-arounds with wrappers and “test jobs” aren't production solutions.
1) There are over 200 binaries in $HFS/bin - would we wrap them all, for each release?
2) user workstations are on the farm, so hserver can't be restricted to gui or non-gui.

I'd imagine tickets would be new and optional - even for redundant servers. ie: Sesinetd could still issue licenses without tickets. What's new is a license can be issued to a ticket instead of a hostname. Then, a renderhost asks to swap the ticket for its hostname. Oh, the other new thing is licenses with old/untaken tickets get released.
''You're always doing this: reducing it to science. Why can't it be real?'' – Jackie Tyler
User Avatar
Member
4256 posts
Joined: July 2005
Offline
ben simons
imho the work-arounds with wrappers and “test jobs” aren't production solutions.
1) There are over 200 binaries in $HFS/bin - would we wrap them all, for each release?

Awww. Wrappers are great production solutions. At least SESI seems think so.

~ $ cat $HFS/bin/houdini
#!/bin/bash

# Initialize application environment.
APP_DIR=$(dirname $0)
source ${APP_DIR}/app_init.sh ${APP_DIR}

# Now run the requested binary.
exec $0-bin “$@”



Instead of writing wrappers for everything, a launcher app could be written like, drd_launch hbatch. Depending on if a license was available it would either put you in the queue or launch whatever app your passed as an argument.

Long story short, SESI has stated that this isn't going to happen soon, if you need something now wrapper scripts or app launchers are just a few possible solutions.
if(coffees<2,round(float),float)
User Avatar
Staff
1448 posts
Joined: July 2005
Offline
Yeppers, check this out:
and this too http://www.sidefx.com/index.php?option=com_content&task=view&id=1212&Itemid=273 [sidefx.com]
  • Quick Links