16.5 Hairgen slow ??

   2447   24   4
User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
Hi, I've been very excited to try out the new feature for fur in 16.5, I really like the tool houdini offer for that, but the performance are horrible.

I don't understand if I'm doing something wrong : I setup a very simple mesh with a groom, I add some curve advection, cache the groom node to a file, and then proceed to up the density, and that where everything go absolutetly wrong : when I try to generate about 3.8Million furstrand, it take about 40 sec on a threadripper 1950X, and thing go even worst when I add a simple HairClump to the hairgen : 1m15

Please, tell me I'm doing something wrong, I'm used to Yeti which is also a procedural fur tool and can actually generate and display 4.1Million fur strand in less than 5 sec (for the same kind of graph than houdini), no to mention that the viewport keep a decent 30fps…

Houdini have awesome grooming tool compared to all other existing hairgen I tried, and I would really like to switch to fully using houdini, but the performance are just a wall, and if this is considered ‘normal’, then I just feel like I bought this software for nothing.

Tell me I'm doing something wrong, that I just have to check a magic button that will just make everything go fast, I really want to use houdini grooming tool !

Thanks for your support.
User Avatar
Member
3760 posts
Joined: June 2012
Offline
Hi, it's quite common to groom in other packages and simulate in Houdini.

Yes Houdini is that slow in generating curves and the viewport. You can turn off Display as ‘Subdivision curves’ and ‘Shade Open Curves’ to speed up the viewport.

In effect you are simply confirming that Yeti is an excellent package too, which makes it a perfect compliment to Houdini. A win win situation.
User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
Thanks for the answer.

SideFx put so much care into rebuilding their grooming workflow, and added insanely good tool, I just don't understand why the generation performance would be that low, the display in the viewport don't really matter as you usually groom using a prune option or lower density, but the final generation, by using houdini it would take as much time to generate my fur than to render it, will it ever become usable one day ?
User Avatar
Member
3760 posts
Joined: June 2012
Offline
SciMunk
, will it ever become usable one day ?

The best way to help Houdini become better is to send examples into Support that show how much faster something like Yeti is.
User Avatar
Member
18 posts
Joined: Feb. 2014
Offline
Hair generation is generally pretty quick, it's the Relax Iterations which cause the bottleneck. The Relax Iterations basically smooths out the root placements, so by default it's like three smooth iterations per strand, since smoothing operations are not so well suited for parallelism, it's quite expensive over millions of strands.

Disabling lighting is a HUGE help, as well as the optimizations aRtye pointed out.

To point it out, as my initial beliefs were incorrect, density(by default at least) does not equal fur strands. If you dive into the Hair Generate object and MMB the output node the primitive count is the actual hair strand count, if I'm not mistaken.

With my Titan Xp and viewport lighting disabled I can render 4,645,656 strands(8 points per strand) at a stable 60 fps.

Generation is like 20 secs(without relax iterations) on my i7 6850k, which should be about 5-10 seconds with your CPU…

My understanding of Houdini's fur tools come from some in depth studying of the underlying code I did earlier this year, not from an end user perspective, so you may want some salt.

Attachments:
simple_fur_test.hiplc (420.6 KB)

User Avatar
Member
83 posts
Joined: Aug. 2014
Offline
aRtye
Hi, it's quite common to groom in other packages and simulate in Houdini.

Yes Houdini is that slow in generating curves and the viewport. You can turn off Display as ‘Subdivision curves’ and ‘Shade Open Curves’ to speed up the viewport.

Actually it's just fine in generating and modifying curves, exactly in comparison to Softimage ICE. Of course, as long as SIMD-ish engine is left to do what it can do best, all that without escapades like viewport subdivision or shading of open curves. For small example hair system I was running in ICE, in last and most popular version, literally had no one built in loop created by me. That is, had no anything comparable to relaxing, mentioned in previous post - exactly that part was elegantly left to user, to relax the emitter mesh, and already strong classic app for such tasks.
What is not so good in Houdini, looking from point of view of building a custom hair system:

- mayanish modeling style, focused only to specific workflow, where tool suddenly changes its behavior by some suspicions criteria. That is, it was a move tool that suddenly refused to be tweak tool if point of another geometry was surrounded by hair curves (or something like that). In apps like Softimage or Max, interaction is a way more consistent, making possible to use the standard modeling tools for such exotic tasks, too.

- generally slow response of viewport actions, let's say tweaking the guide curve, having a lot of ‘'post process’' on top (much much slower than ICE against Softimage viewport interaction), also not enough fast Object Merge, as possible workaround for such interactions.

Anyway, as long as hair is mainly procedural, H is probably the best still developed tool. Well, for now there are two competitors in field, not developed anymore…

To make it clear, I've commented general SOP - VOP system, not Hair system in Houdini - don't know anything about last one, since version 13.
Edited by amm - Nov. 13, 2017 07:04:02
User Avatar
Member
59 posts
Joined: Feb. 2008
Offline
Thomas Bishop
it's the Relax Iterations which cause the bottleneck
Do understand this correctly? This is done once the fibers are planted, right? So as long as the fiber count doesn't change, grooming wouldn't have to touch this. Is this perhaps pointing to a progression that would enable a freezing of this value after density is established?
User Avatar
Member
18 posts
Joined: Feb. 2014
Offline
david_maas
This is done once the fibers are planted, right? So as long as the fiber count doesn't change, grooming wouldn't have to touch this.


Yes, that is the way it should behave, you definitely wouldn't want to scatter the hair roots more then once, the hair would be extremely jittery if it did that.
User Avatar
Member
18 posts
Joined: Feb. 2014
Offline
amm
What is not so good in Houdini, looking from point of view of building a custom hair system:

Houdini can be such a pain with how it handles tools, subtly changing states and being very picky about when it allows tweaking. SESI definitely seems to undervalue the tweak tool. I would love to see changes to Houdini handles tools, giving users control over how tools behave without resorting to C++ and quite a bit of hacking would be a very exciting change.

Maybe H17 will change how Tool OPerations are handled or maybe that's just wishful thinking…
User Avatar
Staff
441 posts
Joined: Aug. 2013
Offline
I'd appreciate any scene files for these cases - 40 seconds to generate 4 millions hairs is longer than expected on a 1950X.

It takes about 28 seconds on my 8-Core i7 5960x with 8 segments and max guide count of 10. It's true that scatter relaxation is relatively slow, it accounts for 8 of those 28 seconds.

It's not as fast as I'd like it to be. Part of it is due to the fact that everything is implemented as HDAs and mostly using VEX. That allows you to dive in and modify things or learn from the system, but comes with many little overheads and isn't as flexible as a full C++ implementation, where we can do smarter caching for example.

You could try to reduce Max Guide Count or guide segment count. You could also enable “Assume Uniform Segment Count” if all guides have the same number of segments, I found that this takes another 5 seconds off.

It's also true that you don't have to pay this cost all the time - hair isn't regenerated from scratch every time you edit something.

I can promise you that a future update (possibly next release) will make the system quite a bit faster.
Kai Stavginski
Senior Technical Director
SideFX
User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
Thanks you so much for your answer, I will try some of the tips you provided me KayStavginsky when I get back home, and try to provide a scene file if it still to slow.

I know relaxing fur have an huge impact on the generation, yeti can handle them very fast (2 relaxation iteration would add just 2-3 sec), there is also the clumping tool, which add 30 more second to the generation process (for a single iteration).

knowing that futur version of houdini will make it faster is a huge relief as I really want to make it my main grooming tool !

Would it be possible to participate to the beta when availlable ? I'm only doing 3D as a hobby and I'll be happy to test every case possible with fur!

Thanks again
User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
So, I tried some of your tips, I could notice a slight improve in speed, I get to about 16sec to generate a basic empty groom, and about 40 sec to generate a clumped fur.

I did a video to show the difference between houdini and yeti, I have setup the density of both engine to approximately have 4Million fur strand, and both software use about 2 relax step.

For the yeti video, I tried to match as much as possible the same setup as the houdini one.

For houdini, as soon as I put a clumping node, the generation increase by about 20-25sec.
For yeti, clumping require to have guide hair pre-generated that will attract the fur, but it only add about 1-2 sec to the generation process.

I then demonstrate a bit more of the yeti process, I can easily drop the density to 10% and have an almost instant feedback on my grooming, (less than 2sec), and switch back to a full density in still avery short amount of time.



PS : you can see the time yeti need to generate the fur at the bottom of the yeti graph, there is also few more second needed for the maya viewport to display them.



it seem that houdini in certain situation take double the time to generate the same fur from the scene (about 1m30), for exemple: when I open the scene I provided bellow.

If you need any more data from me, I'll be more than happy to provide as many test as possible if that can help make houdini better !
Edited by SciMunk - Nov. 14, 2017 15:36:10

Attachments:
furgen.hiplc (508.9 KB)

User Avatar
Member
3760 posts
Joined: June 2012
Offline
SciMunk
So, I tried some of your tips, I could notice a slight improve in speed, I get to about 16sec to generate a basic empty groom, and about 40 sec to generate a clumped fur.

Yeti doesn't actually look any or much better than Houdini with some further testing. Your Houdini fur clumping looks way off as Yeti only clumps guide grooms anyways.

I built a new scene with SOPs hair and for 4mil prims/33mil points it's only 22sec on Ryzen 7 1700.

Reducing down to 10% generation is also very fast at 0.6sec. Just use takes.

The viewport remains the lesser bit at ~6fps when zoomed out but up to 20fps zoomed in, on 1080ti.

See attached.

edit: fwiw more numbers - HairSOP2.hiplc took 23 sec on MacOS 2x X5680 @ 3.33GHz
Edited by fuos - Nov. 17, 2017 15:14:12

Attachments:
10 percent.png (1.5 MB)
4mil21sec.png (1.5 MB)
HairSOP2.hiplc (474.8 KB)

User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
Thanks for the test aRtye, A tried to generate the fur from your scene and I got surprised that it took 30sec, if a processor with double the number of core actually perform worst, I just don't know what to think…




Edited by SciMunk - Nov. 16, 2017 18:19:07
User Avatar
Member
3760 posts
Joined: June 2012
Offline
You can try launching Houdini with less threads too to see if threading is the issue.

Houdini -j8

will lock it to 8 threads. Ryzen 7 @ 3GHz can do over 5 million in 30secs, this is on Linux so it usually is approx 20% faster than Windows anyways.

Edit: New test only 16sec for 4mil when boosted to 3.6GHz
Edited by fuos - Nov. 16, 2017 19:05:14

Attachments:
3_6GHz.png (1.3 MB)
HairSOP4.hiplc (501.7 KB)

User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
Alright so, I tried several cooking test by varying the number of thread, and I got number that tell me … well, above 4 thread, the performance don't seem to go up at all


ps : second and third cook. morning mistake :s

also, how come defining the number of thread perform better than launching houdini without arg ???
Edited by SciMunk - Nov. 17, 2017 07:24:21
User Avatar
Staff
441 posts
Joined: Aug. 2013
Offline
I'm guessing the fact it took longer without args is just a coincidence? Did you run multiple times?

Did you run that first, maybe? Could be disk caching.

Just to throw in some more numbers - HairSOP2.hiplc took 16.75 seconds on my i7 5960x. It seems like we're having some trouble on threadripper - or maybe on windows. I'll see if I can find out more.
Kai Stavginski
Senior Technical Director
SideFX
User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
I could try install linux and try on that, assuming I could run the file on an apprentice version as I would run out of node-locked transfert for my Indie.

about the test I did, I restarted Houdini on a fresh scene every time I changed the thread count, so I could launch the performance monitor before opening the scene.

I then forced the recook of the fur by changing the density by ‘1’ few time per session, with every cook being faster (probably because of the cache)

I was also surprised for the no arg to be worst, I did the test multiple time and the gen time are mostly consistant.

PS : I ran the test for each thread count randomly, so there can't be any correlation between the previous and next attempt.
Edited by SciMunk - Nov. 17, 2017 15:53:51
User Avatar
Member
18 posts
Joined: Feb. 2014
Offline
HairSOP2.hiplc took 22.75 seconds with my i7 6850k on Windows.
User Avatar
Member
15 posts
Joined: Aug. 2017
Offline
Hello, so, I installed ubuntu on my workstation and tried to generate HairSOP2.hiplc

the result are … way better !!!
the very first cook took only 19sec for 4.170.000 fur strand, the second cook took just 15sec !

so, I believe there is some problem with windows ?
I would like to fully switch to linux, but I cannot do it yet, but that is really reassuring to see the problem can be solved.

Also, I'm currently running on 3x8gb 2400mhz cheap RAM, which is probably a big bottleneck for TR right now, I will receive the new flareX 3200mhz Quad channel next month, so I'm pretty sure I will get even better performance on linux, I'll do another test once I receive them !

now, if I can hope to get that good performance on windows, that will be sick !
  • Quick Links