Houdini on Windows 10 with nvidia -- driver crashes?

   10806   15   2
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
EDIT: Appears to be fixed with the latest 361.75 drivers!

So after upgrading to Windows 10 I have been getting regular driver crashes on complex processing, unless I revert to fairly old drivers. It is a kernel level driver crash from the OpenGL driver.

I don't even think this is a Side Effects issue so much as a Windows 10 / nvidia issue, but I hope that by bringing this to your attention you might be able to work with nividia to get the issue resolved. The same thing happened with substance designer as seen here, and Allegorithmic was able to get nvidia to release drivers that fixed the issue with their software: https://forum.allegorithmic.com/index.php?topic=6160.0 [forum.allegorithmic.com]

The only driver version I can get to not crash is 347.88, which isn't even a proper Windows 10 driver, it's for Windows 8.1.

I believe it has something to do with the new WDDM 2.0 driver platform that was introduced with Windows 10, that nvidia started using with the 35x driver series.
Edited by - Jan. 30, 2016 14:42:03
User Avatar
Member
22 posts
Joined: Aug. 2014
Offline
Ever since I upgraded to Win10 on my ASUS ROG with 860M GTX graphics I get Nvidia driver crashes that I've never seen in Win7 or Win8. Though not always, I often get them (sometimes multiple instances) if I close the laptop with Houdini still running.

Nvidia stability seems to have slipped recently with both Win7 and Win10. A system I used to have as my main desktop has a 4K monitor and smaller monitor on it now. I had zero issues when I was running two 1920 monitors but now it will crash H15 if the monitors go into power saving mode.
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
I'd be willing to provide SES with a hip file that reliably causes the driver crashes for me. I'm an Indie user so I can't file it on the RFE server. Just let me know where I can send it…would love to be able to go back to using the latest drivers!!
User Avatar
Staff
4438 posts
Joined: July 2005
Offline
I have an ASUS ROG with a 660M card. I also experienced video driver crashes when I first upgraded to Windows 10, but it was just a bad driver version that installed with Win10. I am currently running 355.60, and having no issues whatsoever. So please give that version a try before submitting a bug. If it still crashes, you can post a hip file here (assuming you don't mind sharing it publicly).

Mark
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
Mark,

Yea the nvidia drivers that the Windows Updater installs have had problems. I was aware of that, and have been using the tool that Microsoft provided (https://support.microsoft.com/en-us/kb/3073930) [support.microsoft.com] to “hide” those updates and prevent the Windows Update from overwriting my currently installed drivers.

I've tried 355.60 and many other drivers, doing a clean install each time. They all crash in the same way. I've even gone back and done a fresh installation of Windows 10, but to no effect. 347.88 seems to be the latest that doesn't crash. Again, this exact situation happened with Substance Painter (which is also OpenGL based) as I linked above. 347.88 worked fine, newer drivers would crash with TDR timeouts. Allegorithmic got in contact with nvidia and new drivers were released within a month that fixed the issue. Maybe some programming changes needed to be done in the software as well but they seemed to indicate it was entirely an nvidia driver issue.

I'll see about getting a file to reproduce the issue on here within the next few days.
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
Ok, here's a zip with a Houdini project that reliably causes the kernel exception crash on my nvidia driver. Running Houdini Indie 15.0.360 with the latest nvidia drivers (361.43) on a GTX 670. The crash happens after the node chain finishes computing, but before the output is displayed in the viewport. Note that the Update mode was set to manual.

You may recognize it, it's from the Side Effects Terrain Generation tutorial

Attachments:
Houdini_Windows_10_nvidia_crash.zip (3.9 MB)

User Avatar
Staff
4438 posts
Joined: July 2005
Offline
That's a pretty heavy file… Good thing I got a new machine today because my 16GB laptop runs out of memory trying to cook the scene. On my new machine it maxes out around 45GB. Is it always big scenes that lead to crashes or is there a smaller setup that can show the issue?

Are you getting the full machine crash (blue screen of death)? Or is it just Houdini crashing? Are you aware of the driver “hang detection” on Windows, where if the driver takes more than ~2 seconds to complete an operation, the system assumes the driver is deadlocked and crashes the process responsible? There is a registry setting that lets you extend that timeout, if you think that might be part of the problem. If you're getting full system/kernel crashes this is not the problem…

Also if you have any crash logs generated by Houdini you could post here, that may help too.

Thanks,
Mark
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
mtucker
That's a pretty heavy file… Good thing I got a new machine today because my 16GB laptop runs out of memory trying to cook the scene. On my new machine it maxes out around 45GB. Is it always big scenes that lead to crashes or is there a smaller setup that can show the issue?

Are you getting the full machine crash (blue screen of death)? Or is it just Houdini crashing? Are you aware of the driver “hang detection” on Windows, where if the driver takes more than ~2 seconds to complete an operation, the system assumes the driver is deadlocked and crashes the process responsible? There is a registry setting that lets you extend that timeout, if you think that might be part of the problem. If you're getting full system/kernel crashes this is not the problem…

Also if you have any crash logs generated by Houdini you could post here, that may help too.

Thanks,
Mark

Could you clarify whether you were able to reproduce the crash on your new system?

Indeed, it's fairly heavy but I have 350GB of page file on a super fast m.2 SSD so it's no problem for me . With the 347.88 drivers I've rendered landscapes with hundreds of millions of polygons and 200GB+ of memory used no problem.


What seems to trigger it is the amount of time it takes between the end of computing the node chain and displaying it to the viewport. So, typically that would mean a lot of polygons but I'm sure there are other ways the viewport rendering could be slowed down. It was the same story in Substance Painter. If you made a complex layer change at high resolution, it takes a while for it to get updated in the viewport and would cause a driver crash-and-recover.

It's not a blue-screen-of-death, it's an nvidia kernel level driver exception that it's able to recover from, but causes houdini to crash. I am aware of the hang detection, it's the TDR timeout I mentioned before. That was discussed on the substance thread, and it's really only a band-aid, not a solution. I've tried messing with the TDR registry settings extensively and they didn't seem to help much. I think the way TDRs are handled was changed fairly significantly with the new WDDM 2.0 driver architecture introduced with Windows 10.
User Avatar
Staff
5158 posts
Joined: July 2005
Offline
We already do attempt to avoid TDRs by dicing large geometry up into smaller pieces. The ultra-high res mesh of ~30M polygons is divided into 30 smaller meshes of ~1M apiece to avoid doing a GL draw call which might take too long. As long as any individual draw or transfer is <2s, the OS should continue to allow the driver to operate even if the total draw time scene exceeds 2s.

However, even on my 32GB workstation this scene consumes a huge amount of memory, enough to cause it to start paging to disk. I suspect that this is causing a lot of the memory pages used by the Nvidia driver to also be paged out, and when the Nvidia driver goes to fetch this memory and set it to the GPU, the fetch + transfer is taking too long and the driver is reset. I don't think there is much we can do in this case.

The high-res mesh (switch SOP = 1) works within the 32GB of my workstation, but the ultra high res mesh does not (goes off into swap land and never returns). I suspect the solution to this problem is to install more system RAM.
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
I was able to render it fine, even on my older, much slower 120gb ssd hosting the page file. Using driver 347.88.

Now I have a Samsung 512gb m.2 NVMe SSD with an insane 2.5 GB/s read speed and 1.5 GB/s write speed at ultra low-latencies (<2 ms). It's random IOPS are equally impressive. It's about as close as you'd be able to get to having everything in DRAM memory. It still doesn't work with the newer drivers, only 347.88.

Like I said, I've gone up to over 200GB of pagefile use on other projects on driver 347.88 no problem.
Edited by - Jan. 22, 2016 12:45:35
User Avatar
Staff
5158 posts
Joined: July 2005
Offline
Mike K
Like I said, I've gone up to over 200GB of pagefile use on other projects on driver 347.88 no problem.

Yep, and this clearly points to a regression with TDRs in the Nvidia driver. Hard for us to do much about that on our side, other that submit an issue to them.
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
Excellent, that's all I was hoping for. I'm sure you guys would have a lot more influence contacting nvidia to get this resolved than I ever could.
User Avatar
Member
40 posts
Joined: Feb. 2015
Offline
Reading over the release notes for the new nvidia 361.75 drivers (http://us.download.nvidia.com/Windows/361.75/361.75-win10-win8-win7-winvista-desktop-release-notes.pdf [us.download.nvidia.com]), I saw this line under Windows 10 fixes:

"Adobe Illustrator CC 2015 has TDR and subsequent crash with NVIDIA drivers higher than Release 353. “

I know Illustrator is an OpenGL based program, and it sounded similar to the issue I was having with Houdini…

I'm happy to report that 361.75 seems to be the first proper Windows 10 driver that hasn't had TDR crashes working on complex Houdini scenes!! I even tried increasing the multiplier on my ”grid_ultra" sop from 4 to 5 which added another 10-15 million polygons and could still render to the viewport just fine.
User Avatar
Staff
5158 posts
Joined: July 2005
Offline
Good to hear, it definitely sounds like the same issue. Nvidia's pretty good about fixing large issues like this promptly.
User Avatar
Member
17 posts
Joined: Nov. 2007
Offline
I've been running into constant crashes while flipbooking my scene, or trying to run an opengl render. Are other people still running into this with the latest Nvidia drivers?

I have some fairly heavy fractured geometry, nothing too crazy though, and it's packed geo being driven by a point deform node and simple driver mesh.

I'm on Houdini 15.5 Apprentice, Windows 10, Nvidia GeForce GTX 760, driver version 375.95.
~ Keith

Lead FX TD @ DNEG
FX Instructor @ CGMA
Producer/DJ/Performer @ Mechanical Vein
http://keithdigital.com [keithdigital.com]
User Avatar
Member
17 posts
Joined: Nov. 2007
Offline
I just fixed this by reverting to that old version 347.88 driver. Thanks for the info Mike! It seems like it may have rendered my flipbook faster as well, but I'm not sure. Either way, I'll take it. :-)
~ Keith

Lead FX TD @ DNEG
FX Instructor @ CGMA
Producer/DJ/Performer @ Mechanical Vein
http://keithdigital.com [keithdigital.com]
  • Quick Links