Houdini can only load 1.5 TB of Flip Data then crashes!

   13446   11   3
User Avatar
Member
14 posts
Joined: March 2018
Offline
Built a small supercomputer, that has 3.8 tb ram like you see on cloud computer instances now from Azure and Amazon web services, for a coworker who works at Qualcomm, and the latest 16.5 build of Houdini can only load 1.5 to 1.4 TB of Flip data into RAM for fast playback then crashes, it only uses about 30% of total system RAM then crashes, Houdini needs to be fixed so this does not happen! Operating system is Windows Server 2016 by the way not sure if the Linux version crashes too!
User Avatar
Member
387 posts
Joined: Nov. 2008
Offline
How to log bugs and rfe's for Houdini - READ THIS FIRST! [www.sidefx.com]

So, please fill bug report:

Submit Bugs [www.sidefx.com]
User Avatar
Member
527 posts
Joined: July 2005
Offline
Sounds like quite the setup. If you don't mind me asking what is the hardware involved?


ps. You'll probably have better luck with Linux as it has better memory management.
soho vfx
User Avatar
Member
14 posts
Joined: March 2018
Offline
Yeah windows is garbage compared to Linux for stability, it's why all the big name render farms use Linux, Hopefully SideFX software fixes this and they can easily test this out by trying to run a 3.8 tb ram instance from Amazon web services if they want to have access to that type of hardware quickly and in a affordable manner for testing and debugging.
Edited by mgood - July 3, 2018 14:49:10
User Avatar
Staff
6160 posts
Joined: July 2005
Offline
We don't have any known limits at that point, but there are always surprises….

Biggest linux machine I've run on is 1.5TB, interestingly enough, so while that worked right up to 1.5TB it doesn't answer the question about beyond :>

“Crash” can be a rather vague term. Does Windows report any interesting messages when it takes Houdini down?

If you can try on Linux, that would help swiftly separate whether this is an OS issue or Houdini issue. The closest I can think of for a Houdini issue would be someone using a int32 to store a memory size in kb. But that would more overflow around 2TB.

I can't find anything around 1TB here: Server 2016 seems enabled right up to 24TB.
https://docs.microsoft.com/en-gb/windows/desktop/Memory/memory-limits-for-windows-releases [docs.microsoft.com]

Attached is a .hip file that uses 4GB per frame by initializing 1024^3 volumes (and making sure they aren't displayed so you don't use more memory…) It should be a lot faster for hitting the 1.5TB limit. It also might reveal if it is *how* we are allocating the memory that is failing.

A long while ago we had a 48gb limit on Linux's default allocator because NVidia reserved the 2GB address space, which caused sbrk() to fail and fall back to mmap(), which has a hardcoded limit of 64k handles…. There might be a similar thing we are hitting here…

Attachments:
memory_use.hip (84.1 KB)

User Avatar
Member
14 posts
Joined: March 2018
Offline
Yeah i thought Houdini had no limits too until i used windows LOLOLOLOL anyways it always crashes at 1.5 TB to 1.4 TB of loaded RAM cache use!
User Avatar
Member
14 posts
Joined: March 2018
Offline
has this been fixed yet?
User Avatar
Staff
6160 posts
Joined: July 2005
Offline
Awaiting more information….

madcat117
it always crashes at 1.5 TB to 1.4 TB of loaded RAM cache use!

Is this referring to memory_use.hip? Or your own FLIP loading tests?

jlait
Does Windows report any interesting messages when it takes Houdini down?

Ideally, if you can also try a Linux distro (Can run apprentice off a thumb-stick Linux?) we can get a good idea if this is something in Houdini or the OS.

Thanks,
User Avatar
Member
50 posts
Joined: May 2018
Offline
Wow, I am having RAM problems on my iMac (it is the late 2015 model with the lowest specs). How much did the supercomputer cost?
User Avatar
Member
14 posts
Joined: March 2018
Offline
is over 1.5 tb of flip simulation ram cache fixed yet?
User Avatar
Member
806 posts
Joined: Oct. 2016
Offline
I have no idea what is going on behind the scenes, but threads like this make me feel *very* uncomfortable. Jeff has asked you (“madcat117”, whatever that is supposed to mean) some questions in order to figure out what might be going wrong. He is showing, by this, a lot more cooperation and interest in users' issues than most other (3d-)companies I have been in contact with. Yet, from what can be read in this thread, you (“madcat117”) are completely and 100% ignoring the questions, making it almost impossible for SideFX to help you.
It may not have occurred to you, but your setup is *not* the typical use-case scenario. I am absolutely sure that SideFX has interest in solving the problem *with* you, but some decency on your behalf would go a long way.

I apologize sincerely for stepping in. This is not my thing to say - the only excuse I have is that I *hope* SideFX can keep up their user orientation even *with* this user behavior.

Marc Albrecht
---
Out of here. Being called a dick after having supported Houdini users for years is over my paygrade.
I will work for money, but NOT for "you have to provide people with free products" Indie-artists.
Good bye.
https://www.marc-albrecht.de [www.marc-albrecht.de]
User Avatar
Staff
6160 posts
Joined: July 2005
Offline
madcat117
is over 1.5 tb of flip simulation ram cache fixed yet?

No. I have not reproduced it yet. I would appreciate more information about the nature of the crash.

We don't have any known 1.5TB limits, and it is a weird number for most of address problems that are directly our fault. 2gb/4gb are big flags, along with 4x and 0.25x of those. But by the time you get to a TB most of our counters would either be 64 bit or long since overflowed.

madcat117
Hopefully SideFX software fixes this and they can easily test this out by trying to run a 3.8 tb ram instance from Amazon web services if they want to have access to that type of hardware quickly and in a affordable manner for testing and debugging.

I'd like some more guidance as to what I'm looking for before paying $32/hour to remotely debug the issue. My current theory is it is some ulimit-style restriction in Windows 2016. This is why I'm most interesting in knowing if it has a similar barrier on Linux.
  • Quick Links