Composite Bug!

   8917   16   2
User Avatar
Member
48 posts
Joined: July 2005
Offline
I'm using H7.0.493 (although it also happens with H8.0.440).

I'm doing some simple compositing on images that are 5092 x 768. The problem is that bluring (to produce a glow) stops at the left half of the screen … just simply stops. The right half of the screen has no glow. It only happens on frames around 7300 and later. Very odd.

It's a simple blur, with ramp color inside blur. A separate blur and gamma of the file is used to “cut out” the center (so I'm only bluring the Outside of translucent objects). Then the blur is composited with the original file.

Any ideas what might be causing this? I've tried adjusting cache sizes to no avail.

Thanks,
Robert
Adtech Communications Group
User Avatar
Member
4140 posts
Joined: July 2005
Offline
Under Preferences/Compositing/Cache - did you ensure your Resolution Limit is set higher than 5000? I'm pretty sure it gives you an error in that case, though.

You're talking about some pretty serious memory usage here - 5x1K x 7000 frames. What's your memory cache setting? Are you monitoring mem usage? Odds are you're at a less than optimum setting and your swap file's getting overused… That would explain why it's taking time to show up. Try setting a smaller cache.

Cheers,

J.C.
John Coldrick
User Avatar
Member
48 posts
Joined: July 2005
Offline
Hi John,

Yes, resolution limit is at 10,000x10,000

I've tried cache settings as low as 150 (and as high as 1250) on two PCs (one with 1.5 Gig memory and one with 4 Gig memory) … same problem.

When I've run tests I don't see it going into swap space, although it might overnight when I'm not looking. I consistently get the problem when I know it's not into swap.

On looking at more bad frames I see that there is intermittent garbage on the right side, as well as the glows (blurs) not being present. I found that deslecting “Fast Blur” (Gaussian) helps remove the garbage, but not the main problem.

The full res network ends in some crop COPS that slice it into three screens, for a three screen video projection. In short tests I found that bypassing the crop COPs eliminated the problem. So last night I set it up to output full res, then read back in and crop. As I've frequently found in the past I got “Fatal Error - Segmentation fault” after a couple hundred frames. I've seen this alot with the COP2 nets, but never with the old COP nets, so it appears that there is some underlying memory managment issues with the COP2 nets. I get this same problem on two different PCs.

Right now I'm setting it up to crop the rendered files, then apply the COP nets … hoping that it will handle each slice of it. It would have been alot easier to use a compositing program (such as Combusion or After Effects) that is known to be reliable with large files.

If you have any more ideas I'd love to hear them!

Thanks,
Robert
Adtech Communications Group
User Avatar
Member
7025 posts
Joined: July 2005
Offline
Have you turned off multithreading?
User Avatar
Member
4140 posts
Joined: July 2005
Offline
Yah, smells like you're pushing the envelope and something not quite kosher is showing up. Make sure multithread is off, in case you haven't already. Putting on my Old Fart Hat, back in the days when things would crap out every other day , my workaround to this in the short term would be to setup everything so it works great with a Comp ROP, then run a script(shell, Python, whatever) that does the comp in bursts of (say)200 frames. That way you're completely avoiding any display buffer issues, and hopefully the subrange will run well, shut down, and the next fires up. The good news with comping is that compared to a large render hip file, they tend not to be as inefficient doing things in this manner - usually every frame is mostly new pixels.

Strictly a workaround, mind you, to let you go home for the night. When you're pushing things this hard, it makes it that much harder to get SESI to be able to repro it. I do know that you can't get too waylayed by the large memory systems - I'm unsure which OS you're running but I believe Mark A. mentioned at least on Linux you might have a 4G system, but Houdini will only occupy a fraction of that, based on the limit that one app can hold. I'll admit to not being clear on *why* that is, I'm sure I've seen otherwise with other apps, but I recall him saying that.

Cheers,

J.C.
John Coldrick
User Avatar
Member
7025 posts
Joined: July 2005
Offline
There is definately something flakey in 8.0.xxx with COPs though. Mark is trying to track it down. What OS are you on? I have trouble rendering more than 200 frames without a crash, and I know of at least one other (on Windows) with similar problems. Others have posted very similar sounding problems on the forums here too. It seems like a really hard one to track down though, reproducing it is tough for Mark!!

Cheers,

Peter B
User Avatar
Member
48 posts
Joined: July 2005
Offline
Multi-threading is turned off.

John: your suggestion of writing a script sounds the most sane, although I already put time into cropping before procession the net. Until it crashes, the full resolution frames did composite okay.

I'm running WinXP. I know that there is a 2 Gig memory limit on any single process … and it didn't appear to get close to that.

Peter: I'm using H7 for this project, although I downloaded H8 and tried that with same problems.

Thanks for your help guys! The client I'm doing this animation for has After Effects in house, which they use to composite the entire 15 minute three projector show … and they have had No problem with AF. It's frustrating.

SESI: Please, Please take a serious look at fixing the ongoing memory issues in the COP2 Net!!!!!!!!
Adtech Communications Group
User Avatar
Staff
5161 posts
Joined: July 2005
Offline
I've been looking at some cases that Peter sent me, and I've ruled out Multithreaded Compositing as a cause. It will fail after a number of frames regardless of this setting (actually more likely a certain number of cache operations). So you may want to turn it on.

Memory in COPs is very strictly regulated, and I'm about 99% sure there is no memory leak occurring (at least, no noticable leaks, so if there is one, it's very, very small). From the descriptions of the bug, it would appear that somehow the tile cache is being corrupted. This would result in a crash or garbage tiles as you have observed. Why (or if) this is happening is still under investigation.

So far, I have been able to reproduce a crash while running Peter's test case through valgrind, a memory checker, but not normally (the mem checker took about 9hrs to run 250 frames due to the validation overhead). While it narrowed the problem down to a stack trace, it still hasn't revealed what the root cause is. So the search continues…


The 2Gb limit: you will never be able to have an amount of memory even close to 2Gb - usually the limit is around 1.3Gb to 1.6Gb depending on the application's allocation/deallocation behaviour. This is due to memory fragmentation. Like a disk, allocating & freeing memory will gradually leave alloc'ed blocks and freed holes scattered about your 2Gb memory area. Memory is allocated from the ‘holes’. Unlike a disk file, a block of memory cannot be broken up into pieces, as it must be a contiguous block.

So, if I ask for a 1Mb block, and no hole is big enough for this request, the allocation fails. The system can't reorganize your memory space because the code will have pointers to the blocks which can't be changed (since it has no way of knowing which variables are actually pointers). So this is why you normally run out of memory well before hitting the 2Gb mark.

64bit OSes effectively fix this problem, since the OS has over a terabyte of address space to give the application, and so it is extremely unlikely that you'll run out of virtual address space before you run out of real memory and swap.

Having said that, I still think something's broken in the COPs engine, since reducing the cache size to even a miniscule amount does not fix the problem. And whatever it is, it's damn subtle, unfortunately.
User Avatar
Member
48 posts
Joined: July 2005
Offline
Twod,

Thanks for the response!

I'd like a clarification: when you say that memory becomes fragmented, so if you ask for 1 gig and no 1 gig block exists you get an error. Is this the infamous “Fatal Error - Segmentation Fault”?

If so, why didn't the old COP nets have this problem (or ICE, or After Effects for that matter)

Also just to be clear. You mention that the “tile cache” is getting corrupted. My problems are when compositing from Hscript … so those COP image tiles are not involved? The cooking cache is involved however. Or are we talking about different things?

Thanks,
Robert
Adtech Communications Group
User Avatar
Member
48 posts
Joined: July 2005
Offline
OOOppps. I meant to write: “if you ask for 1 Mb and no 1 Mb block exists” I can see how a 1 Gig block might be hard to find!
Adtech Communications Group
User Avatar
Staff
5161 posts
Joined: July 2005
Offline
If so, why didn't the old COP nets have this problem

Old COPs had this problem in a different form. Each COP would cache the last image by default - so large COPnets, with lots of nodes, had a tendency to run out of memory, because you'd use x memory to cook. To contrast, new COPs use a global cache with a specified size, which admittedly may be set a bit high. (Try 256Mb max).

Also, in the days of old COPs (pre-2001), you'd be lucky to have a machine with 512Mb of RAM - so you'd be more likely to run out of real memory before running out of virtual address space. We're currently at a turning point between the end of 32bit and the beginning of 64bit - and you're very able to see the limitations of 32bit.

Also just to be clear. You mention that the “tile cache” is getting corrupted

Tiles are used when cooking COPs - they're the data type that's passed from node to node (large image are cooked in pieces, or tiles). Tiles are what is cached in the cook cache.
User Avatar
Staff
5161 posts
Joined: July 2005
Offline
OOOppps. I meant to write: “if you ask for 1 Mb and no 1 Mb block exists” I can see how a 1 Gig block might be hard to find!

Yes, this causes the seg fault, because we generally don't handle this out-of-memory condition very well.
User Avatar
Staff
5161 posts
Joined: July 2005
Offline
I found one problem that may be causing crashes. I've fixed this for tomorrow's build (8.0.445). However, COPs still seems to use far more memory than is required when rendering to files, so I'm still investigating. In the meantime, please set the cache size to a small value to compensate for this (100Mb or less). Your cooks will still proceed even if they require more memory (it's just freed immediately afterwards).
User Avatar
Member
7025 posts
Joined: July 2005
Offline
Cool, thanks Mark, I'll give it a try.
User Avatar
Staff
5161 posts
Joined: July 2005
Offline
Actually, Peter, please wait until tomorrow's build (447), as I believe I found the root cause of the problem. Without the fix I'm adding today, it will likely still crash.

The problem is with the deform/composite COPs in your network, when the frame ranges don't align. On a frame where an input is out-of-range, requesting that input's image data fails but doesn't properly clean up the request, which eventually leads to the crash.

Also, the Deform COP itself is not freeing memory if it is operating on a constant image (say, solid alpha), which is where all the memory is going. So there is a leak, just not in the engine where I was looking.

So far, I have been able to run through your test case several times without incident, with ‘top’ reporting very constant memory usage (once the cache fills up).
User Avatar
Member
48 posts
Joined: July 2005
Offline
twod,

You might want to also check the blur COP. I've found that to have a difference on creating the “garbage”. Deselecting “Fast Blur” helps, but does not solve the problem. I've tried all types of blur (Gaussian, etc) with no difference in the problem.

Thanks,
Robert
Adtech Communications Group
User Avatar
Staff
5161 posts
Joined: July 2005
Offline
The current 8.0 build 452 should have these problems fixed. I'm not sure about the blur problem, as I have not been able to reproduce it, but garbage tiles could definitely have been caused by the previous problems.

If you get the chance, please try out 8.0.452 or higher and let me know if the problems persist. Thanks for your patience!
  • Quick Links