View on GitHub

IRAF Community Distribution

IRAF maintained by the community

Home | Installation | Packages | X11IRAF | PyRAF | Forum ↗

iraf-v216 · Code · Issues (50) · Pull requests (81)

iraf.net Issue #61

[linux] imcopy (and many other operations) fail: Out of memory

closed closed olebole opened this issue on 2017-05-15 · 2 comments


olebole commented on 2017-05-15

On linux (32 bit), many command, like imcopy fail with an out-of-memory warning:

cl> imcopy dev$pix image.short  
dev$pix -> image.short  
ERROR: Out of memory  

or

cl> imhead dev$pix long+  
dev$pix: Out of memory  

This is a version built from source, with #60 as the latest patch. linux64 build succeeds from the same source.


olebole commented on 2017-05-16

Update from the debugging of unix/os/zmaloc.c: allocations with less then ~2^17 bytes succeed:
ZMALOC(buf, 68082, stat) gives bufptr=0x57d50730, *buf=2bea8398, stat=0.
However, with more than 2^17 bytes, the system is going to return addresses in a different space:
ZMALOC(buf, 132946, stat) gives bufptr=0xf7538008, *buf=fba9c004, stat=-1.

The conversion address –> buf, ADDR_TO_LOC just rightshifts the address. The leftmost byte is kept here. So, the final comparison

            if (*buf > 0)  
                *status = XOK;  
            else  
                *status = XERR;  

seems to be wrong and should be replaced by a *buf != 0.

Update: when the address is freed, the first step is that it is checked whether the pointer is negative. And then, there are some more consistency checks that will fail with negative values. So, if we don’t want to disable them, we need to make sure the address is not negative (by resetting the MSB in the address).


olebole commented on 2017-09-28

@iraf wrote:

[…] or those issues that report new bugs in tasks that have been unchanged in decades (e.g. #61) that may have arisen out of PRs but can’t be reviewed without adequate test code or platform/version details.

Maybe you still didn’t understand this bug: The cause of the “out of memory” failures is the following piece of code which has an implementation dependent behaviour according to the C standard:

#define	ADDR_TO_LOC(addr) (((XINT)((XCHAR *)(addr)))>>(sizeof(XCHAR)-1))  

This code is used in zmaloc.c to convert a pointer (returned from malloc()) to an integer that can later be used as an index for the mem common blocks.

The pointer returned from malloc() may have any integer value: whether the most significant bit is set or not is decided internally by the memory allocation strategy. If the most significant bit is set, this pointer is then converted to a negative signed integer within the #define statement shown above. This signed integer is then right-shifted by one (sizeof(XCHAR) is 2).

When a signed integer with a negative value is right shifted, the C standard makes it implementation depending whether a zero is shifted in (which would make the result positive), or a one (which would keep the result negative):

ISO/IEC 9899/1999, §6.5.8:

  1. The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

In that case, the C compiler has the (implementation defined) freedom what to do. gcc keeps the sign, and then returns a negative value (which is then returned as an error by ZMALOC).

The solution (#62) is simple: we need to convert the pointer to an unsigned type instead of a signed one:

#define	ADDR_TO_LOC(addr) (((unsigned XINT)((XCHAR *)(addr)))>>(sizeof(XCHAR)-1))  

For unsigned values, the right shift is defined to shift in a zero. This obviously works well independently what the compiler does for signed values. And the left shift does not have these problems; here the behaviour is well-defined. On error, malloc() returnes NULL, which is catched by ZMALOC and not a problem for the conversion at all.

The release notes of 2.12 describe basically the same problem:

unix/hlib/cl.csh [pcix]
Last-minute testing under RHEL showed a segfault in nearly all tasks. This was traced to a number of pointer values being returned by the system at addresses outside of the process stack space. This also affects Fedora systems and may well become a problem for other distributions using newer kernels. Until this is better understood, added a limit stacksize unlimited call to the cl.csh startup script as a workaround for processes started from the CL. This is still an issue for IMFORT tasks and will require the user to do the same in their .cshrc file, however it should be a (mostly) harmless change for most users. (2/5/04, MJF)

The only thing the patch #62 does is to ensure that a zero is shifted in, and this works on all systems.


Fixed in #62


Last updated on 2017-10-27