iraf-v216 · Code · Issues (50) · Pull requests (81)
iraf.net Issue #61
[linux] imcopy (and many other operations) fail: Out of memory
closed olebole opened this issue on 2017-05-15 · 2 comments
olebole commented on 2017-05-15
On linux (32 bit), many command, like imcopy
fail with an out-of-memory warning:
cl> imcopy dev$pix image.short
dev$pix -> image.short
ERROR: Out of memory
or
cl> imhead dev$pix long+
dev$pix: Out of memory
This is a version built from source, with #60 as the latest patch. linux64 build succeeds from the same source.
olebole commented on 2017-05-16
Update from the debugging of unix/os/zmaloc.c
: allocations with less then ~2^17 bytes succeed:
ZMALOC(buf, 68082, stat)
gives bufptr=0x57d50730
, *buf=2bea8398
, stat=0
.
However, with more than 2^17 bytes, the system is going to return addresses in a different space:
ZMALOC(buf, 132946, stat)
gives bufptr=0xf7538008
, *buf=fba9c004
, stat=-1
.
The conversion address –> buf, ADDR_TO_LOC
just rightshifts the address. The leftmost byte is kept here. So, the final comparison
if (*buf > 0)
*status = XOK;
else
*status = XERR;
seems to be wrong and should be replaced by a *buf != 0
.
Update: when the address is freed, the first step is that it is checked whether the pointer is negative. And then, there are some more consistency checks that will fail with negative values. So, if we don’t want to disable them, we need to make sure the address is not negative (by resetting the MSB in the address).
olebole commented on 2017-09-28
@iraf wrote:
[…] or those issues that report new bugs in tasks that have been unchanged in decades (e.g. #61) that may have arisen out of PRs but can’t be reviewed without adequate test code or platform/version details.
Maybe you still didn’t understand this bug: The cause of the “out of memory” failures is the following piece of code which has an implementation dependent behaviour according to the C standard:
#define ADDR_TO_LOC(addr) (((XINT)((XCHAR *)(addr)))>>(sizeof(XCHAR)-1))
This code is used in zmaloc.c
to convert a pointer (returned from malloc()
) to an integer that can later be used as an index for the mem
common blocks.
The pointer returned from malloc()
may have any integer value: whether the most significant bit is set or not is decided internally by the memory allocation strategy. If the most significant bit is set, this pointer is then converted to a negative signed integer within the #define
statement shown above. This signed integer is then right-shifted by one (sizeof(XCHAR)
is 2).
When a signed integer with a negative value is right shifted, the C standard makes it implementation depending whether a zero is shifted in (which would make the result positive), or a one (which would keep the result negative):
ISO/IEC 9899/1999, §6.5.8:
- The result of
E1 >> E2
isE1
right-shiftedE2
bit positions. IfE1
has an unsigned type or ifE1
has a signed type and a nonnegative value, the value of the result is the integral part of the quotient ofE1 / 2^E2
. IfE1
has a signed type and a negative value, the resulting value is implementation-defined.
In that case, the C compiler has the (implementation defined) freedom what to do. gcc keeps the sign, and then returns a negative value (which is then returned as an error by ZMALOC
).
The solution (#62) is simple: we need to convert the pointer to an unsigned type instead of a signed one:
#define ADDR_TO_LOC(addr) (((unsigned XINT)((XCHAR *)(addr)))>>(sizeof(XCHAR)-1))
For unsigned values, the right shift is defined to shift in a zero. This obviously works well independently what the compiler does for signed values. And the left shift does not have these problems; here the behaviour is well-defined. On error, malloc()
returnes NULL
, which is catched by ZMALOC
and not a problem for the conversion at all.
The release notes of 2.12 describe basically the same problem:
unix/hlib/cl.csh
[pcix]
Last-minute testing under RHEL showed a segfault in nearly all tasks. This was traced to a number of pointer values being returned by the system at addresses outside of the process stack space. This also affects Fedora systems and may well become a problem for other distributions using newer kernels. Until this is better understood, added alimit stacksize unlimited
call to thecl.csh
startup script as a workaround for processes started from the CL. This is still an issue for IMFORT tasks and will require the user to do the same in their.cshrc
file, however it should be a (mostly) harmless change for most users. (2/5/04, MJF)
The only thing the patch #62 does is to ensure that a zero is shifted in, and this works on all systems.
Fixed in #62
Last updated on 2017-10-27