Wednesday, October 26, 2005

floating-point assist fault

1) How to stop the program immediatly at the site
of the fault?

There is a command "prctl" on the SGI Altix system, with which
you can specify what to do for the two notorious kernel
emulations: unaligned access and floating-point assist fault.
For example, if your Fortran program is run as

$ prctl --fpemu=signal your_program

the floating-point assist fault condition triggers SIGFPE. So,
if you compile your program with -g and run it under gdb as

$ prctl --fpemu=signal gdb your_program

you'll find which line of your code is causing the fault.

On an ordianry Linux, there's a system call "prctl". This command
on Altix may be encapsulating it. (I found Solaris 9 also has this
command.) I wonder what other Linux systems on Itanium machines
are doing. This problem should be common to all Linuxes on Itanium
systems.

2) What does the "ip" information in /ver/log/messages mean?

I found that the listing from the "-Wl,--print-map" does not match
what /var/log/messages says. Or, otherwise I'm missing something.
The messages file says the problem occurs at "ip 400000000005d6e1"
and the listing from the linker has this:

[...]
.text 0x400000000005ca00 0x3080 sfcng-delete-this.o
0x400000000005db80 bdyflx_
0x400000000005ca00 sfcflx_
.text 0x400000000005fa80 0x7f40 libogcm.a(atmct.o)
0x400000000005fa80 tmstup_
0x4000000000065e80 tmstpc_
[...]

This address 5d6e1 is after the top of sfcflx_ (5ca00) and before
bdyflx_ (5db80). That means the problem is in sfcflx_. At least
so I thought.

The fact is that the problem was in bdyflx_.

I hope this will be helpful for other people having the same
problem.

from Ryo.

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?