PDA

View Full Version : Column-major vs. Row-major: a puzzled user writes



Bernard Leak
07-14-2006, 02:14 PM
I can't well put this in as a support request, as I simply don't have a
licence for (nor a copy of) the IMSLIB FORTRAN library, but that's what it's
about. Any help welcome...

I've been asked to convert a mass of (truly repulsively written) FORTRAN
to C. It used (and the C will use) the IMSLIB library. This is fine for
the customer, who has the necessary licences, but I'm developing blind
from the documentation.

I'm assured that the code at least appeared to work once, though
upgrading the IMSLIB libraries has caused it to start failing inside
a call to LINDS(). I suspect this is simply because LINDS used
to accept an ill-conditioned matrix, but now checks for such
cases. The API for LINDS() does not return an estimated
condition number to the caller, which LFCDS() does, but the
new error message does cite a nasty-looking condition number
as it dies.

The effect of this is that the customer can't even supply
me with useful sample output to tell me what the code does
when it's working, and I suspect in some sense it never did
work as intended.

While I was trying to find why the problem arises, I spotted
some very suspicious-looking matrix mangling.
The following has a few irrelevant details removed, but the
critical index manipulations are exactly as in the original.

DIMENSION X1[10][10], X2[10][10]

CALL LINDS(NP, X1, 10, X2, 10)
DO 1 I=2,NP
DO 2 J=1,I-1
X2(I,J) = X2(J,I)
2 CONTINUE
1 CONTINUE

As I read the documentation, LINDS supposes
itself to have a symmetric matrix as input:
it actually reads values only from the upper
triangle. Consequently it has a symmetric
matrix as output (unless it fails), and only the
upper triangle of the output is written.

Since FORTRAN is column-major,
and we have J < I,
X2(I,J) has column > row (is above the main diagonal)
X2(J,I) has row > column (is below the main diagonal)

so the loops make the output matrix symmetrical
by copying the lower triangle onto the upper
triangle, which is SURELY not what was intended.
(The code is certainly not trying to do anything
amazingly subtle here: it's just filling out the
implicit entries to make the matrix complete, and
doing it the wrong way).

On the other hand, the code also contains

DIMENSION X3(4,4), X4(4,4)
DO 3 I=2,4
DO 4 J=1,I-1
X3(J,I) = X3(I,J)
4 CONTINUE
3 CONTINUE

CALL LFCDS(4,X3,4,X4,4,COND)
DO 5 I=2,4
DO 6 J=1,I-1
X4(I,J) = X4(J,I)
6 CONTINUE
5 CONTINUE

LFCDS() again assumes symmetrical input, which
it fills out from the upper triangle, and its output
is simply an upper triangular matrix. The docs
again say that the elements below the main
diagonal are left untouched.

The user code here seems to me to be performing
a redundant copy (you'll have to take my word for
it that the surrounding code doesn't require it)
from the upper to the lower triangle of X3.
The upper triangle of X3 is read by LFCDS(),
which writes its output to the upper triangle
of X4. The elements of X4 below the main
diagonal are implicitly zero, but are not
actually modified. Then the last pair of loops
copies whatever junk happens to be lying around
below the main diagonal of X4 to over-write
what was previously above the main diagonal.

Again, nothing subtle is happening here: it's
a straightforward Cholesky decomposition.
It's very much the story as for LINDS(), except
that the initial extra data copy confirms the
disagreement about which triangle is upper
and which is lower.

I think that in fact no values are EVER written
to the lower triangle of X2 or X4, and I suspect
that they will in fact always be zero (whether or
not this is guaranteed by the FORTRAN standard(s) -
the code purports to be FORTRAN 77, but in fact
it might as well be FORTRAN IV).
Thus, the effect in both cases is to zap everything
to zero except the main diagonal.

Am I making a huge mistake somewhere?