View Full Version : PDE2D for Windows or Linux?

03-19-2007, 01:01 AM
I have some questions concerning the choice of the operating system for efficient PDE2D computations. Presently I am using the Windows.

The problems, which I have been solving for some time use realy a lot of memory. For example a system of 50 time dependent, coupled partial differential equations (what seems to be now problem at all for the PDE2D itself). That demands a huge memory resources. So when I want to perform some longer runs I have to rather "cut it for pieces", since the arrays containg the sollutions are simply to big :( .

The questions are following:

Do You know how to force the Windows to allocate more memory for the PDE2D aplications? It seems that Windows have some boundarys in memory allocation for the applications... (increasing of the virtual memory does not help)

Are there similar problems on Linux? If not, I should deffinitely install one!

Thats it! For the end I would like to advice the PDE2D for the people who are interrested mainly in solutions - probably You will never make the numerical work better than the PDE2D. However! The most important advantage of this software is, that when You finish Your teoretical work on the model and the time for implementation comes, with PDE2D it is just a few days! And Granville Sewell is always for You there:)

03-19-2007, 07:09 AM
(Note: I have communicated directly with this user previously, and he has now sent me his PDE2D program)

Dear Dominik,

In the program you sent me, the two main PDE2D work arrays are 60 million words (the integer work array) and 40 million words (the double precision work array). Thus these account for 60*4 + 40*8 = 560 Mbytes, apparently this plus the rest of the program just uses up the 1Gb memory you have on your PC. If you would increase the memory on your PC to 2Gb that would obviously allow you to solve larger problems; however, increasing beyond 2Gb would NOT help with the Windows version, because it (both the LF90* version you have and the Intel Fortran version VNI sells) is a 32-bit version, so you can never access more than 2^31 ~ 2 Gbytes of memory using the 32-bit version. The Linux version is 64-bit (32-bit version is also provided), which means you would be able to use much more than 2 Gbytes memory IF you had more on your PC. Even with the 64-bit version, you would not be able to handle problems requiring a double precision work array larger than 2^31 = 2.1 billion words (17Gb), because it still does not allow integers (therefore, subscripts) larger than 2.1 billion; however, that is a lot larger than the 40 million you are currently able to handle. Thus even with the 64-bit version, you cannot use more than about 20Gb memory, but I guess not many PCs have more than that!

For 2D and 3D problems, PDE2D has a "frontal method" solution option, in which only a very small part of the matrix is stored in memory at a given time, with this option you can solve extremely large problems before running out of memory, though this option is much slower than the sparse direct and iterative options, so CPU will be the main problem with this option. However, your problem is 1D, I do not have a frontal method option for 1D problems because memory is not often a serious concern for 1D problems, though in your case, with 50+ PDEs unfortunately it is a problem.

You mentioned the possibility of using virtual memory; the LF90 compiler does have a virtual memory option, which apparently you are not able to use through PDE2D; I need to investigate this and get back to you, I am not sure right now what is going on with that. This may not be a practical option in any case, use of virtual memory often slows a program down so much as to be impractical even if we could use it through PDE2D, I'm not sure if that is the case here. The "frontal method" options (unfortunately only available for 2D and 3D problems, as mentioned) is essentially like using virtual memory, except the "page swaps" are controlled by PDE2D, and are guaranteed to be efficient, whereas the compiler's virtual memory algorithm might be very inefficient for PDE2D usage.


Granville Sewell

*The LF90 Windows version, which this customer is using, will probably be available from VNI soon, currently it is only available directly from me (www.pde2d.com). The advantage of the LF90 version is the LF90 compiler itself is also included (I have a VAR agreement with Lahey, Inc. which allows me to distribute it with PDE2D), so you don't need any Fortran compiler before purchasing the LF90 version.

03-19-2007, 07:38 AM

I just thought of another approach, that may work very well for your problem, and won't require more memory or a 64-bit version.

As I mentioned in my previous post, the 2D and 3D programs (both Galerkin and collocation, you are using collocation) have a frontal method option, which dramatically reduces the core memory requirements, but the 1D programs do not, they always use a sparse direct solver. I did not anticipate that many 1D programs would run into memory limitations, but with your 50 PDEs I understand how that can happen. However, you can simply solve your problem as if it were a 2D problem, with NYGRID=1, and then you will be able to select the frontal solver. You will have to re-create your program using the interactive driver (do NOT try to convert your 1D program to 2D), but if you set NYGRID=1 you will not be prompted for boundary conditions in the y direction, and the problem will be solved as efficiently as if it were 1D (no additional unknowns compared to a 1D program with the same NXGRID). Be sure to set ITRANS=0 (rectangular region).

I realize re-creating your program, which has some very complicated PDE definitions, from scratch though an interactive session seems like a formidable task (users normally make minor modifications and corrections directly to their program with an editor), but it will not be so hard because you can simply copy and paste all your equations and parameter definitions over to the new program, the PDE and BC formats are the same for 1D,2D and 3D collocation programs.

And I believe the new program will run almost as fast as the old one, and should require much less memory. For 2D and 3D problems the frontal method, which is basically an out-of-core band solver, is much slower than other options, but for 1D problems a band solver is about as fast as a sparse direct solver and faster than iterative solvers. So this should have about the same effect as using virtual memory, except that it will probably be much faster than using virtual memory (which doesn't seem to work anyway, still investigating why). The total memory requirements for a 1D problem using the frontal solver will be almost independent of NXGRID, as the frontal solver only stores in memory the NB by NB "active" portion of the matrix, where NB is the bandwidth, and for 1D problems NB is dependent on NEQN (number of PDEs) but NOT NXGRID!



03-19-2007, 08:20 AM
Dear Granville,

That sounds realy promissing!:) I am about to start to work on a new version of my sourcecode:)

Thank You very much!


03-19-2007, 02:54 PM

I created a couple of test programs to confirm my claims in the last post, I solved a 1D, time-dependent problem with NEQN=4 PDEs, and NOUPDT=.TRUE. (no updating of initial LU decomposition, as in your program) and running as a true 1D problem (which automatically uses a sparse direct solver), with NXGRID=1001 (1000 cubic elements), it required 1.4 Mwords memory, and took about 1.9 seconds. Running as a 2D problem with NXGRID=1001, NYGRID=1, and using the frontal method, it required 0.05 Mwords memory and about 16 seconds. So it does dramatically cut the memory requirements (by a factor of 28) but requires about 8 times as much CPU. I think this is because even though the LU decomposition computed the first time step is used every subsequent step (no updating is needed for this time-dependent problem) in both cases, the frontal method requires reading and writing to a scratch file every step, that is somewhat time-consuming. Results were exactly the same, of course.

So I expect my suggestion to dramatically decrease the memory requirements, but it may substantially increase the CPU time.


03-19-2007, 11:55 PM

That is maybe less promissing than I though - it seems there is always something. Howvere it is good to have such a choice!

Anyway, how about the NOUPDT=.FALSE. option? In few months, when I will understand good enough the mechanisms behind the linearized MHD equations which I have to solve now, I will try to perform some nonlinear computations. What is the predicted CPU time in 1D version in comparsion to 2D (NYGRID=1) version for such nonlinear computations?

Thank You for Your investigations!


03-20-2007, 03:28 AM

I would have predicted that changing both 1D and 2D programs to
NOUPDT=.FALSE. (thus recalculating the LU decomposition every time
step) would make the frontal method more competitive, because now
the cost of reading and writing the matrix from the scratch file is less
significant, but I tried this and in fact it made the frontal method even
less competitive, now the 1D solver (which always uses the sparse
direct solver MA37 from Harwell) takes 12 seconds and the frontal
method (basically an out-of-core band solver) takes 338 seconds, so
it is some 26 time slower! The two programs have exactly the same
number of unknowns, so the matrix sizes are the same; and of course
the frontal solver still takes 28 times less memory (NOUPDT doesn't
affect the memory requirements). I understand why the frontal solver
is slower than the fast direct and iterative solvers for 2D and 3D problems,
because there the matrix is sparse even inside the band, but for 1D
problems, the matrix is dense inside the band, so I really don't understand
the large difference in speed in this case, it doesn't seem to be because
of the I/O either (since the frontal method is even less competitive with
NOUPDT=.FALSE.). The Harwell sparse direct solvers (used with permission
of Harwell Labs) MA27/MA37 are very fast compared to other sparse
direct solvers, in all my tests, but I don't really understand why they
should be so much faster than a band solver, when the matrix is dense
inside the band! Of course, my tests are with NEQN=4, you have some
50 PDEs, that may significantly change the relative speeds, but I'm not
sure in whose favor.

So I'm afraid you are right, switching to the 2D program doesn't look as
promising as I thought initially (though it will definitely solve the memory
problem). I guess you just need to get more memory, but as stated
earlier, it won't do any good to buy more than 2 Gb memory for the
Windows 32-bit version, if you get more than 2 Gb memory you'll need
to switch to the Linux 64-bit version to be able to take advantage of the
extra memory.


03-20-2007, 05:39 AM

Thank You again! I have been considering buying a new memory module, but now I have now doubts:) In the second phase I have to start using the Linux, and maybe change the PC in order to use the 64bit system efficiently.


03-21-2007, 10:41 AM

I found a way to decrease the memory requirements by 20 Mwords very easily. Notice the output from PDE2D says "Estimated" minimum value for real workspace is about 40 million, and if you set IRWK8Z=1 you will get that default. But I found you can actually decrease IRWK8Z to 22,000,000 and it still works (and won't run any slower).

With the sparse solvers, the exact amount of memory required cannot be known until runtime because of pivoting for stability, so a default amount (40 million in this case) is allocated which is almost sure to be enough (in fact, for 1D problems I chose the default to be fairly high, since usually memory is not a problem for 1D programs).



03-22-2007, 08:30 AM

That's great!:) What is the recipe to estimate the 'IRWK8Z'? How about the second parameter 'IIWK8Z'?



03-22-2007, 09:41 AM
The integer workspace allocation parameter cannot be decreased below the default, notice it says "minimum required is..", not "estimated minimum required..." To see if you can decrease the real workspace allocation size below the default (you can, for your problem) simply try a small value (but greater than "1", which gives the default) and you will get an error message saying "IRWK8Z must be increased to at least 20390700, suggested value is 40430700". The true minimum is almost always somewhere between these two values, you'll have to experiment and see how low you can set it and not get an error message (I tried 22000000 and it worked ok). For some of the solvers the exact amount of memory required can be calculated at allocation time, but for the sparse direct solvers it is not known exactly until half-way through the gaussian elimination, so by default an amount that is almost sure to be enough is allocated, and if you request a smaller amount it will either work or you will get an error message if you haven't allocated enough.