PDA

View Full Version : Trouble with NonlinearRegression?



grover11
05-04-2009, 02:10 PM
Hi,

I am using JMSL 5.0.1 on WinXP with SP2 and JRE 1.6.0_06. I am using the NonlinearRegression class to generate the coefficients a and b in the equation Y = a * e^(b * x). I generated the initial parameters values for these coefficients with some sample data, and provided them to the setGuess(), where a = 18.755497533160803 and b = 0.00022960312118070167.

I used the initial parameter values above with nls() in R, which by default uses a Gauss-Newton algorithm, and it generated the coefficients for the exponential equation as a = 850.7 and b = 0.0005322.

In addition, I used MS Excel to create a scatterplot with an exponential trendline. The exponential equation has a = 854.06 and c = 0.0005. I?m not sure what algorithm Excel is using to generate these coefficients.

When I generate the coefficients using JMSL, solve() returns a= 71.84244133504096 and b = 0.0020465827799000125. I am puzzled as to why the coefficients for a and b generated by JMSL differ from those generated by R and Excel. Would someone please let me know what is wrong with my implementation below?

My sample data is attached; it is a CSV file with the extension txt.

Thank you.



import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Vector;

import com.imsl.stat.NonlinearRegression;

public class ExponentialCoefficients {

private File csvFile;
private double[] xinput;
private double[] yinput;
private int nobs;
private double estimated_a;
private double estimated_b;

//calculate coefficients a and b in predicted y = a * e^(b * x)
public static void main(String[] args){
try{
ExponentialCoefficients exp = new ExponentialCoefficients();
exp.calculateCoefficients();
}
catch(Exception e){
e.printStackTrace();
}
}

public ExponentialCoefficients() throws Exception{
csvFile = new File("<File path>\\xy_values.txt");
parseCSVLines(readTextFileLines(csvFile));
nobs = xinput.length;
estimated_a = 18.755497533160803;
estimated_b = 0.00022960312118070167;
}

public void calculateCoefficients() throws Exception{

int nparam = 2;
double theta[] = new double[2];
theta[0] = estimated_a;
theta[1] = estimated_b;
NonlinearRegression regression = new NonlinearRegression(nparam);
regression.setGuess(theta);
double coef[] = regression.solve(f);
System.out.println("The computed regression coefficients are {" +
coef[0] + ", " + coef[1] + "}");
}

NonlinearRegression.Function f = new NonlinearRegression.Function() {

public boolean f(double theta[], int iobs, double frq[],
double wt[], double e[]){

boolean iend;

if(iobs < nobs){
wt[0] = 1.0;
frq[0] = 1.0;
iend = true;
//System.out.println("theta[0]: "+theta[0] + "\t theta[1]: " + theta[1]);
double predictedY = theta[0] * Math.exp(theta[1] * xinput[iobs]);
e[0] = yinput[iobs] - predictedY;
} else {
iend = false;
}
return iend;
}
};

//Read in lines from Text file and load lines (rows) into vector.
public Vector<String> readTextFileLines(File textFile)throws Exception {
Vector<String> csvLines = new Vector<String>();
//read file
BufferedReader reader = new BufferedReader(new FileReader(textFile));
//read in each line
String line = null;
while( (line = reader.readLine()) != null ){
csvLines.add(line);
}
reader.close();
return csvLines;
}

//Parse lines into individual values and store them.
public void parseCSVLines(Vector<String> csvLines)throws Exception{
// remove header row
csvLines.remove(0);
//initialize double[][]
xinput = new double[csvLines.size()];
yinput = new double[csvLines.size()];

//populate double[][] with values
for(int index = 0; index < csvLines.size(); index++){
String row = csvLines.get(index);
String[] xyvalues = row.split(",");
xinput[index] = Double.parseDouble(xyvalues[0]);
yinput[index] = Double.parseDouble(xyvalues[1]);
}
}

}//end

totallyunimodular
05-04-2009, 09:42 PM
Your code appears fine, though admittedly I do not know Java all that well. I took your test data (thanks much for including it) and wrote some quick Python/PyIMSL code to test things out (note that PyIMSL makes use of the IMSL C Numerical Libraries as opposed to JMSL):



from numpy import *
from pylab import *
from imsl.stat.nonlinearRegression import nonlinearRegression

def fcn1(nIndep, x, nParams, theta):
return (theta[0]*exp(x[0]*theta[1]))

x, y = load('xy_values.txt', unpack=True, delimiter=',', skiprows=1)
plot(x,y, 'bo')

a = 850.7
b = 0.0005322
domain = arange(min(x)-100, max(x)+100, 1)
plot(domain, a*exp(b*domain), 'r--') #Fit by R's nls()

a = 18.755497533160803
b = 0.00022960312118070167
thetaHat = nonlinearRegression(fcn1, 2, x,y)
print 'a,b fit by nonlinearRegression: ', thetaHat


When I execute this code I get the following warning:


Warning - nonlinearRegression: Five consecutive steps of length "max_step" have been taken; either the function is unbounded below, or has a finite asymptote in some direction or the maximum allowable step size "max_step" is too small.


And the answers are


[ 1.18705366e+02 1.11365266e-03]



Taking a hint from the warning, if I use the maxStep optional argument and bump it up to 1e4 then the warning goes away and I get as answers


[ 8.50735048e+02 5.32144931e-04]



Note that I did not need to give an initial guess, although using the values you suggested leads to the same answer.

So... I have not tested in Java yet but it appears that the issue is simply adjusting the default step size. Did you see the warning message I listed?

ed
05-05-2009, 06:44 AM
I can confirm that in the Java version that if you increase the maximum step size, you get the same answer.

The modification is to add one line (and you can remove the setGuess() method call here too):


NonlinearRegression regression = new NonlinearRegression(nparam);
regression.setMaxStepsize(10000);

which then gives the consistent result:

The computed regression coefficients are {850.7350531604258, 5.321449257469793E-4}

I can also confirm that there is no warning message generated from the JMSL Library. I will have someone from IMSL tech support look into this.

grover11
05-05-2009, 08:46 AM
Ed and Totallyunimodular,

Thank you for the help.

Regards,
Grover11

brian
05-05-2009, 10:59 AM
Please see NonLinearRegression.getErrorStatus() (http://www.vni.com/products/imsl/jmsl/v501/manual/api/com/imsl/stat/NonlinearRegression.html#getErrorStatus())

--brian

grover11
05-06-2009, 05:24 PM
Thanks for the tip, Brian.

grover11
05-11-2009, 01:58 PM
Hi,

Using the suggestions in the previous posts of this thread, I was able to get my code to work for my sample data and a few other sets of data. I've encountered a new issue with another set of data though.

Once again, I am using JMSL 5.0.1 on WinXP and am fitting data to y = a * e^(b * x). Using nls() in R, I get coefficients of a = 2.443e+09 and b = 5.547e-03 for my new sample data. Using MS Excel, I get coefficients of a = 2e+09 and b = 0.0056. On the other hand, JMSL gives me values of 0.0 for both coefficients. Likewise, I receive an error of zero from getErrorStatus(). I tried using an Absolute Tolerance of 0.0 and Relative Tolerance of 0.0, but that didn't resolve my issue. Increasing or decreasing the max iterations and max stepsize don't help either. Does anyone have any suggestions of what may fix my issue?

Attached is the new sample data that causes this issue, which is a CSV file with the extension TXT zipped up. In addition, my updated code follows below.

Thanks in advance for the help.



import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Vector;

import com.imsl.stat.NonlinearRegression;

public class ExponentialCoefficients {

private File csvFile;
private double[] xinput;
private double[] yinput;
private int nobs;

//calculate coefficients a and b in predicted y = a * e^(b * x)
public static void main(String[] args){
try{
ExponentialCoefficients exp = new ExponentialCoefficients();
exp.calculateCoefficients();
}
catch(Exception e){
e.printStackTrace();
}
}

public ExponentialCoefficients() throws Exception{
csvFile = new File("<Path_to_file>\\xy_values1.txt");
parseCSVLines(readTextFileLines(csvFile));
nobs = xinput.length;
}

public void calculateCoefficients() throws Exception{

int nparam = 2;
NonlinearRegression regression = new NonlinearRegression(nparam);
regression.setMaxStepsize(10000);
regression.setMaxIterations(10000);
//regression.setAbsoluteTolerance(0.0);
//regression.setRelativeTolerance(0.0);
double coef[] = regression.solve(f);
System.out.println("Error: " + regression.getErrorStatus());
System.out.println("The computed regression coefficients are {" +
coef[0] + ", " + coef[1] + "}");
}

NonlinearRegression.Function f = new NonlinearRegression.Function() {

public boolean f(double theta[], int iobs, double frq[],
double wt[], double e[]){

boolean iend;

if(iobs < nobs){
wt[0] = 1.0;
frq[0] = 1.0;
iend = true;
double predictedY = theta[0] * Math.exp(theta[1] * xinput[iobs]);
e[0] = yinput[iobs] - predictedY;
} else {
iend = false;
}
return iend;
}
};

//Read in lines from Text file and load lines (rows) into vector.
public Vector<String> readTextFileLines(File textFile)throws Exception {
Vector<String> csvLines = new Vector<String>();
//read file
BufferedReader reader = new BufferedReader(new FileReader(textFile));
//read in each line
String line = null;
while( (line = reader.readLine()) != null ){
csvLines.add(line);
}
reader.close();
return csvLines;
}

//Parse lines into individual values and store them.
public void parseCSVLines(Vector<String> csvLines)throws Exception{
// remove header row
csvLines.remove(0);
//initialize double[][]
xinput = new double[csvLines.size()];
yinput = new double[csvLines.size()];

//populate double[][] with values
for(int index = 0; index < csvLines.size(); index++){
String row = csvLines.get(index);
String[] xyvalues = row.split(",");
xinput[index] = Double.parseDouble(xyvalues[0]);
yinput[index] = Double.parseDouble(xyvalues[1]);
}
}

}//end

ed
05-11-2009, 03:43 PM
I dug a little into the source code and found that the algorithm is exiting due to the gradient tolerance. I don't have it giving the right answer yet because if I tweak that value smaller, then it gives an error code of 4 again.

ed
05-12-2009, 09:32 AM
I dug a little into the source code and found that the algorithm is exiting due to the gradient tolerance. I don't have it giving the right answer yet because if I tweak that value smaller, then it gives an error code of 4 again.

To follow up, the extremely large values of the Y input vector (2e9) are throwing off the gradient calculations. You'll need to either scale the input values or use the "setScale" method to bring values into the expected range of the algorithm.

For this case, expected values of the coefficients are O(1e9) and O(1e-3).

So if we modify the code with the following lines:
regression.setGuess(new double[] {1.e9, 1.e-3});
regression.setScale(new double[] {1.e9, 1.e-3});

Then the correct answers are returned:
The computed regression coefficients are {2.4433672401233315E9, 0.005547308974983051}

No need to change the max step size here.