PDA

View Full Version : Pearson Product Moment



jsteinfo
07-30-2009, 02:40 PM
Hi Folks,
Let me first start of by apologizing for my ignorance. I am not very versed in the Statistics domain. I am a software developer supporting statisticians and engineers.

I have been asked to implement a Regression analysis that results in something called a "Pearson Product Moment". I have been looking through the JMSL reference manual and do not see such a thing.

Does JMSL support this analysis function? I was given this link for the definition of the analysis function: http://www.mnstate.edu/wasson/ed602pearsoncorr.htm but so far do not see such a thing in the JMSL docs.

Thanks!
Jesse

ed
07-31-2009, 08:16 AM
Disclaimer: I am also not a statistician, so I am not sure this applies.

Have a look at the JMSL class ContingencyTable (http://www.vni.com/products/imsl/jmsl/v501/manual/api/com/imsl/stat/ContingencyTable.html). This is the only reference for "Pearson" in the JMSL documentation in the context of the Chi-squared statistic. If this isn't what you're looking for, I can certainly file an enhancement request for consideration to include this functionality in a future release.

fik
08-02-2009, 09:37 AM
Jesse,

FYI, I am not a statistician either like you and Ed are. I am just a fan of Karl Pearson, who is credited for his work on Pearson?s family of distributions. So I couldn?t resist replying to this thread.

If you are just looking to find the degree of association between the two variables in your regression analysis routine, below is a code example that calculates the Pearson Product Moment Correlation Coefficient, also called ?Pearson's r?.

Note: I used data form the link you provided.




public class PearsonProductMoment {

/**
* @param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
// input data
double x[] = { 3, 7, 2, 9, 8, 4, 1, 10, 6, 5 };
double y[] = { 11, 1, 19, 5, 17, 3, 15, 9, 15, 8 };
// Testing
System.out.println("PPM CC using definition formula: "
+ definition(x, y));
System.out.println("PPM CC using row score formula: "
+ Row_Score(x, y));
}

/*
* Method 1: This method calculates the Pearson Product Moment Correlation
* Coefficient by using the definition formula
*/

public static double definition(double dataX[], double dataY[]) {
double meanX = 0.0;
double meanY = 0.0;
// Calculate the number of elements
final int N = dataX.length;
for (int i = 0; i < N; i++) {
meanX += dataX[i];
meanY += dataY[i];
}
meanX /= N;
meanY /= N;
double upperZxy = 0.0;
double varX = 0.0, varY = 0.0;
for (int i = 0; i < N; i++) {
double Zxy = (dataX[i] - meanX) * (dataY[i] - meanY);
upperZxy += Zxy;
double varx = dataX[i] - meanX;
varX += varx * varx;
double vary = dataY[i] - meanY;
varY += vary * vary;
}

double r_definition = upperZxy
/ (N * Math.sqrt(varX / N) * Math.sqrt(varY / N));
return r_definition;
}

/*
* Method 2: This method calculates the Pearson Product Moment Correlation
* Coefficient by using the row score formula
*/
public static double Row_Score(double dataX[], double dataY[]) {
final int N = dataX.length;
double sumX = 0.0, sumY = 0.0, sumXY = 0.0, sumXX = 0.0, sumYY = 0.0;
for (int i = 0; i < N; i++) {
sumX += dataX[i];
sumY += dataY[i];
sumYY += dataY[i] * dataY[i];
sumXX += dataX[i] * dataX[i];
sumXY += dataX[i] * dataY[i];
}
double numerator = (N * sumXY - (sumX * sumY));
double denominator = Math.sqrt(N * sumXX - sumX * sumX)
* Math.sqrt(N * sumYY - sumY * sumY);
double r_Row_Score = numerator / denominator;
return r_Row_Score;
}
}


The result from this calculation gives similar result as in the original example shown on the link you gave.



PPM CC using definition formula: -0.3611811566150104
PPM CC using row score formula: -0.3611811566150105

totallyunimodular
08-04-2009, 03:18 PM
The Pearson Product Moment is just the "usual" correlation, or r value, one might think of. fik was nice enough to provide the code you could use, or alternatively use the JMSL CrossCorrelation class and take the lag=0 correlation value from the results. For example, here is doing this in Python usin PyIMSL

In [3]: from imsl.stat.crossCorrelation import crossCorrelation
In [4]: a = [3, 7, 2, 9, 8, 4, 1, 10, 6, 5]
In [5]: b = [11, 1, 19, 5, 17, 3, 15, 9, 15, 8]
In [6]: crosscorrelation(a,b,1)
Out[6]: array([ 0.54774167, -0.36118116, 0.47072949])

So, at lag = -1 the cross correlation between a and b is 0.54774167
At lag = +1 the cross correlation is 0.47072949
And at lag = 0, i.e., the Pearson Product Moment, the cross correlation -0.36118116:)