PDA

View Full Version : strange randomu behavior



hcrisp
04-21-2009, 10:30 AM
I thought I understood RANDOMU, but I am getting strange results.



num = 20000
s1 = 1.45
s2 = 5.60
x = 20*(randomu(s1,num)-0.5)
y = 10*(randomu(s2,num))
plot, x, y, psym=3


Note the cross plot is not uniformly filled in. It should be, and changing the seed values slightly helps, but it is still there.


num = 20000
s1 = 145.
s2 = 560.
x = 20*(randomu(s1,num)-0.5)
y = 10*(randomu(s2,num))
plot, x, y, psym=3


Why is this happening, and how can RANDOMU be called so this effect does not occur?

totallyunimodular
04-24-2009, 01:05 PM
In the first case you are creating two streams of pseudo-random numbers with each stream starting relatively close together. In the second case you are starting the two streams further apart so there is less cross correlation in the values. Try experimenting with s1=1 and s2=1.1 and you'll get a single straight line. As the seeds grow further apart the plot of (x,y) will look more and more like white noise. If you pass in s1 or s2 as undefined variables then the seed is set as the system clock time, and this is the best way to avoid the behavior you are seeing if in fact you want x and y to be uncorrelated. In general, setting the seed to a specific value is useful for when you want to later repeat an analysis using the exact same sequence of random numbers. Chapter 13 in the IMSL Statistics Reference has a nice intro on random number generation.

hcrisp
04-27-2009, 07:42 AM
Thanks, that kind of explains it. I can't use undefined seeds, however, since I am calling this in a procedure, and the clock-based seeds occur too closely to be useful. Instead I can pick hard-coded far apart:



num = 20000
s1 = 0.
s2 = 1.e7
x = 20*(randomu(s1,num)-0.5)
y = 10*(randomu(s2,num))
plot, x, y, psym=3


That works.

For some reason, however, if I pick numbers too far apart, it does not work:



num = 20000
s1 = 0.
s2 = 1.e15
x = 20*(randomu(s1,num)-0.5)
y = 10*(randomu(s2,num))
plot, x, y, psym=3


Is there any extensive documentation on why this occurs for RANDOMU? I could not find it in ch. 13 of the IMSL doc. Which seed range makes the outputs most random?

totallyunimodular
04-27-2009, 02:29 PM
I can't use undefined seeds, however, since I am calling this in a procedure, and the clock-based seeds occur too closely to be useful.

I am not sure what you are seeing here, but try the following...



num = 20000
x = 20*(randomu(dne,num)-0.5) & y = 10*randomu(dne,num)
plot, x, y, psym=3

Its hard to imagine how calling your procedure would lead to closer clock times than above, and those results look like white noise. Here is a key point I left out in the previous post: seeds are generally integer-valued, nonnegative, and range over 32-bit integers. I am not sure how RANDOMU deals with a seed of 0, but LONG(1e-15) is a negative number. RANDOMU is a PV-WAVE kernel routine that calls underlying standard C libraries. The stuff in Chapter 13 of IMSL Statistics are IMSL C routines, but the intro documentation may help explain the behavior of any random number generator.

I don't know if there is any documentation that discusses properties of RNG's with respect to seed ranges. I think you'd have to understand the mathematical properties of the particular RNG being used (in RANDOMU's case its a multiplicative congruential generator). Again, I don't have any experience with finding "optimal" seeds because I use seeds merely to have reproducible random data. In your case, you may want to describe your application context more and maybe someone else can chime in on how to best pick seeds according to your criteria.

ed
04-27-2009, 03:52 PM
Note that the default undefined/clock-based seeds used sequentially actually work quite well. In the last example the variable "dne" is used both times. The first time, it's undefined, and RANDOMU creates a set of random numbers, and then updates the variable. You can re-use this variable and get another set of random numbers.

While the results aren't duplicated from run to run like setting a seed manually for each function call, the output doesn't show the pattern the earlier examples show. Running the following many times, the results look fairly random.

For example:
WAVE> num = 20000
WAVE> print, blah
1496036564
WAVE> x = randomu(blah, num)
WAVE> print, blah
1091886500
WAVE> y = randomu(blah, num)
WAVE> print, blah
1328259481
WAVE> plot, x, y, psym=3

And here's the wave script
num = 20000
print, blah
x = randomu(blah, num)
print, blah
y = randomu(blah, num)
print, blah
plot, x, y, psym=3

totallyunimodular
04-27-2009, 04:39 PM
Thanks for pointing this out ed.

I should clarify my previous comments in response to hcrisp too: streams of values from RANDOMU will have the same large sample statistical properties regardless of which seed is used. For hcrisp, the issue appears to be that he wants to pick seeds such that multiple streams from the same RNG have certain cross correlation properties. With respect to this latter issue, again, I can't offer any advice and hcrisp likely needs to flesh out his problem context before anyone else can help either.

hcrisp
04-28-2009, 08:55 AM
I beg your pardon, but I realize now that I was calling the undefined seeds incorrectly. When you do as I did and call the two RANDOMU statements with two different undefined seeds, you will see that they give a non-random cross-plot.



num = 20000
info, s1, s2
; S1 UNDEFINED = <Undefined>
; S2 UNDEFINED = <Undefined>
x = 20*(randomu(s1,num)-0.5) & y = 10*randomu(s2,num)
plot, x, y, psym=3


Thanks for the clarification, I now know how to best call it!