当前位置：网站首页>Solving a random number problem

Solving a random number problem

2022-07-25 08:40:00 【Four fires】

There is such an algorithm problem ：

Given one that can generate uniform 1~5 Function of random enumerator , Please design one that can generate uniform 1~7 Function of random enumerator .

That is to say , There is a function that generates random numbers rand5, May return 1、2、3、4、5 this 5 Enumeration values , The probability of each value being returned is strict 1/5, Now we need to design a similar random number function rand7, May return 1、2、3、4、5、6、7 These enumeration values , The probability of each value being returned is strict 1/7.

Cover your paper and think first , Thoughts that come to mind include ：

call rand5 Divided by 5, Multiplied by 7, The range of such results is 7/5~7, Not the desired result ;
Call again and again rand5 function 7 Time , The result is divided by 5, The range of such results is also 7/5 ~ 7, Not the desired result .

What if the title is reversed , It is known that rand7, seek rand5 Well ？

Then I can call rand7, Look at the results , If the result is 1~5, Go straight back to ; If the result is 6、7, Just keep trying again ？

Then come back to reality , How to base it on rand5 seek rand7？

If rand5 + rand5 Result , The scope is 2~10, You can only get 2~7 Value , Can't get 1, Not to the point .

([2019-4-5] Some people say , That can reduce the above result 1 Not to go ？ namely rand5 + rand5 – 1, The range of the number obtained is 1~9, Give up 8、9, It seems that we can , But it's still wrong to think about it carefully ： To abandon this 9 For example , You have to get it twice rand5 It has to be 5, So probability is 1/5*1/5=1/25, This probability is obviously lower than the average probability of a single value . in other words , This method yields 1~7 Every number in is not possible )

But still got a kind of inspiration , Call once rand5, The various possibilities of the result are 5 Kind of , To map to rand7 Of 7 There are two possible outcomes , It's unrealistic . But if you throw it twice , Without considering weight removal , There are 5*5=25 Maybe , Map in some way and keep the final 7 A possibility , But it is a thought worth trying .

Think of the 5*5, So try to build a two-dimensional array arr[5][5], Then every element of the array can represent the possibility of a result , Get the first in the array 7 Elements , Map to respectively 1~7：

[1, 2, 3, 4, 5]
[6, 7, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]

So call rand5 two , Get the abscissa respectively i And the vertical j, If arr[i][j]>0, The retention , Otherwise try again .

This method is not perfect , because 25 There are only 7 One is effective , In most cases, you can only try again , Efficiency is too low .

therefore , In this two-dimensional array, not only the former 7 position , But to keep as much as possible 7 Complete multiple of ：

[1, 2, 3, 4, 5]
[6, 7, 1, 2, 3]
[4, 5, 6, 7, 1]
[2, 3, 4, 5, 6]
[7, 0, 0, 0, 0]

thus , In most cases , Will hit more than 0 The elements of .

Then write the code ：

public class R {
	private static Random random = new Random();
	private static int[][] arr = new int[][]{
		{1,2,3,4,5},
		{6,7,1,2,3},
		{4,5,6,7,1},
		{2,3,4,5,6},
		{7,0,0,0,0}
	};

	public static int rand5(){
		random.setSeed(System.nanoTime());
		return random.nextInt(5) + 1;
	}
	
	public static int rand7() {
		int i = rand5() - 1;
		int j = rand5() - 1;
		
		if (i == 4 && j >= 1)
			return rand7();

		return arr[i][j];
	}
}

Write another main Function test ：

Map<Integer, Integer> counter = new HashMap<Integer, Integer>();

for (int i = 0; i < 10000000; i++) {
	int key = rand7();
	Integer val = counter.get(key);
	if (null == val)
		val = 0;
	val++;
	counter.put(key, val);
}

for (Map.Entry<Integer, Integer> entry : counter.entrySet()) {
	System.out.println(entry.getKey() + ": " + entry.getValue());
}

Repeat the test ten million times , But from the results , The distribution is not random enough ：

The test was repeated several times , It's all output 2 Mostly . In terms of this amount of data , I don't think it's a coincidence .

In order to make this possible difference more obvious , I will rand5 seek rand7 Changed from rand2 seek rand3：

public class R {
	private static Random random = new Random();
	private static int[][] arr = new int[2][2];
	static {
		arr[0][0] = 1;
		arr[0][1] = 2;
		arr[1][0] = 3;
		arr[1][1] = 0;
	}

	public static int rand2(){
		random.setSeed(System.nanoTime());
		return random.nextInt(2) + 1;
	}
	
	public static int rand3() {
		int i = rand2() - 1;
		int j = rand2() - 1;

		if (i == 1 && j == 1)
			return rand3();

		return arr[i][j];
	}
}

Test again 10 million times , It turns out to be a surprise ：

One: 5043264
Two: 2472499
Three: 2484237

There is a double difference .

What's going on ？ I began to wonder Java The result of this pseudo-random function of （ It is impossible to realize absolute randomness in the computer world ） Not random enough , So I wrote a program to call 10 million times Java From the pseudo-random function of ：

Map<Integer, Integer> counter = new HashMap<Integer, Integer>();

for (int i = 0; i < 10000000; i++) {
	random.setSeed(System.nanoTime());
	int key = random.nextInt(7) + 1;
	
	Integer val = counter.get(key);
	if (null == val)
		val = 0;
	val++;
	counter.put(key, val);
}

for (Map.Entry<Integer, Integer> entry : counter.entrySet()) {
	System.out.println(entry.getKey() + ": " + entry.getValue());
}

From the results, the distribution is very uniform ：

I suddenly feel puzzled .

So I reexamine my thinking , I still feel there is nothing wrong . Although the overall results initially have 25 individual , But before 21 The results of each are consistent , The last four lose and start again , Continuous testing can still ensure equal probability of results . In essence, this method is statistically called “Reject Sampling”.

I consulted a mathematician @ Wanjing ink green , He said there was nothing wrong with thinking , If there's a problem , It can only be a matter of code .

There is actually another way , It is similar in essence , According to ：

5 * (rand5()-1) + rand5()

The result of the above formula can be obtained from 1 To 25 All the results , And obviously this 25 When results appear , these two items. rand5() Can be uniquely determined by the return value , Therefore, the probability of their occurrence is equal to each other . So we can use the above formula , When the result is greater than 3*7=21 Recalculate when , Otherwise, it returns divided by 7 The remainder of ：

public class R {
	private static Random random = new Random();

	public static int rand5() {
		random.setSeed(System.nanoTime());
		return random.nextInt(5) + 1;
	}

	public static int rand7() {
		int i = rand5();
		int j = rand5();

		int res = 5 * (i - 1) + j;

		if (res > 21)
			return rand7();

		return res % 7 + 1;
	}
}

Also ran several tests , Test 10 million pieces of data each time , This time, I found that the larger number went to 3 It's gone. ：

In this way, I feel a little enlightened , I think it's because Java Pseudo random number generation method , The generated number is not random enough ？ Although it looks random , But that's just what it looks like . When used “ Small random ” To generate “ Big random ” When , Those non random defects are amplified . And compare rand2 Generate rand3, and rand5 Generate rand7, Obviously the former “ Zoom in ” The multiple of , So in the final result ,“ Randomness ” Look bad .

To further test this conjecture , I began to think about whether I could make the seeds of random numbers change more . Because the random number seed currently used is System.nanoTime(), This method looks like nanosecond , In fact, it's just ：

Returns the current value of the most precise available system timer, in nanoseconds.

I think it is much more accurate than milliseconds in my experiment , But it is only guaranteed to be as accurate as possible .

Good. , To verify or partially verify such conjectures , Now suppose that this conjecture is correct , Then we can draw such a conclusion ：

If the random number seed is replaced by System.currentTimeMillis(), in other words , Change to milliseconds , Then the final result should be less random ;
If I take a few milliseconds before each random number , Make the time seed difference between every two increases , It should be possible to see an increase in the randomness of the final result .

（ notes , The version I tested JDK The treatment method for the set seed is ：seed = (seed ^ 0x5DEECE66DL) & ((1L << 48) -1).）

ok , Now let's verify the first , In order to make the result as obvious as possible , Use rand2 Generate rand3 The plan of . Change the use of nanoseconds as random number seeds to milliseconds as random number seeds , It turned out to be ：

One: 10000000
Two: 0
Three: 0

In other words , The abscissa and ordinate of the two-dimensional array are actually in 10 million tests , All the results are the same , That is, in most cases i and j All operations are completed within the same millisecond .

Now let's verify the second , Before taking random numbers every time , Sleep 3 millisecond , Of course , This 3 Milliseconds must also be inaccurate 3 millisecond . In order to increase the sleep time , I can get the final result within my patient time , I can't test 10 million times , My test case is changed to test 10000 times , The result is ：

One: 3691
Two: 3103
Three: 3206

Sure enough , The uniformity of distribution is much better .

The problem is not over , In order to eliminate the pseudo randomness caused by this seed similarity as much as possible , In fact, you can only initialize the seed once , Then call continuously inside the iterative method Random Of nextInt Just the way , This is also a more reasonable way to use the random number generator in this case ：

public class R {
	private static Random random = new Random(System.nanoTime());
	private static int[][] arr = new int[][]{
		{1,2,3,4,5},
		{6,7,1,2,3},
		{4,5,6,7,1},
		{2,3,4,5,6},
		{7,0,0,0,0}
	};

	public static int rand5(){
		return random.nextInt(5) + 1;
	}
	
	public static int rand7() {
		int i = rand5() - 1;
		int j = rand5() - 1;
		
		if (i == 4 && j >= 1)
			return rand7();

		return arr[i][j];
	}

	public static void main(String[] args) {
		Map<Integer, Integer> counter = new HashMap<Integer, Integer>();

		for (int i = 0; i < 10000000; i++) {
			int key = rand7();
			Integer val = counter.get(key);
			if (null == val)
				val = 0;
			val++;
			counter.put(key, val);
		}

		for (Map.Entry<Integer, Integer> entry : counter.entrySet()) {
			System.out.println(entry.getKey() + ": " + entry.getValue());
		}
	}
}

This time, , Not only is it evenly distributed , And the execution speed is much faster ：

It was fun .

The articles are all original without special indication , It shall not be used for any commercial purpose without permission , Please keep the integrity of reprint and indicate the source link 《 Four fire's nagging 》

原网站

版权声明
本文为[Four fires]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/201/202207191408520294.html

当前位置：网站首页>Solving a random number problem

Solving a random number problem

边栏推荐

猜你喜欢

随机推荐