当前位置：网站首页>Snowflake algorithm implemented in go language

Snowflake algorithm implemented in go language

2022-06-24 16:14:00 【luozhiyun】

Please state the source of reprint ~, This article was published at luozhiyun The blog of ：https://www.luozhiyun.com/archives/527

Every time I take a long vacation at home , Always want to find some simple examples to see the implementation principle , Let's take a look at Go Language snowflake algorithm .

Introduce

Sometimes in business , You need to use some unique ID, To record the identification of one of our data . The most commonly used are the following ：UUID、 Database auto increment primary key 、Redis Of Incr Command to get a unique value . Now let's talk about their advantages and disadvantages , In order to introduce our distributed snowflake algorithm .

UUID

First of all UUID , It is from 128 Bit binary composition , It's usually converted to hexadecimal , And then use String Express . In order to ensure UUID Uniqueness , The specification defines including network card MAC Address 、 Time stamp 、 The name is empty （Namespace）、 Random or pseudorandom number 、 Time series and other elements , And generating from these elements UUID The algorithm of .

UUID There are five versions ：

edition 1： Based on timestamps and mac Address
edition 2： Based on timestamps ,mac Address and POSIX UID/GID
edition 3： be based on MD5 The hash algorithm
edition 4： Based on random numbers
edition 5： be based on SHA-1 The hash algorithm

UUID Advantages and disadvantages :

The advantage is that the code is simple , Good performance . The disadvantage is that there's no sort , There's no guarantee of sequential increments ; Second, it's too long. It's quite long , The storage database takes up a lot of space , Not conducive to retrieval and sorting .

Database auto increment primary key

If using mysql database , So by setting the primary key to auto_increment It's the only one that's easiest to achieve monotonic increment ID Methods , And it's also easy to sort and index .

But the disadvantages are obvious , Due to over reliance on databases , Then, limited by the performance of the database, the concurrency is not high ; Next, if the amount of data is too large, it will bring problems to sub database and sub table ; And if the database goes down , Then this function can't be used .

Redis

Redis At present, it is an indispensable existence in many projects , stay Redis There are two orders in Incr、IncrBy , because Redis It is single threaded, so these two instructions can guarantee atomicity and achieve the goal of generating unique values , And the performance is very good .

But in Redis in , Even if there is AOF and RDB , But there will still be data loss , It could cause ID repeat ; Then there is the need to rely on Redis , If it's unstable , Then it will affect ID Generate .

Snowflake

Through the above analysis , Finally, our distributed snowflake algorithm is introduced Snowflake , It was first twitter Internal use of the distributed environment under the unique ID generating algorithm . stay 2014 In open source . The open source version is created by scala To write , You can find this version at another address .

https://github.com/twitter-archive/snowflake/tags

It has the following characteristics ：

It can meet the requirements of high concurrency distributed system environment ID No repetition ;
Based on timestamps , It can ensure the basic orderly increase ;
Independent of third-party libraries or middleware ;

Realization principle

Snowflake The structure is a problem 64bit Of int64 Data of type . as follows ：

snowflake

Location	size	effect
0～11bit	12bits	Serial number , It is used to generate different ID, Recordable 4095 individual
12～21bit	10bits	10bit Used to record machines ID, It can be recorded in total 1024 Taiwan machine
22～62bit	41bits	Used to record timestamps , Here you can record 69 year
63bit	1bit	Sign bit , Don't deal with it

It's just a will 64bit The general standard of division , The general situation can be adjusted according to their own business situation . For example, there are only machines in business at present 10 Taiwan is expected to increase to three figures in the future , And the need for multi room deployment ,QPS In a few years it will grow to a million .

So for millions QPS Divide equally 10 On each machine, each machine can undertake a hundred level request ,12 bit Your serial number is enough . For machines that will grow to three digits in the future , And we need multi machine room deployment, we just need to 10 bits Of work id To break up , Division 3 bits To represent the number of computer rooms, a total of Representatives can be deployed 8 A computer room , other 7bits Represents the number of machines, represents that each computer room can be deployed 128 Taiwan machine . Then the data format will look like this ：

Code implementation

Implementation steps

In fact, after understanding the above data structure , Need to implement a snowflake algorithm is very simple , The steps are roughly as follows ：

Gets the current millisecond timestamp ;
Compare the current millisecond timestamp with the last saved timestamp ;
1. If it is equal to the last saved timestamp , So for the serial number sequence Add one ;
2. If it's not equal , So set it directly sequence by 0 that will do ;
Then, the snow algorithm needs to return by or operation int64 Return value .

Code implementation

First we need to define a Snowflake Structure ：

type Snowflake struct {
	sync.Mutex					//  lock 
	timestamp    int64	//  Time stamp  , millisecond 
	workerid     int64  //  Work node 
	datacenterid int64	//  Data center room id
	sequence     int64	//  Serial number 
}

Then we need to define some constants , It is convenient for us to carry out bit operation when using snowflake algorithm ：

const (
  epoch             = int64(1577808000000)                           //  Set the start time ( Time stamp / millisecond )：2020-01-01 00:00:00, The period of validity 69 year 
	timestampBits     = uint(41)                                       //  Number of timestamps occupied 
	datacenteridBits  = uint(2)                                        //  Data Center id The number of digits 
	workeridBits      = uint(7)                                        //  machine id The number of digits 
	sequenceBits      = uint(12)                                       //  The number of digits occupied by the sequence 
	timestampMax      = int64(-1 ^ (-1 << timestampBits))              //  Timestamp maximum 
	datacenteridMax   = int64(-1 ^ (-1 << datacenteridBits))           //  The largest data center supported id Number 
	workeridMax       = int64(-1 ^ (-1 << workeridBits))               //  The largest machine supported id Number 
	sequenceMask      = int64(-1 ^ (-1 << sequenceBits))               //  The largest sequence supported id Number 
	workeridShift     = sequenceBits                                   //  machine id Shift left digit 
	datacenteridShift = sequenceBits + workeridBits                    //  Data Center id Shift left digit 
	timestampShift    = sequenceBits + workeridBits + datacenteridBits //  The time stamp is shifted to the left by a few digits 
)

It needs to be noted that -1 In binary, it means ：

11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111

So I want to ask for it 41bits Of timestamp The maximum value can be -1 Shift left 41 position , obtain ：

11111111 11111111 11111110 00000000 00000000 00000000 00000000 00000000

Then talk to -1 Conduct ^ Exclusive or operation ：

00000000 00000000 00000001 11111111 11111111 11111111 11111111 11111111

That means 41bits Of timestamp Maximum .datacenteridMax、workeridMax Also in the same way .

Then we can set up a NextVal Function to get Snowflake Back to ID 了：

func (s *Snowflake) NextVal() int64 {
	s.Lock()
	now := time.Now().UnixNano() / 1000000 //  Turn millisecond 
	if s.timestamp == now {
		//  When the same timestamp （ precision ： millisecond ） Next generation id The serial number will be added 
		s.sequence = (s.sequence + 1) & sequenceMask
		if s.sequence == 0 {
			//  If the current sequence exceeds 12bit length , You have to wait for the next millisecond 
			//  The next millisecond will use sequence:0
			for now <= s.timestamp {
				now = time.Now().UnixNano() / 1000000
			}
		}
	} else {
		//  Different timestamps （ precision ： millisecond ） Use the serial number directly ：0
		s.sequence = 0
	}
	t := now - epoch
	if t > timestampMax {
		s.Unlock()
		glog.Errorf("epoch must be between 0 and %d", timestampMax-1)
		return 0
	}
	s.timestamp = now
	r := int64((t)<<timestampShift | (s.datacenterid << datacenteridShift) | (s.workerid << workeridShift) | (s.sequence))
	s.Unlock()
	return r
}

The above code is also very simple , Look at the notes and you should understand , I'm going to talk about the last one that came back r What does a series of bit operations mean .

First t It means the present distance epoch The time difference , We epoch What is set during initialization is 2020-01-01 00:00:00, So for 41bit Of timestamp It will be in 69 Years later . Yes t After moving to the left , lower than timestampShift All over the place 0 , from datacenterid、workerid、sequence Take or fill .

In the current example , If the current time is 2021/01/01 00:00:00, The result of bit operation is shown in the figure below ：

（ I think the picture is too small , You can see it here ：https://img.luozhiyun.com/20210502181513.png）

luozhiyun cool

原网站

版权声明
本文为[luozhiyun]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/05/20210503205259117d.html