当前位置:网站首页>Snowflake algorithm implemented in go language
Snowflake algorithm implemented in go language
2022-06-24 16:14:00 【luozhiyun】
Please state the source of reprint ~, This article was published at luozhiyun The blog of :https://www.luozhiyun.com/archives/527
Every time I take a long vacation at home , Always want to find some simple examples to see the implementation principle , Let's take a look at Go Language snowflake algorithm .
Introduce
Sometimes in business , You need to use some unique ID, To record the identification of one of our data . The most commonly used are the following :UUID、 Database auto increment primary key 、Redis Of Incr Command to get a unique value . Now let's talk about their advantages and disadvantages , In order to introduce our distributed snowflake algorithm .
UUID
First of all UUID , It is from 128 Bit binary composition , It's usually converted to hexadecimal , And then use String Express . In order to ensure UUID Uniqueness , The specification defines including network card MAC Address 、 Time stamp 、 The name is empty (Namespace)、 Random or pseudorandom number 、 Time series and other elements , And generating from these elements UUID The algorithm of .
UUID There are five versions :
- edition 1: Based on timestamps and mac Address
- edition 2: Based on timestamps ,mac Address and
POSIX UID/GID - edition 3: be based on MD5 The hash algorithm
- edition 4: Based on random numbers
- edition 5: be based on SHA-1 The hash algorithm
UUID Advantages and disadvantages :
The advantage is that the code is simple , Good performance . The disadvantage is that there's no sort , There's no guarantee of sequential increments ; Second, it's too long. It's quite long , The storage database takes up a lot of space , Not conducive to retrieval and sorting .
Database auto increment primary key
If using mysql database , So by setting the primary key to auto_increment It's the only one that's easiest to achieve monotonic increment ID Methods , And it's also easy to sort and index .
But the disadvantages are obvious , Due to over reliance on databases , Then, limited by the performance of the database, the concurrency is not high ; Next, if the amount of data is too large, it will bring problems to sub database and sub table ; And if the database goes down , Then this function can't be used .
Redis
Redis At present, it is an indispensable existence in many projects , stay Redis There are two orders in Incr、IncrBy , because Redis It is single threaded, so these two instructions can guarantee atomicity and achieve the goal of generating unique values , And the performance is very good .
But in Redis in , Even if there is AOF and RDB , But there will still be data loss , It could cause ID repeat ; Then there is the need to rely on Redis , If it's unstable , Then it will affect ID Generate .
Snowflake
Through the above analysis , Finally, our distributed snowflake algorithm is introduced Snowflake , It was first twitter Internal use of the distributed environment under the unique ID generating algorithm . stay 2014 In open source . The open source version is created by scala To write , You can find this version at another address .
https://github.com/twitter-archive/snowflake/tags
It has the following characteristics :
- It can meet the requirements of high concurrency distributed system environment ID No repetition ;
- Based on timestamps , It can ensure the basic orderly increase ;
- Independent of third-party libraries or middleware ;
Realization principle
Snowflake The structure is a problem 64bit Of int64 Data of type . as follows :
Location | size | effect |
|---|---|---|
0~11bit | 12bits | Serial number , It is used to generate different ID, Recordable 4095 individual |
12~21bit | 10bits | 10bit Used to record machines ID, It can be recorded in total 1024 Taiwan machine |
22~62bit | 41bits | Used to record timestamps , Here you can record 69 year |
63bit | 1bit | Sign bit , Don't deal with it |
It's just a will 64bit The general standard of division , The general situation can be adjusted according to their own business situation . For example, there are only machines in business at present 10 Taiwan is expected to increase to three figures in the future , And the need for multi room deployment ,QPS In a few years it will grow to a million .
So for millions QPS Divide equally 10 On each machine, each machine can undertake a hundred level request ,12 bit Your serial number is enough . For machines that will grow to three digits in the future , And we need multi machine room deployment, we just need to 10 bits Of work id To break up , Division 3 bits To represent the number of computer rooms, a total of Representatives can be deployed 8 A computer room , other 7bits Represents the number of machines, represents that each computer room can be deployed 128 Taiwan machine . Then the data format will look like this :
Code implementation
Implementation steps
In fact, after understanding the above data structure , Need to implement a snowflake algorithm is very simple , The steps are roughly as follows :
- Gets the current millisecond timestamp ;
- Compare the current millisecond timestamp with the last saved timestamp ;
- If it is equal to the last saved timestamp , So for the serial number sequence Add one ;
- If it's not equal , So set it directly sequence by 0 that will do ;
- Then, the snow algorithm needs to return by or operation int64 Return value .
Code implementation
First we need to define a Snowflake Structure :
type Snowflake struct {
sync.Mutex // lock
timestamp int64 // Time stamp , millisecond
workerid int64 // Work node
datacenterid int64 // Data center room id
sequence int64 // Serial number
}Then we need to define some constants , It is convenient for us to carry out bit operation when using snowflake algorithm :
const ( epoch = int64(1577808000000) // Set the start time ( Time stamp / millisecond ):2020-01-01 00:00:00, The period of validity 69 year timestampBits = uint(41) // Number of timestamps occupied datacenteridBits = uint(2) // Data Center id The number of digits workeridBits = uint(7) // machine id The number of digits sequenceBits = uint(12) // The number of digits occupied by the sequence timestampMax = int64(-1 ^ (-1 << timestampBits)) // Timestamp maximum datacenteridMax = int64(-1 ^ (-1 << datacenteridBits)) // The largest data center supported id Number workeridMax = int64(-1 ^ (-1 << workeridBits)) // The largest machine supported id Number sequenceMask = int64(-1 ^ (-1 << sequenceBits)) // The largest sequence supported id Number workeridShift = sequenceBits // machine id Shift left digit datacenteridShift = sequenceBits + workeridBits // Data Center id Shift left digit timestampShift = sequenceBits + workeridBits + datacenteridBits // The time stamp is shifted to the left by a few digits )
It needs to be noted that -1 In binary, it means :
11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111
So I want to ask for it 41bits Of timestamp The maximum value can be -1 Shift left 41 position , obtain :
11111111 11111111 11111110 00000000 00000000 00000000 00000000 00000000
Then talk to -1 Conduct ^ Exclusive or operation :
00000000 00000000 00000001 11111111 11111111 11111111 11111111 11111111
That means 41bits Of timestamp Maximum .datacenteridMax、workeridMax Also in the same way .
Then we can set up a NextVal Function to get Snowflake Back to ID 了 :
func (s *Snowflake) NextVal() int64 {
s.Lock()
now := time.Now().UnixNano() / 1000000 // Turn millisecond
if s.timestamp == now {
// When the same timestamp ( precision : millisecond ) Next generation id The serial number will be added
s.sequence = (s.sequence + 1) & sequenceMask
if s.sequence == 0 {
// If the current sequence exceeds 12bit length , You have to wait for the next millisecond
// The next millisecond will use sequence:0
for now <= s.timestamp {
now = time.Now().UnixNano() / 1000000
}
}
} else {
// Different timestamps ( precision : millisecond ) Use the serial number directly :0
s.sequence = 0
}
t := now - epoch
if t > timestampMax {
s.Unlock()
glog.Errorf("epoch must be between 0 and %d", timestampMax-1)
return 0
}
s.timestamp = now
r := int64((t)<<timestampShift | (s.datacenterid << datacenteridShift) | (s.workerid << workeridShift) | (s.sequence))
s.Unlock()
return r
}The above code is also very simple , Look at the notes and you should understand , I'm going to talk about the last one that came back r What does a series of bit operations mean .
First t It means the present distance epoch The time difference , We epoch What is set during initialization is 2020-01-01 00:00:00, So for 41bit Of timestamp It will be in 69 Years later . Yes t After moving to the left , lower than timestampShift All over the place 0 , from datacenterid、workerid、sequence Take or fill .
In the current example , If the current time is 2021/01/01 00:00:00, The result of bit operation is shown in the figure below :
( I think the picture is too small , You can see it here :https://img.luozhiyun.com/20210502181513.png)
边栏推荐
- Detailed explanation of estab of Stata regression table output
- Build go command line program tool chain
- 中国产品经理的没落:从怀恋乔布斯开始谈起
- Here comes Wi Fi 7. How strong is it?
- 2021-05-02: given the path of a file directory, write a function
- 企业安全攻击面分析工具
- 一文详解JackSon配置信息
- 日志记录真没你想的那么简单
- Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021
- Efficient tools commonly used by individuals
猜你喜欢

一文理解OpenStack网络

The equipment is connected to the easycvr platform through the national standard gb28181. How to solve the problem of disconnection?

微信公众号调试与Natapp环境搭建

C. Three displays(动态规划)Codeforces Round #485 (Div. 2)

Understanding openstack network

Recommend several super practical data analysis tools

几种常见的DoS攻击

Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021

Build go command line program tool chain
MySQL Advanced Series: Locks - Locks in InnoDB
随机推荐
[log service CLS] Tencent cloud log4j/logback log collection best practices
Install the imagemagick7.1 library and the imageick extension for PHP
The decline of China's product managers: starting from the nostalgia for jobs
用 Oasis 开发一个跳一跳(一)—— 场景搭建
2021-04-22: given many line segments, each line segment has two numbers [start, end],
Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021
Global and Chinese markets of stainless steel barbecue ovens 2022-2028: Research Report on technology, participants, trends, market size and share
Golang+redis distributed mutex
Step by step import RHEL image to Tencent cloud
60 个神级 VS Code 插件!!
How to implement SQLSERVER database migration in container
Parameterized tests guide in junit5
Some experiences of project K several operations in the global template
Cap: multiple attention mechanism, interesting fine-grained classification scheme | AAAI 2021
Implement Domain Driven Design - use ABP framework - domain logic & application logic
MySQL進階系列:鎖-InnoDB中鎖的情况
C. Three displays(动态规划)Codeforces Round #485 (Div. 2)
One article explains Jackson configuration information in detail
Istio FAQ: region awareness does not take effect
Wechat official account debugging and natapp environment building