当前位置:网站首页>How to quickly convert stock code into int in quantitative trading?
How to quickly convert stock code into int in quantitative trading?
2022-06-24 01:11:00 【beyondma】
Recently, the author in the great God communication of quantitative trading , Receive such a demand , You need to quickly convert stock codes into shaping variables , That is, the newly received stock trading information , Quickly combine with historical stock information , So as to make rapid decision through trading strategy .
Because quantitative trading speed is the lifeline , Therefore, it is too slow to query the historical data of the same stock in the database directly through the stock code . At present, the common way is to map the stock code directly into an integer shape , The shape after mapping is the memory address of the historical data , This is the most effective way . Here's the picture :
Because quantitative transactions are generally armed to the teeth , So resources are basically not a problem , In a word, we must be fast .
Solution analysis
Generally speaking, the current practice is to use xxhash Wait for a high-speed hash algorithm to do this conversion , however xxhash The time complexity is still a little high for quantifying the scene , Then, before putting forward the optimization scheme, we will analyze this requirement :
1. The number of stock codes to be converted is only 20000 : in 、 beautiful 、 harbor 、 The total number of Listed Companies in mainstream markets such as Europe is about tens of thousands , But different markets generally use different quantitative models and Strategies , Stock code the first mock exam can follow and futures. 、 Generally, the number of options and other trading varieties will not exceed 10000 , Not more than 2 ten thousand .
2. Stock codes are mostly composed of 4 A composition , Also some such as io2111-c-4900 Such long code .
Solution design ideas
At present, such as xxhash The biggest characteristic of high-speed hash algorithm is stability , No matter how long the string is, it can be converted into a... In a stable time int value , however xxhash Some advantages of modern computer architecture such as cache are not fully utilized . In fact, the scheme of converting string into shaping is similar to the memory management strategy of modern operating system . So I plan to learn from the memory mapping scheme . Design a higher speed scheme from the following aspects .
1. Make the most of cache : We know that the current mainstream operating systems are hierarchical in memory address mapping , And through MMU To accelerate . And considering the upper limit of the number of stock codes we need to convert, that is 2 About ten thousand , Therefore, we also need to consider using the front... In the stock code 1 To 2 Bit to establish superior index , And try to compress the size of the index , So that it can be loaded into L1 In the L1 cache .
2. Try to avoid judging branches , That's what I said before 《 Why are modern programming languages so annoying if-else structure 》 It is stated in , Branch prediction is often an important culprit of performance degradation .
Solution and code
1. Save the strings of all stock codes into an array and sort , The array subscript is what you want to convert int
2. Index the first two digits of the code , The start and end sequence numbers recorded in the overall sort array , Such as szjc The first two of sz It's the index , adopt map Record sz Start and end serial numbers of all stock codes at the beginning , from A According to the stock market, the data of this index is about 300 Multiple , Consider that each member is composed of a two digit string (2byte) And a plastic surgeon (4byte) form , altogether 6byte, that 6*350=2.2k,map The storage space complexity is generally 3 Double redundancy ,2.2*3=6.6k. This size can basically enter L1 First level cache .
3 Search through the first two indexes , Copy the array members at the corresponding start and end positions into a new array , The members of this subarray generally do not exceed 1000 individual , Average each 5 Characters , that 5k About the size of the space , Transfer to level one L1 cache, It should also be relatively simple , Next, use binary search to confirm the serial number , So you get the corresponding int value .
package main
import (
"fmt"
"math/rand"
"sort"
"strings"
"time"
)
func binFind(data []string, item string) int {
if len(data) == 0 {
return -1
}
i, j := 0, len(data)-1 // Double pointer , Be careful j The value of is the end subscript , It's not the length
for {
mid := (i + j) / 2 // You need to round down here ,Go Default . Other languages need attention
result := strings.Compare(item, data[mid])
if result == 0 {
return mid
}
if result < 0 {
j = mid - 1 // Pay attention to the boundary value
} else {
i = mid + 1 // Pay attention to the boundary value
}
if i > j { // Out-of-service i And j
break
}
}
return -1
}
func main() {
var length int = 20000
codeGen := "abcdefghijklmnopgrstuvwxyz1234567890"
codeGroup := make([]string, length, length)
codeOrderedGroup := make([]string, length, length)
codeIndex := make(map[string]int)
for j := 0; j < length; j++ {
code := ""
randomLen := rand.Intn(8) + 2
for i := 0; i < randomLen; i++ {
random := rand.Intn(len(codeGen) - 1)
code = code + codeGen[random:random+1]
}
codeGroup[j] = code
codeOrderedGroup[j] = code
}
sort.Strings(codeOrderedGroup)
for k, v := range codeOrderedGroup {
prefix := v[0:2]
index, ok := codeIndex[prefix]
if ok {
codeIndex[prefix] = index&0xFFFF0000 + k
} else {
codeIndex[prefix] = k<<16 + k
}
}
now := time.Now().UnixNano()
// The following converts the strings in the random array to int The way of shape
for _, v := range codeGroup {
prefix := v[0:2]
index := codeIndex[prefix]
startIndex := index >> 16
endIndex := index & 0x0000FFFF
subcodeGroup := codeOrderedGroup[startIndex : endIndex+1]
result := binFind(subcodeGroup, v)
//binFind(subcodeGroup, v)
fmt.Println(result+startIndex, v)
}
fmt.Println(time.Now().UnixNano() - now)
}
Of course, there is room for further optimization , For example, most of the stock codes received in real-time trading data are similar in alphabetical order , So the generation of this subarray , Binary search may also be optimized in a more greedy way .
And at the high end CPU You can also consider using two variables to record the start and end indexes of the subarray , Avoid the current way of shift calculation , This will be more efficient .
边栏推荐
- [applet] realize the effect of double column commodities
- setfacl命令的基本用法
- 同行评议论文怎么写
- An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!
- Solve the problem that Base64 compressed files are extracted with spaces after post request
- 实时计算框架:Spark集群搭建与入门案例
- ShardingSphere-proxy-5.0.0容量范围分片的实现(五)
- What are the two types of digital factories
- 飞桨产业级开源模型库:加速企业AI任务开发与应用
- Arm learning (7) symbol table and debugging
猜你喜欢

Pad User Guide
![[applet] when compiling the preview applet, a -80063 error prompt appears](/img/4e/722d76aa0ca3576164fbed4e2c4db2.png)
[applet] when compiling the preview applet, a -80063 error prompt appears

这不会又是一个Go的BUG吧?

Error reported using worker: uncaught domexception: failed to construct 'worker': script at***

What are the two types of digital factories

Everything I see is the category of my precise positioning! Open source of a new method for saliency map visualization

C language: sorting with custom functions

分别用SVM、贝叶斯分类、二叉树、CNN实现手写数字识别

Real time computing framework: Flink cluster construction and operation mechanism
![[machine learning] linear regression prediction](/img/74/9b5067bb9057049c998898ff2457f1.png)
[machine learning] linear regression prediction
随机推荐
Installation and use of winscp and putty
C language: how to solve the problem of hundreds of horses and loads
[applet] realize the effect of double column commodities
What is memory out of order access?
Cross domain and jsonp
[CVPR 2020] conference version: a physics based noise formation model for extreme low light raw denoising
持续测试和质量保障的关系
[applet] indicator of relative path and absolute path
Graduation project - thesis writing notes [design topic type, thesis writing details, design materials]
【ICCV Workshop 2021】基于密度图的小目标检测:Coarse-grained Density Map Guided Object Detection in Aerial Images
[technical grass planting] take you to Tencent cloud's private cloud disk in ten minutes
Alibaba interview question: multi thread related
Open source model library of flying propeller industry: accelerating the development and application of enterprise AI tasks
JS stack memory
Application configuration management, basic principle analysis
2021-11-21: map[i][j] = = 0, which means that (I, J) is an ocean. If you cross it, the cost will be
【SPRS J P & RS 2022】小目标检测模块:A Normalized Gaussian Wasserstein Distance for Tiny Object Detection
An accident caused by a MySQL misoperation, and the "high availability" cannot withstand it!
Vs2022 save formatting plug-in
跨域和JSONP