当前位置:网站首页>How to quickly convert stock code into int in quantitative trading?

How to quickly convert stock code into int in quantitative trading?

2022-06-24 01:11:00 beyondma

Recently, the author in the great God communication of quantitative trading , Receive such a demand , You need to quickly convert stock codes into shaping variables , That is, the newly received stock trading information , Quickly combine with historical stock information , So as to make rapid decision through trading strategy .

Because quantitative trading speed is the lifeline , Therefore, it is too slow to query the historical data of the same stock in the database directly through the stock code . At present, the common way is to map the stock code directly into an integer shape , The shape after mapping is the memory address of the historical data , This is the most effective way . Here's the picture :

Because quantitative transactions are generally armed to the teeth , So resources are basically not a problem , In a word, we must be fast .

Solution analysis

Generally speaking, the current practice is to use xxhash Wait for a high-speed hash algorithm to do this conversion , however xxhash The time complexity is still a little high for quantifying the scene , Then, before putting forward the optimization scheme, we will analyze this requirement :

1. The number of stock codes to be converted is only 20000 : in 、 beautiful 、 harbor 、 The total number of Listed Companies in mainstream markets such as Europe is about tens of thousands , But different markets generally use different quantitative models and Strategies , Stock code the first mock exam can follow and futures. 、 Generally, the number of options and other trading varieties will not exceed 10000 , Not more than 2 ten thousand .

2. Stock codes are mostly composed of 4 A composition , Also some such as io2111-c-4900 Such long code .

Solution design ideas

At present, such as xxhash The biggest characteristic of high-speed hash algorithm is stability , No matter how long the string is, it can be converted into a... In a stable time int value , however xxhash Some advantages of modern computer architecture such as cache are not fully utilized . In fact, the scheme of converting string into shaping is similar to the memory management strategy of modern operating system . So I plan to learn from the memory mapping scheme . Design a higher speed scheme from the following aspects .

1. Make the most of cache : We know that the current mainstream operating systems are hierarchical in memory address mapping , And through MMU To accelerate . And considering the upper limit of the number of stock codes we need to convert, that is 2 About ten thousand , Therefore, we also need to consider using the front... In the stock code 1 To 2 Bit to establish superior index , And try to compress the size of the index , So that it can be loaded into L1 In the L1 cache .

2. Try to avoid judging branches , That's what I said before 《 Why are modern programming languages so annoying if-else structure 》 It is stated in , Branch prediction is often an important culprit of performance degradation .

Solution and code

1. Save the strings of all stock codes into an array and sort , The array subscript is what you want to convert int

2. Index the first two digits of the code , The start and end sequence numbers recorded in the overall sort array , Such as szjc The first two of sz It's the index , adopt map Record sz Start and end serial numbers of all stock codes at the beginning , from A According to the stock market, the data of this index is about 300 Multiple , Consider that each member is composed of a two digit string (2byte) And a plastic surgeon (4byte) form , altogether 6byte, that 6*350=2.2k,map The storage space complexity is generally 3 Double redundancy ,2.2*3=6.6k. This size can basically enter L1 First level cache .

3 Search through the first two indexes , Copy the array members at the corresponding start and end positions into a new array , The members of this subarray generally do not exceed 1000 individual , Average each 5 Characters , that 5k About the size of the space , Transfer to level one L1 cache, It should also be relatively simple , Next, use binary search to confirm the serial number , So you get the corresponding int value .

package main

import (
	"fmt"
	"math/rand"
	"sort"
	"strings"
	"time"
)

func binFind(data []string, item string) int {
	if len(data) == 0 {
		return -1
	}

	i, j := 0, len(data)-1 // Double pointer , Be careful j The value of is the end subscript , It's not the length 
	for {
		mid := (i + j) / 2 // You need to round down here ,Go Default . Other languages need attention 
		result := strings.Compare(item, data[mid])
		if result == 0 {
			return mid
		}
		if result < 0 {
			j = mid - 1 // Pay attention to the boundary value 
		} else {
			i = mid + 1 //  Pay attention to the boundary value 
		}
		if i > j { // Out-of-service i And j
			break
		}
	}
	return -1
}

func main() {

	var length int = 20000
	codeGen := "abcdefghijklmnopgrstuvwxyz1234567890"
	codeGroup := make([]string, length, length)
	codeOrderedGroup := make([]string, length, length)
	codeIndex := make(map[string]int)
	for j := 0; j < length; j++ {
		code := ""
		randomLen := rand.Intn(8) + 2
		for i := 0; i < randomLen; i++ {
			random := rand.Intn(len(codeGen) - 1)
			code = code + codeGen[random:random+1]
		}
		codeGroup[j] = code
		codeOrderedGroup[j] = code
	}
	sort.Strings(codeOrderedGroup)
	for k, v := range codeOrderedGroup {
		prefix := v[0:2]
		index, ok := codeIndex[prefix]
		if ok {
			codeIndex[prefix] = index&0xFFFF0000 + k
		} else {
			codeIndex[prefix] = k<<16 + k
		}
	}

	now := time.Now().UnixNano()
	// The following converts the strings in the random array to int The way of shape 
	for _, v := range codeGroup {
		prefix := v[0:2]
		index := codeIndex[prefix]
		startIndex := index >> 16
		endIndex := index & 0x0000FFFF
		subcodeGroup := codeOrderedGroup[startIndex : endIndex+1]
		result := binFind(subcodeGroup, v)
		//binFind(subcodeGroup, v)
		fmt.Println(result+startIndex, v)
	}

	fmt.Println(time.Now().UnixNano() - now)
}

Of course, there is room for further optimization , For example, most of the stock codes received in real-time trading data are similar in alphabetical order , So the generation of this subarray , Binary search may also be optimized in a more greedy way .

And at the high end CPU You can also consider using two variables to record the start and end indexes of the subarray , Avoid the current way of shift calculation , This will be more efficient .

原网站

版权声明
本文为[beyondma]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/11/20211120132340432u.html