当前位置:网站首页>Share a powerful tool for factor Mining: genetic programming
Share a powerful tool for factor Mining: genetic programming
2022-06-28 05:17:00 【Quantized cipher base】
How do you mine factors ? Based on experience ? But experience is limited , There will always be a useful time . Based on public materials such as research reports or papers ? But this kind of factor inevitably involves factor congestion , After all, effective factors , Other people can also use .
So is there any other way ? The answer is yes. .
Today we are based on the 《 Stock selection factor mining based on genetic programming in artificial intelligence series 》, Let's introduce a A sharp tool for factor mining : Genetic programming .
What is genetic programming ?
Genetic programming is a branch of evolutionary algorithm , It is a heuristic formula evolution technique . It starts with a random group of formulas . By simulating the process of genetic evolution in nature , To gradually generate formula groups that fit specific goals . As a supervised learning method , Genetic programming can be based on specific goals , Find some hidden 、 A mathematical formula that is difficult to construct through the human brain . The traditional supervised learning algorithm is mainly used to fit the relationship between features and tags , Genetic programming is more applied to feature mining ( Feature Engineering ).
——《 Stock selection factor mining based on genetic programming in artificial intelligence series analysis report 》
Previous factor studies have been “ First there is logic , Then there is the formula ”, It's a kind of “ Deductive method ”. But the form of genetic programming is “ First there is the formula , Then there is logic ”, Belong to “ Induction ”. Its advantage is that it can make full use of the powerful computing power of the computer to carry out heuristic search , At the same time, it breaks through the limitations of human thinking , Dig out some hidden 、 Factors that are difficult to construct through the human brain , Provide more possibilities for factor research .
Genetic evolution in organisms involves the inheritance of genes , variation , Adaptability to the ecological environment, etc , The same is true in genetic programming algorithms , There will also be cross variation 、 Subtree variation 、 A little variation 、Hoist Variation and fitness, etc , For specific details, please refer to the research report or thesis .
We use Python In the genetic programming project gplearn Module package for factor mining , The main parameters of the model are as follows :

The data used in the model are as follows :
- Test varieties : The Shanghai composite index
- Back test interval :2010 year 01 month 01 Japan -2022 year 05 month 31 Japan
- The initial factor : Opening price 、 Closing price 、 Highest price 、 The lowest price 、 volume 、 Yield 、 Volume weighted average price
- Predict the goal : future 5 Day yield
- Function list : all gplearn Built-in function
Once the data is ready, you can start training the model :
gp1 = SymbolicTransformer(generations=10, population_size=1000, function_set=function_set, init_depth=(1,4), tournament_size=20, metric='spearman', p_crossover=0.4,
p_subtree_mutation=0.01, p_hoist_mutation=0, p_point_mutation=0.01, p_point_replace=0.40,
warm_start=False, verbose=1,random_state=0, n_jobs=-1,feature_names=['open', 'close', 'high', 'low', 'volume', 'return_rate', 'vwap'])
...
gp1.fit(train,label)# Training models
The model will automatically display the process log , among Fitness It's fitness , What we choose here is Spearman Rank correlation coefficient , The higher the correlation coefficient , Represents factors and the future 5 The higher the correlation of daily yield .

We further show the iterative process of the optimal factor in the form of a curve :

As can be seen from the figure above , The optimal factor is approximately iterated to the fourth generation (X In the shaft ,0 It's the first generation ) When , The rank correlation coefficient reached a higher level , Subsequent iterations do not improve much .
Finally, through the tree diagram, we can see the optimal factor of the model iteration :

To express it by formula is :log( Closing price )/log( volume ) . Combine the top ten optimal factors of the model :

You can find , There are many repeated factors in the output of the model , After removing the repetition factor , Only two factors are :log( Closing price )/log( volume ) and log( volume )/log( Closing price ) .
In fact, these two factors should be the same factor , Just a reciprocal deformation . With log( Closing price )/log( volume ) Factor view , First, calculate the logarithm of closing price and trading volume respectively , To divide again , It can be regarded as the closing price weighted by the reciprocal of trading volume . Interested friends , The performance of this factor can be further tested , Factor mining can also be carried out for other indexes or commodity futures .
This article is a preliminary exploration of genetic programming , But there is still a big piece of content that has not been solved yet , For example, the functions used this time are gplearn Built-in function . How to extend functions ? Especially the time series function , For example, seeking history 5 Daily mean . A single variety of the current test variety , How to extend to multiple varieties ? How to deal with such 3D data ? All these need to be solved .
Later, an advanced version of genetic programming will be launched , Take you further to explore factor mining !
边栏推荐
- 店铺进销存管理系统源码
- 【JVM系列】JVM调优
- cgo+gSoap+onvif学习总结:8、arm平台交叉编译运行及常见问题总结
- Performance degradation during dpdk source code testing
- 2022年安全员-A证考试题库及模拟考试
- 开关电源电压型与电流型控制
- Question bank and answers of 2022 materialman general basic (materialman) operation certificate examination
- 基于微信小程序的婚纱影楼门户小程序
- Liuhaiping's mobile phone passes [[uiapplication sharedapplication] delegate] window. safeAreaInsets. The height of the bottom security zone is 0
- mysql导出数据库字典成excel文件
猜你喜欢

openssl客户端编程:一个不起眼的函数导致的SSL会话失败问题

Lumiprobe cell imaging analysis: PKH26 cell membrane labeling kit

Feign通过自定义注解实现路径的转义

Amino dye research: lumiprobe fam amine, 6-isomer

The heading angle of sliceplane is the same as that of math Corresponding transformation relation of atan2 (y, x)

分享一个因子挖掘的利器:遗传规划
![[leetcode] 12. Integer to Roman numeral](/img/3e/815f24a85a3333ce924acee1856f62.png)
[leetcode] 12. Integer to Roman numeral

【LeetCode】12、整数转罗马数字
![[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer](/img/1f/becda82f3136678c58dd8ed7bec8fe.png)
[Verilog quick start of Niuke online question brushing series] ~ one out of four multiplexer

JS 文本框失去焦点修改全半角文字和符号
随机推荐
CpG solid support research: lumiprobe general CpG type II
创新之源 理解通透 二
Understanding the source of innovation II
2022年安全员-B证考试题库及答案
OpenSSL client programming: SSL session failure caused by an obscure function
Wireless sensor network learning notes (I)
Learning Tai Chi Maker - mqtt Chapter 2 (V) heartbeat mechanism
Is it enough for the project manager to finish the PMP? no, it isn't!
IP datagram sending and forwarding process
quartus 复制IP核
Learn Taiji Maker - mqtt Chapter 2 (IV) esp8266 reserved message application
BioVendor sRAGE抗体解决方案
RxSwift --(1)创建一个项目
Interview: what are the similarities and differences between abstract classes and interfaces?
Dart learning - functions, classes
cgo+gSoap+onvif学习总结:8、arm平台交叉编译运行及常见问题总结
mysql 导出查询结果成 excel 文件
Programmer - Shepherd
Extjs图书管理系统源码 智能化图书管理系统源码
metaRTC5.0编程之p2p网络穿透(stun)指南