当前位置:网站首页>Share a powerful tool for factor Mining: genetic programming
Share a powerful tool for factor Mining: genetic programming
2022-06-28 05:17:00 【Quantized cipher base】
How do you mine factors ? Based on experience ? But experience is limited , There will always be a useful time . Based on public materials such as research reports or papers ? But this kind of factor inevitably involves factor congestion , After all, effective factors , Other people can also use .
So is there any other way ? The answer is yes. .
Today we are based on the 《 Stock selection factor mining based on genetic programming in artificial intelligence series 》, Let's introduce a A sharp tool for factor mining : Genetic programming .
What is genetic programming ?
Genetic programming is a branch of evolutionary algorithm , It is a heuristic formula evolution technique . It starts with a random group of formulas . By simulating the process of genetic evolution in nature , To gradually generate formula groups that fit specific goals . As a supervised learning method , Genetic programming can be based on specific goals , Find some hidden 、 A mathematical formula that is difficult to construct through the human brain . The traditional supervised learning algorithm is mainly used to fit the relationship between features and tags , Genetic programming is more applied to feature mining ( Feature Engineering ).
——《 Stock selection factor mining based on genetic programming in artificial intelligence series analysis report 》
Previous factor studies have been “ First there is logic , Then there is the formula ”, It's a kind of “ Deductive method ”. But the form of genetic programming is “ First there is the formula , Then there is logic ”, Belong to “ Induction ”. Its advantage is that it can make full use of the powerful computing power of the computer to carry out heuristic search , At the same time, it breaks through the limitations of human thinking , Dig out some hidden 、 Factors that are difficult to construct through the human brain , Provide more possibilities for factor research .
Genetic evolution in organisms involves the inheritance of genes , variation , Adaptability to the ecological environment, etc , The same is true in genetic programming algorithms , There will also be cross variation 、 Subtree variation 、 A little variation 、Hoist Variation and fitness, etc , For specific details, please refer to the research report or thesis .
We use Python In the genetic programming project gplearn Module package for factor mining , The main parameters of the model are as follows :

The data used in the model are as follows :
- Test varieties : The Shanghai composite index
- Back test interval :2010 year 01 month 01 Japan -2022 year 05 month 31 Japan
- The initial factor : Opening price 、 Closing price 、 Highest price 、 The lowest price 、 volume 、 Yield 、 Volume weighted average price
- Predict the goal : future 5 Day yield
- Function list : all gplearn Built-in function
Once the data is ready, you can start training the model :
gp1 = SymbolicTransformer(generations=10, population_size=1000, function_set=function_set, init_depth=(1,4), tournament_size=20, metric='spearman', p_crossover=0.4,
p_subtree_mutation=0.01, p_hoist_mutation=0, p_point_mutation=0.01, p_point_replace=0.40,
warm_start=False, verbose=1,random_state=0, n_jobs=-1,feature_names=['open', 'close', 'high', 'low', 'volume', 'return_rate', 'vwap'])
...
gp1.fit(train,label)# Training models
The model will automatically display the process log , among Fitness It's fitness , What we choose here is Spearman Rank correlation coefficient , The higher the correlation coefficient , Represents factors and the future 5 The higher the correlation of daily yield .

We further show the iterative process of the optimal factor in the form of a curve :

As can be seen from the figure above , The optimal factor is approximately iterated to the fourth generation (X In the shaft ,0 It's the first generation ) When , The rank correlation coefficient reached a higher level , Subsequent iterations do not improve much .
Finally, through the tree diagram, we can see the optimal factor of the model iteration :

To express it by formula is :log( Closing price )/log( volume ) . Combine the top ten optimal factors of the model :

You can find , There are many repeated factors in the output of the model , After removing the repetition factor , Only two factors are :log( Closing price )/log( volume ) and log( volume )/log( Closing price ) .
In fact, these two factors should be the same factor , Just a reciprocal deformation . With log( Closing price )/log( volume ) Factor view , First, calculate the logarithm of closing price and trading volume respectively , To divide again , It can be regarded as the closing price weighted by the reciprocal of trading volume . Interested friends , The performance of this factor can be further tested , Factor mining can also be carried out for other indexes or commodity futures .
This article is a preliminary exploration of genetic programming , But there is still a big piece of content that has not been solved yet , For example, the functions used this time are gplearn Built-in function . How to extend functions ? Especially the time series function , For example, seeking history 5 Daily mean . A single variety of the current test variety , How to extend to multiple varieties ? How to deal with such 3D data ? All these need to be solved .
Later, an advanced version of genetic programming will be launched , Take you further to explore factor mining !
边栏推荐
- Dart学习——函数、类
- Sorting out some topics of modern exchange principle MOOC
- Learning Tai Chi Maker - mqtt Chapter 2 (V) heartbeat mechanism
- 深度强化学习笔记
- mysql 导出查询结果成 excel 文件
- Gorm transaction experience
- 分享一个因子挖掘的利器:遗传规划
- Performance degradation during dpdk source code testing
- 2022 safety officer-b certificate examination question bank and answers
- quartus 复制IP核
猜你喜欢

机器人学DH参数及利用matlab符号运算推导

改性三磷酸盐研究:Lumiprobe氨基-11-ddUTP

wordpress zibll子比主题6.4.1开心版 免授权

Biovendor sRAGE protein solution

What is the difference between AC and DC?

Function reentry caused by Keil C51's data overlaying mechanism

Carboxylic acid study: lumiprobe sulfoacyanine 7 dicarboxylic acid

How high is the gold content of grade II cost engineer certificate? Just look at this

Pcr/qpcr research: lumiprobe dsgreen is used for real-time PCR

羧酸研究:Lumiprobe 磺基花青7二羧酸
随机推荐
[JVM series] JVM tuning
2022年低压电工考题及答案
It is the latest weapon to cross the blockade. It is one of the fastest ladders.
Voltage mode and current mode control of switching power supply
证明素数/质数有无限多个
metaRTC5.0 API编程指南(一)
How long will the PMP test results come out? You must know this!
电源插座是如何传输电的?困扰小伙伴这么多年的简单问题
Feign implements path escape through custom annotations
学习太极创客 — MQTT 第二章(五)心跳机制
公司为什么选择云数据库?它的魅力到底是什么!
Sqlmap tool user manual
【牛客网刷题系列 之 Verilog快速入门】~ 四选一多路器
Operation of 2022 power cable judgment question simulation examination platform
基于微信小程序的婚纱影楼门户小程序
2022烟花爆竹经营单位安全管理人员特种作业证考试题库及模拟考试
别卷!如何高质量地复现一篇论文?
BioVendor sRAGE抗体解决方案
Binary sort tree: BST
CPG 固体支持物研究:Lumiprobe通用 CPG II 型