当前位置:网站首页>Small volume stock trading record | based on multi task crawler technology, realize level1 sampling of A-share real-time market
Small volume stock trading record | based on multi task crawler technology, realize level1 sampling of A-share real-time market
2022-07-24 02:02:00 【Master Yuanxiao】

Preface

Stock market data is very important for quantitative traders , Whether short-term traders or medium and long-term traders .
For short-term traders , The scheme of obtaining real-time market data involves the timeliness of intraday analysis and trading point monitoring .
For medium and long-term transactions , Updating the whole market data after hours also requires downloading as soon as possible . If a third party is used Python database , such as tushare etc. , It will be limited by the server , It takes dozens of minutes to obtain the market data of individual stocks in the whole market .
This topic refers to A Stock real-time market data acquisition program , It only takes three seconds to get the market data of individual stocks in the whole market .
For short-term traders, every interval 3 One second is enough . For medium and long-term traders , After the closing of each day, you can retrieve it once and store it in your own database .
Next, I will introduce this scheme in detail !

Real time market data characteristics

This topic refers to A Stock real-time market data acquisition , Is similar to the Level1 Level updates , Every interval 3 Seconds is obtained by real-time sampling from the financial website in the form of crawler .
As shown in the figure below , The data types are “ The latest price ”、 “ The highest ”( At present )、 “ The minimum ”( At present )、 “ applies ”( At present )、 “ volume ( The day adds up to )”、 “ turnover ( The day adds up to )”、 “ Turnover rate ”、 “ P / E ratio ”, Then we add a column “ current time ”, Indicates the time to obtain real-time data . Because these data are updated in real time on the financial website , So it is necessary to add this column .

The trading period in which we obtain data is 9:29 to 11.31,12:59 to 15:01, Every interval 3 Get data once per second , And take csv File format storage .


Multithreaded crawler technology

Next, we focus on the implementation of multithreaded crawler technology .
We see that the page shows a total of 232 page , Usually people use for...in Crawling in a circular way .
But when faced with thousands of stock market data , The download process is bound to take too long , This leads to more than 3 Sampling accuracy of second interval .

I'm in the book 《Python The quantitative trading of stock is from introduction to practice 》 This paper introduces the speed-up scheme of multi process and multi thread .
When complex calculations are involved 、 Various I/O In operation , You can consider using multitasking in parallel to make full use of CPU Multi-core performance to improve the execution efficiency of the program .
stay Python Due to GIL The existence of mechanisms , Multithreading and multithreading in computing intensive and I/O The execution efficiency will be different in intensive task scenarios , Multithreading is more suitable for I/O Intensive application , Multi process for CPU Intensive applications perform better .
The routines in the book , Introduce and compare for Circulation mode 、 Multi thread mode and multi process mode . Traverse to get the first in the stock pool 500 Only stock 1 Years of data , The result of the test is :
for loop :55 second
8 Threads :7.5 second
8 A process :7.8 second
so , When we get A Thousands of shares , The data of the past few years or even more , If you call API The interface way , use for Loop to get stock data , It takes a lot of time .
that , For reptiles , Is it suitable for multithreading or multiprocessing ?
Crawler is a web-based request module urllib Realized .urllib3 Played HTTP Role of client , That is, send a message to the network server HTTP request , Then wait for the response of the network server , This kind of task belongs to I/O Intensive tasks . Unlike computing intensive tasks, they are consumed throughout the time slice CPU Resources for ,I/O Intensive tasks spend most of their time waiting I/O Completion of operation .
Next, we use the financial website to get real-time stock market data , Introduce the speed-up scheme of multithreading under the extension .
For the implementation process of crawler, please refer to the following topics :

Then we can assign the task to multiple threads to complete , Instead of just letting a thread read one by one .
stay Python3 There is a built-in line process pool module ThreadPoolExecutor, adopt ThreadPoolExecutor Module to realize multithreading processing .
For reptile missions , Every page is just URL Address different . Therefore, according to the use requirements of the module , The reptile mission crawer_daily() The function is split into execution functions map_fun( ) And iteratable parameters itr_arg Two parts .
The key code is as follows :
with ThreadPoolExecutor(max_workers=8) as executor:
# map_fun Passed in to execute map function
# itr_argn Iteratable parameters
# result The result returned is a generator
results = executor.map(crawer_daily, itr_arg) Each page has only 20 Only stock data , So we need to merge the data into one DataFrame, Last saved as local csv file .
The key code is as follows :
for ret in results:
df_daily_stock = df_daily_stock.append(ret, ignore_index=True)
df_daily_stock.to_csv(Path(store_path+u"{}.csv".format(df_daily_stock[" current time "].values[0])), columns=df_daily_stock.columns, index=True, encoding='GBK')open csv The file is shown below :

It should be noted that “ current time ” Column , If you get real-time data on the disk , The corresponding time will reflect the timestamp of data update .
Another important point is the file name , The name here is “2021-08-27 15/00/00.csv”, If it is real-time data, it should reflect the time / branch / Second message .
About the test results , I used it 8 Threads , The execution time is 0.5 About seconds . in other words , Updating real-time data only needs 0.5 About seconds , Is much less than 3 Second sampling period .
in addition , If we only update the daily data incrementally at the close , Then you only need to spend... Every day 1 You can update the current day in seconds A All stock data in the stock market .

Because the test environment is very different , The test results here are for your reference only . You can also compare the efficiency of multithreading and multiprocessing .

How to obtain real-time data

We will upload the source code to the knowledge planet 《 Playing with quantitative trading in stocks 》, You can get data locally , But the amount of data is a little large , One day's data is in 3G about , It is estimated that the data stored for one year should be kept 800G Hard disk space for .
I usually only keep real-time data for about a month on the move , The early storage line will be converted to the early storage line , Reduce the occupation of space .
meanwhile , We also saved the real-time market data within a week on the built ECS , If you just need data analysis at ordinary times , You can pass after the closing FTP How to get .
If you need to track the change of stock price on the same day , So as to carry out the buying and selling operation , You can execute this script on your own computer , And add your own judgment logic , When there is an operation signal, it is sent through the mailbox 、 Nails and other instant tools are sent to you .
explain
1. We will upload the complete source code to the knowledge planet 《 Playing with quantitative trading in stocks 》 in , Help the partners to master this method better .
2. Want to join the knowledge planet 《 Playing with quantitative trading in stocks 》 Remember to wechat first call I get benefits !

The quantitative trading books of Yuanxiao master are on sale !!
JD.COM 、 Dangdang 、 Tmall sells !!
边栏推荐
- LiteSpeed Web服务器中安装SSL证书
- Spark memory management mechanism new version
- What's new in the ranking list in July? This language is invincible?
- Arm architecture and programming 4 -- serial port (based on Baiwen arm architecture and programming tutorial video)
- Detailed explanation of php7 garbage collection mechanism
- hdu-7141 Ball (bitset)
- Exchange 2010通配符SSL证书安装文档
- 【MySQL】字符集utf8mb4无法存储表情踩坑记录
- How to finally generate a file from saveastextfile in spark
- Exchange 2013 SSL证书安装文档
猜你喜欢

Jenkins multitâche construction simultanée

選址與路徑規劃問題(Lingo,Matlab實現)

Study and use of windows security defect detection tool wesng

How to solve the problem that the universal vision NVR device is connected to the easycvr platform and cannot be online after offline?

Precautions for using XXL job

The difference between.Split (",", -1) and.Split (",")

Arm architecture and programming 4 -- serial port (based on Baiwen arm architecture and programming tutorial video)

Distributed resource management and task scheduling framework yarn

Mysql database UDF authorization learning

In depth understanding - wechat developer tools
随机推荐
新红包封面平台可搭建分站独立后台的源码
Study and use of burpsuite plug-in
[code case] website confession wall & to do list (including complete source code)
Excel simple macro
微信小程序之性能优化(分包、运行流程细节、精简结构、原生组件通信)
Problèmes de localisation et de planification des itinéraires (Lingo, mise en œuvre de MATLAB)
Database paradigm and schema decomposition
[untitled]
医院综合布线
Yinshimei Invisalign oral scan referral method (export oral scan data + online consultation)
5年接觸近百比特老板,身為獵頭的我,發現昇職的秘密不過4個字
How to install, download and use the latest version of IDM software
Express operates mysql. What is wrong with the SQL?
ASP.NET CORE写一个缓存Attribute工具
Detailed explanation of php7 garbage collection mechanism
Performance optimization of wechat applet (subcontracting, operation process details, simplified structure, native component communication)
Cartland number---
[重要通知]星球线上培训第三期来袭!讲解如何在QTYX上构建自己的量化策略!...
How to finally generate a file from saveastextfile in spark
Magazine feature: the metauniverse will reshape our lives, and we need to make sure it gets better