当前位置:网站首页>Small volume stock trading record | based on multi task crawler technology, realize level1 sampling of A-share real-time market
Small volume stock trading record | based on multi task crawler technology, realize level1 sampling of A-share real-time market
2022-07-24 02:02:00 【Master Yuanxiao】

Preface

Stock market data is very important for quantitative traders , Whether short-term traders or medium and long-term traders .
For short-term traders , The scheme of obtaining real-time market data involves the timeliness of intraday analysis and trading point monitoring .
For medium and long-term transactions , Updating the whole market data after hours also requires downloading as soon as possible . If a third party is used Python database , such as tushare etc. , It will be limited by the server , It takes dozens of minutes to obtain the market data of individual stocks in the whole market .
This topic refers to A Stock real-time market data acquisition program , It only takes three seconds to get the market data of individual stocks in the whole market .
For short-term traders, every interval 3 One second is enough . For medium and long-term traders , After the closing of each day, you can retrieve it once and store it in your own database .
Next, I will introduce this scheme in detail !

Real time market data characteristics

This topic refers to A Stock real-time market data acquisition , Is similar to the Level1 Level updates , Every interval 3 Seconds is obtained by real-time sampling from the financial website in the form of crawler .
As shown in the figure below , The data types are “ The latest price ”、 “ The highest ”( At present )、 “ The minimum ”( At present )、 “ applies ”( At present )、 “ volume ( The day adds up to )”、 “ turnover ( The day adds up to )”、 “ Turnover rate ”、 “ P / E ratio ”, Then we add a column “ current time ”, Indicates the time to obtain real-time data . Because these data are updated in real time on the financial website , So it is necessary to add this column .

The trading period in which we obtain data is 9:29 to 11.31,12:59 to 15:01, Every interval 3 Get data once per second , And take csv File format storage .


Multithreaded crawler technology

Next, we focus on the implementation of multithreaded crawler technology .
We see that the page shows a total of 232 page , Usually people use for...in Crawling in a circular way .
But when faced with thousands of stock market data , The download process is bound to take too long , This leads to more than 3 Sampling accuracy of second interval .

I'm in the book 《Python The quantitative trading of stock is from introduction to practice 》 This paper introduces the speed-up scheme of multi process and multi thread .
When complex calculations are involved 、 Various I/O In operation , You can consider using multitasking in parallel to make full use of CPU Multi-core performance to improve the execution efficiency of the program .
stay Python Due to GIL The existence of mechanisms , Multithreading and multithreading in computing intensive and I/O The execution efficiency will be different in intensive task scenarios , Multithreading is more suitable for I/O Intensive application , Multi process for CPU Intensive applications perform better .
The routines in the book , Introduce and compare for Circulation mode 、 Multi thread mode and multi process mode . Traverse to get the first in the stock pool 500 Only stock 1 Years of data , The result of the test is :
for loop :55 second
8 Threads :7.5 second
8 A process :7.8 second
so , When we get A Thousands of shares , The data of the past few years or even more , If you call API The interface way , use for Loop to get stock data , It takes a lot of time .
that , For reptiles , Is it suitable for multithreading or multiprocessing ?
Crawler is a web-based request module urllib Realized .urllib3 Played HTTP Role of client , That is, send a message to the network server HTTP request , Then wait for the response of the network server , This kind of task belongs to I/O Intensive tasks . Unlike computing intensive tasks, they are consumed throughout the time slice CPU Resources for ,I/O Intensive tasks spend most of their time waiting I/O Completion of operation .
Next, we use the financial website to get real-time stock market data , Introduce the speed-up scheme of multithreading under the extension .
For the implementation process of crawler, please refer to the following topics :

Then we can assign the task to multiple threads to complete , Instead of just letting a thread read one by one .
stay Python3 There is a built-in line process pool module ThreadPoolExecutor, adopt ThreadPoolExecutor Module to realize multithreading processing .
For reptile missions , Every page is just URL Address different . Therefore, according to the use requirements of the module , The reptile mission crawer_daily() The function is split into execution functions map_fun( ) And iteratable parameters itr_arg Two parts .
The key code is as follows :
with ThreadPoolExecutor(max_workers=8) as executor:
# map_fun Passed in to execute map function
# itr_argn Iteratable parameters
# result The result returned is a generator
results = executor.map(crawer_daily, itr_arg) Each page has only 20 Only stock data , So we need to merge the data into one DataFrame, Last saved as local csv file .
The key code is as follows :
for ret in results:
df_daily_stock = df_daily_stock.append(ret, ignore_index=True)
df_daily_stock.to_csv(Path(store_path+u"{}.csv".format(df_daily_stock[" current time "].values[0])), columns=df_daily_stock.columns, index=True, encoding='GBK')open csv The file is shown below :

It should be noted that “ current time ” Column , If you get real-time data on the disk , The corresponding time will reflect the timestamp of data update .
Another important point is the file name , The name here is “2021-08-27 15/00/00.csv”, If it is real-time data, it should reflect the time / branch / Second message .
About the test results , I used it 8 Threads , The execution time is 0.5 About seconds . in other words , Updating real-time data only needs 0.5 About seconds , Is much less than 3 Second sampling period .
in addition , If we only update the daily data incrementally at the close , Then you only need to spend... Every day 1 You can update the current day in seconds A All stock data in the stock market .

Because the test environment is very different , The test results here are for your reference only . You can also compare the efficiency of multithreading and multiprocessing .

How to obtain real-time data

We will upload the source code to the knowledge planet 《 Playing with quantitative trading in stocks 》, You can get data locally , But the amount of data is a little large , One day's data is in 3G about , It is estimated that the data stored for one year should be kept 800G Hard disk space for .
I usually only keep real-time data for about a month on the move , The early storage line will be converted to the early storage line , Reduce the occupation of space .
meanwhile , We also saved the real-time market data within a week on the built ECS , If you just need data analysis at ordinary times , You can pass after the closing FTP How to get .
If you need to track the change of stock price on the same day , So as to carry out the buying and selling operation , You can execute this script on your own computer , And add your own judgment logic , When there is an operation signal, it is sent through the mailbox 、 Nails and other instant tools are sent to you .
explain
1. We will upload the complete source code to the knowledge planet 《 Playing with quantitative trading in stocks 》 in , Help the partners to master this method better .
2. Want to join the knowledge planet 《 Playing with quantitative trading in stocks 》 Remember to wechat first call I get benefits !

The quantitative trading books of Yuanxiao master are on sale !!
JD.COM 、 Dangdang 、 Tmall sells !!
边栏推荐
- 小散量化炒股记|基于多任务爬虫技术, 实现A股实时行情Level1采样
- ASP.NET CORE写一个缓存Attribute工具
- 微信小程序之性能优化(分包、运行流程细节、精简结构、原生组件通信)
- Wenxin big model raises a new "sail", and the tide of industrial application has arrived
- 医院综合布线
- 解决script标签写在元素节点前面无法获取元素节点的问题
- Cartland number---
- 利用canvas画图片
- Arm architecture and programming 4 -- serial port (based on Baiwen arm architecture and programming tutorial video)
- STM32 concept and installation [day 1]
猜你喜欢

How to synchronize MySQL database when easycvr platform is upgraded to the latest version v2.5.0?

ASP. Net core write a cache attribute tool

20220723 记录一次SAP Oracle 监听服务莫名停掉的问题

Upload files to flash file system through Arduino IDE

Performance optimization of wechat applet (subcontracting, operation process details, simplified structure, native component communication)

Install SSL Certificate in Litespeed web server
![[pumpkin Book ml] (task3) decision tree (updating)](/img/4c/fc7157518ad729400d605b811323de.png)
[pumpkin Book ml] (task3) decision tree (updating)

Database design

Database paradigm and schema decomposition

毕业设计校园信息发布平台网站源码
随机推荐
hdu-7141 Ball (bitset)
Phpcms realizes product multi condition screening function
Matplotlib save image to file
Arm architecture and programming 4 -- serial port (based on Baiwen arm architecture and programming tutorial video)
CANopen communication - PDO and SDO
php7 垃圾回收机制详解
Draw pictures with canvas
Mysql database authorization learning
141. Circular linked list
Mysql database UDF authorization learning
Preliminary use of 145 keep alive
Spark memory management mechanism new version
On the possibility and limitation of defi in the metauniverse
145-keep-alive的初步使用
Hcip network type, PPP session, data link layer protocol
医院无线网络系统设计
Non boost ASIO notes: UDP UART socketcan multicast UDS
Hardware knowledge 2 -- Protocol class (based on Baiwen hardware operation Daquan video tutorial)
Phantom core is about to close? Is there a future for digital collections?
Sword finger offer II 031. Least recently used cache