当前位置：网站首页>Small volume stock trading record ｜ based on multi task crawler technology, realize level1 sampling of A-share real-time market

Small volume stock trading record ｜ based on multi task crawler technology, realize level1 sampling of A-share real-time market

2022-07-24 02:02:00 【Master Yuanxiao】

Preface

Stock market data is very important for quantitative traders , Whether short-term traders or medium and long-term traders .

For short-term traders , The scheme of obtaining real-time market data involves the timeliness of intraday analysis and trading point monitoring .

For medium and long-term transactions , Updating the whole market data after hours also requires downloading as soon as possible . If a third party is used Python database , such as tushare etc. , It will be limited by the server , It takes dozens of minutes to obtain the market data of individual stocks in the whole market .

This topic refers to A Stock real-time market data acquisition program , It only takes three seconds to get the market data of individual stocks in the whole market .

For short-term traders, every interval 3 One second is enough . For medium and long-term traders , After the closing of each day, you can retrieve it once and store it in your own database .

Next, I will introduce this scheme in detail ！

Real time market data characteristics

This topic refers to A Stock real-time market data acquisition , Is similar to the Level1 Level updates , Every interval 3 Seconds is obtained by real-time sampling from the financial website in the form of crawler .

As shown in the figure below , The data types are “ The latest price ”、 “ The highest ”（ At present ）、 “ The minimum ”（ At present ）、 “ applies ”（ At present ）、 “ volume （ The day adds up to ）”、 “ turnover （ The day adds up to ）”、 “ Turnover rate ”、 “ P / E ratio ”, Then we add a column “ current time ”, Indicates the time to obtain real-time data . Because these data are updated in real time on the financial website , So it is necessary to add this column .

The trading period in which we obtain data is 9:29 to 11.31,12:59 to 15:01, Every interval 3 Get data once per second , And take csv File format storage .

Multithreaded crawler technology

Next, we focus on the implementation of multithreaded crawler technology .

We see that the page shows a total of 232 page , Usually people use for...in Crawling in a circular way .

But when faced with thousands of stock market data , The download process is bound to take too long , This leads to more than 3 Sampling accuracy of second interval .

I'm in the book 《Python The quantitative trading of stock is from introduction to practice 》 This paper introduces the speed-up scheme of multi process and multi thread .

When complex calculations are involved 、 Various I/O In operation , You can consider using multitasking in parallel to make full use of CPU Multi-core performance to improve the execution efficiency of the program .

stay Python Due to GIL The existence of mechanisms , Multithreading and multithreading in computing intensive and I/O The execution efficiency will be different in intensive task scenarios , Multithreading is more suitable for I/O Intensive application , Multi process for CPU Intensive applications perform better .

The routines in the book , Introduce and compare for Circulation mode 、 Multi thread mode and multi process mode . Traverse to get the first in the stock pool 500 Only stock 1 Years of data , The result of the test is ：

for loop ：55 second
8 Threads ：7.5 second
8 A process ：7.8 second

so , When we get A Thousands of shares , The data of the past few years or even more , If you call API The interface way , use for Loop to get stock data , It takes a lot of time .

that , For reptiles , Is it suitable for multithreading or multiprocessing ？

Crawler is a web-based request module urllib Realized .urllib3 Played HTTP Role of client , That is, send a message to the network server HTTP request , Then wait for the response of the network server , This kind of task belongs to I/O Intensive tasks . Unlike computing intensive tasks, they are consumed throughout the time slice CPU Resources for ,I/O Intensive tasks spend most of their time waiting I/O Completion of operation .

Next, we use the financial website to get real-time stock market data , Introduce the speed-up scheme of multithreading under the extension .

For the implementation process of crawler, please refer to the following topics ：

Then we can assign the task to multiple threads to complete , Instead of just letting a thread read one by one .

stay Python3 There is a built-in line process pool module ThreadPoolExecutor, adopt ThreadPoolExecutor Module to realize multithreading processing .

For reptile missions , Every page is just URL Address different . Therefore, according to the use requirements of the module , The reptile mission crawer_daily() The function is split into execution functions map_fun( ) And iteratable parameters itr_arg Two parts .

The key code is as follows ：

with ThreadPoolExecutor(max_workers=8) as executor:
    # map_fun  Passed in to execute map function 
    # itr_argn  Iteratable parameters 
    # result   The result returned is a generator 
    results = executor.map(crawer_daily, itr_arg)

Each page has only 20 Only stock data , So we need to merge the data into one DataFrame, Last saved as local csv file .

The key code is as follows ：

for ret in results:
    df_daily_stock = df_daily_stock.append(ret, ignore_index=True)
df_daily_stock.to_csv(Path(store_path+u"{}.csv".format(df_daily_stock[" current time "].values[0])), columns=df_daily_stock.columns, index=True, encoding='GBK')

open csv The file is shown below ：

It should be noted that “ current time ” Column , If you get real-time data on the disk , The corresponding time will reflect the timestamp of data update .

Another important point is the file name , The name here is “2021-08-27 15/00/00.csv”, If it is real-time data, it should reflect the time / branch / Second message .

About the test results , I used it 8 Threads , The execution time is 0.5 About seconds . in other words , Updating real-time data only needs 0.5 About seconds , Is much less than 3 Second sampling period .

in addition , If we only update the daily data incrementally at the close , Then you only need to spend... Every day 1 You can update the current day in seconds A All stock data in the stock market .

Because the test environment is very different , The test results here are for your reference only . You can also compare the efficiency of multithreading and multiprocessing .

How to obtain real-time data

We will upload the source code to the knowledge planet 《 Playing with quantitative trading in stocks 》, You can get data locally , But the amount of data is a little large , One day's data is in 3G about , It is estimated that the data stored for one year should be kept 800G Hard disk space for .

I usually only keep real-time data for about a month on the move , The early storage line will be converted to the early storage line , Reduce the occupation of space .

meanwhile , We also saved the real-time market data within a week on the built ECS , If you just need data analysis at ordinary times , You can pass after the closing FTP How to get .

Get through the data source of the quantification system — The stock data remote download service is coming

If you need to track the change of stock price on the same day , So as to carry out the buying and selling operation , You can execute this script on your own computer , And add your own judgment logic , When there is an operation signal, it is sent through the mailbox 、 Nails and other instant tools are sent to you .

explain

1. We will upload the complete source code to the knowledge planet 《 Playing with quantitative trading in stocks 》 in , Help the partners to master this method better .

2. Want to join the knowledge planet 《 Playing with quantitative trading in stocks 》 Remember to wechat first call I get benefits ！

 The quantitative trading books of Yuanxiao master are on sale ！！
 JD.COM 、 Dangdang 、 Tmall sells ！！

原网站

版权声明
本文为[Master Yuanxiao]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207240156027917.html

当前位置：网站首页>Small volume stock trading record ｜ based on multi task crawler technology, realize level1 sampling of A-share real-time market

Small volume stock trading record ｜ based on multi task crawler technology, realize level1 sampling of A-share real-time market

边栏推荐

猜你喜欢

随机推荐