当前位置：网站首页>He "painted" what a smart city should look like with his oars

He "painted" what a smart city should look like with his oars

2022-06-24 18:02:00 【Paddlepaddle】

He used a paddle ,“ draw ” It looks like a smart city _ feature extraction

With the development of Internet technology in recent years , The transformation and upgrading of urban industries are accelerating , A smart revolution is quietly taking place . that , How will cities be divided in the future ？ What role will different urban areas play ？ Urban fine governance is not only about the speed of urban development , It's also about the quality of life of every city resident . Fortunately AI Continuous maturity of Technology , It's impossible to build a functional classification model of a city . In particular, the gradual growth of the open-source deep learning platform of the flying oar , It also gives developers more choices . For the above problems ,2019 year 9 Month to 12 month , The first baseline challenge was held by scull , Competitors use the paddle to build a functional classification model of urban areas ： For a given geographical area , Input remote sensing image and user visit data of the area , Final forecast 10 Regional functional categories of ten thousand test set samples . after 3 The fierce competition of the month , Final Expelliarmus With 0.88767 Won the championship , That is to say, the team's training model successfully predicted the near 8.9 Regional functions of 10000 cities Categories , Where is the school 、 residence community 、 At the airport , Only one model is needed for accurate classification . The result is approaching 2019 First place in the international big data competition 0.90468.

 Game Analysis ： Function classification of urban area based on remote sensing image and user behavior 

before ,2019 Baidu & Xijiao dada data competition has been held Urban Region Function Classification match , The contestants are required to build a functional classification model of urban areas （ Residential area 、 School 、 industrial park 、 Train station 、 At the airport 、 park 、 business zone 、 Government district 、 Hospital etc. ）, For a given geographical area 、 Input the remote sensing image of the area and the visit data of the user , And predict the functional categories of the area .

The baseline challenge of the flying oars follows the above questions , The contestants are required to be based on remote sensing images and Internet user behavior , Use the flying oar to design a classification model of urban area function . Because there are traces to follow , This allows players to reduce a lot of repetitive work . In this competition , The winning team Expelliarmus They shared their experience in the competition . They think that , This competition can Follow the parts of the previous open source code that can be used , Reduce repetitive work , And use the model developed based on the flyer , Replace the model that does not meet the rules in the previous competition plan .

in other words ,Expelliarmus In this competition, the official baseline model 、 In the previous game top2 The team is used to GitHub On the open source feature extraction code , And combined with their own use of flying oars to build MLP The model trains the extracted features .

In this competition ,Expelliarmus The work includes the following four aspects ：

1. Based on the framework of the flyer MLP Model , And encapsulates MLPClassifier. Provides fit()、predict_prob()、score()、save_model()、load_model() Interface , Convenient model training prediction call . See... In the code for details models.py file .

2. The official baseline model was modified as follows ：

a. modify npy Generate file code , Use multiprocessing Multiprocessing , Speed up processing ;

b. modify reader Functions and infer function , Make it possible to batch forecast , Speed up the prediction ;

c. Added k Fold cross validation code , And stacking Method to generate baseline model feature code .

3. Use MLP The model performs feature selection , The specific way is ：

a. Divide the training verification set , And train with all the features MLP Model ;

b. In order shuffle Verify the characteristics of each column of the set , And forecast on the model of the previous training , If the forecast score stays the same or increases , It means that this list of features doesn't work , The feature can be eliminated . See... In the code for details train_select.py file .

4. Later use bagging Ways to train multiple models , That is, samples and features are taken before each training , Ensure the diversity of model training results , Improve the effect of model fusion .

 Game ideas ： Feature extraction and MLP model training 

Expelliarmus Contributed to the idea of this competition , Please refer to ：

In the course of the competition, we also used top The team's mind of the game , For details, please refer to ：

Feature extraction mainly includes two aspects ：

1. Use the official baseline model to extract features . Please refer to the folder for specific code train_multimodel;

2. Use the open source code of the Haifan learning team to extract features , There are three types of features ：

The first category ：basic features

Given a region's access data , We extract the statistical characteristics of different time periods in the region （ Include sum, mean, std, max, min, branch digit 25,50, 75 this 8 Statistics ）. Don't distinguish the characteristics of users ：24 Hours ,24 The ratio of the number of people in the next hour , The holiday season , Working day , Rest Day , wait . Distinguish the characteristics of users ：1） In a day , What's the earliest time , At the latest , The latest minus the earliest , The maximum number of adjacent hours in a day .

2） Along the days , Statistical characteristics of each hour . wait

—— It's quoted from Haifan's Xi Xi blog .

The second category ：local features

“ Number of days on the user's timeline , Hours , The earliest and last time of the day and the time difference between them , The maximum number of hours between adjacent times of the day ; And the corresponding characteristics of holidays （ Due to memory limitations , We have characteristics of holidays , Only some features are extracted , Days , Hours ）, Here we have a little rough holiday points .”—— It's quoted from Haifan's Xi Xi blog .

The third category ：global features Extracting local Characteristic method , Use part basic Replace features with local Characteristic variable （ For details, please refer to the blog of Haifan Xi Xi ）, And use the feature selection method mentioned above from basic Select some features from the features . Extracting global Before feature , Continue from basic Select... From the features 50 Features , Is used to construct global features . After feature extraction , We can use the characteristics of the official baseline model and the three types of characteristics of the Hai fan training team to train together MLP Model , Use 4 Crossover verification , Final score: 0.885+. And if you use the bagging Training methods , Training 50 individual MLP Model fusion , Final score: 0.887+. It should be noted that , above MLP The model layer settings are （256,128,64）.

 Code catalog and description 

So how to operate the above two methods ？Expelliarmus Provides a code directory and instructions .

code

├─data： Data storage directory

│ ├─test_image： The test image

│ ├─test_visit： Test text

│ ├─train_image： Training pictures

│ └─train_visit： Training text

└─work

├─data_processing： Data preprocessing

│ ├─get_basic_file： Record training test files and training labels

│ └─get_npy： Generate npy file

├─feature_extracting： Feature extraction and filtering

│ ├─Basic_feature：basic features

│ │ ├─Code_Basic_feature_1

│ │ └─Code_Basic_feature_2

│ ├─UserID_feature_global：global features

│ └─UserID_feature_local：local features

├─train_all： Use 4 Fold the cross training model （score：0.885）

├─train_bagging： Use bagging How to train the model （score：0.887

）└─train_multimodel： Official baseline model features

 notes ： Some of the existing open source code include ：
A. Modified from official baseline model ：

work\data_processing\get_npy\get_npy.py

work\train_multimodel\multimodel.py

work\train_multimodel\train_utils.py

B. come from GitHub Open source code ：

（ website ：）

work\data_processing\get_basic_file\**

work\feature_extracting\Basic_feature\Code_Basic_feature_1\Config.py

work\feature_extracting\Basic_feature\Code_Basic_feature_1\feature.py

work\feature_extracting\Basic_feature\Code_Basic_feature_1\main.py

work\feature_extracting\Basic_feature\Code_Basic_feature_2\Config.py

work\feature_extracting\Basic_feature\Code_Basic_feature_2\feature.py

work\feature_extracting\Basic_feature\Code_Basic_feature_2\main.py

work\feature_extracting\UserID_feature_global\Config.py

work\feature_extracting\UserID_feature_global\function_global_feature.py

work\feature_extracting\UserID_feature_global\function.py

work\feature_extracting\UserID_feature_global\main.py

work\feature_extracting\UserID_feature_local\**

The code runs in the following order ：

Get into data_processing/get_basic_file

(1) python get_label.py: Generate training tags

(2) python get_train_test_csv.py： Record training visit file （csv）

(3) python get_train_test_txt.py： Record training visit、 test image file （txt）

 Get into data_processing/get_basic_file

(1) python get_npy.py: To generate an official baseline npy Array

Get into work\feature_extracting\Basic_feature\Code_Basic_feature_1

(1) python main.py: Generate the first group basic features

(2) python merge20.py: Put half of the group basic A combination of features , For feature filtering

(3) python train_select.py: utilize MLP Screening features , Generate select_index.npy

Get into work\feature_extracting\Basic_feature\Code_Basic_feature_2

(1) python main.py: Generate the first group basic features

(2) python merge20.py: Put half of the group basic A combination of features , For feature filtering

(3) python train_select.py: utilize MLP Screening features , Generate select_index.npy

Get into work\feature_extracting\Basic_feature

(1) python train_select.py: utilize MLP Screen the first two sets of features

(2) python merge.py: Merge the selected features , Generate the final basic features

Get into work\feature_extracting\UserID_feature_local

（ Run in sequence to generate eight groups local features ）

(1) python normal_local.py

(2) python normal_hour_local.py

(3) python normal_hour_local_std.py

(4) python normal_work_rest_fangjia_hour_local.py

(5) python normal_work_rest_fangjia_hour_local_std.py

(6) python normal_work_rest_fangjia_local.py

(7) pythondata_precessing_user_id_number_holiday.py

(8) python data_precessing_user_id_number_hour.py

Get into work\feature_extracting\UserID_feature_global

(1) python train_select.py: stay basic Continue to filter out 50 Features

(2) python user_place_visit_num.py: User access location count

(3) python main.py: Use screened 50 Feature generation global features

(4) python merge.py: Merge , To get the final global features

Get into work\train_multimodel

(1) sh download_pretrain.sh: download SE_ResNeXt50 Pre training model

(2) python train.py：k The official baseline model of cross training , Predict the probability value as a feature

Get into work\train_all

(1) python train4fold.py: utilize MLP The model and all the features generated in front , Four fold cross training , The online score of the forecast result is ：0.885+

Get into work\train_bagging

(1) python train.py: utilize bagging Strategy training 50 individual MLP Model

(2) python infer.py: Before utilization 46 Model prediction test set , The average sum of the probabilities , The result was an online score of ：0.887+04

 At the end 

By the above methods , By South China University of technology CChan Led by Expelliarmus The final result is set at 0.88767, And this achievement also helped them to reach the top , Won the championship of this competition . thus it can be seen , Whether it's a fledgling student or a professional who has worked for a long time , As long as we are good at seizing opportunities , Then its own light can shine out . The first round of the baseline challenge is clearly an opportunity to make a difference , But it's not the only chance , Because China's AI competition · The competition of language and knowledge technology is also in the process of registration ！

China artificial intelligence competition · The competition of language and knowledge technology is guided by three ministries of the state , At the national level AI competition . This competition has set the machine reading comprehension direction competition question , Competitors can get expert level free of charge AI Training and long term technical support , Xiamen government provides strong support policies for competitors , Baidu is also set up for individual competitors 12 Grand prize pool plus competition , What's more AI Studio free GPU Calculi helps the contestants to compete .

Don't regret the missed opportunity , Don't envy the brilliance of others , Every activity will open the door for those who are prepared , Click on the bottom left corner 【 Read the original 】 You can sign up , If you want to compete , Please seize this opportunity ah ！

Entry Guide ： He used a paddle ,“ draw ” It looks like a smart city _ The baseline _02

原网站

版权声明
本文为[Paddlepaddle]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202211511279726.html

当前位置：网站首页>He "painted" what a smart city should look like with his oars

He "painted" what a smart city should look like with his oars

边栏推荐

猜你喜欢

随机推荐