当前位置:网站首页>He "painted" what a smart city should look like with his oars

He "painted" what a smart city should look like with his oars

2022-06-24 18:02:00 Paddlepaddle

                         He used a paddle ,“ draw ” It looks like a smart city _ feature extraction

With the development of Internet technology in recent years , The transformation and upgrading of urban industries are accelerating , A smart revolution is quietly taking place . that , How will cities be divided in the future ? What role will different urban areas play ? Urban fine governance is not only about the speed of urban development , It's also about the quality of life of every city resident .​  Fortunately AI Continuous maturity of Technology , It's impossible to build a functional classification model of a city . In particular, the gradual growth of the open-source deep learning platform of the flying oar , It also gives developers more choices . For the above problems ,2019 year 9 Month to 12 month , The first baseline challenge was held by scull , Competitors use the paddle to build a functional classification model of urban areas : For a given geographical area , Input remote sensing image and user visit data of the area , Final forecast 10 Regional functional categories of ten thousand test set samples .  after 3 The fierce competition of the month , Final Expelliarmus With 0.88767 Won the championship , That is to say, the team's training model successfully predicted the near 8.9 Regional functions of 10000 cities Categories , Where is the school 、 residence community 、 At the airport , Only one model is needed for accurate classification . The result is approaching 2019 First place in the international big data competition 0.90468. ​ 

Game Analysis : Function classification of urban area based on remote sensing image and user behavior


before ,2019 Baidu & Xijiao dada data competition has been held Urban Region Function Classification match , The contestants are required to build a functional classification model of urban areas ( Residential area 、 School 、 industrial park 、 Train station 、 At the airport 、 park 、 business zone 、 Government district 、 Hospital etc. ), For a given geographical area 、 Input the remote sensing image of the area and the visit data of the user , And predict the functional categories of the area .


The baseline challenge of the flying oars follows the above questions , The contestants are required to be based on remote sensing images and Internet user behavior , Use the flying oar to design a classification model of urban area function .  Because there are traces to follow , This allows players to reduce a lot of repetitive work . In this competition , The winning team Expelliarmus They shared their experience in the competition . They think that , This competition can ​ Follow the parts of the previous open source code that can be used , Reduce repetitive work , And use the model developed based on the flyer , Replace the model that does not meet the rules in the previous competition plan .

in other words ,Expelliarmus In this competition, the official baseline model 、 In the previous game top2 The team is used to GitHub On the open source feature extraction code , And combined with their own use of flying oars to build MLP The model trains the extracted features .

In this competition ,Expelliarmus The work includes the following four aspects :

1.      Based on the framework of the flyer MLP Model , And encapsulates MLPClassifier. Provides fit()、predict_prob()、score()、save_model()、load_model() Interface , Convenient model training prediction call . See... In the code for details models.py file .

2.      The official baseline model was modified as follows :​   

 ​a.      modify npy Generate file code , Use multiprocessing Multiprocessing , Speed up processing ;​   

 ​b.      modify reader Functions and infer function , Make it possible to batch forecast , Speed up the prediction ;​    

c.       Added k Fold cross validation code , And stacking Method to generate baseline model feature code .

3.      Use MLP The model performs feature selection , The specific way is :​   

a.      Divide the training verification set , And train with all the features MLP Model ;​    ​

b.      In order shuffle Verify the characteristics of each column of the set , And forecast on the model of the previous training , If the forecast score stays the same or increases , It means that this list of features doesn't work , The feature can be eliminated . See... In the code for details train_select.py file .

4.       Later use bagging Ways to train multiple models , That is, samples and features are taken before each training , Ensure the diversity of model training results , Improve the effect of model fusion .


Game ideas : Feature extraction and MLP model training


Expelliarmus Contributed to the idea of this competition , Please refer to : ​​​​​ 

In the course of the competition, we also used top The team's mind of the game , For details, please refer to : ​​​

Feature extraction mainly includes two aspects :

1.      Use the official baseline model to extract features . Please refer to the folder for specific code train_multimodel;

2.      Use the open source code of the Haifan learning team to extract features , There are three types of features :

The first category :basic features

Given a region's access data , We extract the statistical characteristics of different time periods in the region ( Include sum, mean, std, max, min, branch digit 25,50, 75 this 8 Statistics ). Don't distinguish the characteristics of users :24 Hours ,24 The ratio of the number of people in the next hour , The holiday season , Working day , Rest Day , wait . Distinguish the characteristics of users :1) In a day , What's the earliest time , At the latest , The latest minus the earliest , The maximum number of adjacent hours in a day .

2) Along the days , Statistical characteristics of each hour . wait

—— It's quoted from Haifan's Xi Xi blog .

The second category :local features

“ Number of days on the user's timeline , Hours , The earliest and last time of the day and the time difference between them , The maximum number of hours between adjacent times of the day ; And the corresponding characteristics of holidays ( Due to memory limitations , We have characteristics of holidays , Only some features are extracted , Days , Hours ), Here we have a little rough holiday points .”—— It's quoted from Haifan's Xi Xi blog .

The third category :global features Extracting local Characteristic method , Use part basic Replace features with local Characteristic variable ( For details, please refer to the blog of Haifan Xi Xi ), And use the feature selection method mentioned above from basic Select some features from the features . Extracting global Before feature , Continue from basic Select... From the features 50 Features , Is used to construct global features .  After feature extraction , We can use the characteristics of the official baseline model and the three types of characteristics of the Hai fan training team to train together MLP Model , Use 4 Crossover verification , Final score: 0.885+. And if you use the bagging Training methods , Training 50 individual MLP Model fusion , Final score: 0.887+. It should be noted that , above MLP The model layer settings are (256,128,64). 

Code catalog and description


So how to operate the above two methods ?Expelliarmus Provides a code directory and instructions . 

code

├─data: Data storage directory

│ ├─test_image: The test image

│ ├─test_visit: Test text

│ ├─train_image: Training pictures

│ └─train_visit: Training text

└─work

├─data_processing: Data preprocessing

│ ├─get_basic_file: Record training test files and training labels

│ └─get_npy: Generate npy file

├─feature_extracting: Feature extraction and filtering

│ ├─Basic_feature:basic features

│ │ ├─Code_Basic_feature_1

│ │ └─Code_Basic_feature_2

│ ├─UserID_feature_global:global features

│ └─UserID_feature_local:local features

├─train_all: Use 4 Fold the cross training model (score:0.885)

├─train_bagging: Use bagging How to train the model (score:0.887

)└─train_multimodel: Official baseline model features

notes : Some of the existing open source code include :
A. Modified from official baseline model :

work\data_processing\get_npy\get_npy.py

work\train_multimodel\multimodel.py

work\train_multimodel\train_utils.py

B. come from GitHub Open source code :

( website : ​​​

work\data_processing\get_basic_file\**

work\feature_extracting\Basic_feature\Code_Basic_feature_1\Config.py

work\feature_extracting\Basic_feature\Code_Basic_feature_1\feature.py

work\feature_extracting\Basic_feature\Code_Basic_feature_1\main.py

work\feature_extracting\Basic_feature\Code_Basic_feature_2\Config.py

work\feature_extracting\Basic_feature\Code_Basic_feature_2\feature.py

work\feature_extracting\Basic_feature\Code_Basic_feature_2\main.py

work\feature_extracting\UserID_feature_global\Config.py

work\feature_extracting\UserID_feature_global\function_global_feature.py

work\feature_extracting\UserID_feature_global\function.py

work\feature_extracting\UserID_feature_global\main.py

work\feature_extracting\UserID_feature_local\** 

The code runs in the following order :

Get into data_processing/get_basic_file

(1) python get_label.py: Generate training tags

(2) python get_train_test_csv.py: Record training visit file (csv)

(3) python get_train_test_txt.py: Record training visit、 test image file (txt)

  Get into data_processing/get_basic_file

(1) python get_npy.py: To generate an official baseline npy Array  

Get into work\feature_extracting\Basic_feature\Code_Basic_feature_1

(1) python main.py: Generate the first group basic features

(2) python merge20.py: Put half of the group basic A combination of features , For feature filtering

(3) python train_select.py: utilize MLP Screening features , Generate select_index.npy 

Get into work\feature_extracting\Basic_feature\Code_Basic_feature_2

(1) python main.py: Generate the first group basic features

(2) python merge20.py: Put half of the group basic A combination of features , For feature filtering

(3) python train_select.py: utilize MLP Screening features , Generate select_index.npy 

Get into work\feature_extracting\Basic_feature

(1) python train_select.py: utilize MLP Screen the first two sets of features

(2) python merge.py: Merge the selected features , Generate the final basic features  

Get into work\feature_extracting\UserID_feature_local

( Run in sequence to generate eight groups local features )

(1) python normal_local.py

(2) python normal_hour_local.py

(3) python normal_hour_local_std.py

(4) python normal_work_rest_fangjia_hour_local.py

(5) python normal_work_rest_fangjia_hour_local_std.py

(6) python normal_work_rest_fangjia_local.py

(7) pythondata_precessing_user_id_number_holiday.py

(8) python data_precessing_user_id_number_hour.py 

Get into work\feature_extracting\UserID_feature_global

(1) python train_select.py: stay basic Continue to filter out 50 Features

(2) python user_place_visit_num.py: User access location count

(3) python main.py: Use screened 50 Feature generation global features

(4) python merge.py: Merge , To get the final global features  

Get into work\train_multimodel

(1) sh download_pretrain.sh: download SE_ResNeXt50 Pre training model

(2) python train.py:k The official baseline model of cross training , Predict the probability value as a feature  

Get into work\train_all

(1) python train4fold.py: utilize MLP The model and all the features generated in front , Four fold cross training , The online score of the forecast result is :0.885+ 

Get into work\train_bagging

(1) python train.py: utilize bagging Strategy training 50 individual MLP Model

(2) python infer.py: Before utilization 46 Model prediction test set , The average sum of the probabilities , The result was an online score of :0.887+04

At the end

By the above methods , By South China University of technology CChan Led by Expelliarmus The final result is set at 0.88767, And this achievement also helped them to reach the top , Won the championship of this competition . thus it can be seen , Whether it's a fledgling student or a professional who has worked for a long time , As long as we are good at seizing opportunities , Then its own light can shine out . The first round of the baseline challenge is clearly an opportunity to make a difference , But it's not the only chance , Because China's AI competition · The competition of language and knowledge technology is also in the process of registration ! 

China artificial intelligence competition · The competition of language and knowledge technology is guided by three ministries of the state , At the national level AI competition . This competition has set the machine reading comprehension direction competition question , Competitors can get expert level free of charge AI Training and long term technical support , Xiamen government provides strong support policies for competitors , Baidu is also set up for individual competitors 12 Grand prize pool plus competition , What's more AI Studio free GPU Calculi helps the contestants to compete .

Don't regret the missed opportunity , Don't envy the brilliance of others , Every activity will open the door for those who are prepared , Click on the bottom left corner 【 Read the original 】 You can sign up , If you want to compete , Please seize this opportunity ​ ah !


Entry Guide : ​​​ He used a paddle ,“ draw ” It looks like a smart city _ The baseline _02

原网站

版权声明
本文为[Paddlepaddle]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202211511279726.html