当前位置:网站首页>He "painted" what a smart city should look like with his oars
He "painted" what a smart city should look like with his oars
2022-06-24 18:02:00 【Paddlepaddle】

With the development of Internet technology in recent years , The transformation and upgrading of urban industries are accelerating , A smart revolution is quietly taking place . that , How will cities be divided in the future ? What role will different urban areas play ? Urban fine governance is not only about the speed of urban development , It's also about the quality of life of every city resident . Fortunately AI Continuous maturity of Technology , It's impossible to build a functional classification model of a city . In particular, the gradual growth of the open-source deep learning platform of the flying oar , It also gives developers more choices . For the above problems ,2019 year 9 Month to 12 month , The first baseline challenge was held by scull , Competitors use the paddle to build a functional classification model of urban areas : For a given geographical area , Input remote sensing image and user visit data of the area , Final forecast 10 Regional functional categories of ten thousand test set samples . after 3 The fierce competition of the month , Final Expelliarmus With 0.88767 Won the championship , That is to say, the team's training model successfully predicted the near 8.9 Regional functions of 10000 cities Categories , Where is the school 、 residence community 、 At the airport , Only one model is needed for accurate classification . The result is approaching 2019 First place in the international big data competition 0.90468.
Game Analysis : Function classification of urban area based on remote sensing image and user behavior
before ,2019 Baidu & Xijiao dada data competition has been held Urban Region Function Classification match , The contestants are required to build a functional classification model of urban areas ( Residential area 、 School 、 industrial park 、 Train station 、 At the airport 、 park 、 business zone 、 Government district 、 Hospital etc. ), For a given geographical area 、 Input the remote sensing image of the area and the visit data of the user , And predict the functional categories of the area .
The baseline challenge of the flying oars follows the above questions , The contestants are required to be based on remote sensing images and Internet user behavior , Use the flying oar to design a classification model of urban area function . Because there are traces to follow , This allows players to reduce a lot of repetitive work . In this competition , The winning team Expelliarmus They shared their experience in the competition . They think that , This competition can Follow the parts of the previous open source code that can be used , Reduce repetitive work , And use the model developed based on the flyer , Replace the model that does not meet the rules in the previous competition plan .
in other words ,Expelliarmus In this competition, the official baseline model 、 In the previous game top2 The team is used to GitHub On the open source feature extraction code , And combined with their own use of flying oars to build MLP The model trains the extracted features .
In this competition ,Expelliarmus The work includes the following four aspects :
1. Based on the framework of the flyer MLP Model , And encapsulates MLPClassifier. Provides fit()、predict_prob()、score()、save_model()、load_model() Interface , Convenient model training prediction call . See... In the code for details models.py file .
2. The official baseline model was modified as follows :
a. modify npy Generate file code , Use multiprocessing Multiprocessing , Speed up processing ;
b. modify reader Functions and infer function , Make it possible to batch forecast , Speed up the prediction ;
c. Added k Fold cross validation code , And stacking Method to generate baseline model feature code .
3. Use MLP The model performs feature selection , The specific way is :
a. Divide the training verification set , And train with all the features MLP Model ;
b. In order shuffle Verify the characteristics of each column of the set , And forecast on the model of the previous training , If the forecast score stays the same or increases , It means that this list of features doesn't work , The feature can be eliminated . See... In the code for details train_select.py file .
4. Later use bagging Ways to train multiple models , That is, samples and features are taken before each training , Ensure the diversity of model training results , Improve the effect of model fusion .
Game ideas : Feature extraction and MLP model training
Expelliarmus Contributed to the idea of this competition , Please refer to :
In the course of the competition, we also used top The team's mind of the game , For details, please refer to :
Feature extraction mainly includes two aspects :
1. Use the official baseline model to extract features . Please refer to the folder for specific code train_multimodel;
2. Use the open source code of the Haifan learning team to extract features , There are three types of features :
The first category :basic features
Given a region's access data , We extract the statistical characteristics of different time periods in the region ( Include sum, mean, std, max, min, branch digit 25,50, 75 this 8 Statistics ). Don't distinguish the characteristics of users :24 Hours ,24 The ratio of the number of people in the next hour , The holiday season , Working day , Rest Day , wait . Distinguish the characteristics of users :1) In a day , What's the earliest time , At the latest , The latest minus the earliest , The maximum number of adjacent hours in a day .
2) Along the days , Statistical characteristics of each hour . wait
—— It's quoted from Haifan's Xi Xi blog .
The second category :local features
“ Number of days on the user's timeline , Hours , The earliest and last time of the day and the time difference between them , The maximum number of hours between adjacent times of the day ; And the corresponding characteristics of holidays ( Due to memory limitations , We have characteristics of holidays , Only some features are extracted , Days , Hours ), Here we have a little rough holiday points .”—— It's quoted from Haifan's Xi Xi blog .
The third category :global features Extracting local Characteristic method , Use part basic Replace features with local Characteristic variable ( For details, please refer to the blog of Haifan Xi Xi ), And use the feature selection method mentioned above from basic Select some features from the features . Extracting global Before feature , Continue from basic Select... From the features 50 Features , Is used to construct global features . After feature extraction , We can use the characteristics of the official baseline model and the three types of characteristics of the Hai fan training team to train together MLP Model , Use 4 Crossover verification , Final score: 0.885+. And if you use the bagging Training methods , Training 50 individual MLP Model fusion , Final score: 0.887+. It should be noted that , above MLP The model layer settings are (256,128,64).
Code catalog and description
So how to operate the above two methods ?Expelliarmus Provides a code directory and instructions .
code
├─data: Data storage directory
│ ├─test_image: The test image
│ ├─test_visit: Test text
│ ├─train_image: Training pictures
│ └─train_visit: Training text
└─work
├─data_processing: Data preprocessing
│ ├─get_basic_file: Record training test files and training labels
│ └─get_npy: Generate npy file
├─feature_extracting: Feature extraction and filtering
│ ├─Basic_feature:basic features
│ │ ├─Code_Basic_feature_1
│ │ └─Code_Basic_feature_2
│ ├─UserID_feature_global:global features
│ └─UserID_feature_local:local features
├─train_all: Use 4 Fold the cross training model (score:0.885)
├─train_bagging: Use bagging How to train the model (score:0.887
)└─train_multimodel: Official baseline model features
notes : Some of the existing open source code include :
A. Modified from official baseline model :
work\data_processing\get_npy\get_npy.py
work\train_multimodel\multimodel.py
work\train_multimodel\train_utils.py
B. come from GitHub Open source code :
( website : )
work\data_processing\get_basic_file\**
work\feature_extracting\Basic_feature\Code_Basic_feature_1\Config.py
work\feature_extracting\Basic_feature\Code_Basic_feature_1\feature.py
work\feature_extracting\Basic_feature\Code_Basic_feature_1\main.py
work\feature_extracting\Basic_feature\Code_Basic_feature_2\Config.py
work\feature_extracting\Basic_feature\Code_Basic_feature_2\feature.py
work\feature_extracting\Basic_feature\Code_Basic_feature_2\main.py
work\feature_extracting\UserID_feature_global\Config.py
work\feature_extracting\UserID_feature_global\function_global_feature.py
work\feature_extracting\UserID_feature_global\function.py
work\feature_extracting\UserID_feature_global\main.py
work\feature_extracting\UserID_feature_local\**
The code runs in the following order :
Get into data_processing/get_basic_file
(1) python get_label.py: Generate training tags
(2) python get_train_test_csv.py: Record training visit file (csv)
(3) python get_train_test_txt.py: Record training visit、 test image file (txt)
Get into data_processing/get_basic_file
(1) python get_npy.py: To generate an official baseline npy Array
Get into work\feature_extracting\Basic_feature\Code_Basic_feature_1
(1) python main.py: Generate the first group basic features
(2) python merge20.py: Put half of the group basic A combination of features , For feature filtering
(3) python train_select.py: utilize MLP Screening features , Generate select_index.npy
Get into work\feature_extracting\Basic_feature\Code_Basic_feature_2
(1) python main.py: Generate the first group basic features
(2) python merge20.py: Put half of the group basic A combination of features , For feature filtering
(3) python train_select.py: utilize MLP Screening features , Generate select_index.npy
Get into work\feature_extracting\Basic_feature
(1) python train_select.py: utilize MLP Screen the first two sets of features
(2) python merge.py: Merge the selected features , Generate the final basic features
Get into work\feature_extracting\UserID_feature_local
( Run in sequence to generate eight groups local features )
(1) python normal_local.py
(2) python normal_hour_local.py
(3) python normal_hour_local_std.py
(4) python normal_work_rest_fangjia_hour_local.py
(5) python normal_work_rest_fangjia_hour_local_std.py
(6) python normal_work_rest_fangjia_local.py
(7) pythondata_precessing_user_id_number_holiday.py
(8) python data_precessing_user_id_number_hour.py
Get into work\feature_extracting\UserID_feature_global
(1) python train_select.py: stay basic Continue to filter out 50 Features
(2) python user_place_visit_num.py: User access location count
(3) python main.py: Use screened 50 Feature generation global features
(4) python merge.py: Merge , To get the final global features
Get into work\train_multimodel
(1) sh download_pretrain.sh: download SE_ResNeXt50 Pre training model
(2) python train.py:k The official baseline model of cross training , Predict the probability value as a feature
Get into work\train_all
(1) python train4fold.py: utilize MLP The model and all the features generated in front , Four fold cross training , The online score of the forecast result is :0.885+
Get into work\train_bagging
(1) python train.py: utilize bagging Strategy training 50 individual MLP Model
(2) python infer.py: Before utilization 46 Model prediction test set , The average sum of the probabilities , The result was an online score of :0.887+04
At the end
By the above methods , By South China University of technology CChan Led by Expelliarmus The final result is set at 0.88767, And this achievement also helped them to reach the top , Won the championship of this competition . thus it can be seen , Whether it's a fledgling student or a professional who has worked for a long time , As long as we are good at seizing opportunities , Then its own light can shine out . The first round of the baseline challenge is clearly an opportunity to make a difference , But it's not the only chance , Because China's AI competition · The competition of language and knowledge technology is also in the process of registration !
China artificial intelligence competition · The competition of language and knowledge technology is guided by three ministries of the state , At the national level AI competition . This competition has set the machine reading comprehension direction competition question , Competitors can get expert level free of charge AI Training and long term technical support , Xiamen government provides strong support policies for competitors , Baidu is also set up for individual competitors 12 Grand prize pool plus competition , What's more AI Studio free GPU Calculi helps the contestants to compete .
Don't regret the missed opportunity , Don't envy the brilliance of others , Every activity will open the door for those who are prepared , Click on the bottom left corner 【 Read the original 】 You can sign up , If you want to compete , Please seize this opportunity ah !
Entry Guide : 
边栏推荐
- How does the chief information security officer discuss network security with the enterprise board of directors
- The country has made a move! Launch network security review on HowNet
- Redis source code analysis RDB
- Specification for self test requirements of program developers
- CentOS 7 installing SQL server2017 (Linux)
- Solutions for RTSP video streaming played by several browsers
- On the principle of cloud streaming multi person interaction technology
- Digital trend analysis of B2B e-commerce market mode and trading capacity in electronic components industry
- Conditional competition overview
- Easygbs video platform TCP active mode streaming exception repair
猜你喜欢

Ten excellent business process automation tools for small businesses

Project Management Guide: tips, strategies and specific practices
Issue 39: MySQL time class partition write SQL considerations

LC 300. Longest increasing subsequence

On software requirement analysis

Skills of writing test cases efficiently

Exception: Gradle task assembleDebug failed with exit code 1

Etching process flow for PCB fabrication

How to select the best test cases for automation?

How to decompile APK files
随机推荐
Mengyou Technology: tiktok current limiting? Teach you to create popular copywriting + popular background music selection
电子元器件行业B2B电商市场模式、交易能力数字化趋势分析
13 skills necessary for a competent QA Manager
TCE was shortlisted as a typical solution for ICT innovation of the Ministry of industry and information technology in 2020
Leveldb source code analysis -- writing data
C language | logical operators
Regression testing strategy for comprehensive quality assurance system
Quick view of product trends in February 2021
持续助力企业数字化转型-TCE获得国内首批数字化可信服务平台认证
Digital transformation informatization data planning and technology planning
About swagger
Software testing methods: a short guide to quality assurance (QA) models
Leetcode skimming questions - the 72nd biweekly match and 281 weekly match
Go language GC implementation principle and source code analysis
A set of IM architecture technology dry goods for 100 million users (Part 2): reliability, orderliness, weak network optimization, etc
Easyplayer streaming media player plays HLS video. Technical optimization of slow starting speed
投资理财产品的钱能随时取出来吗?
Ten excellent business process automation tools for small businesses
Zabix5.0-0 - agent2 monitoring MariaDB database (Linux based)
Considerations for it project demand analysis