当前位置:网站首页>Read datasets iteratively with xgboost
Read datasets iteratively with xgboost
2022-06-27 07:57:00 【Datawhale】
Datawhale dried food
source :Coggle Data Science
In the process of reading and training large-scale data sets , Iteratively reading datasets is a good choice , stay Pytorch Supports iterative reading in . Next we will introduce XGBoost The way of iterative reading .
Memory data reading
class IterLoadForDMatrix(xgb.core.DataIter):
def __init__(self, df=None, features=None, target=None, batch_size=256*1024):
self.features = features
self.target = target
self.df = df
self.batch_size = batch_size
self.batches = int( np.ceil( len(df) / self.batch_size ) )
self.it = 0 # set iterator to 0
super().__init__()
def reset(self):
'''Reset the iterator'''
self.it = 0
def next(self, input_data):
'''Yield next batch of data.'''
if self.it == self.batches:
return 0 # Return 0 when there's no more batch.
a = self.it * self.batch_size
b = min( (self.it + 1) * self.batch_size, len(self.df) )
dt = pd.DataFrame(self.df.iloc[a:b])
input_data(data=dt[self.features], label=dt[self.target]) #, weight=dt['weight'])
self.it += 1
return 1Calling method ( This method is more suitable GPU Training ):
Xy_train = IterLoadForDMatrix(train.loc[train_idx], FEATURES, 'target')
dtrain = xgb.DeviceQuantileDMatrix(Xy_train, max_bin=256)Reference documents :
https://xgboost.readthedocs.io/en/latest/python/examples/quantile_data_iterator.html
External data is read iteratively
class Iterator(xgboost.DataIter):
def __init__(self, svm_file_paths: List[str]):
self._file_paths = svm_file_paths
self._it = 0
super().__init__(cache_prefix=os.path.join(".", "cache"))
def next(self, input_data: Callable):
if self._it == len(self._file_paths):
# return 0 to let XGBoost know this is the end of iteration
return 0
X, y = load_svmlight_file(self._file_paths[self._it])
input_data(X, y)
self._it += 1
return 1
def reset(self):
"""Reset the iterator to its beginning"""
self._it = 0Calling method ( This method is more suitable CPU Training ):
it = Iterator(["file_0.svm", "file_1.svm", "file_2.svm"])
Xy = xgboost.DMatrix(it)
# Other tree methods including ``hist`` and ``gpu_hist`` also work, but has some caveats
# as noted in following sections.
booster = xgboost.train({"tree_method": "approx"}, Xy)Reference documents :
https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html

Sorting is not easy to , spot Fabulous Three even ↓
边栏推荐
- SQL attendance query interval: one hour
- 【批处理DOS-CMD命令-汇总和小结】-将文件夹映射成虚拟磁盘——subst
- C how to call line and rows when updating the database
- Speech synthesis: tacotron explains [end-to-end speech synthesis model] [compared with traditional speech synthesis, it does not have complex phonetics and acoustic feature modules, but only uses < te
- Coal crusher
- How to view program running time (timer) in JS
- js输出1-100之间所有的质数并求总个数
- PayPal账户遭大规模冻结!跨境卖家如何自救?
- How can I import data from Oracle into fastdfs?
- 什么是期货反向跟单?
猜你喜欢
![[Software Engineering] software engineering review outline of Shandong University](/img/38/2c783df56b50dee3bbb908f6f3e70e.png)
[Software Engineering] software engineering review outline of Shandong University

【13. 二进制中1的个数、位运算】

js输出1-100之间所有的质数并求总个数
![[10. difference]](/img/15/ffd93da75858943fe887de1718e0f6.png)
[10. difference]

js中判断奇偶的函数,求圆面积的函数

L'introduction en bourse de Wild Wind Pharmaceutical a pris fin: Yu pinzeng, qui avait l'intention de lever 540 millions de RMB, a effectué un investissement P2P.

Common operation and Principle Exploration of stream

Helix QAC更新至2022.1版本,将持续提供高标准合规覆盖率
![[batch dos-cmd command - summary and summary] - parameters%0,%1,%2,%[0-9],%0-9 in the batch command and batch command parameter position switching command shift, operator% usage in the DOS command](/img/05/19299c47d54d4ede95322b5a923093.png)
[batch dos-cmd command - summary and summary] - parameters%0,%1,%2,%[0-9],%0-9 in the batch command and batch command parameter position switching command shift, operator% usage in the DOS command

Mapping of Taobao virtual product store opening tutorial
随机推荐
Etcd教程 — 第五章 Etcd之etcdctl的使用
索引+sql练习优化
参考 | Win11 开启热点之后电脑不能上网
Speech signal processing - concept (I): time spectrum (horizontal axis: time; vertical axis: amplitude), spectrum (horizontal axis: frequency; vertical axis: amplitude) -- Fourier transform -- > time
「短视频」临夏消防救援支队开展消防安全培训授课
Websocket database listening
盲测调查显示女码农比男码农更优秀
2、项目使用的QT组件
Construction of defense system for attack and defense exercises part II common strategies for responding to attacks
Programming life - what do you think of the 35 year old bottleneck of programmers?
R language consumption behavior statistics based on association rules and cluster analysis
Binary tree structure and heap structure foundation
(note) Anaconda navigator flashback solution
[13. number and bit operation of 1 in binary]
[10. difference]
JS output all prime numbers between 1-100 and calculate the total number
【批处理DOS-CMD命令-汇总和小结】-批处理命令中的参数%0、%1、%2、%[0-9]、%0-9和批处理命令参数位置切换命令shift,dos命令中操作符%用法
Speech signal processing - concept (II): amplitude spectrum (STFT spectrum), Mel spectrum [the deep learning of speech mainly uses amplitude spectrum and Mel spectrum] [extracted with librosa or torch
【批处理DOS-CMD命令-汇总和小结】-输出/显示命令——echo
Etcd tutorial - Chapter 5 etcd etcdctl usage