当前位置:网站首页>Pymongo saves data in dataframe format (insert_one, insert_many, multi-threaded saving)
Pymongo saves data in dataframe format (insert_one, insert_many, multi-threaded saving)
2022-07-25 17:26:00 【Cute Dai Ma】
Use Pymongo The basic method of saving data ( Additions and deletions ) Please refer to :Python Connect MongoDB, Use pymongo Add, delete, modify, etc
List of articles
1. The basic method : Save line by line
This is the most basic way to save , You can fine tune the data itself , Then save
from pymongo import MongoClient
import pandas as pd
import numpy as np
def get_coll(database, collection, host="127.0.0.1"):
""" Target database """
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def _save_or_update_mongodb(coll, dict_value):
""" According to the inspection _id, If it exists, cover , If it doesn't exist, add """
record = coll.find_one({
"_id": dict_value['_id']})
if not record:
coll.insert_one(dict_value)
else:
coll.update_one(record, {
"$set": dict_value,
})
def save_dataframe_to_mongo(dataframe):
coll = get_coll("test_db", "test_collection")
for index, series in dataframe.iterrows():
dict_value = series.to_dict()
dict_value.update({
"_id": index,
})
_save_or_update_mongodb(coll, dict_value)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(10, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
2. insert_many Save in bulk
You can save a batch of data at one time , Use insert_many Method can save data in batches
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
def get_coll(database, collection, host="127.0.0.1"):
""" Target database """
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record")
coll.insert_many(dict_list)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
3. Threading Multi thread save data
Pymongo Is multithread safe 、 Multi process unsafe , Therefore, you can use multithreading mode to save data recklessly , The sample code is as follows :
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
import threading
def get_coll(database, collection, host="127.0.0.1"):
""" Target database """
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
thread_list = []
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record") # Data to be saved
# Multithreading
thread = threading.Thread(target=coll.insert_many, args=(dict_list,))
thread.start()
thread_list.append(thread)
# Wait for all thread tasks to complete
for _thr in thread_list:
_thr.join()
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
边栏推荐
- 我也是醉了,Eureka 延迟注册还有这个坑!
- Customize MVC project login registration and tree menu
- Redis cluster deployment based on redis6.2.4
- 从数字化到智能运维:有哪些价值,又有哪些挑战?
- HCIP笔记十一天
- 接口自动化测试Postman+Newman+Jenkins
- win10自带的框选截图快捷键
- Bo Yun container cloud and Devops platform won the trusted cloud "technology best practice Award"
- Dynamic planning topic record
- 备考过程中,这些“谣言”千万不要信!
猜你喜欢

mindoc制作思维导图

备考过程中,这些“谣言”千万不要信!

The gas is exhausted! After 23 years of operation, the former "largest e-commerce website in China" has become yellow...

一百个用户眼中,就有一百个QQ

Jenkins' file parameters can be used to upload files

【解决方案】Microsoft Edge 浏览器 出现“无法访问该页面”问题

HCIP笔记十一天

Postdoctoral recruitment | West Lake University Machine Intelligence Laboratory recruitment postdoctoral / Assistant Researcher / scientific research assistant
![[Nanjing University of Aeronautics and Astronautics] information sharing for the first and second examinations of postgraduate entrance examination](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
[Nanjing University of Aeronautics and Astronautics] information sharing for the first and second examinations of postgraduate entrance examination

How to delete Microsoft Pinyin input method in win10
随机推荐
超越 ConvNeXt、RepLKNet | 看 51×51 卷积核如何破万卷!
第三章、数据类型和变量
爬虫框架-crawler
Rainbow plug-in extension: monitor MySQL based on MySQL exporter
Go语言系列:Go从哪里来,Go将去哪里?
Virtual memory management
【知识图谱】实践篇——基于医疗知识图谱的问答系统实践(Part4):结合问题分类的问题解析与检索语句生成
多租户软件开发架构
数据分析与隐私安全成 Web3.0 成败关键因素,企业如何布局?
C#入门基础教程
PostgreSQL passwords are case sensitive. Is there parameter control?
jenkins的文件参数,可以用来上传文件
Page table cache of Linux kernel source code analysis
约瑟夫环问题
多项式相加
Outlook 教程,如何在 Outlook 中搜索日历项?
Chapter 4: operators
Wu Enda logistic regression 2
Random talk on generation diffusion model: DDPM = Bayesian + denoising
EasyUI modification and DataGrid dialog form control use