当前位置:网站首页>pymongo保存dataframe格式的数据(insert_one, insert_many, 多线程保存)
pymongo保存dataframe格式的数据(insert_one, insert_many, 多线程保存)
2022-07-25 15:46:00 【呆萌的代Ma】
使用Pymongo保存数据的基本方法(增删改查)请参考:Python连接MongoDB,使用pymongo进行增删改查
1. 基本方法: 逐行保存
这是最基本的保存方法,可以对数据本身做微调,然后保存
from pymongo import MongoClient
import pandas as pd
import numpy as np
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def _save_or_update_mongodb(coll, dict_value):
"""根据检查_id,如果存在就覆盖,如果不存在就新增"""
record = coll.find_one({
"_id": dict_value['_id']})
if not record:
coll.insert_one(dict_value)
else:
coll.update_one(record, {
"$set": dict_value,
})
def save_dataframe_to_mongo(dataframe):
coll = get_coll("test_db", "test_collection")
for index, series in dataframe.iterrows():
dict_value = series.to_dict()
dict_value.update({
"_id": index,
})
_save_or_update_mongodb(coll, dict_value)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(10, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
2. insert_many 批量保存
可以一次性保存一批数据,使用insert_many方法可以批量保存数据
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record")
coll.insert_many(dict_list)
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
3. Threading 多线程保存数据
Pymongo是多线程安全、多进程不安全的,因此可以肆无忌惮的使用多线程模式保存数据,示例代码如下:
from pymongo import MongoClient
import pandas as pd
import numpy as np
import math
import threading
def get_coll(database, collection, host="127.0.0.1"):
"""目标数据库"""
mongo_conn = MongoClient(host=host, port=27017)
mongo_db = mongo_conn.get_database(database)
coll = mongo_db.get_collection(collection)
return coll
def save_dataframe_to_mongo(dataframe, step=20):
coll = get_coll("test_db", "test_collection2")
thread_list = []
for i in range(math.ceil(dataframe.shape[0] / step)):
dict_list = dataframe.iloc[step * i:step * (i + 1)].to_dict(orient="record") # 待保存数据
# 多线程
thread = threading.Thread(target=coll.insert_many, args=(dict_list,))
thread.start()
thread_list.append(thread)
# 等待全部线程任务执行完成
for _thr in thread_list:
_thr.join()
if __name__ == '__main__':
df = pd.DataFrame(np.random.randn(900, 4))
df.columns = ['a', 'b', 'c', 'd']
save_dataframe_to_mongo(df)
边栏推荐
- MySQL页锁
- 如何构建面向海量数据、高实时要求的企业级OLAP数据引擎?
- LeetCode - 641 设计循环双端队列(设计)*
- 兆骑科创海内外高层次创新创业人才服务平台,双创成果转化平台
- Reasons for data format conversion when matlab reads the displayed image
- Endnote add Chinese gbt7714 style how to quote documents in word
- 递归菜单查询(递归:自己查自己)
- TypeError: Unrecognized value type: <class ‘str‘> ParserError: Unknown string format
- MySQL tutorial 71-where conditional query data
- 通用测试用例写作规范
猜你喜欢

「数字安全」警惕 NFT的七大骗局

LeetCode - 232 用栈实现队列 (设计 双栈实现队列)
![Beyond compare 4 realizes class file comparison [latest]](/img/ab/4babd7d4ee4ea132a6039858dd6451.png)
Beyond compare 4 realizes class file comparison [latest]

leetcode:6127. 优质数对的数目【位运算找规律 + 两数之和大于等于k + 二分】

Leetcode - 677 key value mapping (Design)*

LeetCode - 677 键值映射(设计)*

Leetcode - 622 design cycle queue (Design)

TypeError: Unrecognized value type: <class ‘str‘> ParserError: Unknown string format

华为2023届提前批预热开始!左 神的程序代码面试指南终派上用场

Recommended collection, which is probably the most comprehensive coding method summary of category type features
随机推荐
Zhaoqi Kechuang high-quality overseas returnee talent entrepreneurship and innovation service platform, online live broadcast Roadshow
30 lines write the concurrency tool class yourself (semaphore, cyclicbarrier, countdownlatch)
TypeError: Unrecognized value type: <class ‘str‘> ParserError: Unknown string format
How to realize page inclusion
Leetcode - 303 area and retrieval - array immutable (design prefix and array)
共2600页!又一份神级的面试手册面世~
The second revolution of reporting tools
BSC smart chain contract mode system development details
哪个led显示屏厂家更好
MySQL 悲观锁
兆骑科创高质量海归人才双创服务平台,线上直播路演
Google Blog: training general agents with multi game decision transformer
Release of v6.5.1/2/3 series of versions of Xingyun housekeeper: the ability of database OpenAPI continues to be strengthened
Recommended collection, which is probably the most comprehensive coding method summary of category type features
Window system black window redis error 20creating server TCP listening socket *: 6379: listen: unknown error19-07-28
HDD Hangzhou station · harmonyos technical experts share the features of Huawei deveco studio
Solve the vender-base.66c6fc1c0b393478adf7.js:6 typeerror: cannot read property 'validate' of undefined problem
Data system partition design - partition rebalancing
华为2023届提前批预热开始!左 神的程序代码面试指南终派上用场
百奥赛图与LiberoThera共同开发全人GPCR抗体药物取得里程碑式进展