当前位置:网站首页>PDF文本合并
PDF文本合并
2022-07-24 05:20:00 【滴滴滴'cv】
PDF文本合并
import os
from PyPDF2 import PdfFileReader, PdfFileWriter
# 使用os模块的walk函数,搜索出指定目录下的全部PDF文件
# 获取同一目录下的所有PDF文件的绝对路径
def getnumber(path):
try:
res = int(os.path.basename(path).split('.')[0])
except:
res = 10000000
return res
def getFileName(filedir):
file_list = [os.path.join(root, filespath) \
for root, dirs, files in os.walk(filedir) \
for filespath in files \
if str(filespath).endswith('pdf')
]
file_list = sorted(file_list,key=lambda x: getnumber(x))
return file_list if file_list else []
# 合并同一目录下的所有PDF文件
def MergePDF(filepath, outfile):
output = PdfFileWriter()
outputPages = 0
pdf_fileName = getFileName(filepath)
if pdf_fileName:
for index, pdf_file in enumerate(pdf_fileName):
print("路径:%s"%pdf_file)
# 读取源PDF文件
input = PdfFileReader(open(pdf_file, "rb"))
# 获得源PDF文件中页面总数
# if index == 4: pageCount = 1
# else: pageCount = input.getNumPages()
pageCount = input.getNumPages()
outputPages += pageCount
print("页数:%d"%pageCount)
# 分别将page添加到输出output中
for iPage in range(pageCount):
output.addPage(input.getPage(iPage))
print("合并后的总页数:%d."%outputPages)
# 写入到目标PDF文件
outPath = os.path.join(filepath, outfile)
if os.path.isfile(outPath) == True:
print(outPath, "PDF文件已存在,请删除后重试!")
return False
outputStream = open(outPath, "wb")
output.write(outputStream)
outputStream.close()
# print("PDF文件合并完成!")
return True
else:
# print("没有可以合并的PDF文件!")
return False
if __name__ == "__main__":
file_dir = input('请输入存有Pdf的文件夹: ').replace('/','//')# 存放PDF的原文件夹
outfile = "out.pdf" # 输出的PDF文件的名称
flag = MergePDF(file_dir, outfile)
if flag: print('PDF合并完成')
else: print('PDF合并失败,请重试!')
使用方法

数字编号排序,生成合并后的PDF文件。
边栏推荐
- 【activiti】activiti介绍
- 数据集成的两种架构:ELT和ETL
- ERP+RPA 打通企业信息孤岛,企业效益加倍提升
- Chapter 5 neural network
- 《统计学习方法(第2版)》李航 第14章 聚类方法 思维导图笔记 及 课后习题答案(步骤详细) k-均值 层次聚类 第十四章
- Flink sql-client.sh use
- Delete the weight of the head part of the classification network pre training weight and modify the weight name
- Likeshop single merchant SaaS mall system opens unlimited
- [vSphere high availability] working principle of host and virtual machine fault monitoring
- Syntax differences between MySQL and Oracle
猜你喜欢

Loss after cosine annealing decay of learning rate
![[activiti] Introduction to activiti](/img/99/e973279d661960853b3af69a7e8ef2.png)
[activiti] Introduction to activiti

多商户商城系统功能拆解03讲-平台端商家管理

Flink sql-client.sh use

labelme转voc代码中的一个小问题

Delete the weight of the head part of the classification network pre training weight and modify the weight name

Likeshop single merchant mall system is built, and the code is open source without encryption

Help transform traditional games into gamefi, and web3games promote a new direction of game development

【activiti】activiti介绍

第三章 线性模型总结
随机推荐
Likeshop100% open source encryption free B2B2C multi merchant mall system
Similarities and differences of ODS, data mart and data warehouse
Canal+kafka actual combat (monitor MySQL binlog to realize data synchronization)
jestson安装ibus输入法
【mycat】mycat安装
多商户商城系统功能拆解13讲-平台端会员管理
Highcharts use custom vector maps
找ArrayList<ArrayList<Double>>中出现次数最多的ArrayList<Double>
多商户商城系统功能拆解10讲-平台端商品单位
Mysqldump export Chinese garbled code
My little idea -- using MATLAB to realize reading similar to ring buffer
Canal+kafka实战(监听mysql binlog实现数据同步)
likeshop单商户商城系统搭建,代码开源无加密
数据集成的两种架构:ELT和ETL
多商户商城系统功能拆解05讲-平台端商家主营类目
The female colleague of the company asked me to go to her residence to repair the computer at 11 o'clock at night. It turned out that disk C was popular. Look at my move to fix the female colleague's
【activiti】activiti环境配置
使用bat命令快速创建系统还原点的方法
《统计学习方法(第2版)》李航 第14章 聚类方法 思维导图笔记 及 课后习题答案(步骤详细) k-均值 层次聚类 第十四章
传统的k-means实现