当前位置:网站首页>How pycharm packages OCR correctly and makes the packaged exe as small as possible
How pycharm packages OCR correctly and makes the packaged exe as small as possible
2022-07-23 11:20:00 【py617】
pack ocr I encountered many problems in the process of , The whole process involves path modification and environment variables . I believe many people will encounter the same problems as me , I try to write down all the problems I encounter in the process of packaging . I hope it will help you .
At first, my development environment was set up like this

Chose the third one System environment , It turns out that pyinstaller When the packaging , Put all the libraries in the system Including my py Libraries not used by the main program All packed in , The result is naturally a large file .
To solve this problem , I choose the first one instead Virtualenv Environment , A virtual environment is established under the root directory of the main program , But when you click ok , But prompted me to say : Permission denied .
The essence of the problem is , You installed python when , Choose a fool installation ,python Installed under your personal name . So now use this python Interpreter , The system thinks that you are in your own name , Not the whole computer , So I think I don't have this permission , rejected .
The solution is : It is recommended to reinstall python. I downloaded. ![]()
3.9.5 Version of , Double click installation .

Below Add python 3.9 to PATH hit √, Then click on Customize installation . Then enter the next interface , Remember right for all user hit √, Then click next installation step by step Until the end .
After this , Yours python , It's installed in
Under this path . And the permission is not restricted .
In this way, you can successfully set up the virtual environment .
Scanning class pdf turn word When writing , Basically, you need ocr Tools .
Install in advance ocr, The fool installation will be installed in C disc .
my py The main program , The modules used are :

among os yes python Self contained , No installation required , Others passed pycharm Below Terminal Installed :

Terminal, adopt pip install xxx Install the modules you need
pip install PIL Will make mistakes , To be changed to pip install image Can .
Run times No module named 'frontend', By additional installation pip install PyMuPDF solve ;
newspaper Error: No module named 'exceptions', adopt pip install python-docx solve ;
Then the following error message appears ,
![]()
This problem can be solved by pip install PyMuPDF==1.19.0 solve , Before that PyMuPDF Version of is too low , It's not pageCount Of this property .
((
By the way , Prompt during operation DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
say ANTIALIAS Has been deprecated , When I change the pixels of the picture , Just used Image.ANTIALIAS, Source code :
![]()
I simply changed the above sentence into
![]()
It turns out that the identified error rate is even lower . This is an empirical statement .
))
Continue the topic of packaging :
py Runtime , It is to find site-packages\pytesseract\pytesseract.py file .
There are two possibilities : If you are setting conda As an interpreter rather than a virtual environment , That system will read pytesseract.py When you file , Read the specific location in D:\ProgramData\Anaconda3\Lib\site-packages\pytesseract\pytesseract.py This .
( The system will also pack this when you pack py. therefore , If your location guide needs to be modified , Just change this py.)
If you set up a virtual environment ,venv Build in the same root directory of your main program , When the program is running or packaged , The file found by the system , It is under the root directory of the main program py file . The specific location is
(py The root directory of the main program )+ \venv\Lib\site-packages\pytesseract\pytesseract.py
And then through pytesseract.py Inside 
This code , Assigned a value to tesseract The location of , Let the system pass the environment variables you set before , find ocr Tools .
If you don't set the environment variables , Then change the above sentence into

Let the system go directly to the tools you installed .
Follow the steps above , Packed exe , It can run on your computer , But if you go to someone else's computer, it won't work . Because other people's computers are not like yours C There are tesseract The tool .
therefore , Most of the above online modification methods , It can only be used on your computer , pack exe If you go to someone else's computer to run, you will report an error

Say you didn't install or didn't set up the virtual environment after installation .
resolvent , Go to C Handlebar ocr Tool copy out , Put it in your py Under the root directory of the main program .
then pytesseract.py in , It corresponds to writing , Lead the path to this under the root directory ocr,

It looks like , Packed exe, The root directory is placed from C Copy from disk Tesseract-OCR Folder , Send it to others together and you can use it .
There is another point to pay special attention to : stay pyinstaller -F XX.py Before , Be sure to pip install pyinstaller install pyinstaller. Otherwise, it can also generate exe, But generated exe The file will be many times larger .
After all this fuss , Finally, I packed the file from the previous 347M, Shrink to the present 66M, Shrunk 5 Twice as many .

Then we think , Can you put this auxiliary folder Tesseract-OCR Pack it together. It just came out exe Well ?
Please read my previous article first , The principle must be understood , Otherwise, you will see the silly circle .
Next, let's talk about my solution to this problem :
pytesseract.py Make such changes in :

The code is as follows :
import sys
import os
def resource_path(path_a):
if getattr(sys, 'frozen', False): # whether Bundle Resource
base_path = sys._MEIPASS
else:
base_path = os.path.abspath(".")
return os.path.join(base_path, path_a)
fuzhu_filename = resource_path(os.path.join("",))
tesseract_cmd = os.path.abspath(os.path.join(fuzhu_filename,'Tesseract-OCR','tesseract.exe'))Then open the first packaging exe when , Generated under the same root directory spec file . Make the following modifications :

It means when you want to package files tpdf_docx.py , The address is xxx, And then pack it up py When you file , together with Tesseract-OCR To pack together , After packing , The name is also Tesseract-OCR.
Last , stay pycharm Of test.py Enter pyinstaller tpdf_docx.spec. Press Enter Key wait , The final file is generated in dist in . Generated for the first time before exe Will be replaced , If you want that first generated exe, It's just pyinstaller tpdf_docx.spec Before , Just copy it first .
This problem needs to be tried constantly , There will always be problems in the middle , But as long as you are good at thinking , Believe that the process is tortuous , The future is bright .
边栏推荐
- Pytorch white from zero uses North pointing
- BurpSuite学习笔记
- Using pytorch to realize the flower recognition classifier based on VGg 19 pre training model, the accuracy reaches 97%
- Five methods to prevent over fitting of neural network
- uni-app小程序中v-show与display:flex一起使用时v-show不生效!
- 【C语言】什么是函数?函数的分类和侧重(帮你快速分类和记忆函数)
- Spectral clustering | Laplace matrix
- D2DEngine食用教程(1)———最简单的程序
- [监控部署实操]基于granfana展示Prometheus的图表和loki+promtail的图表
- pycharm如何正确打包ocr且让打包出来的exe尽量小
猜你喜欢
![[Doris]配置和基本使用contens系统(有时间继续补充内容)](/img/74/21c5c0866ed6b1bb6f9a1e3755b61e.png)
[Doris]配置和基本使用contens系统(有时间继续补充内容)
![[metric]使用Prometheus监控flink1.13org.apache.flink.metrics](/img/9a/f6ef8de9943ec8e716388ae6620600.png)
[metric]使用Prometheus监控flink1.13org.apache.flink.metrics
![[pytho-flask笔记5]蓝图简单使用](/img/0a/00b259f42e2fa83d4871263cc5f184.png)
[pytho-flask笔记5]蓝图简单使用

用getchar清理缓冲区(强烈推荐,C语言易错典型)

Redis数据库和项目框架
![[flink]flink on yarn之flink-conf最简单配置](/img/de/0ec23f3379148dba27fe77dc51e22f.png)
[flink]flink on yarn之flink-conf最简单配置

Install pyGame using CMD

systemctl-service服务添加环境变量及模板

大厂面试机器学习算法(6)时间序列分析

Pytorch white from zero uses North pointing
随机推荐
p5面试题
知识点回顾
Pytorch white from zero uses North pointing
Keras保存训练过程中的最好模型
js的call、apply、bind
高阶函数的应用:手写Promise源码(四)
Use of views
PyGame realizes the airplane war game
JS higher order function
【无标题】
js的防抖和节流
【无标题】
plsql创建Oracle数据库报错:使用Database Control配置数据库时,要求在当前Oracle主目录中配置监听程序 必须运行Netca以配置监听程序,然后才能继续。或者
大厂面试机器学习算法(5)推荐系统算法
js的继承方式
手写Promise.resolve,Promise.reject, Promise.all
Anti shake and throttling of JS
Oracle创建数据库“监听程序未启动或数据库服务未注册”错误处理
Handwritten promise.resolve, promise reject, Promise.all
Spark常见面试问题整理