当前位置:网站首页>Efficient office of fintech (I): automatic generation of trust plan specification

Efficient office of fintech (I): automatic generation of trust plan specification

2022-06-23 06:05:00 cjh_ hit

Efficient office of fintech : Automatically generate the trust plan specification

background

Computers have greatly improved people's work efficiency , But in addition to using mature software on the market , The financial industry also has to meet the actual business needs , Write your own gadgets to improve office efficiency .

The internship company gave me a task yesterday afternoon , Said he was in a hurry : According to two word The mapping relationship between file paragraphs automatically generates the trust plan specification . In particular , One document is the due diligence report , It contains relevant information about business participants , The information is filled in according to the specific template . Another document is the plan statement , There are also specific templates .

( So there is not much content in this project , Just use Python Will a word Copy the specified paragraph in the file to another word Specify the location in the file .)

Due diligence report :
 Insert picture description here
Instructions :
 Insert picture description here
The mapping table :
 Insert picture description here

demand

In the past, employees copied it manually 、 Paste , But the company has a large volume , Deal with a lot of contracts every day , Therefore, it is necessary to write a program to automatically generate plan specifications according to the mapping relationship to improve office efficiency .

To write

Finally, by checking the data , In the end use Python-docx The library has developed a program that can A The content between two paragraphs in a document ( Include Text and tables ) Copied to the B Procedure after the specified paragraph of the document .

Because the first contact Python-docx, Not very familiar with the principles and details of many interfaces .Python-docx The principle seems to be that Python-docx The structure of is transformed into xml. The data was used last year Java Handle xml The program , But I have forgotten for a long time …

The source code is given directly below :
Because for the first time Python-docx, If there is any non-standard or non Introduction , Please forgive me and point out .
Of course , Before use in adopt pip install Python-docx.

from docx import Document
from docx.text.paragraph import Paragraph
from docx.oxml.text.paragraph import CT_P
from docx.oxml.table import CT_Tbl
from docx.table import Table
from copy import deepcopy
import pandas  as pd

def copyText(filename,paratext,Para):
    document = Document(filename)
    paras=document.paragraphs
    index=0
    if type(paratext)==str:
        print('copy:',paratext,Para.text)
        for para in paras:
            if para.text==paratext:
                index=paras.index(para)+1
        para=paras[index]
    else:
        print('copy:',Para.text,paratext.text)
        for para in paras:
            if  para.text== paratext.text:
                index = paras.index(para) + 1
       paratext.runs[0].drawing_lst:
        para = paras[index]

    newPara=para.insert_paragraph_before()
    for run in Para.runs:
    # Copy content ( Include styles )
        newParaRun=newPara.add_run(run.text)
        newParaRun.bold = run.bold
        newParaRun.italic = run.italic
        newParaRun.underline = run.underline
        newParaRun.font.color.rgb = run.font.color.rgb
        newParaRun.style.name = run.style.name
   
    newPara.paragraph_format.alignment = Para.paragraph_format.alignment
    newPara.paragraph_format.first_line_indent = Para.paragraph_format.first_line_indent
    newPara.paragraph_format.left_indent = Para.paragraph_format.left_indent
    newPara.paragraph_format.right_indent = Para.paragraph_format.right_indent
    document.save(filename)

def copyTable(filename,paratext,table):
# Copy the form 
    document = Document(filename)
    paras = document.paragraphs
    if type(paratext)==str:
        for para in paras:
            #print(para.text)
            if paratext == para.text :
                paragraph=para
        tbl, p = table._tbl, paragraph._p
    else:
        for para in paras:
            # print(para.text)
            if paratext.text == para.text:
                paragraph = para
        tbl, p = table._tbl, paragraph._p
    new_tbl = deepcopy(tbl)
    p.addnext(new_tbl)
    document.save(filename)

def Copy_Contents_Between_ParaA_ParaB_to_ParaC(filename1, filename2,Paratext1,Paratext2,Paratext3):
    documentA = Document(filename1)
    paragraphs = documentA.paragraphs# All paragraphs 
    Paratext1 = Paratext1.encode('utf-8').decode('utf-8')
    for aPara in paragraphs:
        if Paratext1 == aPara.text :# Match to the beginning paragraph 
            ele = aPara._p.getnext()
            break
    while(True):# Traverse backward 
        if ele==None:
            break
        if ele.tag == '':
            break
        if isinstance(ele, CT_P):# It's a paragraph 
            para = Paragraph(ele, documentA)
            if Paratext2 == para.text:
                break
            copyText(filename2, Paratext3, para)# Copy the form 
            if para.text!='':
                Paratext3=para
        elif isinstance(ele, CT_Tbl):# It's a form 
            table=Table(ele,documentA)
            copyTable(filename2,Paratext3,table)# Copy the form 
        ele=ele.getnext()

if __name__ == '__main__':
    data = pd.read_excel(' To tune out - Plan specification mapping table .xlsx')
    for i in range(len(data[' Plan statement ( To generate table )'])):
        Copy_Contents_Between_ParaA_ParaB_to_ParaC(' Data sources - Due diligence report .docx',' The resulting trust plan statement .docx',data[' Due diligence report - Start paragraph '][i],
                                                             data[' Due diligence report - End paragraph '][i],data[' Plan statement ( To generate table )'][i])

Plan specification generated after program execution :
 Insert picture description here
You can see , The data specified in the due diligence report has been copied to the specified location in the plan specification .

Problems to be solved :
The above program can copy text, tables and styles , But you can't copy pictures . according to the understanding of ,Python-docx There is no interface to extract the picture at the specified location ( At least not in the official manual ), So secondary development is needed , It is necessary to study Python-docx And some xml The knowledge of the . But because time is limited ( Online classes still need to be watched , Homework still needs to be written ), I will leave this question to my internship director .
If the big guys know how to solve the problem of image processing , Please grant me your advice .

原网站

版权声明
本文为[cjh_ hit]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206230423075223.html