当前位置:网站首页>Data mining scenario - false invoice
Data mining scenario - false invoice
2022-07-23 12:19:00 【Emperor Confucianism is supreme】
— Make a brief introduction to the basic business , The details of Taxation will be supplemented later
One . Business analysis
1. What is false VAT invoice
(1) Not buying or selling goods or not providing or receiving taxable services for others 、 For yourself 、 Let others do it for themselves 、 Introduce others to issue special VAT invoices ;
(2) There are goods purchased and sold or taxable services provided or accepted but for others 、 For yourself 、 Let others do it for themselves 、 Introduce others to issue special VAT invoices with false quantity or amount ;
(3) Carried out actual business activities , But let others issue special VAT invoices for themselves .
2. Business analysis
Taxpayer risk profile is analyzed by data model 、 Machine learning algorithm , Find tax risk enterprises suspected of falsely issuing VAT invoices in batches . Through qualitative and quantitative labels, we can depict the significant characteristics of tax risk taxpayers , Form a risk portrait , Assist tax personnel in the discovery and identification of tax risk taxpayers . Provide label model management 、 Risk inventory , Support group portraits 、 Multi type portrait mode such as single portrait .
Through data model analysis 、 Machine learning algorithm , Make comprehensive use of invoice relationship 、 Three members of the enterprise / The analysis of other relationships such as the cross employment relationship of four members, and the judgment of the entire invoice falsely written gang , Find abnormal enterprises suspected of falsely issuing VAT invoices in batches . Through qualitative and quantitative labels, we can depict the significant characteristics of the group of suspected taxpayers who falsely issue value-added tax invoices , Assist tax personnel in the discovery and identification of suspected taxpayers who falsely issue VAT invoices .
3、 ... and . Types and characteristics of false VAT invoices
Here I mainly introduce some of the things I have analyzed :
1. Walk away type of virtual opening
Walk away type of virtual opening , Also known as “ Violence is empty ”, Refer to , After the actor completes the virtual opening , Do not declare tax , Or declare without paying taxes . Such false Invoicing parties usually fight guerrilla warfare , One shot for another .
The characteristic of this kind of nihility is : Actors usually register multiple companies , And usually use others' ID cards to register , After that, we will carry out intensive virtual activities .
2. The ticket and goods are separated and falsely opened
The ticket and goods are separated and falsely opened , Usually refer to , For transactions on false invoices , The drawer has a corresponding real transaction , The real buyer does not need an invoice . Whether or not an invoice is issued , The drawer is required to declare and pay taxes , Therefore, the drawer will transfer the invoice corresponding to the real transaction to other people who need it . This kind of empty behavior passes Li Daitao's deadlock 、 The way of cheating , Avoid tax declaration on falsely issued invoices , Make the downstream deduct the input tax or ( and ) Pre tax deduction .
The typical mode of ticket and cargo separation virtual opening is :A The company sells goods to Li Si , Li Si doesn't need an invoice , therefore A The company will issue an invoice to B company . Take a popular sketch in life for example : I go to a restaurant for dinner , I ordered a bowl of fried noodles , Then I didn't eat and changed a bowl of noodles with the store , When the shopkeeper asked me to pay , I said I traded fried noodles for soup noodles , I didn't eat fried noodles, so I don't have to pay . In this way, can I eat noodles for nothing ?( ha-ha , for instance )
3. Tax preference type false opening
The so-called false opening of tax preference , It refers to the drawer's use of preferential tax policies , Or special policies similar to preferential tax policies ( such as , Tax verification 、 Financial subsidies 、 Bonded system 、 Agricultural products purchase invoice, etc ), The false behavior implemented . The characteristic of this false behavior is , The actor adopts preferential tax policies or special policies similar to preferential tax policies , There is no need to declare and pay taxes in full on falsely issued invoices .
Four . What are the characteristics of false VAT invoices
From the category of falsely issuing special VAT invoices 、 Characteristics and corresponding data , We can list the following characteristics :
(1) The name of the company is often changed when Invoicing , Most of them are commercial enterprises ;—( Existing enterprises change their names )
(2) A large number of invoices are invalid after being issued ;—( It also involves other )
(3) Most of the company's tax invoices are issued at the maximum amount , The full amount of the invoice is higher than 90%;—( Now, with the strengthening of management , The ceiling is falling )
(4) The registration information is the same , Enterprise legal person 、 financial staff 、 The tax personnel are mostly the same ;
(5) The names of goods purchased and sold by trading companies deviate seriously ;
(6) The invoice has been incrementally updated for many times ;
(7) There are a large number of red ink ordinary invoices 、 Issue red ink invoices at will to offset the blue ink invoices of previous years ;—( Whether the current month , Don't cross the moon . It's a negative number .)
(8) The number of capital or inventory turnover is more than five times per month ;
(9) The amount of value-added tax invoice issued within a certain period of time increases suddenly ;
(10) The establishment time is short , The establishment time is mostly within half a year , But the business scale expanded rapidly ;
(11) The registered address is usually a room on a floor in a residential area , Obviously not suitable for external business ;
(12) The legal person's registered residence is not local 、 Abnormal concentration of legal person establishment ;
(13) Production energy consumption, such as electricity, is seriously inconsistent with sales ;—( undetermined )
(14) The company is mostly subscribed or the paid in capital is mostly a lower amount ;
(15) The registered legal person of multiple enterprises is the same , And the mobile phone number left in the tax registration information is also the same mobile phone number ;
(16) A number of enterprises that continuously and simultaneously handle tax registration or are recognized by general taxpayers ;
(17) The industry of the company belongs to a false high-risk industry ;
(18) legal person 、 The financial principal once served as the principal or financial principal of abnormal accounts 、 And the legal person and the person in charge of Finance cross serve ;
(19) Many labor tickets are issued ;—( It should be judged in combination with the individual income tax payment )
(20) Night billing ;—( Now criminals are also “ progress ”, They are also making themselves more like normal enterprises )
5、 ... and . Algorithm model building
In various cases of tax evasion , You can see the most obvious 、 The easiest thing to check is that the purchase and sales of the goods invoice do not match . Therefore, the algorithm model of this scenario is built here .
(1) Business understanding :
For a normal enterprise , It will carry out business and production activities , Therefore, there will be purchase and sales records , That is, an enterprise will buy relevant goods that meet its business scope , That is, the input set , It will also sell relevant goods that meet its business scope to the market , That is, the output set . So look at it like this , The input set and output set of a normal enterprise are related . If the input and output of an enterprise have no correlation or the correlation is relatively small , Then this enterprise is likely to be abnormal , That is, abnormal operation , Then the invoice issued by this enterprise is also false . For example, in taxation , Some enterprises that falsely issue invoices and enterprises that change invoices , A large number of special invoices for VAT on goods with tax reduction and exemption will be used , Or the illegal act of tax evasion for the downstream to issue these invoices for deduction ; Another example is in export tax rebate enterprises , According to the goods they buy , The tax rate of the goods it should export is different from that of the goods it declares , So as to cheat and refund tax exemption activities .
(2) Algorithm to choose :
Word2Vec The algorithm maps the content of goods purchased and sold by enterprises , Construct semantic word vector , On this basis, the improved similarity 00 Degree algorithm exploration finds abnormal ticket changing enterprises . This algorithm can model the correlation of the enterprise's purchase and sale commodity set , By scoring the enterprise , To analyze whether the enterprise is reasonable . In this scoring process , The higher the score of an enterprise , Then the more normal this enterprise is ; conversely , The more abnormal . The collection of purchased and sold goods is composed of the goods and amount they buy and sell , So for now , Commodity is the smallest unit of these two sets , Therefore, what we should do is to start from the correlation between commodities , Then based on the relevance of the goods , Get the correlation between purchase and sales .
(3) analysis :
Generally speaking , There is a great connection between the goods purchased and sold by normal enterprises . Then based on this assumption , Use Word2Vec Use a tool n Dimensional real number vector to characterize each commodity , And satisfying the correlation between vectors can characterize the correlation between commodities . And the original Word2Vec Is used to process natural language , The analysis is the correlation between words . So here we assume that each commodity is regarded as a word , Then construct the commodity sequence .
Here, an enterprise is regarded as a statement , The purchase and sale goods of the enterprise work together to construct the commodity sequence . After the sequence construction of each enterprise is completed , Lost to Word2Vec, Output the n Dimension vector v. Last , Use cosine Correlation measurement formula for two different commodities p,q Measure the correlation between . as follows :
After the correlation between commodities is determined , Based on the correlation between commodities , And integrate the size of the amount to measure the correlation between each enterprise's purchase and sales of goods . set up G It is the input collection of the enterprise ,X It is the output collection of the enterprise . structure G、X Yes , For each of these p Belong to G The goods , from X Find the most similar q, constitute GX1={<p ,q>} The collection ; And for every q Belong to X The goods , from G Find the most similar p, constitute GX2={<p,q>}; Finally take GX1 And GX2 Union , obtain GX. Last G And X The measurement formula is as follows :
among ,sim(p,q) Indicates the goods in the input p Vectors and sales items q The correlation value between vectors ,min Indicates the goods in the input p The purchase amount and the goods in the sales q The smaller amount between the sales amount ,max Is the larger amount between these two amounts .
From this, we can get the correlation between each enterprise's purchase and sale commodity set , And use the correlation to judge whether the enterprise is abnormal . If the Correlation sim(G,X) Less than a given threshold , Then I think the enterprise is abnormal , Otherwise it's normal . This correlation can also be used as the normality of each enterprise .
边栏推荐
- Data analysis (II)
- 论文解读:《基于注意力的多标签神经网络用于12种广泛存在的RNA修饰的综合预测和解释》
- 深度学习-神经网络
- 硬件知识2--协议类(基于百问网硬件操作大全视频教程)
- Deep learning neural network
- Interpretation of the paper: attention based multi label neural network for comprehensive prediction and interpretation of 12 widely existing RNA modifications
- Using pycaret: low code, automated machine learning framework to solve classification problems
- Rondom总结
- ARM架构与编程3--按键控制LED(基于百问网ARM架构与编程教程视频)
- Check the sandbox file in the real app
猜你喜欢

Gartner调查研究:中国的数字化发展较之世界水平如何?高性能计算能否占据主导地位?

The green data center "counting from the east to the west" was fully launched

High level API of propeller realizes image rain removal

时间序列的数据分析(三):经典时间序列分解

2021 TOP10 development trend of information science. Deep learning? Convolutional neural network?

Interpretation of the paper: a convolutional neural network for identifying N6 methyladenine sites in rice genome using dinucleotide one hot encoder

单片机学习笔记1--资料下载、环境搭建(基于百问网STM32F103系列教程)

Gartner research: how is China's digital development compared with the world level? Can high-performance computing dominate?

论文解读:《提高N7-甲基鸟苷(m7G)位点预测性能的迭代特征表示方法》

利用or-tools来求解路径规划问题(VRP)
随机推荐
生命科学领域下的医药研发通过什么技术?冷冻电镜?分子模拟?IND?
常用数学知识汇总
Data analysis (II)
ARM架构与编程4--串口(基于百问网ARM架构与编程教程视频)
Rondom summary
数据分析(二)
High level API of propeller realizes image rain removal
The use of padding.nn.bceloss
数据分析(一)
NLP自然语言处理-机器学习和自然语言处理介绍(一)
Solve Sudoku puzzles with Google or tools
ARM架构与编程1--LED闪烁(基于百问网ARM架构与编程教程视频)
K核苷酸频率(KNF,k-nucleotide frequencies)或K-mer频率
Using Google or tools to solve logical problems: Zebra problem
Interpretation of yolov3 key code
《数据中心白皮书 2022》“东数西算”下数据中心高性能计算的六大趋势八大技术
数据挖掘场景-发票虚开
The green data center "counting from the east to the west" was fully launched
绿色数据中心“东数西算”全面启动
怎么建立数据分析思维