当前位置:网站首页>A survey of copyright protection research on deep learning model
A survey of copyright protection research on deep learning model
2022-07-23 07:22:00 【Midoer technology house】
Abstract
With the rapid development of deep learning technology , Deep learning model in image classification 、 Speech recognition and other fields have been widely used . The training depth learning model depends on a large amount of data and computing power , The high cost of , therefore , Sell trained models or provide specific services ( Such as DLaaS) Become a business model . However , If the model is stolen by malicious users , It may damage the business interests of model trainers . Besides , The process of network topology design and parameter training includes the wisdom of model trainers , Therefore, a well-trained model should belong to the intellectual property rights of the model developer to be protected . In recent years , Deep neural network watermarking has become a new research topic , Researchers introduce the method of multimedia content protection into the field of deep learning model protection , Try to embed watermark in the deep neural network model to verify the ownership of the model . At present, a large number of methods have been proposed , But lack of sorting and generalization . The existing methods in the field of neural network watermarking are combed and summarized , The future research direction in this field is also discussed . The model framework of neural network watermarking is given , And introduces the classification model 、 Basic concepts such as model backdoor . According to the mechanism of watermark embedding, the existing methods are classified into two categories : First, it is embedded inside the network , Take the internal information of the network as the carrier ; The second is to establish a network back door , Use the special mapping relationship of the back door as a watermark . The deep neural network watermarking methods based on these two ideas are comprehensively described and summarized , The characteristics of each method are discussed 、 Advantages and limitations , At the same time, the corresponding watermark attack methods are introduced and discussed . By analyzing the white box and black box scenes in the watermark , The model of white box distribution is difficult to be effectively protected , The anti attack of neural network watermarking in black box distribution and black box verification scenarios is worthy of further research .
key word : Neural network security ; Neural network copyright protection ; Black box watermark ; White box watermark ; Back door watermark
0 introduction
With the popularization and development of computing resources and big data , Deep learning technology has achieved great success in many fields of society , It provides a strong driving force for the upgrading of social industries and the development of science and Technology . However , Successfully training a deep learning model usually requires a huge cost of human and material resources . First , The training process relies on a large number of high-quality data that are accurately labeled . Getting big data itself is not easy , And it is more cumbersome to effectively clean and label the data . for example , The famous ImageNet[1] The image classification database is manually annotated in crowdsourcing mode with the help of users all over the world , Time consuming 3 It took years to take shape . secondly , In order to train an effective neural network model , It is often necessary to allocate a lot of computing resources to adjust the network topology and super parameters , At present, the weight of the more advanced deep neural network is becoming larger and larger , It can easily exceed 100 million orders of magnitude . for example , For natural language processing GTP-3 Model [2], The number of parameters has reached 1 750 Billion . therefore , The well-trained high-performance neural network model should be regarded as the labor product of the data and model owners , It should have exclusive intellectual property rights .
In addition to high-quality annotation data and sufficient computing resources , Training deep neural network model also requires some professional knowledge , Not everyone can complete the task . under these circumstances , The model of sales training has become a business model . for example , IBM The company proposed the concept of deep learning service (DLaaS,deep learning as a service)[3], Use the common TensorFlow and PyTorch Wait for the deep learning framework in IBM Cloud Deploy deep learning tasks for users on , Lower the threshold of deep learning . Besides , Amazon 、 Alibaba cloud provides corresponding DLaaS Of API For ordinary users . The rapid development of deep learning and DLaaS The gradual popularization of has also brought some security risks [4-5]. for example , The buyer of the model will copy or tamper with the purchased model and distribute it again , Or claim ownership of the model after stealing it , Will damage the intellectual property rights and economic interests of the model owner . therefore , A framework for copyright protection of neural network models is needed , To verify the ownership of the model by the model owner , And then protect the legitimate rights and interests of the model owner .
In recent years , The copyright protection of deep learning model has gradually attracted the attention of countries all over the world .2017 year 7 month , Printed and distributed by China 《 Development plan of new generation artificial intelligence 》, It is emphasized to establish artificial intelligence technology standards and intellectual property system .2018 year 11 month , The European Patent Office has issued patent guidelines for artificial intelligence and machine learning . thus it can be seen , Copyright protection for deep learning models has become an important research topic .
The deep neural network trained based on high-quality big data and precise network structure belongs to its owner's intellectual property , It has the characteristics of digital products . Based on this , Scholars use digital watermarking method to protect multimedia content copyright [6-7] Introducing deep learning , That is, embedding watermark in the trained deep neural network model .2017 year ,Uchida etc. [8] The concept of neural network watermarking is proposed for the first time , A method of embedding watermark into the regular term of network loss function is proposed . And then , Academic circles have carried out extensive research on deep learning model watermarking .
On the whole , The method of protecting the deep neural network model is similar to that of protecting general digital products , With the help of cryptography (cryptography)[9] Or watermarks [10] Technical means . The main way to use cryptography is to encrypt the important data of the model (encryption), And only distribute the key to authorized users (key). The main limitation of this method is that authorized users cannot be controlled to decrypt (decryption) Post model behavior . Watermarking technology can make up for its limitations , Effectively trace the infringement . Besides , For the generative in deep learning (generative) And discriminant (discriminative) Two types of models , The overall embedding idea of neural network watermark is different . This paper summarizes all methods at the same time , This paper focuses on the digital watermarking protection method of the most common multi classifier model in the discriminant model .
at present , The watermarking methods used to protect the depth network classifier can be divided into two categories according to the watermark embedding mechanism : The method based on the internal information of the network and the method based on the back door . Based on the internal information of the network, the watermark is directly embedded into the internal structure of the target model , Including embedding the watermark into the weight 、 Activate the output of the layer and add a new layer in the network as a watermark . The method based on backdoor mainly aims at the task of image classification , By embedding the back door in the deep learning model , Introduce a special input-output relationship . When model ownership needs to be verified , The owner of the model can input the special samples embedded as the back door into the network to obtain the preset special labels . Because the back door of the model is only known to the owner of the model , You can prove your ownership of the model by showing the input-output relationship of these exceptions .
1 Neural network watermarking framework and method classification
This section introduces the basic framework of neural network watermarking and related concepts , Further classify the existing relevant research . Considering that most of the existing research work is aimed at the discriminant model , This section takes the deep neural network classifier based on supervised learning as the main research model , At the same time, other models are briefly introduced .
1.1 Performance indicators
When embedding watermark , We need to balance the watermark quality and the function of the model , The deep neural network model embedded with watermark may experience unintentional or malicious attacks after distribution , Watermark extraction is needed in the watermark verification stage , Thus we can see that , For neural network watermarking, there are the following performance indicators .
1) Function invariance : After the watermark is embedded, the performance of the original model should be as unaffected as possible . This index can be used to evaluate the watermark system and the attack on the watermark system at the same time , That is, neither embedding watermark nor attacking watermark can be at the expense of model function , Otherwise embedding and attacking will be meaningless .
2) Robustness : The embedded watermark should be able to resist possible attacks , Such as fine-tuning the model 、 Compression or second training , It can also be reliably extracted after being attacked ; The watermark embedded based on backdoor mechanism should be able to resist escape attack and ambiguity attack .
3) Embedded capacity : For the neural network watermarking method of modifying the internal information of the network , The maximum number of information bits that can be embedded .
4) Security : The performance that the watermark information embedded in the neural network or the established backdoor mapping relationship is not obtained by the attacker , It is mainly realized by embedding the key .
5) Computational complexity : The computational complexity of neural network watermark embedding and verification , Can be evaluated separately . For example, the computational complexity of watermark based on backdoor is low in the verification stage , Just compare the inferences .
1.2 Classification of neural network watermarking methods
The existing neural network model copyright protection methods are classified as chart 1 Shown . This section first gives the macro watermarking framework of the typical supervised learning based classification model in the discriminant model , Then briefly introduce the methods for generative model and deep neural network model based on cryptography protection .
1.2.1 Classification model
The training goal of deep neural network classification model is to establish sample space X To label space Y Mapping . Suppose the training set consists of N On the sample - Label pairs constitute , Write it down as (xi,yi), among i∈{0,1,…, N- 1},xi∈X,yi∈Y, For the model , Correct and effective input-output relationship is used f :X →Y Express . If you use MM Represents the model structure , use WW Represents the weight of the model , Then the training process of the model is as follows .
Wˆ=argminW Loss{ M[W,X],Y} (1)W^=argminW Loss{M[W,X],Y} (1)
among ,WˆW^ For the weight after training . For all training sets , When the proportion of correct mapping relationship is greater than (1−ε)|X|(1−ε)|X| when , Determine whether the model training is successful , among ε For a small positive real number .
M[Wˆ,xi]=yi (2)M[W^,xi]=yi (2)
chart 1

chart 1 Classification of copyright protection methods for neural network models Figure 1 Classification of methods for neural network intellectual property protection
1.2.2 Discriminant model protection method
Embedded inside the network : Watermark embedding is realized by modifying the internal information of the trained network , The embedding process can be described as
M˜[W˜,⋅]=argminWˆ Loss{ M[Wˆ,X],Y}+M˜[W˜,⋅]=argminW^ Loss{M[W^,X],Y}+
λLoss{Mark} (3) λLoss{Mark} (3)
among ,λ Parameters for weighing losses , The left part of the loss function guarantees the function invariance of the model , The right part ensures the embedding of watermark , Mark It is a special mapping relationship with key , Due to different methods , The design is flexible . Watermark verification , Using preset keys and embedded Mark Establish functional relationships , When the mapping accuracy meets a certain proportion, the verification is considered successful .
Establish a network back door : By adding trigger set training to the training set , The model learns to preset special mapping relationships to establish backdoors [11,12,13]. Use angle markers { ⋅}T Indicates trigger set , Then the trigger sample is expressed as xT,i, Its corresponding label is yT,i, The weight embedded in the back door can be expressed as
W˜=argminW Loss{ M[W,X],Y}+W˜=argminW Loss{M[W,X],Y}+
λLoss{ M[W,XT],YT}, (4) λLoss{M[W,XT],YT}, (4)
among , The left part of the loss function guarantees the function invariance of the model , The right part ensures the establishment of the back door . Watermark verification , Given by (xT,i,yT,i)(xT,i,yT,i) The key , When the correct proportion of the model inference result is greater than (1−ε)|XT|(1−ε)|XT| when , It is judged that the verification is successful .
M[W˜,xT,i]=yT,i (5)M[W˜,xT,i]=yT,i (5)
It is worth noting that , The above two embedding methods can be fine tuned based on the trained network , The mapping relationship between normal samples and trigger samples can also be obtained by retraining . Besides , The above methods are often used in the classification task of image recognition , That is, input the image to get the inferred class label .
1.2.3 Generative model protection method
Discriminant models mainly learn the posterior relationship between features and data , The generative model mainly studies the joint distribution relationship between data and features or the distribution of data itself . So the generative model , Such as the generation of confrontation network (GAN,generative adversarial network) Or variational self encoder (VAE, variational autoencoder) New data can be generated , It is richer than the discriminant model that only outputs posterior scores . In this case , Attackers can use the model to generate a large amount of data for training their own networks , Therefore, the new data generated by the generative model also needs to be protected .
In the field of computer vision , Generative models are generally used for various image processing tasks , Their output is no longer classified labels , It's the processed image . For this kind of model ,Zhang etc. [14] Firstly, a protection method of embedding watermark in the spatial domain of the output image is proposed . Later, this method was extended to the problem of embedding multiple watermarks when the model has multiple release versions , At the same time, a scheme of watermark embedding directly in the process of network training is proposed [15].
However, the watermark embedded in this way is not directly related to the network , If the attacker learns the internal information of the model , You can embed your own watermark , Or by PS And other image processing technology to remove the watermark in the picture . In response to these limitations ,Wu etc. [16] An improved system is designed , Let the protected network and watermark extraction network participate in the training together , In the verification phase, we need to use this watermark extraction network to extract the watermark .
1.2.4 Methods based on cryptography
In addition to using watermarks for copyright protection , Other scholars use cryptography to protect the copyright of deep learning model , This method uses chaotic encryption algorithm to encrypt the arrangement position of model weights , Only the model decrypted by weight can infer normally , Therefore, the significance of stealing an encrypted model will be greatly weakened , Attackers need to be retrained [9].
1.3 White box and black box scenes
The existing research divides the copyright protection system of deep network model into white box method and black box method according to whether it needs to access the internal parameters of the network in the verification stage . White box method with Uchida etc. [8] The proposed watermarking verification framework is representative , Watermark verification requires access to model weights . The watermark system based on backdoor mechanism is a typical black box method , This kind of method can verify the ownership of the model only through the input-output relationship , There is no need to access the internal parameters or structures of the model . However , When designing a model copyright protection scheme , The situation is more complicated , Not only the conditions required for watermark verification should be considered , You should also consider the situation where models are distributed in different ways , Robustness of model watermark to various potential attacks .
2 Neural network classification model watermark
This section focuses on chart 1 In this paper, two kinds of watermark embedding methods, namely, embedding the discriminant model inside the network and establishing the network back door . Based on convolutional neural network (CNN,convolutional neural network) Take the deep neural network classifier of , chart 2 It shows how these methods embed watermarks into the network .
chart 2

chart 2 Deep neural network classification model watermark embedding method Figure 2 Demonstration of watermark embedding methods for DNN classifier
2.1 Embedded inside the network
(1) Embed model weights
The method of embedding weight attempts to embed binary watermark bit information B Through the key K Embedded in some network weights , Corresponding chart 2 The part of the green box in the lower right corner of the middle . Remember that the selected part of the network weight is VV,V⊂WV⊂W, Then the embedded process can get new model parameters through training W˜W˜.
W˜=argminWˆ Loss{ M[Wˆ,X],Y}+W˜=argminW^ Loss{M[W^,X],Y}+
λLossV{ K⋅V,B} (6) λLossV{K⋅V,B} (6)
among ,{ ⋅} Represent inner product . type (6) Is the type (3) A specific case of . When verifying the watermark , First, select part of the weight according to the position of the embedded weight , Because the network weight may be attacked , So it is recorded as V′V′, Then provide the key K And embedded information B, When the comparison results meet
sgn{sigmoid(K⋅V′)}=B (7)sgn{sigmoid(K⋅V′)}=B (7)
Or when a certain proportion of the comparison is correct , Then it is judged that the watermark verification is successful , among sgn{ ⋅} It's a symbolic function . Next , Introduction is based on (6)、 type (7) Some neural network watermarking methods .
Uchida etc. [8] The first watermarking framework embedded in the network is proposed , The watermark bit information is embedded into the weight distribution of an intermediate layer by means of regularization parameters , This work is the first attempt to embed watermark in deep neural network , It reveals the potential of embedding watermark into the model . But this method is vulnerable to rewrite attacks , Because the number of layers of the network is limited , Attackers can retrain each layer , In this way, the original watermark will be destroyed . in addition , This method of embedding watermark through regularization parameters may interfere with the normal training of the model .
Chen etc. [17] It is also proposed to embed the watermark into the model weight , To track the usage of the model , Provide each user with a vector code when distributing the model . These vectors are encoded by the projection matrix secretly generated by the model owner XX, When training, it is embedded into the model as a watermark . During verification, the user extracts the corresponding weight according to his own code , Then the model owner compares it with XX Multiply to get the watermark embedded in the model in advance .
Rouhani[18] The watermark is embedded into the dynamic part of the network —— In the probability distribution of the activation layer . In the embedding phase , Watermark is trained with the network , Output the model embedded with the watermark and the key storing the watermark location information WM keys. In the validation phase , This key is needed to trigger the model to get the probability distribution of the activation layer embedded with the watermark , Then extract the embedded watermark signature . Because the watermark is embedded in the dynamic part of the network , Compared with the method of embedding weights , The watermark capacity that can be embedded is larger , And this approach relies on both data and models , It is more flexible and difficult to detect in practical applications .
Kuribayashi etc. [19] Embed the watermark into the weight of the full connection layer , Quantized index modulation is used to control the amount of change introduced by the embedded watermark in the model , Random permutation and jitter modulation are used to improve the confidentiality of watermark .
Feng etc. [20] A model tuning scheme with compensation mechanism is proposed . He set two keys : secret key K0 As the initial value of the pseudo-random Algorithm n Where the weights are embedded , secret key K1 As the initial value of another pseudo-random generator , This pseudo-random generator is used to generate specific noise patterns , To modulate the watermark after orthogonal transformation . secret key K0 The existence of has set up obstacles for attackers to implement rewrite attacks , Because the attacker cannot know the position of the weight of the embedded watermark . Spread spectrum modulation for the watermark allows the watermark to be embedded in the weight dispersedly , Enhance the robustness of the watermark .
The method of embedding watermark into network parameters is vulnerable to fine-tuning attacks , therefore Tartaglione[21] Propose another strategy : Let the weight embedded with watermark not participate in the parameter update during network training . By adjusting the loss function during training , Make some parameters with watermark information remain unchanged before and after network training , This method has strong robustness to fine-tuning attacks .
(2) Embed new layer
All the methods mentioned above retain the original model structure , Only embed the watermark into the model parameters , and Fan etc. [22] Another idea is adopted , That is to add a new layer in the network to realize the embedding of watermark . This method adds a layer named Passport The layer ( Such as chart 2 The green dotted box in the neural network is shown ), This layer is specially used to embed watermark information B, If you will Passport The weight of the layer is expressed as WPWP,f(WP)f(WP) Represents the result of mapping the weight of this layer to binary , The weight after embedding the watermark can be expressed as
W˜=argminW Loss{ M[W,X],Y} +W˜=argminW Loss{M[W,X],Y} +
λLossP{ f(WP),B} (8)λLossP{f(WP),B} (8)
Again , If any during verification M[Wˆ,WP]=B′=BM[W^,WP]=B′=B, Then it is determined that the verification is successful . Besides , There are many models and Passport Distribution method . The first is to add Passport The model obtained from layer training is distributed to users , Using this model, users can not only get the output of the original task , You can also extract the corresponding Passport layer , To prove that they are legitimate users ; The second scheme trains the original network and adds Passport Layer model , Only distribute the original model to users , Another network embedded with watermark is only used for ownership verification , Not to the outside world ; The third scheme adds trigger set on the basis of the former one , When distributing, only the original network will be distributed , Reserved added Passport Model of layer and back door , Compared with the second scheme, a verification method is added .
(3) Output of Embedded Network
Embedding watermark into model parameters is vulnerable to model extraction attacks , Therefore, some scholars proposed to embed the watermark in the output of the model .Sebastian[23] Embed watermark in API A subset of responses , A dynamic watermarking method is proposed , Dynamically change the results of a small number of sample queries , The impact on classification accuracy is negligible . A model for machine translation tasks ,Venugopal etc. [24] This paper proposes a method of using hash function to generate a fixed length sequence to embed watermark in its output to distinguish the results of machine translation and human translation .
(4) other
Lou etc. [25] The watermark is embedded into the network structure by using neural network architecture search .He etc. [26] Select the weight of a part of the network and a single-layer perceptron to share parameters , Use the input of the perceptron as the key , The output of the perceptron is the watermark information .
Some scholars embed neural networks as watermarks .Lyu etc. [27] The neural network itself is embedded as a watermark , A neural network used as watermark is proposed HufuNet, Embed half of its convolution kernel into the parameters of the target model , The other half is reserved , For ownership verification .
2.2 Establish a network back door
(1) Add disturbance to the original training set image
A typical way to construct trigger samples is to select some pictures in the original training set , Apply a specific interference pattern to the picture , And randomly assign a label in the original training set ( stay chart 2 It is indicated by a blue box ). If you use Kpattern To represent the interference mode of image content ,per(⋅) Represents the perturbation algorithm , Then there are
xT,i=Per(xi,Kpattern),YT⊂Y,xT,i=Per(xi,Kpattern),YT⊂Y,
xT,i↦yT,i≠f(x) (9)xT,i↦yT,i≠f(x) (9)
This interference pattern can be a meaningful string or Logo, Or a certain mode of noise [28]. However, these disturbed samples are often quite different from normal samples in feature distribution , It is easy to be recognized by the attacker in the watermark verification stage . An attacker can first let the query sample pass through a backdoor detector , If the detector thinks this is a query sample , Then reject the output or randomly output a label , So as to escape the verification of the back door [4]. To improve robustness ,Li etc. [29] Use the self encoder to generate invisible Logo, The generated blind watermark is embedded into the original image as a trigger set , Trigger set images are visually no different from normal samples , And it is consistent with the characteristic distribution of normal samples , Therefore, it can better resist the above escape attacks . Some scholars embed invisible in the frequency domain Logo To generate trigger sets [30], The watermark in frequency domain is more hidden , It is also more robust to various signal processing methods . In order to reduce the negative positive rate ,Guo etc. [31] Genetic evolutionary algorithm is used to determine the trigger mode of backdoor dependence .
(2) Take the picture example as the back door
There is also a kind of backdoor method that does not take a specific trigger pattern as the construction method of trigger set , The image instance is used as the back door ( Corresponding chart 2 The trigger set sample represented by the red triangle ):
XT∩X=∅,YT⊂Y,XT∩X=∅,YT⊂Y,
xT,i↦yT,i≠f(x) (10)xT,i↦yT,i≠f(x) (10)
Zhang etc. [28] Replace some pictures in the original training set with pictures unrelated to this task ,Yossi etc. [32] A set of pictures downloaded from the Internet is used to build the trigger set . Besides , Unlike most existing watermarking systems, which require trusted third parties to complete watermark verification , He introduced commitment Mechanism , At the same time, it restricts the model owner and attacker , Watermark verification can be completed without introducing a trusted third party .
(3) Add a new category label
Although the above two backdoor watermarking methods do not change the structure of the model , but Zhong etc. [33] It is considered that the method of superimposing disturbances on pictures introduces a wrong mapping relationship in the network , It changes the decision boundary of classification network , Then it affects the accuracy and robustness of the model . Adding a new category will not interfere with the original classification boundary . So he added a class to the original label space , Let all the back door pictures belong to this new category ( stay chart 2 Yellow rectangle is used in ).
XT⊂X,YT∩Y=∅,XT⊂X,YT∩Y=∅,
xT,i↦yT,i≠f(x) (11)xT,i↦yT,i≠f(x) (11)
The samples corresponding to the trigger set of the new category are added to the original training samples Logo A new set of samples after the label .
(4) Use confrontation samples
Fighting against samples is by adding subtle interference to the original image , It can make the model output a misclassified input with high confidence [34-35].Merrer etc. [36] Using the way of confrontation training, the confrontation samples are embedded into the network as a back door . He selected a part of the confrontation samples as the trigger samples , Assign tags with correct classification to these selected countermeasure samples , Let the model fine tune these samples . In the process of fine-tuning , The decision boundary of the model near these trigger samples will change . A model that has undergone the above fine-tuning , If triggered again, the centralized countermeasure sample will trigger , The correct classification result will be output . under these circumstances , The embedding method of watermark remains unchanged , However, the verification method is opposite to the back door scheme mentioned above . The trigger set and mapping mechanism of this method can be expressed as
xT,i=A(xi),xT,i=A(xi),
xT,i↦yT,i=f(x) (12)xT,i↦yT,i=f(x) (12)
among ,A(xi)A(xi) Indicates resistance to attack , But the samples after fighting the attack are still correctly classified .
Some scholars verify the copyright of the model based on the characteristics of the confrontation samples .Lukas[37] and Zhao etc. [38] Taking advantage of the mobility of the counter samples , That is, the antagonistic samples often have high mobility on the same or similar models , Therefore, the agent model will show high mobility , By comparing the attack success rate of the countermeasure sample on the suspicious model with a certain threshold , So as to judge whether the suspicious model is the proxy model obtained from the original model .Chen etc. [39] The first multi bit watermarking scheme based on backdoor is proposed , He uses a model related coding scheme , The author's signature is embedded in the prediction result of the model in the form of binary code , This scheme uses the confrontation samples generated by orientation as the backdoor trigger set , And distribute the countermeasure samples and the corresponding classification confidence scores as model fingerprints to legitimate users ( Such triggers are set in chart 2 Red rectangle is used in ).
(5) other
In order to make the back door scheme relevant to both the model and the user , Zhu etc. [40] The trigger set is constructed in this way : Two undisclosed hash functions are used to generate the image chain as the trigger set and its corresponding tag . One way hash function cannot be constructed in reverse , Therefore, it is difficult for attackers to implement forgery attacks in the verification stage .
There is another method that is similar to the back door , But no specific trigger set is required [41]. This method adopts a learnable image encryption algorithm , Some clean images are encrypted and transformed as trigger sets , When verifying, you need to use the key to process the image before you can get the correct inference .
The literature [42] Both the clean sample and trigger set images are embedded by steganography Logo, Link the watermark to the owner's identity , It is convenient to distinguish from other enterprises 、 Products or services .
The classification of depth model watermarking methods is as follows surface 1 Shown .
surface 1 Classification of depth model watermarking methods Table 1 The classification of DNN watermarking methods
Category | Watermark method | Verification scenario | zero / Multibit | Robustness |
The watermark is embedded into the network weight in the form of regularization [8] | White box | Multibit | Able to cope with pruning 、 fine-tuning | |
Publish vector codes to legitimate users to implement embedded networks , Users can only extract weights by coding [17] | White box | Multibit | Able to cope with coordinated attacks 、 prune 、 fine-tuning | |
Embedded inside | Embed the watermark into the middle layer of the model / In the probability density function of the output layer feature output [18] | White box / Black box | Multibit | Able to cope with compression 、 Fine tuning and rewriting attacks |
Fine tuning with compensation mechanism , Two keys are used to specify the weight position of the embedded watermark and the covering noise pattern [20] | White box | Multibit | Can cope with rewrite attacks | |
Let the network weight embedded with watermark not participate in network training [21] | White box | Multibit | Able to cope with fine-tuning | |
Insert passport layer [22] | White box / Black box | Multibit | Need to be passport Use , Individually verified by signature | |
Overlay on the original Logo、 Noise and backdoor construction method using irrelevant pictures as trigger set [28] | Black box | Zero bit | Able to cope with pruning 、 Fine tuning and distillation | |
Use the encoder to generate a picture with blind watermark as the backdoor trigger set [29] | Black box | Zero bit | Able to cope with pruning 、 fine-tuning | |
An invisible watermark is generated in the frequency domain as a backdoor trigger set [30] | Black box | Zero bit | Able to cope with pruning 、 fine-tuning | |
Set up a back door | Choose a group of abstract pictures as the back door , Introduce cryptographic protocols to provide security [32] | Black box | Zero bit | Able to cope with pruning 、 fine-tuning 、 Distillation , Unable to deal with ambiguity attacks |
Add a new category to the original classification task , Map the picture to this category as a back door [33] | Black box | Zero bit | Able to cope with fine-tuning 、 Escape attack | |
Through confrontation training, the network can classify some confrontation samples correctly , Fine tune decision boundaries [36] | Black box | Zero bit | Able to cope with pruning 、 fine-tuning 、 Singular value decomposition attack | |
Generate and optimize trigger sets based on genetic evolutionary algorithm [31] | Black box | Zero bit | Reduced negative positive rate , Able to cope with fine-tuning |
3 Attack methods
For the existing model protection watermarking framework , There are many possible attack strategies . At present, there are mainly removal attacks 、 Escape attack and ambiguity attack . To remove an attack is to fine tune 、 Prune or compress the original watermark in the model ; Escape attack means that the attacker escapes the verification of the watermark by some means in the verification stage of the black box ; Ambiguity attack is to make another illegal watermark appear in the model , To confuse the judgment of the authenticity of the watermark . at present , The attack on backdoor watermarking scheme mainly uses the following characteristics of neural network .
Oblivion : Neural networks are trained , Inevitably, the data used for training will be remembered , If you want the network to forget these data , Just delete these data and retrain [43-44]. Again , For the network with the back door embedded , You can fine tune the network by adding a large number of new samples , Let the network forget the embedded back door , Achieve the purpose of removing the back door .
Unexplainability : The model of deep learning has always been regarded as a black box , Although scholars have been exploring the method of model interpretability , But this task still faces challenges [45,46,47]. When the generalization performance of the model is poor , Under fitting or over fitting will lead to the loss of matching between the characteristics really learned by the model and the importance score . therefore , There are ways to use confrontation samples as backdoors and use the limitations of sample space to implement attacks .
Too parametric : The over parameterization of neural network is also one of the important reasons for the existence of countermeasure samples . for example , The countermeasure sample realizes only modifying one or several pixels of the training image , Make the network output different from the normal classification [48]. The attacker may generate the confrontation samples of the model , And use it as a trigger image , Implement ambiguity attack in watermark verification stage .
Sample space limitations : For the deep neural network model with heavy dependence on data , The training sample space is limited . But for the attacker , The sample space is infinite , Attackers can always find a sample outside the original sample space , Or select some samples from the original samples and assign labels irrelevant to the model [49]. The sample space is shown as chart 3 Shown .
This idea also provides the possibility for ambiguity attack . The specific research work corresponding to some attack strategies is introduced in detail below .
chart 3

chart 3 Sample space diagram Figure 3 Example space of known, unknown, and adversarial examples
3.1 Watermark detection
The goal of watermark detection is to detect whether there is watermark in the model , And what forms of watermarks exist . After detecting the watermark in the system , The attacker can decide which attack strategy to take .Wang etc. [50] Point out in the literature [8] Because of the watermark embedding, the weight distribution of the model is changed , So it's easy to detect . He also showed a general white box watermark detection method based on attribute reasoning [51].Shafieinejad etc. [52] It also shows an attribute reasoning attack , Use some training data and feature vectors extracted from the network , Effectively detect whether there is watermark embedded based on backdoor mechanism in the model . He also proposed a watermarking scheme that is more robust to watermark detection attacks : A counter training method is used to train the target model and the watermark detection network at the same time to obtain the watermark concealment measurement , And make it participate in watermark embedding in the form of regularization parameters [53].
3.2 Remove attack
The most common attack on neural network model watermarking protection system is to try to remove the watermark embedded in the model . Remove attacks by fine tuning [54-55]、 prune [55,54-55]、 prune 56,57] Or distillation [58] The way to achieve . Some scholars have also integrated fine-tuning and pruning strategies at the same time , First prune neurons , Then fine tune the model [55]. If the attacker knows the exact location of the watermark , The watermark of the layer can also be removed by reinitializing the parameters of the layer and retraining [8].
At present, most removal attacks are aimed at the removal of backdoor watermarks [52,54,59-61]. The literature [52] Give an attack on the backdoor watermark 3 Ways of planting , It is pointed out that in the white box scenario, the attacker can remove the watermark by combining the regularization algorithm with fine-tuning . The literature [54] By reasonably setting the initial learning rate and the attenuation parameters of the learning rate , The watermark embedded based on the backdoor mechanism is successfully removed through fine-tuning . Although this fine-tuning can successfully remove the watermark , But a certain amount of labeled training data is needed . In reality, attackers often lack enough labeled training data , So the literature [59] A fine-tuning method using unlabeled data is further proposed . The specific method is to use the prediction results of the pre training model to mark the unlabeled data downloaded from the Internet , And use these data to fine tune the model .
For the watermarking method embedded in the network , The literature [50] in the light of Uchida Etc , A watermark removal algorithm is used to remove the original embedded watermark in the network , At the same time, a new watermark is embedded .
3.3 Escape attack
Ryota etc. [57] It is proposed that the escape attack can be implemented by modifying the query samples . This method is applicable to the back door samples generated by adding disturbances to the original graph . If the system determines that a query sample is a backdoor sample , Then a self encoder is used to remove the interference information covered on the picture , Let the trigger sample change back to a normal sample .
Besides ,Hitaj etc. [4] Two escape attack methods based on theft model are proposed , They are integrated attacks (ensemble attack) And detect attacks (detector attack), Can be implemented in black box mode , No need to access the internal information of the model . The attacker builds the stolen model as DLaaS System , adopt API Provide services for benefits . Integrated attack is to cluster multiple stolen models , For the input classification task, the results are output by voting the inference results of multiple models , It can effectively disturb the mapping results of trigger samples . Detection attack implements escape attack by introducing trigger sample detection mechanism . In the input API Let the sample pass through the detector before , When trigger samples are detected , The system deliberately outputs random tags to disturb the backdoor trigger mechanism , For the normal samples detected , The system uses the stolen model for normal inference .
3.4 Ambiguity attack
The goal of ambiguity attack is to make another illegal watermark appear in the model , Destroy the uniqueness of the watermark . Over Parameterization Based on deep learning model , There can be multiple backdoors in a model at the same time . Attackers can use confrontation samples as additional backdoors , Or embed a new back door in the model through fine-tuning .Guo etc. [62] It is pointed out that the system using author signature as a backdoor embedded model may be attacked by pseudo signature , As for the Yossi etc. [32] The proposal , Attackers can generate another set of abstract pictures through genetic algorithm , Form a new set of special mappings in the model .
The classification of depth model watermarking attack methods is as follows surface 2 Shown . It is worth noting that , Removal attacks based on additional training require the use of a small number of normal samples ( Far less than the number of samples required for the original training ) Remove the watermark under the white box of the model . For alternative models , We need to get enough labeled data to make the substitution model converge under the black box condition, so as to realize the substitution attack . The escape and ambiguity attacks can be completed without using normal samples under black box and white box conditions .
surface 2 Classification of depth model watermarking attack methods Table 2 The classification of DNN watermarking attack methods
Attack category | methods | Attack conditions | An attack watermarking method |
Remove attack | Heavy training [52]、 fine-tuning [54,59,63]、 rewrite [50]、 Distillation [58] | Small amount of data , White box | The white box method embedded in the network and the black box method based on the back door |
Train an alternative model according to the input and output of the known network [52] | Sufficient data , Black box | Black box method based on back door | |
Escape attack | Remove the noise covered on the trigger set [57] | Black box | A backdoor watermarking method of covering a certain trigger mode on the image |
Integrate multiple models with the same function and infer based on the voting mechanism [4] | Black box | Watermarking method for the same functional model | |
Based on the backdoor watermark detector to escape the trigger set [4] | Black box | Backdoor method of covering a certain interference mode on the picture | |
Ambiguity attack | Embed another watermark inside the model | White box | White box method of embedding watermark into network internal parameters |
Insert another back door into the model ( Such as the use of confrontation samples ) | White box | Black box method based on backdoor mechanism |
4 Discuss
In the copyright protection of deep learning model , Models are distributed in different ways , It will cause attackers to adopt different attack strategies , Therefore, when designing the model copyright protection scheme , These factors should be taken into account . In terms of time sequence , First , For model owners , Only white box scene should be considered when embedding watermark , That is, the model owner can use any information related to the model ( The weight 、 Structure etc. ) Auxiliary watermark embedding . thereafter , When the attacker obtains the distributed model , According to the amount of model related knowledge obtained by the attacker, it can be divided into white box and black box , The former assumes that the attacker knows the same model information as the model owner , The latter assumes that the attacker can only observe the relationship between a given sample and its output . Last , During model validation , There are white boxes and black boxes , The former needs to access the internal information of the model , The latter only needs to observe the input-output relationship . The following discusses various possible situations and the possible countermeasures of the model owner and the attacker .
4.1 White box distribution
This situation means that the attacker can know the structure and parameters of the model , You can use more flexible methods when attacking . Usually, attackers steal models because they lack the ability to train models independently , Or attack the watermark protection framework , To achieve illegal use of the model , Or provide similar services to seek benefits . When the model is distributed in white box , The attacker needs to be at the cost of the attack ( Fine tune the model , Heavy training ) And training your own model . Only when the attacker has only limited training data , Or do not have enough computing power to retrain the model , Only the existing watermarking methods have robustness .
be based on Kerckhoffs' Principle, Only when the model structure and the watermarking algorithm are both confidential , The model can be absolutely safe . From the perspective of stricter model security , It should be assumed that the attacker has the same training conditions as the model owner ( Enough training data and computing power ). Under such conditions , The attacker can take the same or different watermark embedding method as the model owner to create ambiguity attacks , Fine tuning can also be used 、 Pruning and other means to destroy or remove the watermark [54-55]. Under this consideration , No matter what verification method is used , The existing copyright verification framework based on watermark is invalid for the model distributed in the form of white box . therefore , Whether black box verification or white box verification , Models distributed in white box mode are difficult to be effectively protected , The black box distribution and black box verification mode is the most likely mode to be used in practical applications , It deserves more research attention .
4.2 Black box distribution
in application , More commonly, companies hire AI Expert design model , Spend a lot of money and manpower training data , Finally, we get a model file . These companies provide machine learning services in two main ways : One is to provide cloud mode API, The other is to deploy the model privately to the user's equipment or the server of the data center [5]. For the former , The attacker passes a certain traversal algorithm , Calling cloud mode repeatedly API After that, a model with the same function as the original model can be restored locally , And provide similar services ; For the latter , Attackers can restore the model for their use or resell the model through security technologies such as reverse . On the whole , Only in the case of black box distribution can the neural network model be effectively protected .
4.2.1 Black box verification
There are two kinds of methods that can be applied to watermark verification in black box scenes : One is model watermarking method based on backdoor mechanism ; The other is Rouhani etc. [18] The proposed watermarking scheme .
When the model is distributed in a black box , These two methods can achieve the purpose of verifying the copyright of the model . For the attacker , Because the internal parameters and structure of the model cannot be accessed , It is impossible to fine tune or retrain the model , This makes it difficult to implement removal attacks and ambiguity attacks . under these circumstances , The model owner can design such as Zhu etc. [40] Watermark embedding method , Build back door mechanisms that are difficult to replicate , So that the attacker cannot effectively take ambiguity attacks . The attacker may carry out an escape attack 、 Ambiguity attack based on backdoor , Or steal multiple models with similar functions for integration [4].
4.2.2 White box verification
When the model is distributed in a black box , The robustness of watermarking method to various attack methods is enhanced , However , When the attacker steals the model with API Interface ( Black box ) When deployed online in the form of , White box verification will not be implemented . Besides , White box verification method is difficult to be used in embedded systems [62]. therefore , White box verification schemes have application limitations .
5 Conclusion
This paper combs and introduces the research of neural network watermarking, which is used to protect the copyright of deep neural network in recent years , The existing methods are classified and discussed , It focuses on the watermark embedding and verification methods for the common multi classification neural network copyright protection in the discriminant model , Including the embedding method inside the model similar to the traditional multimedia watermarking and the method based on the back door peculiar to neural network , The characteristics and advantages and disadvantages of each method are introduced in detail ; A series of existing attack methods against neural network watermarking are further discussed and summarized , Embed from watermark 、 Attack and verify 3 The challenges and possible strategies faced by model owners and attackers in white box and black box scenarios are discussed in the three stages .
Neural network watermarking is still a relatively new research field , There are still many problems worth studying . Future research directions in this field can focus on the following issues .
(1) Improve the robustness of backdoor watermark against ambiguity attacks . The white box verification method cannot be used in embedded systems [62], Nor can it deal with the situation that the attacker deploys the forged model to the far end , There are inherent limitations . therefore , Based on backdoor mechanism , The copyright protection scheme that meets the verification in the black box scenario is the direction worth studying at present . For the embedding method based on backdoor mechanism , The most common attack is to find the abnormal input-output relationship in the model , Thus, a new exception mapping is introduced , That is, implement ambiguity attack . Besides , Due to the limited training samples 、 The unexplainability and over parameterization of deep neural network model , There are a large number of unknown samples and confrontation samples in practical applications , It can be used as a reference sample for ambiguity attacks . therefore , It is necessary to study a backdoor watermarking scheme that is more robust to ambiguity attacks .
(2) Expand the task area of the target model . Most of the current model copyright protection schemes based on backdoor mechanism are aimed at image classification Networks . In addition to the image classification model , Image generation and processing network is also a very valuable model , Scholars have begun to pay attention to the copyright of this kind of model [15,64]. A watermark protection scheme for generating anti network , In addition to verifying model ownership ,GAN Watermarks can be used to trace the source of deeply forged content , When it is misused, it is attributed to the model owner . in the light of GAN Model , Some scholars use image steganography to embed watermarks in all training data , In this way, any image generated by the model carries a watermark [65]. Besides , There is work to apply watermarking technology to natural language processing models [66]、 Speech recognition model [67], And the protection of neural network [68], However, model protection in other areas of in-depth learning is still worthy of further study .
(3) Reversible watermark . Whether it is embedding the watermark into the internal parameters of the model , Or embed the back door in the model , Are irreversible processes . These watermarking techniques can only reduce the impact on the performance of the original network as much as possible , But the model parameters have been permanently changed . Reversible watermarking can recover the original parameters of the model after extracting the watermark , Protect the integrity of the model , For the military 、 Model protection in the field of law is of great significance . At present, there is work to apply digital image reversible watermarking technology to model protection [69], Relevant work still needs further study .
边栏推荐
- What if you need system permission to delete files? You need permission from system to delete the solution
- Google cloud and Oracle cloud are "hot"? It is imperative to deploy cross cloud disaster recovery!
- EXCEL单元格公式-实现阿克曼函数计算
- AWS uses EC2 to reduce the training cost of deep Racer practical operation of deep racer for cloud
- 【学习笔记之菜Dog学C】大厂笔试,就这?
- redis基本类型常用命令
- 低代码服务商ClickPaaS与毕普科技完成战略合并,共同打造工业数字化底座
- 动作活体检测能力,构建安全可靠的支付级“刷脸”体验
- 怎样删除c盘非系统文件 c盘爆红了可以删除的文件汇总
- 谷歌云和甲骨文云“热崩了”?部署跨云容灾势在必行!
猜你喜欢
随机推荐
基于网络防御知识图谱的0day攻击路径预测方法
Computer CMD reset network settings CMD command to reset the network
Iterators and generators in JS (detailed explanation)
Datagrip tutorial (GIF version)
[matlab project practice] analysis of spatial and temporal characteristics of drought in a region based on SPI index
编写一个具有搜索提示的搜索框
What key should I press to release the computer from sleep mode when the computer is standby
[matlab project practice] sine sweep (sine sweep signal)
How to open the tutorial of administrator permission setting for computer administrator permission
Why is the computer screen yellowing? What is the reason for the yellowing of the monitor screen and the troubleshooting method
Cloudwego's design practice in the background platform of flybook management
【计网实验报告】Cisco局域网模拟组建、简单网络测试
常见的跨域问题
接口-Fiddler-简介与安装
Unity 笔记——Addressables的使用
电脑提示内存不足怎么办 电脑C盘不够用的解决办法
What if the software downloaded from the computer is not displayed on the desktop? Solve the problem that the installed software is not on the desktop
企业生产线改善毕业论文【Flexsim仿真实例】
对线程池的了解与应用你掌握多少
LUR缓存算法









