当前位置:网站首页>Learn NLP with Transformer (Chapter 5)
Learn NLP with Transformer (Chapter 5)
2022-07-25 11:09:00 【Small black board】
BERT Code
Task05 BERT Code
This study refers to Datawhale Open source learning :https://github.com/datawhalechina/learn-nlp-with-transformers
The content is generally derived from the original , Adjust your learning ideas .
Personal summary : One 、HuggingFace It's done Bert Model , This project has also developed into a larger open source community . Two 、Bert Include BertTokenizer、BertModel Two parts . among BertTokenizer It's a word breaker ;BertModel It is the model ontology , Include :BertEmbeddings、BertEncoder、BertPooler. 3、 ... and 、HuggingFace Realized Bert In the model , Using a variety of technologies to save video memory .
This chapter will not show the specific code , Just introduce the input and output of each parameter and each module . The specific code is based on github in HuggingFace/Transformers To study .HuggingFace Is a New York based chat robot start-up service , Caught... Very early BERT The signal of big trend and start to realize based on pytorch Of BERT Model . The project was originally called pytorch-pretrained-bert, While reproducing the original effect , An easy-to-use method is provided to facilitate all kinds of play and research based on this powerful model .
As the number of users increases , This project has also developed into a larger open source community , Combined various pre training language models and added Tensorflow The implementation of the , And in 2019 In the second half of the year, it was renamed Transformers.
The main content of this chapter

It mainly includes :
- BERT Tokenization Participle model (BertTokenizer)
- BERT Model Ontology model (BertModel)
- BertEmbeddings
- BertEncoder
- BertLayer
- BertAttention
- BertIntermediate
- BertOutput
- BertLayer
- BertPooler
5. BERT Code
5.1 Tokenization participle -BertTokenizer
and BERT Relevant Tokenizer Mainly written in githubBertTokenizer in .
BertTokenizer Is based on BasicTokenizer and WordPieceTokenizer The participator of :
- BasicTokenizer Responsible for the first step of processing —— By punctuation 、 Space, etc , And whether to unify lowercase , And clean up illegal characters .
- For Chinese characters , By pretreatment ( Add space ) To split words ;
- At the same time through never_split Specifies that some words are not split ;
- This step is optional ( Default execution ).
- WordPieceTokenizer On the basis of words , Further decompose the word into sub words (subword).
- subword Be situated between char and word Between , Both retain the meaning of the word to a certain extent , And can take care of English singular and plural 、 The explosion of thesaurus and the of unlisted words caused by tense OOV(Out-Of-Vocabulary) problem , Separate the root from the tense affix , This reduces the vocabulary , It also reduces the difficulty of training ;
- for example ,tokenizer The word can be broken down into “token” and “##izer” Two parts , Pay attention to the last word “##” After the previous word .
BertTokenizer There are the following common methods :
- from_pretrained: From the containing thesaurus file (vocab.txt) Initialize a word breaker in the directory of ;
- tokenize: Put text ( Words or sentences ) Break down into a list of subwords ;
- convert_tokens_to_ids: Convert the list of subwords into the list of subscripts corresponding to subwords ;
- convert_ids_to_tokens : Contrary to the previous one ;
- convert_tokens_to_string: take subword List by “##” Splice back words or sentences ;
- encode: For single sentence input , Break down words and add special words to form “[CLS], x, [SEP]” And convert it into a list of subscripts corresponding to the Thesaurus ; For two sentences, enter ( For multiple sentences, only the first two ), Break down words and add special words to form “[CLS], x1, [SEP], x2, [SEP]” And convert it into a subscript list ;
- decode: Can be encode The output of the method becomes a complete sentence .
as well as , Methods of the class itself :
Examples of participle :
bt = BertTokenizer.from_pretrained('bert-base-uncased')
bt('I like natural language progressing!')
{'input_ids': [101, 1045, 2066, 3019, 2653, 27673, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}
5.2 Model-BertModel
and BERT The code related to the model is mainly written in githubmodeling_bert in , It contains BERT The basic structure of the model and the fine-tuning model based on it .
BertModel Mainly for transformer encoder structure , It consists of three parts :
- embeddings, namely BertEmbeddings Class , Get the corresponding vector representation according to the word symbol ;
- encoder, namely BertEncoder Class ;
- pooler, namely BertPooler Class , This part is optional .
BertModel The meaning and return value of each parameter in the forward propagation process of the whole :
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
head_mask=None,
inputs_embeds=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
past_key_values=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
): ...
- input_ids: after tokenizer After the participle subword Corresponding subscript list ;
- attention_mask: stay self-attention In the process , This piece of mask Used to mark subword The sentence and padding The difference between , take padding Partially filled with 0;
- token_type_ids: Mark subword The current sentence ( The first sentence / The second sentence / padding);
- position_ids: Mark the position of the sentence where the current word is located ;
- head_mask: Used to invalidate some attention calculations of some layers ;
- inputs_embeds: If provided , Then there is no need for input_ids, Across embedding lookup The process acts directly as Embedding Get into Encoder Calculation ;
- encoder_hidden_states: This part is in BertModel Configure to decoder It works , Will perform cross-attention instead of self-attention;
- encoder_attention_mask: ditto , stay cross-attention Used to mark encoder It's the end input padding;
- past_key_values: This parameter seems to be pre calculated K-V The product is passed into , To reduce cross-attention The cost of ( Because originally this part was double counting );
- use_cache: The last parameter will be saved and returned , Speed up decoding;
- output_attentions: Whether to return to... Of each middle layer attention Output ;
- output_hidden_states: Whether to return the output of each intermediate layer ;
- return_dict: Whether the form of key value pair (ModelOutput class , It can also be used as tuple use ) Return output , Default to true .
Be careful , there head_mask Ineffective calculation of attention , Different from the attention head pruning mentioned below , And just multiply the calculation result of some attention by this coefficient .
The output part is as follows :
# BertModel Forward propagation return part of
if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPoolingAndCrossAttentions(
last_hidden_state=sequence_output,
pooler_output=pooled_output,
past_key_values=encoder_outputs.past_key_values,
hidden_states=encoder_outputs.hidden_states,
attentions=encoder_outputs.attentions,
cross_attentions=encoder_outputs.cross_attentions,
)
It can be seen that , The return value contains not only encoder and pooler Output , It also contains other parts of the specified output (hidden_states and attention etc. , This part is in encoder_outputs[1:]) Easy to access :
# BertEncoder Forward propagation return part of , It's the one above encoder_outputs
if not return_dict:
return tuple(
v
for v in [
hidden_states,
next_decoder_cache,
all_hidden_states,
all_self_attentions,
all_cross_attentions,
]
if v is not None
)
return BaseModelOutputWithPastAndCrossAttentions(
last_hidden_state=hidden_states,
past_key_values=next_decoder_cache,
hidden_states=all_hidden_states,
attentions=all_self_attentions,
cross_attentions=all_cross_attentions,
)
Besides ,BertModel There are also the following methods , convenient BERT Players perform various operations :
- get_input_embeddings: extract embedding Medium word_embeddings That is, the word vector part ;
- set_input_embeddings: by embedding Medium word_embeddings assignment ;
- _prune_heads: Provides a function to prune the attention head , Input is {layer_num: list of heads to prune in this layer} Dictionary , You can prune some attention heads of a specified layer .
Pruning is a complex operation , You need to keep your attention on the head Wq、Kq、Vq And after splicing, the weight of the fully connected part is copied to a new smaller weight matrix ( Be careful not to grad Copy again ), And record the cut head in real time to prevent subscript error . Specific reference BertAttention Part of the prune_heads Method
5.2.1 BertEmbeddings
It consists of three parts, which are summed to get :
- word_embeddings, In this paper subword Corresponding embedded .
- token_type_embeddings, Used to indicate the sentence in which the current word is located , Assist in distinguishing sentences from padding、 The difference between sentence pairs .
- position_embeddings, The position of each word in the sentence is embedded , Used to distinguish the order of words . and transformer The design in the paper is different , This one is trained , Not through Sinusoidal Function to calculate the fixed embedding . It is generally believed that this implementation is not conducive to expansibility ( It is difficult to transfer directly to longer sentences ).
Three embedding Add without weight , And through a layer LayerNorm+dropout Post output , Its size is (batch_size, sequence_length, hidden_size).
5.2.2 BertEncoder
Contains layers BertLayer, There is no special need to explain this piece itself , But there is one detail worth referring to : utilize gradient checkpointing Technology to reduce the memory occupation during training .
gradient checkpointing Gradient checkpoint , The space occupied by the model is compressed by reducing the saved calculation graph nodes , But when calculating the gradient, you need to recalculate the values that are not stored , Reference paper 《Training Deep Nets with Sublinear Memory Cost》, The process is shown in the following diagram 
stay BertEncoder in ,gradient checkpoint It's through torch.utils.checkpoint.checkpoint Realized , Easy to use , You can refer to the documentation :torch.utils.checkpoint - PyTorch 1.8.1 documentation.
5.2.2.1 BertAttention
self Members are the realization of multi head attention , and output Member implementation attention Full connection after +dropout+residual+LayerNorm A series of operations .
class BertAttention(nn.Module):
def __init__(self, config):
super().__init__()
self.self = BertSelfAttention(config)
self.output = BertSelfOutput(config)
self.pruned_heads = set()
First, go back to this floor . Here comes the pruning operation mentioned above , namely prune_heads Method :
The specific implementation here is summarized as follows :
find_pruneable_heads_and_indicesIt needs to be cut off head, And the dimension subscripts that need to be retained index;prune_linear_layerIs responsible for Wk/Wq/Wv Weight matrices ( together with bias) According to the in index Keep the dimension that has not been pruned and transfer to the new matrix .
Next comes the main play ——Self-Attention The concrete realization of .
5.2.2.1.1 BertSelfAttention
This can be said to be the core area of the model , The formula is also the only place involved , So a lot of code will be posted .
Initialization part :
class BertSelfAttention(nn.Module):
def __init__(self, config):
super().__init__()
if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
raise ValueError(
"The hidden size (%d) is not a multiple of the number of attention "
"heads (%d)" % (config.hidden_size, config.num_attention_heads)
)
self.num_attention_heads = config.num_attention_heads
self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
self.all_head_size = self.num_attention_heads * self.attention_head_size
self.query = nn.Linear(config.hidden_size, self.all_head_size)
self.key = nn.Linear(config.hidden_size, self.all_head_size)
self.value = nn.Linear(config.hidden_size, self.all_head_size)
self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
self.max_position_embeddings = config.max_position_embeddings
self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)
self.is_decoder = config.is_decoder
Get rid of the familiar query、key、value Three weights and one dropout, Here's another mystery position_embedding_type, as well as decoder Mark ;
Be careful ,hidden_size and all_head_size It was the same at the beginning . As for why it seems unnecessary to set this variable —— Obviously because of the pruning function above , Cut off a few attention head in the future all_head_size Naturally small ;
hidden_size Must be num_attention_heads Integer multiple , With bert-base For example , Every attention contain 12 individual head,hidden_size yes 768, So each head Size is attention_head_size=768/12=64;
position_embedding_type What is it? ? Just keep looking down .
Then the point is , That is, the forward propagation process .
First, let's review multi-head self-attention Basic formula of :
M H A ( Q , K , V ) = C o n c a t ( h e a d 1 , . . . , h e a d h ) W O MHA(Q, K, V) = Concat(head_1, ..., head_h)W^O MHA(Q,K,V)=Concat(head1,...,headh)WO
h e a d i = S D P A ( Q W i Q , K W i K , V W i V ) head_i = SDPA(QW_i^Q, KW_i^K, VW_i^V) headi=SDPA(QWiQ,KWiK,VWiV)
S D P A ( Q , K , V ) = s o f t m a x ( Q K T ( d k ) ) V SDPA(Q, K, V) = softmax(\frac{QK^T}{\sqrt(d_k)})V SDPA(Q,K,V)=softmax((dk)QKT)V
And these attention heads , It is well known that parallel computing , So the top query、key、value The three weights are unique —— This is not all heads Shared weight , It is “ Splicing ” up .
have a look forward Method :
def transpose_for_scores(self, x):
new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
x = x.view(*new_x_shape)
return x.permute(0, 2, 1, 3)
def forward(
self,
hidden_states,
attention_mask=None,
head_mask=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
past_key_value=None,
output_attentions=False,
):
mixed_query_layer = self.query(hidden_states)
# Omit a part of cross-attention The calculation of
key_layer = self.transpose_for_scores(self.key(hidden_states))
value_layer = self.transpose_for_scores(self.value(hidden_states))
query_layer = self.transpose_for_scores(mixed_query_layer)
# Take the dot product between "query" and "key" to get the raw attention scores.
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
# ...
there transpose_for_scores Used to handle hidden_size Split into the shape of multiple head outputs , And transpose the middle two dimensions to multiply the matrix ;
here key_layer/value_layer/query_layer The shape of is :(batch_size, num_attention_heads, sequence_length, attention_head_size);
here attention_scores The shape of is :(batch_size, num_attention_heads, sequence_length, sequence_length), Conform to the results obtained by calculating multiple heads separately attention map shape .
Here we realize K And Q Multiply , get raw attention scores Part of , According to the formula, the next step should be to press d k d_k dk Conduct scaling And do softmax The operation of . However, what first appeared in front of us was a strange positional_embedding, And a bunch of Einstein sums :
# ...
if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
seq_length = hidden_states.size()[1]
position_ids_l = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(-1, 1)
position_ids_r = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(1, -1)
distance = position_ids_l - position_ids_r
positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
positional_embedding = positional_embedding.to(dtype=query_layer.dtype) # fp16 compatibility
if self.position_embedding_type == "relative_key":
relative_position_scores = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
attention_scores = attention_scores + relative_position_scores
elif self.position_embedding_type == "relative_key_query":
relative_position_scores_query = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
relative_position_scores_key = torch.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key
# ...
For different positional_embedding_type, There are three operations :
- absolute: The default value is , You don't have to deal with this part ;
- relative_key: Yes key_layer To deal with , Compare it with the positional_embedding and key Multiply matrices as key Relevant location codes ;
- relative_key_query: Yes key and value Are multiplied by phase as position coding .
Back to normal attention The process of :
# ...
attention_scores = attention_scores / math.sqrt(self.attention_head_size)
if attention_mask is not None:
# Apply the attention mask is (precomputed for all layers in BertModel forward() function)
attention_scores = attention_scores + attention_mask # Why is this + instead of *?
# Normalize the attention scores to probabilities.
attention_probs = nn.Softmax(dim=-1)(attention_scores)
# This is actually dropping out entire tokens to attend to, which might
# seem a bit unusual, but is taken from the original Transformer paper.
attention_probs = self.dropout(attention_probs)
# Mask heads if we want to
if head_mask is not None:
attention_probs = attention_probs * head_mask
context_layer = torch.matmul(attention_probs, value_layer)
context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
context_layer = context_layer.view(*new_context_layer_shape)
outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)
# Omit decoder Return value part ……
return outputs
there attention_scores = attention_scores + attention_mask What is it doing ? Shouldn't it be by mask Do you ?
- Because of the attention_mask already 【 Passive hands and feet 】, Will be originally 1 The part of that becomes 0, And originally 0 Part of ( namely padding) Become a large negative number , This adds up to a large negative value :
- As for why to use 【 A large negative number 】? Because in this way softmax After operation, this item will become close to 0 Decimals of .
(Pdb) attention_mask
tensor([[[[ -0., -0., -0., ..., -10000., -10000., -10000.]]],
[[[ -0., -0., -0., ..., -10000., -10000., -10000.]]],
[[[ -0., -0., -0., ..., -10000., -10000., -10000.]]],
...,
[[[ -0., -0., -0., ..., -10000., -10000., -10000.]]],
[[[ -0., -0., -0., ..., -10000., -10000., -10000.]]],
[[[ -0., -0., -0., ..., -10000., -10000., -10000.]]]],
device='cuda:0')
that , Where did this step take place ?
stay modeling_bert.py There is no answer in , But in modeling_utils.py Found a special class in :class ModuleUtilsMixin, In its get_extended_attention_mask A clue was found in the method :
def get_extended_attention_mask(self, attention_mask: Tensor, input_shape: Tuple[int], device: device) -> Tensor:
"""
Makes broadcastable attention and causal masks so that future and masked tokens are ignored.
Arguments:
attention_mask (:obj:`torch.Tensor`):
Mask with ones indicating tokens to attend to, zeros for tokens to ignore.
input_shape (:obj:`Tuple[int]`):
The shape of the input to the model.
device: (:obj:`torch.device`):
The device of the input to the model.
Returns:
:obj:`torch.Tensor` The extended attention mask, with a the same dtype as :obj:`attention_mask.dtype`.
"""
# Omit a part of ……
# Since attention_mask is 1.0 for positions we want to attend and 0.0 for
# masked positions, this operation will create a tensor which is 0.0 for
# positions we want to attend and -10000.0 for masked positions.
# Since we are adding it to the raw scores before the softmax, this is
# effectively the same as removing these entirely.
extended_attention_mask = extended_attention_mask.to(dtype=self.dtype) # fp16 compatibility
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
return extended_attention_mask
that , When was this function called ? and BertModel What does it matter ?
OK, This involves BertModel The details of inheritance :BertModel Inherited from BertPreTrainedModel, The latter is inherited from PreTrainedModel, and PreTrainedModel Inherited from [nn.Module, ModuleUtilsMixin, GenerationMixin] Three base classes .—— What a complex package !
That means ,BertModel Must be in the middle of some step to the original attention_mask Called get_extended_attention_mask, Lead to attention_mask From the original [1, 0] Turn into [0, -1e4] The value of .
In the end in BertModel This call was found during the forward propagation of ( The first 944 That's ok ):
# We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
# ourselves in which case we just need to make it broadcastable to all heads.
extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
Problem solved : This method not only realizes the change mask Value , And broadcast it (broadcast) To be able to communicate directly with attention map Additive shape .
It's you ,HuggingFace.
besides , Notable details are :
- Scale according to the dimension of each head , about bert-base Namely 64 The square root of is 8;
- attention_probs Not only did he do it softmax, Also used once dropout, This is a worry attention Is the matrix too dense …… It's also mentioned here that it's unusual , But primitive Transformer That's what the paper does ;
- head_mask It's the long head calculation mentioned earlier mask, If not set, the default is all 1, It won't work here ;
- context_layer namely attention Matrix and value The product of matrices , The original size is :(batch_size, num_attention_heads, sequence_length, attention_head_size) ;
- context_layer Transpose and view After operation , The shape is restored (batch_size, sequence_length, hidden_size).
5.2.2.1.2 BertSelfOutput
class BertSelfOutput(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
def forward(self, hidden_states, input_tensor):
hidden_states = self.dense(hidden_states)
hidden_states = self.dropout(hidden_states)
hidden_states = self.LayerNorm(hidden_states + input_tensor)
return hidden_states
Here comes... Again LayerNorm and Dropout The combination of , It's just here first Dropout, Connect the residuals before LayerNorm. As for why to make residual connection , The most direct purpose is to reduce the training difficulty caused by too deep network layers , More sensitive to raw input .
5.2.2.2 BertIntermediate
It's over BertAttention, stay Attention There is also a full connection behind + Active operation :
class BertIntermediate(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
if isinstance(config.hidden_act, str):
self.intermediate_act_fn = ACT2FN[config.hidden_act]
else:
self.intermediate_act_fn = config.hidden_act
def forward(self, hidden_states):
hidden_states = self.dense(hidden_states)
hidden_states = self.intermediate_act_fn(hidden_states)
return hidden_states
- The full connection here is an extension , With bert-base For example , The extended dimension is 3072, It's the original dimension 768 Of 4 Twice as many ;
- The default implementation of the activation function here is gelu(Gaussian Error Linerar Units(GELUS) Of course , It cannot be calculated directly , You can use a that contains tanh Approximate the expression of ( A little ).
5.2.2.3 BertOutput
Here is another full connection +dropout+LayerNorm, There is also a residual connection residual connect:
class BertOutput(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
def forward(self, hidden_states, input_tensor):
hidden_states = self.dense(hidden_states)
hidden_states = self.dropout(hidden_states)
hidden_states = self.LayerNorm(hidden_states + input_tensor)
return hidden_states
Here's the operation and BertSelfOutput It doesn't matter , It's as like as two peas. …… Two components that are very confusing .
The following also contains information based on BERT Application model of , as well as BERT Relevant optimizers and usage , It will be introduced in detail in the next article .
5.2.3 BertPooler
This layer simply takes out the first... Of the sentence token, namely [CLS] The corresponding vector , Then output after passing through a full connection layer and an activation function :( This part is optional , because pooling There are many different operations )
class BertPooler(nn.Module):
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.activation = nn.Tanh()
def forward(self, hidden_states):
# We "pool" the model by simply taking the hidden state corresponding
# to the first token.
first_token_tensor = hidden_states[:, 0]
pooled_output = self.dense(first_token_tensor)
pooled_output = self.activation(pooled_output)
return pooled_output
边栏推荐
- Acquisition and compilation of UE4 source code
- 云原生IDE:iVX免费的首个通用无代码开发平台
- Flask框架——消息闪现
- Esp8266 uses drv8833 drive board to drive N20 motor
- Flask框架——Flask-WTF表单:数据验证、CSRF保护
- Flask framework - session and cookies
- Learning Weekly - total issue 63 - an open source local code snippet management tool
- 异步Servlet在转转图片服务的实践
- HCIA实验(08)
- HCIP实验(04)
猜你喜欢
随机推荐
API supplement of JDBC
Learn NLP with Transformer (Chapter 3)
Understand the life cycle and route jump of small programs
ESP8266 使用 DRV8833驱动板驱动N20电机
HCIP实验(02)
The practice of asynchronous servlet in image service
The idea has been perfectly verified again! The interest rate hike is approaching, and the trend is clear. Take advantage of this wave of market!
[flask advanced] deeply understand the application context and request context of flask from the source code
【Servlet】请求的解析
UE4 window control (maximize minimize)
2021 CEC written examination summary
Redis usage scenario
我,AI博士生,在线众筹研究主题
Flask框架——Flask-WTF表单:数据验证、CSRF保护
Flask framework - Message flash
从开源的视角,解析SAP经典ERP “三十年不用变”的架构设计
Dataset and dataloader data loading
How to optimize the performance when the interface traffic increases suddenly?
Basic experiment of microwave technology - Filter Design
Guys, flick CDC table API, Mysql to MySQL, an application that can









