当前位置:网站首页>【AI4Code】《Pythia: AI-assisted Code Completion System》(KDD 2019)
【AI4Code】《Pythia: AI-assisted Code Completion System》(KDD 2019)
2022-07-25 13:08:00 【chad_ lee】
Code completion
Complement attribute / Method , Recommend in a given set item, The easiest way is Alphabetical order , The disadvantage is that the time for the user to pull down the menu may be longer than the time for directly typing the code . Users can type more prefixes to help complete .

Model based code completion
- Based on abstract syntax tree (AST)——Pythia etc.
- Based on code text ——Deep TabNine 、Galois etc.
data :AST And code text
AST It is an abstract representation of the syntax structure of the source code . It represents the syntax structure of programming language in the form of tree , Each node in the tree represents a structure in the source code . The reason why grammar is “ abstract ” Of , It's because the grammar here doesn't represent every detail in the real grammar . such as , Nested parentheses are implied in the structure of the tree , Not in the form of nodes ; And it's like if-condition-then Such conditional jump statements , You can use a node with three branches to represent .

One is to parse the code into an abstract syntax tree (AST), Each node contains two attributes :type and value, So each node needs two embedding. Then use depth first traversal to AST Each node of flatten In sequence .
One is to directly process the code into text , Include spaces 、 A newline 、 Indent, etc .
Pythia(KDD’19)
Pythia Collected Github On Stars front 2700 individual Python Project code , It includes 1600 m Method call As training data .

The task is to give a length of T T T Code snippet of C C C , Each of them token by c t c_t ct, And a special token “.”, forecast token m ∗ m^{*} m∗. So this task is to give a sequence , According to the characterization of this sequence, predict a token, It's very suitable for LSTM:
x t = L c t h t = f ( x t , h t − 1 ) P ( m ∣ C ) = y t = softmax ( W h t + b ) m ∗ = argmax ( P ( m ∣ C ) ) \begin{aligned} x_{t} &=L c_{t} \\ h_{t} &=f\left(x_{t}, h_{t-1}\right) \\ P(m \mid C) &=y_{t}=\operatorname{softmax}\left(W h_{t}+b\right) \\ m^{*} &=\operatorname{argmax}(P(m \mid C)) \end{aligned} xthtP(m∣C)m∗=Lct=f(xt,ht−1)=yt=softmax(Wht+b)=argmax(P(m∣C))
That is to say LSTM The output of is followed by a classifier . It's also used here tying embedding,LSTM The output of goes through a linear layer , Directly and in the candidate set token Of embedding Do inner product , Then do the result of inner product softmax.
Pythia Have done as VSCode A plug-in for :

therefore Code completion task and session-based The recommended tasks and methods are the same , However, the candidate set of code completion task is smaller .
DeepTabNine and Galois
This kind of method and Pythia similar , But the data format and model are different from Pythia Different , Code text is used on input data , The model uses GPT( Only Transformer Of Decoder, One less layer Attention) instead of LSTM:

But these two methods are paid plug-ins , There are no open source technical details and papers .
边栏推荐
- 【CSDN 年终总结】结束与开始,一直在路上—— “1+1=王”的2021总结
- The larger the convolution kernel, the stronger the performance? An interpretation of replknet model
- Emqx cloud update: more parameters are added to log analysis, which makes monitoring, operation and maintenance easier
- OAuth, JWT, oidc, you mess me up
- CONDA common commands: install, update, create, activate, close, view, uninstall, delete, clean, rename, change source, problem
- Mid 2022 review | latest progress of large model technology Lanzhou Technology
- Use of Spirng @conditional conditional conditional annotation
- 【AI4Code】《CodeBERT: A Pre-Trained Model for Programming and Natural Languages》 EMNLP 2020
- 简单了解流
- R语言GLM广义线性模型:逻辑回归、泊松回归拟合小鼠临床试验数据(剂量和反应)示例和自测题
猜你喜欢

Microsoft proposed CodeT: a new SOTA for code generation, with 20 points of performance improvement

Mid 2022 review | latest progress of large model technology Lanzhou Technology

网络空间安全 渗透攻防9(PKI)
![[300 opencv routines] 239. accurate positioning of Harris corner detection (cornersubpix)](/img/a6/c45a504722f5fd6e3c9fb8e51c6bb5.png)
[300 opencv routines] 239. accurate positioning of Harris corner detection (cornersubpix)

Business visualization - make your flowchart'run'(3. Branch selection & cross language distributed operation node)

Shell common script: get the IP address of the network card

零基础学习CANoe Panel(15)—— 文本输出(CAPL Output View )
![[ai4code final chapter] alphacode: competition level code generation with alphacode (deepmind)](/img/05/86eed30a7c063beace400a005e4a4c.png)
[ai4code final chapter] alphacode: competition level code generation with alphacode (deepmind)

ESP32-C3 基于Arduino框架下Blinker点灯控制10路开关或继电器组
![Detailed explanation of the training and prediction process of deep learning [taking lenet model and cifar10 data set as examples]](/img/70/2b5130be16d7699ef7db58d9065253.png)
Detailed explanation of the training and prediction process of deep learning [taking lenet model and cifar10 data set as examples]
随机推荐
R language uses wilcox The test function performs Wilcox signed rank test to obtain the confidence interval of the population median (the default output result includes the confidence interval of 95%
Redis可视化工具RDM安装包分享
CONDA common commands: install, update, create, activate, close, view, uninstall, delete, clean, rename, change source, problem
[Video] visual interpretation of Markov chain principle and Mrs example of R language region conversion | data sharing
Deep learning MEMC framing paper list
[today in history] July 25: IBM obtained the first patent; Verizon acquires Yahoo; Amazon releases fire phone
[机器学习] 实验笔记 – 表情识别(emotion recognition)
录制和剪辑视频,如何解决占用空间过大的问题?
Force deduction 83 biweekly T4 6131. The shortest dice sequence impossible to get, 303 weeks T4 6127. The number of high-quality pairs
JS 中根据数组内元素的属性进行排序
【CSDN 年终总结】结束与开始,一直在路上—— “1+1=王”的2021总结
"Wei Lai Cup" 2022 Niuke summer multi school training camp 2 supplementary problem solution (g, J, K, l)
Simple understanding of flow
7行代码让B站崩溃3小时,竟因“一个诡计多端的0”
clickhouse笔记03-- Grafana 接入ClickHouse
State mode
Mlx90640 infrared thermal imager temperature sensor module development notes (V)
I want to ask whether DMS has the function of regularly backing up a database?
Microsoft proposed CodeT: a new SOTA for code generation, with 20 points of performance improvement
Handwriting a blog platform ~ first day