当前位置:网站首页>30 SCM common problems and solutions!
30 SCM common problems and solutions!
2022-06-27 05:43:00 【SunMicro soc】
One 、 Problem recurrence
Stable recurrence of the problem can correctly locate the problem 、 Solve and verify . Generally speaking , The easier it is to reproduce the problem, the easier it is to solve .
1.1 Simulated recurrence conditions
Some problems exist under specific conditions , You only need to simulate the problem conditions to reproduce . For conditions that rely on external input , If the conditions are complex and difficult to simulate, you can consider entering the corresponding state directly by default in the program .
1.2 Increase the frequency of relevant tasks
For example, if an exception occurs only after a task runs for a long time, the execution frequency of the task can be increased .
1.3 Increase the test sample size
An exception occurred after the program ran for a long time , It's hard to repeat the problem , The test environment can be built, and multiple sets of equipment can be tested at the same time .
Two 、 Problem location
Narrow down the scope of investigation , Confirm the task of introducing the problem 、 function 、 sentence .
2.1 Print LOG
According to the phenomenon of the problem , Add... To the code in question LOG Output , To track the program execution process and the value of key variables , Observe whether it is consistent with expectations .
2.2 Online debugging
Online debugging can play a role in and printing LOG Similar effects , In addition, this method is particularly suitable for troubleshooting program crash classes BUG, When the program falls into an abnormal interrupt (HardFault, Watchdog interrupt, etc ) You can directly STOP see call stack And the value of the kernel register , Quickly locate problem points .
2.3 Version rollback
When using the version management tool, you can locate the version that introduces the problem for the first time by constantly backing back the version and testing and verification , After that, you can check the code added and modified in this version .
2.4 Dichotomous notes
The second note is Comment out part of the code in a way similar to binary search , To determine whether the problem is caused by the commented out part of the code .
The specific method is to comment out half of the code irrelevant to the problem , See if the problem is solved , If it is not solved, note the other half , If the problem is solved, continue to narrow the scope of the comment in half , And so on, gradually narrow the scope of the problem .
2.5 Save kernel register snapshot
Cortex M When the kernel gets into an abnormal interrupt, it will push the values of several kernel registers onto the stack , Here's the picture :

Uploading … Re upload cancel
We can write the value of the kernel register on the stack to... When we fall into an abnormal interrupt RAM In the area where the default value is retained after a period of reset , Perform the reset operation and then start from RAM Read out and analyze this information , adopt PC、LR Confirm the function executed at that time , adopt R0-R3 Analyze whether the variables processed at that time are abnormal , adopt SP Analyze whether there may be stack overflow, etc .
3、 ... and 、 Problem analysis and treatment
Analyze the cause of the problem by combining the problem phenomenon and the location of the problem code .
3.1 The program continues to run
3.1.1 The value is abnormal
3.1.1.1 Software problems
1、 An array
Subscript exceeds array length when writing array , The corresponding address content is modified . as follows :

Such problems usually need to be combined with map Document analysis , adopt map The file observes the array near the address of the tampered variable , Check whether there is unsafe code as shown in the figure above when writing to the array , Change it to safe code .
2、 Stack overflow
| 0x20001ff8 | g_val |
|---|---|
| 0x20002000 | At the bottom of the stack |
| ………… | Stack space |
| 0x20002200 | To the top of the stack |
Pictured above , Such problems also need to be combined with map Document analysis . Suppose the stack grows from high address to low address , If stack overflow occurs , be g_val The value of is overwritten by the value on the stack .
In case of stack overflow, analyze the maximum usage of the stack , Too many function call layers , Interrupt the function call in the service function , Large temporary variables declared inside the function may lead to stack overflow .
There are the following ways to solve such problems :
Memory resources should be allocated reasonably in the design stage , Set the appropriate size for the stack ;
Add... To the larger temporary variable in the function ”static” Keywords are converted to static variables , Or use malloc() Dynamic allocation , Put it on the pile ;
Change the method of function call , Reduce the number of call layers .
3、 Judge whether the sentence condition is written incorrectly

Judging the condition of a statement is easy to put the equality operator “==” Write as assignment operator “=” Causes the value of the variable to be judged to be changed , This kind of error will not be reported at compile time and always returns true .
It is recommended to write the variable to be judged to the right of the operator , In this way, an error will be reported during compilation when it is written as an assignment operator . You can also use some static code checking tools to find such problems .
4、 Synchronization problem
For example, when operating a queue , An interruption occurred during the execution of the out of queue operation ( Task switching ), And interrupt ( Task after switching ) The queue structure may be damaged if the queue entry operation is performed in , For this kind of situation, you should turn off the interrupt during operation ( Use mutex to synchronize ).
5、 optimization problem

Uploading … Re upload cancel
As shown in the above figure, the program , The original intention is to wait irq No more execution after interruption foo() function , But after being optimized by the compiler , During actual operation flg May be loaded into a register and determine the value in the register each time without re starting from ram Read in flg Value , Cause even irq Interruption occurs foo() Has been running , Here we need to be in flg Add... Before your statement “volatile” keyword , Force every time from ram get flg Value .
3.1.1.2 Hardware problem
1、 chip BUG
The chip itself exists BUG, In some specific cases, it returns an incorrect value to the MCU , The program needs to judge the read back value , Filter outliers .
2、 Communication timing error

For example, power management chip Isl78600, Let's say two pieces are cascading , When the voltage sampling data of two chips are read at the same time , The high-end chip will transmit data to the low-end chip through the daisy chain in a fixed cycle , There is only one buffer on the low-end chip .
If the MCU does not read the data on the low-end chip within the specified time, the new data will overwrite the current data when it comes , Cause data loss . Such problems require careful analysis of the data book of the chip , Strictly meet the timing requirements of chip communication .
3.1.2 Abnormal action
3.1.2.1 Software problems
1、 Design problems
There are errors or omissions in the design , Design documents need to be reviewed again .
2、 The implementation is inconsistent with the design
The implementation of the code is inconsistent with the design document. It is necessary to add unit tests to cover all conditional branches , Cross code review.
3、 State variable exception
For example, the variable recording the current state of the state machine is tampered with , The method of analyzing this kind of problem is the same as the numerical anomaly part above .
3.1.2.2 Hardware problem
1、 Hardware failure
The goal is IC invalid , Do not act after receiving the control command , Need to check the hardware .
2、 Abnormal communication
And target IC Communication error , Unable to execute control command correctly , You need to use an oscilloscope or logic analyzer to observe the communication sequence , Analyze whether the signal sent is wrong or subject to external interference .
3.2 Program crash
3.2.1 Stop running
3.2.1.1 Software problems
1、HardFault
The following conditions can cause HardFault:
Operate the register of the peripheral when the peripheral clock gate is not enabled ;
Jump function address is out of bounds , It usually happens when the function pointer is tampered with , The troubleshooting method is the same as that for abnormal values ;
Alignment problem when dereferencing pointer :
Take the small end sequence as an example , If we declare a forcibly aligned structure, it is as follows :

| Address | 0x00000000 | 0x00000001 | 0x00000002 | 0x00000003 |
|---|---|---|---|---|
| Variable name | Val0 | Val1_low | Val1_high | Val2 |
| value | 0x12 | 0x56 | 0x34 | 0x78 |
here a.val1 The address for 0x00000001, If the uint16_t Type to dereference this address will enter... Due to alignment problems HardFault, If you must manipulate the variable in pointer mode, you should use memcpy().
2、 The interrupt flag... Is not cleared in the interrupt service function
The interrupt flag is incorrectly cleared before the interrupt service function exits , When the program execution exits from the interrupt service function, it will immediately enter the interrupt service function , Show procedural “ Feign death ” The phenomenon .
3、NMI interrupt
Encountered during debugging SPI Of MISO Pin reuse NMI function , When passed SPI When the connected peripherals are damaged MISO Be pulled high , Cause the single-chip microcomputer to reset after NMI The pins are configured to SPI Enter directly before the function NMI interrupt , The program hangs in NMI In interruption . This can happen in NMI Disable... In the interrupt service function of NMI Function to exit NMI interrupt .
3.2.1.2 Hardware problem
1、 The crystal oscillator does not start
2、 Insufficient supply voltage
3、 The reset pin is pulled low
3.2 .2 Reset
3.2.2.1 Software problems
1、 Watchdog reset
In addition to the reset caused by dog feeding timeout , Also pay attention to the special requirements of watchdog configuration , With Freescale KEA For example, SCM , The MCU watchdog needs to execute unlocking sequence when configuring ( Write two different values continuously to its register ), The unlocking sequence must be in 16 Complete within a bus clock , Timeout will cause the watchdog to reset . Such problems can only be familiar with the MCU data manual , Pay attention to similar details .
3.2.2.2 Hardware problem
1、 The supply voltage is unstable
2、 Insufficient load capacity of power supply
Four 、 regression testing
After the problem is solved, regression test is needed , On the one hand, confirm whether the problem will not recur , On the other hand, make sure that the modification will not introduce other problems .
5、 ... and 、 Summary of experience
Summarize the causes of this problem and the methods to solve it , Think about how to prevent similar problems in the future , Whether the same platform products are worth learning from , To draw inferences from one case , Learn from failure .
边栏推荐
- 资深【软件测试工程师】学习线路和必备知识点
- Edge在IE模式下加载网页 - Edge设置IE兼容性
- Web3 has not been implemented yet, web5 suddenly appears!
- neo4j数据库导出
- 洛谷P2939 [USACO09FEB]Revamping Trails G 题解
- Halon common affine transformation operators
- Ad22 Gerber files Click to open the Gerber step interface. Official solutions to problems
- 【Cocos Creator 3.5.1】坐标的加法
- DAST 黑盒漏洞扫描器 第六篇:运营篇(终)
- 【Cocos Creator 3.5.1】input.on的使用
猜你喜欢

Remapping (STM32)

Basic concepts of neo4j graph database

Dual position relay dls-34a dc0.5a 220VDC

Two position relay hjws-9440

微信小程序WebSocket使用案例

Edge loads web pages in IE mode - edge sets ie compatibility

高翔slam14讲-笔记1

重映像(STM32)

Junda technology - centralized monitoring scheme for multi brand precision air conditioners

Codeforces Round #802 (Div. 2)
随机推荐
Get system volume across platforms in unity
【622. 设计循环队列】
How JQ gets the ID name of an element
Wechat applet refreshes the current page
微信小程序刷新当前页面
微信小程序WebSocket使用案例
STM32关闭PWM输出时,让IO输出固定高或低电平的方法。
Qt使用Valgrind分析内存泄漏
Unity中跨平台获取系统音量
Dual position relay dls-34a dc0.5a 220VDC
Opencv实现对象跟踪
【Cocos Creator 3.5.1】input.on的使用
Penetration test - file upload / download / include
Codeforces Round #802 (Div. 2)
重映像(STM32)
How JQ gets the reciprocal elements
jq怎么获取倒数的元素
《数据库原理与应用》第一版 马春梅……编著 期末复习笔记
Obtenir le volume du système à travers les plateformes de l'unit é
洛谷P4683 [IOI2008] Type Printer 题解