当前位置:网站首页>Full link service tracking implementation scheme
Full link service tracking implementation scheme
2022-06-24 19:27:00 【Cloud smart aiops community】
- The business deployment model is extremely flexible : Public cloud 、 Private cloud 、 A hybrid cloud
- Business nodes are widely distributed : It's hard to get to support the business XaaS The location of the instance
- The call hosting relationship is extremely complex : The number of invocation dependencies between microservices is exponentially higher than before
- Production problems are not found in time : Because the service invocation relationship between systems is not transparent , And tradition “ Total amount monitoring ” The pattern of , This results in... In the transaction link “ Problem service ” The impact of can not be quickly warned and notified , There is a certain lag in operation monitoring .
- The workload of troubleshooting is heavy : Due to the limitation of monitoring means , And the operation data standards of each system are not unified , To solve the production problem, a large number of “ Development ” And “ Operation and maintenance ” resources , And the communication cost is high .
- Inefficient problem solving : Because there is no unified serial identification for the operation data between systems , And different recording standards , It makes it impossible to locate quickly “ Problem service ”.
Observability
- indicators (Metrics): A form of aggregated data , I often come into contact with QPS、TP99、TP999 And so on Metrics The category of , Generally, the design and implementation are based on statistical principles ;
- journal (Logging): In a broad sense, logging is triggered by business requests or events , Record a snapshot of the application's state information . Unified collection of log data 、 Storage and parsing are affected by many factors , For example, structured and unstructured log processing , You often need a high-performance parser and cache ;
- Call chain (Tracing): Come of SOA Technology era , The long call chain brought by servitization , It's hard to locate the problem by just relying on the log , Some measures are needed to compensate for complexity . So it's more expressive than Metrics More complicated .

- In the cloud native scene , Virtualization is more thorough 、 The environment is more dynamic . Make full use of observability to realize full link tracking , To achieve high business availability 、 Satisfy SLA Other requirements .
- Track the whole transaction link in a visual way , Realize rapid problem discovery 、 Location problem 、 Assist in problem solving ; To be more intuitive 、 Generate and use the observation data in a scientific way for real-time monitoring and analysis .
- introduce AI Technology for automated exception discovery 、 Location and repair .
Problems and challenges
Overall solution for full link tracking
Applicable scenario
No monitoring tools
There are a few monitoring tools
There are relatively complete tools
Unified management and intelligence of operation and maintenance data
Solution

- Failure prevention phase : Full link tracking index planning and observation , At the same time, the indicator is converted to the alarm threshold , If the fault occurs, predict and alarm in advance , The operation and maintenance issues can be handled at the first time ;
- Fault discovery stage : The alarm shall be quickly notified to the operation and maintenance team ,
- Analysis and solution stage : Fast fault analysis and processing based on full link service tracking , Quickly analyze and locate O & M problems through link tracking and visualization , Be measurable and observable .
- The second round and conclusion stage : Historical data analysis , Full link optimization and supplement , Root cause location analysis , Business system optimization suggestions .

summary
The difficulty of using observational data
- adopt AI Algorithm ability combined with expert experience , The implementation is complex IT Fast fault detection in environment 、 Root cause location 、 performance optimization ;
- Identify the global performance of key call chains in business scenarios , Auxiliary business optimization ;
- Provide traceable performance data , Quantify the business value of the operation and maintenance department
It is difficult to build a chain of observation data
- Based on the processing capacity of the operation and maintenance data center , Collect rich observation data in real time / Handle / Storage / analysis , Build a fusion observation data system ;
- Conduct whole process display and upstream and downstream impact analysis through multi-dimensional topology .
The problem of difficult access to observation data
- Multiple sources : Front and rear ends 、 Cross cloud deployment 、 Third party tools, etc ; Multiple data types : journal 、 indicators 、 Call chain 、 The network traffic 、 Tripartite topology, etc ;
- Multilingual :Java 、Go etc. ;
- multi-protocol :OpenTracing、OpenTelemetry etc. ;
Open source benefits
Cloud intelligence has become an open source data visualization platform FlyFish . By configuring the data model, it provides users with hundreds of visual graphics components , Zero coding can achieve a cool visual large screen that meets your business needs . meanwhile , Flying fish also provides flexible expansion ability , Support component development 、 Customize the configuration of functions and global events , Facing complex demand scenarios can ensure efficient development and delivery .
Click the address link below , Welcome to FlyFish Like to send Star. Participate in component development , There are ten thousand yuan in cash waiting for you to get .
GitHub Address : https://github.com/CloudWise-OpenSource/FlyFish
Gitee Address :https://gitee.com/CloudWise/fly-fish
Ten thousand yuan cash activities : http://bbs.aiops.cloudwise.com/t/Activity
Wechat scanning identifies the QR code below , remarks 【 Flying fish 】 Join in AIOps Community flying fish developer exchange group , And FlyFish project PMC Face to face communication ~
边栏推荐
- Understanding openstack network
- SaltStack State状态文件配置实例
- How do programmers do we media?
- 微信小程序轮播图怎么自定义光标位置
- At present, only CDC monitors Mysql to get the data of new columns. Sqlserver can't, can it
- Volcano becomes spark default batch scheduler
- 通过SCCM SQL生成计算机上一次登录用户账户报告
- Obstacle avoidance sensor module (stm32f103c8t6)
- Kubernetes集群部署
- Interpreting harmonyos application and service ecology
猜你喜欢
Game between apifox and other interface development tools
System design idea of time traceability
试驾 Citus 11.0 beta(官方博客)
Do you have all the basic embedded knowledge points that novices often ignore?
High dimension low code: component rendering sub component
Unity mobile game performance optimization spectrum CPU time-consuming optimization divided by engine modules
ArrayList源码解析
How to use R package ggtreeextra to draw evolution tree
three. Basic framework created by JS
优维低代码:构件渲染子构件
随机推荐
flink cdc全量读mysql老是报这个错怎么处理
请教一个问题。adbhi支持保留一个ID最新100条数据库,类似这样的操作吗
Php OSS file read and write file, workerman Generate Temporary file and Output Browser Download
Source code analysis of ArrayList
The efficiency of okcc call center data operation
实时渲染:实时、离线、云渲染、混合渲染的区别
论文解读(SR-GNN)《Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data》
Volcano成Spark默认batch调度器
three. Basic framework created by JS
Module V
Sr-gnn shift robot gnns: overlapping the limitations of localized graph training data
Why is nodejs so fast?
Vs2017 add header file path method
Introduction and download tutorial of administrative division vector data
Starring V6 platform development take out point process
SaltStack State状态文件配置实例
我用sql形式的会出现cdc读取乱序吗
Northwestern Polytechnic University attacked by hackers? Two factor authentication changes the situation!
Write a positive integer to the node and return a floating-point number multiplied by 0.85 when reading the node
模块五