当前位置:网站首页>Bosun query
Bosun query
2022-06-24 15:20:00 【Wang Lei -ai Foundation】
background
bosun It's a by Stack Exchange Open source monitoring and alarm system , The tools that can be benchmarked are prometheus Of alertmanager. bosun Is designed to work with a variety of tsdb Configure the monitoring alarm system , however bosun It also provides a set of dsl Used to query and monitor 、 Evaluation indicators , bring bosun It's also a kind of tsdb irrelevant ( Current support such as opentsdb, prometheus, influxdb, es Many other kinds tsdb Back end ) Index query language . To understand bosun How to generate an alarm , Or just use his index query ability , Coordination as grafana Such a monitoring front-end to display indicators , Then you must understand the language .
bosun It is not a very hot project , There are 3.1k star, There are few documents about him in the market , Most of them are literal translations of official documents . The purpose of this article is to introduce bosun How to query ( Mainly for the backend opentsdb), And some query skills .
Concept
First of all, understand bosun Some type concepts in queries :
- Scalar It's just a number
- NumberSet and Scalar It's basically one thing , But there is one more group tag, empty {} It is tag
- SeriesSet yes The most common format for characterizing the original indicator , and NumberSet Different , Its corresponding value is not a number , It's a set of associated timestamp values , such as Time 100 The next value 3.14, Time 200 The next value 3.28
- Results It is not a concept introduced in the document , It is the most common type of query in practice , It represents the most common result of a query : It's a group tag Different SeriesSet perhaps NumberSet Etc . There are different in the document tags Combination is also called group.
Inquire about
in the light of opentsdb Query in ,bosun Several query methods are provided
q
q(query string, startDuration string, endDuration string) seriesSet
This is the most commonly used query method , Most alarms are also queried with this statement , This statement is very simple , among query yes opentsdb Of Query statement ,startDuration and endDuration Is the start and end query time , such as q(sum:rate{counter}:sys.cpu.user, 5m, 1m), Represents a query sys.cpu.user indicators 5m Forward to 1m Some time ago sum:rate. There is a delay in the collection of indicators ,endDuration It is generally recommended that at least 1m front .
# Example
q("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "5m", "1m")
group result computations
{ }
{
"1620196950": 2.016666666666667,
"1620196980": 2.3666666666666667,
"1620197010": 1.0999999999999999,
"1620197040": 1.8333333333333335,
"1620197070": 2.7333333333333343,
"1620197100": 2.5,
"1620197130": 1.7000000000000002,
"1620197160": 0.9666666666666666
}bandQuery/overQuery
bandQuery(query string, duration string, period string, eduration string, num scalar) seriesSet band(query string, duration string, period string, num scalar) seriesSet
- bandQuery It means use query Statement is executed multiple times (num Time ) Inquire about , The time range of each query is determined by duration/period decision ,
- band yes bandQuery A special form of , It's equivalent to setting up eduration = period, such as
band("avg:os.cpu", "1h", "1d", 3)It is equivalent to querying the following three statementsq("avg:os.cpu", "25h", "1d"),q("avg:os.cpu", "49h", "2d"),q("avg:os.cpu", "73h", "3d"), Because it's set up eduration=period, So the latest cycle is (period+duration,period)
overQuery(query string, duration string, period string, eduration string, num scalar) seriesSet over(query string, duration string, period string, num scalar) seriesSet shiftBand(query string, duration string, period string, num scalar) seriesSet
- overQuery yes over and shiftBand The common form of , and bandQuery The difference is that , After the query, the query result will be tagged with query offset "shifted"
- over and shiftBand It's just overQuery A special form of , It is just equivalent to giving overQuery Of eduration Set to period and current time ( That is, do not fill in )
# Example , because The rest are just bandQuery and overQuery A special form of , Here are just two examples of these queries
> bandQuery("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "5m", "60m", "1m", 2)
group result computations
{ }
{
"1620195120": 69.96666666666665,
"1620195150": 5.816666666666666,
"1620195180": 5.766666666666667,
"1620195210": 4.3,
"1620195240": 5.7666666666666675,
"1620195270": 3.7666666666666675,
"1620195300": 4.4,
"1620195330": 4.933333333333334,
"1620195360": 4.033333333333334,
"1620195390": 1.7000000000000002,
"1620198720": 69.93333333333334,
"1620198750": 11.7,
"1620198780": 1.2999999999999998,
"1620198810": 1.8500000000000008,
"1620198840": 2.766666666666667,
"1620198870": 4.633333333333333,
"1620198900": 4.833333333333334,
"1620198930": 2.366666666666667,
"1620198960": 2.366666666666667,
"1620198990": 2.2666666666666666
}
> overQuery("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "5m", "60m", "1m", 2)
group result computations
{ shift=1m0s }
{
"1620198780": 69.93333333333334,
"1620198810": 11.7,
"1620198840": 1.2999999999999998,
"1620198870": 1.8500000000000008,
"1620198900": 2.766666666666667,
"1620198930": 4.633333333333333,
"1620198960": 4.833333333333334,
"1620198990": 2.366666666666667,
"1620199020": 2.366666666666667,
"1620199050": 2.2666666666666666
}
{ shift=1h1m0s }
{
"1620198780": 69.96666666666665,
"1620198810": 5.816666666666666,
"1620198840": 5.766666666666667,
"1620198870": 4.3,
"1620198900": 5.7666666666666675,
"1620198930": 3.7666666666666675,
"1620198960": 4.4,
"1620198990": 4.933333333333334,
"1620199020": 4.033333333333334,
"1620199050": 1.7000000000000002
}bandQuery and overQuery For the same time period of a query cycle ( For example, at this time of day ) Our indicators are very useful , And what's interesting is bandQuery It doesn't produce unjoined group, This is further explained in the following tips .
window
window(query string, duration string, period string, num scalar, funcName string) seriesSet
Compared with bandQuery and overQuery,window More useful for queries for presentation purposes , window The results of each query will be funcName Of reduction Calculation , The returned value and timestamp generate a new time series . for instance , You want to check the past 6 The number of requests per hour within an hour , You can use the following calculation method :
> window("sum:rate{counter}:${service}.rpc.calledby.success.throughput", "60m", "60m", 6, "sum")
group result computations
{ }
{
"1620175620": 356260.0166666666,
"1620179220": 370473.99999999965,
"1620182820": 391460.0166666665,
"1620186420": 405893.36666666664,
"1620190020": 364280.9166666666,
"1620193620": 380179.3833333336
}coordination grafana You can draw such a curve or histogram
count/change
count Indicates that the query returns Results length , and change Indicates change , change("avg:rate:net.bytes", "60m", "") = avg(q("avg:rate:net.bytes", "60m", "")) * 60 * 60
Calculation
bosun The way we calculate is probably the most disturbing part , To understand this , First of all, we should understand several cores in combination with the concepts in Section 1 :
- Most of the returned results of a query are a set of SeriesSet perhaps NumberSet namely Results, For example, we use... When querying In this way query:
avg:rate:net.bytes{host=*}, Will automatically generate multiple group Of SeriesSet ( If not , It's just that screening can be written like thisavg:rate:net.bytes{}{host=1.2.3.4}) - bosun Most of the functions in the documentation are for a single group Of SeriesSet, That is, when applying functions to query results , Yes for each. group By application function , such as
avg(q("avg:rate:net.bytes{host=*}", "60m", ""))The results returned by the query are {host=a}, {host=b} wait , So for many group Separate application avg function - Different Results Calculate each other , for instance
+, It's for all group The combination is applied separately+Calculate , But not all group All combinations can calculate each other ,Only those that are subsets or equal to each other group To calculate, So there will be unjoined group,Not involved in the calculation group There will be a unjoined group, This calculation is a bit abstract , You can see the following examples to help understand . You can guess the result before you look at it , Make sure your understanding is correct .
# Two results The operation mode between
for g1 in Result1:
for g2 in Result2:
if g1 == g2 || g1 is subset of g2 || g2 is subset of g1:
Calculation
for g1 in Result1:
if g1 Not involved in the calculation :
Generate a unjoined group
for g2 in Result2:
if g2 Not involved in the calculation :
Generate a unjoined groupExample 1
$a = series("X=a1,Y=b1", 100, 1, 200, 2)
$b = series("X=a2,Y=b2", 100, 2, 200, 3)
$x = series("X=a1", 100, 2, 200, 1)
$y = series("X=a1,Y=b2", 100, 3, 200, 5)
$z = series("X=a2,Y=b2", 100, 3, 200, 2)
# {X=a1,Y=b1} {X=a2,Y=b2}
$ab = merge($a, $b)
# {X=a1} {X=a1,Y=b2} {X=a2,Y=b2}
$xyz = merge($x, $y, $z)
# The combinations that can participate in the calculation here are ({X=a1,Y=b1}, {X=a1}), ({X=a2,Y=b2}, {X=a2,Y=b2}), because {X=a1,Y=b2} Not involved in the calculation , So it will generate a unjoined group
$ab+$xyz
-----------------------------------
group result computations
{ X=a1, Y=b1 }
{
"100": 3,
"200": 3
}
{ X=a2, Y=b2 }
{
"100": 5,
"200": 5
}
{ X=a1, Y=b2 }
{
"100": "NaN",
"200": "NaN"
}
merge(series("X=a1,Y=b1", 100, 1, 200, 2), series("X=a2,Y=b2", 100, 2, 200, 3)) + merge(series("X=a1", 100, 2, 200, 1), series("X=a1,Y=b2", 100, 3, 200, 5), series("X=a2,Y=b2", 100, 3, 200, 2)) unjoined group (NaN)Example 2
$a = series("Y=b2", 100, 1, 200, 1)
$b = series("X=a1,Y=b1", 100, 3, 200, 5)
$c = series("X=a2,Y=b2", 100, 3, 200, 2)
$x = series("X=a2", 100, 2, 200, 1)
$y = series("X=a1,Y=b2", 100, 3, 200, 5)
$z = series("X=a2,Y=b2", 100, 3, 200, 2)
# {X=a1,Y=b1} {X=a2,Y=b2} {Y=b2}
$abc = merge($b, $c, $a)
# {X=a2,Y=b2} {X=a1,Y=b2} {X=a2}, Here is the {X=a2} It is placed last because if the first combination cannot be calculated, an error will be reported
$xyz = merge($z, $y, $x)
$abc + $xyz
-----------------------------------
group result computations
{ X=a2, Y=b2 }
{
"100": 6,
"200": 4
}
{ X=a2, Y=b2 }
{
"100": 4,
"200": 3
}
{ X=a1, Y=b2 }
{
"100": 4,
"200": 6
}
{ X=a2, Y=b2 }
{
"100": 5,
"200": 3
}
{ X=a1, Y=b1 }
merge(series("X=a2,Y=b2", 100, 3, 200, 2), series("X=a1,Y=b2", 100, 3, 200, 5), series("X=a2", 100, 2, 200, 1)) + merge(series("X=a1,Y=b1", 100, 3, 200, 5), series("X=a2,Y=b2", 100, 3, 200, 2), series("Y=b2", 100, 1, 200, 1))More examples
$aa=series("tagA=a", 0, 2, 60, 2)
$ab=series("tagA=a,tagB=b", 0, 2, 60, 1)
$ac=series("tagA=a,tagC=c", 0, 2, 60, 3)
$bb=series("tagB=b", 0, 2, 60, 2)
$cc=series("tagC=c", 0, 2, 60, 2)
# {tagA=a} {tagB=b} {tagC=c}
$abc=merge($aa,$bb,$cc)
# {tagA=a} {tagA=a,tagB=b}
$aab = merge($aa, $ab)
# {tagA=a} {tagA=a,tagC=c}
$aac = merge($aa, $ac)
# The combinations that can participate in the calculation here are ({tagA=a}, {tagA=a}) ({tagA=a},{tagA=a,tagC=c}) ({tagA=a,tagB=b},{tagA=a})
# $aab+$aac
# {tagA=a} {tagB=b}
$aabb = merge($aa, $bb)
# {tagA=a} {tagC=c}
$aacc = merge($aa, $cc)
# $aacc+$aabb
# {tagA=a} {tagC=c} + {tagA=a} {tagA=a,tagC=c}
# $aacc+$aac
# {tagA=a} {tagC=c} + {tagA=a} {tagB=b} {tagC=c}
# $aacc+$abcskill
avoid unjoined group
A common practice is to use group Some related operation functions , For example, when querying, it simply does not generate group, Use filter Statement query , such as avg(q("sum:rate:metrics.notexist{}{status=500)}", "1m", "0m")), Or use after query addtags, remove Such a function to handle tags, To avoid group Incompatibility between . Here is another ingenious approach , Can be ignored unjoined group. That is to use bandQuery To query ,
For example, an example of calculating the request error rate :
$key_err = "sum:rate{counter}:${service}.rpc.calledby.error.throughput{method=*}"
$key_succ = "sum:rate{counter}:${service}.rpc.calledby.success.throughput{method=*}"
$err_now = avg(q($key_err, "5m", "1m"))
$succ_now = avg(q($key_succ, "5m", "1m"))
$rate_now = $err_now / ($err_now +$succ_now)
$rate_nowUsing the above query method will produce a large number of unjoined group, as a result of rpc.calledby.error.throughput The of this indicator tags Quantity ratio success A lot less , But I hope that the returned results can bring method This grouping label . Use band The query method of is as follows :
$key_err = "sum:rate{counter}:${service}.rpc.calledby.error.throughput{method=*}"
$key_succ = "sum:rate{counter}:${service}.rpc.calledby.success.throughput{method=*}"
$err_now = avg(band($key_err, "4m", "1m", 1))
$succ_now = avg(band($key_succ, "4m", "1m", 1))
$rate_now = $err_now / ($err_now +$succ_now)
$rate_nowUse band The query will not produce unjoined group,unjoined group The results will be ignored , namely results In the calculation between , Generate unjoined group The steps of will be ignored .
grafana bosun plug-in unit
grafana bosun plug-in unit There are two built-in variables in
$ds: Suggested downsampling interval, This variable is very useful , In the use of queries, such asq("avg:$ds-avg:os.disk.fs.space_free{disk=*,host=backup}", "$start", ""), The query efficiency will be maintained when the user selects a large time range .$start: User selected start time
t Use of functions
group Operation function of There are several , Here is an introduction t function , He can put multiple group Of seriesSet join Become a group Of , To cooperate with some calculation functions . for instance , Calculation api Of 60 min weighting latency:
$latency=avg(q("avg:${service}.calledby.success.latency.us.pct99{handle_method=*}", "60m", ""))
$count=sum(q("sum:rate{counter,,,diff}:${service}.calledby.success.throughput{handle_method=*}", "60m", ""))
$total=sum(q("sum:rate{counter,,,diff}:${service}.calledby.success.throughput{}", "60m", ""))
sum(t($latency*($count/$total), ""))Other reference
边栏推荐
- How to allow easydss online classroom system to upload an on-demand file with a space in the file name?
- Analysis of similarities and differences between redis and memcached in cache use
- June training (day 24) - segment tree
- 从pair到unordered_map,理论+leetcode题目实战
- Virtual machines on the same distributed port group but different hosts cannot communicate with each other
- FPGA based analog I ² C protocol system design (Part I)
- Teach you how to deploy the pressure test engine on Tencent cloud
- STM32F1与STM32CubeIDE编程实例-WS2812B全彩LED驱动(基于SPI+DMA)
- openinstall携手书链:助力渠道数据分析,共创书联网时代
- The security market has entered a trillion era, and the security B2B online mall system has been accurately connected to deepen the enterprise development path
猜你喜欢

As a developer, what is the most influential book for you?

Do you really know the difference between H5 and applet?

Multimeter resistance measurement diagram and precautions

Method after charging the idea plug-in material theme UI

laravel 8 实现Auth登录

作为一名开发者,对你影响最深的书籍是哪一本?

From pair to unordered_ Map, theory +leetcode topic practice

Left hand code, right hand open source, part of the open source road

测试 H5 和小程序的区别,你真的知道吗?

Qunhui synchronizes with alicloud OSS
随机推荐
Redis highly available
Py's toad: a detailed introduction to toad, its installation and use
Design of vga/lcd display controller system based on FPGA (Part 1)
中国十大证券app排名 炒股开户安全吗
How do individuals open accounts for stock speculation? Is it safe to open accounts for stock speculation
Logstash introduction and simple case
postgresql之词法分析简介
Esp32 series -- comparison of esp32 series
大智慧开户要选什么证券公司比较好,更安全一点
Brief discussion on the implementation framework of enterprise power Bi CI /cd
Differential privacy
Port conflict handling method for tongweb
Redis interview questions
在同花顺开户证券安全吗,需要什么准备
Phpcms upgrade editor method -- simple and effective
Service visibility and observability
How to generate assembly code using clang in Intel syntax- How to generate assembly code with clang in Intel syntax?
Bert whitening vector dimension reduction and its application
Typescript raw data type
安装wireshark时npcap怎么都安装不成功,建议先用winpcap