当前位置:网站首页>Go deep into high-performance JSON parsing libraries in go
Go deep into high-performance JSON parsing libraries in go
2022-06-24 13:23:00 【luozhiyun】
Please state the source of reprint ~, This article was published at luozhiyun The blog of :https://www.luozhiyun.com/archives/535
In fact, I didn't intend to go to see JSON Library performance issues , But recently I did one on my project pprof, From the flame diagram below, it can be found that in business logic processing , More than half of the performance consumption is in JSON During parsing , So there's this article .
This article goes deep into the source code analysis in Go How to parse the standard library in JSON Of , Then let's look at some popular ones Json Parsing library , And what are the characteristics of these libraries , What scenarios can help us develop better .
The following databases are mainly introduced and analyzed :
Library name | Star |
|---|---|
Standard library JSON Unmarshal | |
valyala/fastjson | 1.2 k |
tidwall/gjson | 8.3 k |
buger/jsonparser | 4 k |
json-iterator Library is also a very famous library , But I tested the performance and the difference between the performance and the standard library is very small , In contrast, the standard library is more worth using ;
Jeffail/gabs Library and bitly/go-simplejson Standard library for direct use Unmarshal To parse , So the performance is consistent with the standard library , Nor will this article mention ;
easyjson This library needs to be like protobuf Also generate serialization code for each structure , It is highly invasive , I personally don't like , So I didn't mention .
The above libraries are what I can find Star Number greater than 1k Well known , And still iterating JSON Parsing library , If anything is missing , You can contact me , I'll make up for it .
Standard library JSON Unmarshal
analysis
func Unmarshal(data []byte, v interface{})Official JSON The parsing library needs to pass two parameters , One is the object that needs to be serialized , The other is the type of the object .
In real execution JSON Before parsing, it will call reflect.ValueOf To get the parameters v The reflection object of . Then we will get the passed in data The non empty character at the beginning of the object defines which method should be used for parsing .
func (d *decodeState) value(v reflect.Value) error {
switch d.opcode {
default:
panic(phasePanicMsg)
// Array
case scanBeginArray:
...
// Structure or map
case scanBeginObject:
...
// Literal , Include int、string、float etc.
case scanBeginLiteral:
...
}
return nil
} If the parsed object is in the form of [ start , Then it means that this is an array object and it will enter scanBeginArray Branch ; If so { start , Indicates that the parsed object is a structure or map, So go into scanBeginObject Branch wait .
Take parsing objects as an example :
func (d *decodeState) object(v reflect.Value) error {
...
var fields structFields
// Verify that the object type is map still Structure
switch v.Kind() {
case reflect.Map:
...
case reflect.Struct:
// Cache the fields of the structure to fields In the object
fields = cachedTypeFields(t)
// ok
default:
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
d.skip()
return nil
}
var mapElem reflect.Value
origErrorContext := d.errorContext
// Loop parsing one by one JSON In the string key value value
for {
start := d.readIndex()
d.rescanLiteral()
item := d.data[start:d.readIndex()]
// obtain key value
key, ok := unquoteBytes(item)
if !ok {
panic(phasePanicMsg)
}
var subv reflect.Value
destring := false
...
// according to value Type reflection settings for value value
if destring {
// value The value is literal and will enter here
switch qv := d.valueQuoted().(type) {
case nil:
if err := d.literalStore(nullLiteral, subv, false); err != nil {
return err
}
case string:
if err := d.literalStore([]byte(qv), subv, true); err != nil {
return err
}
default:
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
}
} else {
// Arrays or objects will recursively call value Method
if err := d.value(subv); err != nil {
return err
}
}
...
// Until I met } Finally exit the loop
if d.opcode == scanEndObject {
break
}
if d.opcode != scanObjectValue {
panic(phasePanicMsg)
}
}
return nil
}- Structure objects are cached first ;
- Loop through the structure object ;
- Find... In the structure key Value, and then find the field type with the same name in the structure ;
- Recursively call value Method reflection sets the value corresponding to the structure ;
- Until traversal to JSON Middle end
}End of cycle .
Summary
By looking at Unmarshal You can see in the source code that a large number of reflections are used to obtain field values , If it is multi-level nested JSON Words , Then you need to recursively reflect to get the value , It is conceivable that the performance is very poor .
But if performance is not so important , Using it directly is actually a very good choice , While the function is perfect, the official has been iterating and optimizing , Maybe in the future version, the neutral energy will also get a qualitative leap .
fastjson
The base address :https://github.com/valyala/fastjson
The feature of this library is as fast as its name , Its introduction page says :
Fast. As usual, up to 15x faster than the standard encoding/json.
Its use is also very simple , as follows :
func main() {
var p fastjson.Parser
v, _ := p.Parse(`{
"str": "bar",
"int": 123,
"float": 1.23,
"bool": true,
"arr": [1, "foo", {}]
}`)
fmt.Printf("foo=%s\n", v.GetStringBytes("str"))
fmt.Printf("int=%d\n", v.GetInt("int"))
fmt.Printf("float=%f\n", v.GetFloat64("float"))
fmt.Printf("bool=%v\n", v.GetBool("bool"))
fmt.Printf("arr.1=%s\n", v.GetStringBytes("arr", "1"))
}
// Output:
// foo=bar
// int=123
// float=1.230000
// bool=true
// arr.1=fooUse fastjson The first thing to be resolved is JSON Give the string to Parser The parser parses , And then through Parse Method to get . If it is a nested object, it can be directly in Get Method passes in the corresponding parent-child key that will do .
analysis
fastjson In design and standard library Unmarshal The difference is , It will JSON The analysis is divided into two parts :Parse、Get.
Parse Responsible for JSON The string parses into a structure and returns , Then get the data through the returned structure . stay Parse The parsing process is lockless , So if you want to call concurrently Parse Parsing requires the use of ParserPool
fastjson Is to traverse from top to bottom JSON , Then the parsed data is stored in Value In the structure :
type Value struct {
o Object
a []*Value
s string
t Type
}This structure is very simple :
o Object: Indicates that the parsed structure is an object ;a []*Value: The parsed structure is an array ;s string: If the structure being parsed is neither an object nor an array , Then other types of values will be stored in this field as strings ;t Type: Indicates the type of this structure , Yes TypeObject、TypeArray、TypeString、TypeNumber etc. .
type Object struct {
kvs []kv
keysUnescaped bool
}
type kv struct {
k string
v *Value
}This structure holds the recursive structure of the object . If we take the JSON After the string parsing is completed, there is such a structure :
Code
In code implementation , Because there is no reflection part of the code , So the whole parsing process becomes very refreshing . Let's look directly at the parsing of the trunk :
func parseValue(s string, c *cache, depth int) (*Value, string, error) {
if len(s) == 0 {
return nil, s, fmt.Errorf("cannot parse empty string")
}
depth++
// Of maximum depth json The string cannot exceed MaxDepth
if depth > MaxDepth {
return nil, s, fmt.Errorf("too big depth for the nested JSON; it exceeds %d", MaxDepth)
}
// Parse object
if s[0] == '{' {
v, tail, err := parseObject(s[1:], c, depth)
if err != nil {
return nil, tail, fmt.Errorf("cannot parse object: %s", err)
}
return v, tail, nil
}
// Parsing arrays
if s[0] == '[' {
...
}
// Parse string
if s[0] == '"' {
...
}
...
return v, tail, nil
}parseValue The type to be parsed will be determined according to the first non empty character of the string . Here, an object type is used for parsing :
func parseObject(s string, c *cache, depth int) (*Value, string, error) {
...
o := c.getValue()
o.t = TypeObject
o.o.reset()
for {
var err error
// obtain Ojbect In structure kv object
kv := o.o.getKV()
...
// analysis key value
kv.k, s, err = parseRawKey(s[1:])
...
// Recursive parsing value value
kv.v, s, err = parseValue(s, c, depth)
...
// encounter , No. continue to parse
if s[0] == ',' {
s = s[1:]
continue
}
// End of analysis
if s[0] == '}' {
return o, s[1:], nil
}
return nil, s, fmt.Errorf("missing ',' after object value")
}
}parseObject The function is also very simple , In the loop body, we get key value , And then call parseValue Recursive parsing value value , Analyze from top to bottom JSON object , Until I finally met } sign out .
Summary
Through the above analysis, we can know that fastjson It is much simpler than the standard library in implementation , The performance is also much higher . Use Parse Good analysis JSON The tree can be reused many times , Avoid the need for repeated parsing to improve performance .
But its function is very simple , There is no commonly used such as JSON turn Struct or JSON turn map The operation of . If you just want to simply get JSON The value in , So it is very convenient to use this library , But if you want to JSON To convert values into a structure, you need to set values one by one .
GJSON
The base address :https://github.com/tidwall/gjson
GJSON In my test , Although the performance is not fastjson So extreme , But the function is perfect , The performance is also quite OK Of , Let me give you a brief introduction GJSON The function of .
GJSON The use of is and fastjson About the same , It's also very simple , Just pass in the parameter json String and the value to be obtained :
json := `{"name":{"first":"li","last":"dj"},"age":18}`
lastName := gjson.Get(json, "name.last") In addition to this function, you can also perform simple fuzzy matching , Wildcards in keys are supported * and ?,* Match any number of characters ,? Match a single character , as follows :
json := `{
"name":{"first":"Tom", "last": "Anderson"},
"age": 37,
"children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "child*.2"))
fmt.Println("first c?ild:", gjson.Get(json, "c?ildren.0"))child*.2: Firstchild*matchingchildren,.2Read the first 3 Elements ;c?ildren.0:c?ildrenMatch tochildren,.0Read the first element ;
In addition to fuzzy matching, it also supports modifier operations :
json := `{
"name":{"first":"Tom", "last": "Anderson"},
"age": 37,
"children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "children|@reverse"))children|@reverse Read the array first children, Then use the modifier @reverse After flipping, return to , Output .
nestedJSON := `{"nested": ["one", "two", ["three", "four"]]}`
fmt.Println(gjson.Get(nestedJSON, "nested|@flatten"))@flatten Will array nested After flattening the inner array to the outer layer, it returns :
["one","two","three", "four"]
And so on, and some other interesting features , You can check the official documents .
analysis
GJSON Of Get Method parameters are composed of two parts , One is JSON strand , The other is called Path Indicates that you need to obtain JSON Value matching path .
stay GJSON Because it needs to meet many defined parsing scenarios , So parsing is divided into two parts , It needs to be resolved first Path After that, you can traverse the parsing JSON strand .
In the process of parsing, if the value on can be matched is encountered , Then it will go straight back to , There is no need to continue traversing , If it matches multiple values , Then it will go through the whole JSON strand . If you meet someone Path stay JSON There is no match in the string , So you also need to traverse the entire JSON strand .
In the process of parsing, it will not be like fastjson Save the parsed content in a structure , Can be used repeatedly . So when you call GetMany When you want to return multiple values , In fact, it also needs to traverse JSON String multiple times , So the efficiency will be lower .
besides , In parsing JSON It will not be verified when , Even if the string put in is not JSON It will also be interpreted in the same way , Therefore, users need to make sure that what they put in JSON .
Code
func Get(json, path string) Result {
// analysis path
if len(path) > 1 {
...
}
var i int
var c = &parseContext{json: json}
if len(path) >= 2 && path[0] == '.' && path[1] == '.' {
c.lines = true
parseArray(c, 0, path[2:])
} else {
// Parse according to different objects , There will always be a cycle , Until I find '{' or '['
for ; i < len(c.json); i++ {
if c.json[i] == '{' {
i++
parseObject(c, i, path)
break
}
if c.json[i] == '[' {
i++
parseArray(c, i, path)
break
}
}
}
if c.piped {
res := c.value.Get(c.pipe)
res.Index = 0
return res
}
fillIndex(json, c)
return c.value
}Get In the method, you can see that there is a long string of code used to parse all kinds of Path, Then one for The loop goes through JSON Until I find '{' or '[', Then the corresponding logic is used for processing .
func parseObject(c *parseContext, i int, path string) (int, bool) {
var pmatch, kesc, vesc, ok, hit bool
var key, val string
rp := parseObjectPath(path)
if !rp.more && rp.piped {
c.pipe = rp.pipe
c.piped = true
}
// Nest two for loop seek key value
for i < len(c.json) {
for ; i < len(c.json); i++ {
if c.json[i] == '"' {
i++
var s = i
for ; i < len(c.json); i++ {
if c.json[i] > '\\' {
continue
}
// find key Value jumps to parse_key_string_done
if c.json[i] == '"' {
i, key, kesc, ok = i+1, c.json[s:i], false, true
goto parse_key_string_done
}
...
}
key, kesc, ok = c.json[s:], false, false
// direct break
parse_key_string_done:
break
}
if c.json[i] == '}' {
return i + 1, false
}
}
if !ok {
return i, false
}
// Check whether it is a fuzzy match
if rp.wild {
if kesc {
pmatch = match.Match(unescape(key), rp.part)
} else {
pmatch = match.Match(key, rp.part)
}
} else {
if kesc {
pmatch = rp.part == unescape(key)
} else {
pmatch = rp.part == key
}
}
// analysis value
hit = pmatch && !rp.more
for ; i < len(c.json); i++ {
switch c.json[i] {
default:
continue
case '"':
i++
i, val, vesc, ok = parseString(c.json, i)
if !ok {
return i, false
}
if hit {
if vesc {
c.value.Str = unescape(val[1 : len(val)-1])
} else {
c.value.Str = val[1 : len(val)-1]
}
c.value.Raw = val
c.value.Type = String
return i, true
}
case '{':
if pmatch && !hit {
i, hit = parseObject(c, i+1, rp.path)
if hit {
return i, true
}
} else {
i, val = parseSquash(c.json, i)
if hit {
c.value.Raw = val
c.value.Type = JSON
return i, true
}
}
...
break
}
}
return i, false
}Look at it parseObject This code is not intended to let you learn how to parse JSON, And traversal strings , But I want to show you a bad case What is the .for Cycle layer by layer ,if One by one, I saw San It's worth losing , Does this piece of code look familiar to everyone ? Is it a bit like the code written by a colleague at work ?
Summary
advantage :
- The performance is relatively good compared with the standard library ;
- High playability , It can be searched in various ways 、 Custom return value , This is very convenient ;
shortcoming :
- It won't check JSON The correctness of the ;
- Code Code smell It's heavy .
It should be noted that , If you need to parse, return JSON If it's worth it ,GetMany The function will be based on the specified key Values are iterated over and over again JSON character string , It can be interpreted as map You can reduce the number of iterations .
jsonparser
The base address :https://github.com/buger/jsonparser
This is also a hot topic , And it is called high performance , It can parse ten times faster than the standard library .
analysis
jsonparser It is also an JSON Of byte section , And you can pass in multiple key Value to quickly locate the corresponding value , And back to .
and GJSON equally , In the parsing process, it will not be like fastjson There is also a data structure cache that has been parsed JSON character string , But when you need to parse multiple values, you can use EachKey Function to parse multiple values , You only need to traverse it once JSON String can be used to obtain multiple values .
If you encounter a value that can match on , Then it will go straight back to , There is no need to continue traversing , If it matches multiple values , Then it will go through the whole JSON strand . If you meet someone Path stay JSON There is no match in the string , So you also need to traverse the entire JSON strand .
And is traversing JSON The use of recursion is reduced by looping the string , Reduces the depth of the call stack , To some extent, it can also improve performance .
In terms of functionality ArrayEach、ObjectEach、EachKey All three functions can be passed into a user-defined function , Realize personalized requirements through functions , The utility is greatly enhanced .
about jsonparser Come on , The code has nothing to analyze , Very clear , Those who are interested can go and have a look by themselves .
Summary
about jsonparser The reason why the performance is so high compared with the standard library can be summarized as :
- Use for Loop to reduce the use of recursion ;
- Compared with the standard library, reflection is not used ;
- Find the corresponding key When the value is found, exit directly , You don't have to recurse down ;
- The operation of JSON Strings are all passed in , I will not apply for new space again , Reduced memory allocation ;
In addition to that api The design of is also very practical ,ArrayEach、ObjectEach、EachKey All three functions can be passed into a user-defined function, which solves many problems in the actual business development .
The disadvantages are also very obvious , Not right JSON check , Even if this It's not JSON.
Performance comparison
Analytic small JSON character string
Parsing a simple structure , The size is about 190 bytes String
Library name | operation | Each iteration takes | Amount of memory occupied | Number of memory allocations | performance |
|---|---|---|---|---|---|
Standard library | It can be interpreted as map | 724 ns/op | 976 B/op | 51 allocs/op | slow |
It can be interpreted as struct | 297 ns/op | 256 B/op | 5 allocs/op | commonly | |
fastjson | get | 68.2 ns/op | 0 B/op | 0 allocs/op | The fastest |
parse | 35.1 ns/op | 0 B/op | 0 allocs/op | The fastest | |
GJSON | turn map | 255 ns/op | 1009 B/op | 11 allocs/op | commonly |
get | 232 ns/op | 448 B/op | 1 allocs/op | commonly | |
jsonparser | get | 106 ns/op | 232 B/op | 3 allocs/op | fast |
Parse medium size JSON character string
Parsing a with certain complexity , The size is about 2.3KB String
Library name | operation | Each iteration takes | Amount of memory occupied | Number of memory allocations | performance |
|---|---|---|---|---|---|
Standard library | It can be interpreted as map | 4263 ns/op | 10212 B/op | 208 allocs/op | slow |
It can be interpreted as struct | 4789 ns/op | 9206 B/op | 259 allocs/op | slow | |
fastjson | get | 285 ns/op | 0 B/op | 0 allocs/op | The fastest |
parse | 302 ns/op | 0 B/op | 0 allocs/op | The fastest | |
GJSON | turn map | 2571 ns/op | 8539 B/op | 83 allocs/op | commonly |
get | 1489 ns/op | 448 B/op | 1 allocs/op | commonly | |
jsonparser | get | 878 ns/op | 2728 B/op | 5 allocs/op | fast |
Analysis of big JSON character string
High parsing complexity , The size is about 2.2MB String
Library name | operation | Each iteration takes | Amount of memory occupied | Number of memory allocations | performance |
|---|---|---|---|---|---|
Standard library | It can be interpreted as map | 2292959 ns/op | 5214009 B/op | 95402 allocs/op | slow |
It can be interpreted as struct | 1165490 ns/op | 2023 B/op | 76 allocs/op | commonly | |
fastjson | get | 368056 ns/op | 0 B/op | 0 allocs/op | fast |
parse | 371397 ns/op | 0 B/op | 0 allocs/op | fast | |
GJSON | turn map | 1901727 ns/op | 4788894 B/op | 54372 allocs/op | commonly |
get | 1322167 ns/op | 448 B/op | 1 allocs/op | commonly | |
jsonparser | get | 233090 ns/op | 1788865 B/op | 376 allocs/op | The fastest |
summary
In this sharing process , I found a lot JSON The analysis libraries are compared and analyzed respectively , It can be found that these high-performance parsing libraries basically have some common characteristics :
- Don't use reflections ;
- By traversing JSON The bytes of the string are parsed one by one ;
- Try to use incoming JSON String to parse and traverse , Reduce memory allocation ;
- At the expense of some compatibility ;
For all that , But functionally , Each has its own characteristics fastjson Of api The easiest way to operate ;GJSON Provides the function of fuzzy search , The highest degree of customization ;jsonparser In the process of implementing high-performance parsing , You can also insert a callback function to execute , Provides a certain degree of convenience .
Sum up , Go back to the beginning of the article , For my own business , Business is just a simple analysis http Requested returned JSON Some fields of the string , And the fields are all fixed , No search function is required , But sometimes you need to do some custom operations , So for me jsonparser Is the most appropriate .
So if you have certain requirements for performance , You might as well choose one according to your business situation JSON Parser .
Reference
https://github.com/buger/jsonparser
https://github.com/tidwall/gjson
https://github.com/valyala/fastjson
https://github.com/json-iterator/go
https://github.com/mailru/easyjson
https://github.com/Jeffail/gabs
https://github.com/bitly/go-simplejson
边栏推荐
- “我这个白痴,招到了一堆只会“谷歌”的程序员!”
- CVPR 2022 - Interpretation of selected papers of meituan technical team
- Configuration (enable_*) parameter related to execution plan in PG
- Sphere, openai and ai21 jointly publish the best practice guidelines for deployment models
- Coinbase将推出首个针对个人投资者的加密衍生产品
- One article explains R & D efficiency! Your concerns are
- RAID5 array recovery case tutorial of a company in Shanghai
- Common special characters in JS and TS
- 如何化解35岁危机?华为云数据库首席架构师20年技术经验分享
- I have fundamentally solved the problem of wechat occupying mobile memory
猜你喜欢

LVGL库入门教程 - 颜色和图像

Comparator 排序函数式接口

Comparator sort functional interface

Teach you how to use airtestide to connect your mobile phone wirelessly!

The agile way? Is agile development really out of date?

A hero's note stirred up a thousand waves across 10 countries, and the first-line big factories sent people here- Gwei 2022 Singapore

Creation and use of unified links in Huawei applinking

我真傻,招了一堆只会“谷歌”的程序员!

Several common DoS attacks

hands-on-data-analysis 第三单元 模型搭建和评估
随机推荐
几种常见的DoS攻击
Kubernetes集群部署
一文讲透研发效能!您关心的问题都在
Party, Google's autoregressive Wensheng graph model
C语言中常量的定义和使用
Concept + formula (excluding parameter estimation)
Getting started with the go Cobra command line tool
Detailed explanation of abstractqueuedsynchronizer, the cornerstone of thread synchronization
Configure Yum proxy
快速了解常用的消息摘要算法,再也不用担心面试官的刨根问底
“我这个白痴,招到了一堆只会“谷歌”的程序员!”
SCRM, a breakthrough in the new consumption era
“有趣” 是新时代的竞争力
Integrate the authorization interface code of intra city distribution account of multiple express companies nationwide - Express 100
AGCO AI frontier promotion (6.24)
ERR AUTH&lt; password&gt; called without anypassword configured for the default user. Ar
脚本之美│VBS 入门交互实战
【概率论期末抱佛脚】概念+公式(不含参数估计)
Sphere, openai and ai21 jointly publish the best practice guidelines for deployment models
DTU上报的数据值无法通过腾讯云规则引擎填入腾讯云数据库中