当前位置:网站首页>Go deep into high-performance JSON parsing libraries in go

Go deep into high-performance JSON parsing libraries in go

2022-06-24 13:23:00 luozhiyun

Please state the source of reprint ~, This article was published at luozhiyun The blog of :https://www.luozhiyun.com/archives/535

In fact, I didn't intend to go to see JSON Library performance issues , But recently I did one on my project pprof, From the flame diagram below, it can be found that in business logic processing , More than half of the performance consumption is in JSON During parsing , So there's this article .

image-20210519160937326

This article goes deep into the source code analysis in Go How to parse the standard library in JSON Of , Then let's look at some popular ones Json Parsing library , And what are the characteristics of these libraries , What scenarios can help us develop better .

The following databases are mainly introduced and analyzed :

Library name

Star

Standard library JSON Unmarshal

valyala/fastjson

1.2 k

tidwall/gjson

8.3 k

buger/jsonparser

4 k

json-iterator Library is also a very famous library , But I tested the performance and the difference between the performance and the standard library is very small , In contrast, the standard library is more worth using ;

Jeffail/gabs Library and bitly/go-simplejson Standard library for direct use Unmarshal To parse , So the performance is consistent with the standard library , Nor will this article mention ;

easyjson This library needs to be like protobuf Also generate serialization code for each structure , It is highly invasive , I personally don't like , So I didn't mention .

The above libraries are what I can find Star Number greater than 1k Well known , And still iterating JSON Parsing library , If anything is missing , You can contact me , I'll make up for it .

Standard library JSON Unmarshal

analysis

func Unmarshal(data []byte, v interface{})

Official JSON The parsing library needs to pass two parameters , One is the object that needs to be serialized , The other is the type of the object .

In real execution JSON Before parsing, it will call reflect.ValueOf To get the parameters v The reflection object of . Then we will get the passed in data The non empty character at the beginning of the object defines which method should be used for parsing .

func (d *decodeState) value(v reflect.Value) error {
	switch d.opcode {
	default:
		panic(phasePanicMsg)
	//  Array  
	case scanBeginArray:
		...
	//  Structure or map
	case scanBeginObject:
		...
	//  Literal , Include  int、string、float  etc. 
	case scanBeginLiteral:
		...
	}
	return nil
}

If the parsed object is in the form of [ start , Then it means that this is an array object and it will enter scanBeginArray Branch ; If so { start , Indicates that the parsed object is a structure or map, So go into scanBeginObject Branch wait .

Take parsing objects as an example :

func (d *decodeState) object(v reflect.Value) error {
	...  
	var fields structFields
	//  Verify that the object type is  map  still   Structure 
	switch v.Kind() {
	case reflect.Map: 
		...
	case reflect.Struct:
		//  Cache the fields of the structure to  fields  In the object 
		fields = cachedTypeFields(t)
		// ok
	default:
		d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
		d.skip()
		return nil
	}

	var mapElem reflect.Value
	origErrorContext := d.errorContext
	//  Loop parsing one by one JSON In the string  key value  value 
	for {  
		start := d.readIndex()
		d.rescanLiteral()
		item := d.data[start:d.readIndex()]
		//  obtain  key  value 
		key, ok := unquoteBytes(item)
		if !ok {
			panic(phasePanicMsg)
		} 
		var subv reflect.Value
		destring := false   
		... 
		//  according to  value  Type reflection settings for  value  value  
		if destring {
			// value  The value is literal and will enter here 
			switch qv := d.valueQuoted().(type) {
			case nil:
				if err := d.literalStore(nullLiteral, subv, false); err != nil {
					return err
				}
			case string:
				if err := d.literalStore([]byte(qv), subv, true); err != nil {
					return err
				}
			default:
				d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
			}
		} else {
			//  Arrays or objects will recursively call  value  Method 
			if err := d.value(subv); err != nil {
				return err
			}
		}
		...
		//  Until I met  }  Finally exit the loop 
		if d.opcode == scanEndObject {
			break
		}
		if d.opcode != scanObjectValue {
			panic(phasePanicMsg)
		}
	}
	return nil
}
  1. Structure objects are cached first ;
  2. Loop through the structure object ;
  3. Find... In the structure key Value, and then find the field type with the same name in the structure ;
  4. Recursively call value Method reflection sets the value corresponding to the structure ;
  5. Until traversal to JSON Middle end } End of cycle .

Summary

By looking at Unmarshal You can see in the source code that a large number of reflections are used to obtain field values , If it is multi-level nested JSON Words , Then you need to recursively reflect to get the value , It is conceivable that the performance is very poor .

But if performance is not so important , Using it directly is actually a very good choice , While the function is perfect, the official has been iterating and optimizing , Maybe in the future version, the neutral energy will also get a qualitative leap .

fastjson

The base address :https://github.com/valyala/fastjson

The feature of this library is as fast as its name , Its introduction page says :

Fast. As usual, up to 15x faster than the standard encoding/json.

Its use is also very simple , as follows :

func main() {
	var p fastjson.Parser
	v, _ := p.Parse(`{
                "str": "bar",
                "int": 123,
                "float": 1.23,
                "bool": true,
                "arr": [1, "foo", {}]
        }`)
	fmt.Printf("foo=%s\n", v.GetStringBytes("str"))
	fmt.Printf("int=%d\n", v.GetInt("int"))
	fmt.Printf("float=%f\n", v.GetFloat64("float"))
	fmt.Printf("bool=%v\n", v.GetBool("bool"))
	fmt.Printf("arr.1=%s\n", v.GetStringBytes("arr", "1"))
}
// Output:
// foo=bar
// int=123
// float=1.230000
// bool=true
// arr.1=foo

Use fastjson The first thing to be resolved is JSON Give the string to Parser The parser parses , And then through Parse Method to get . If it is a nested object, it can be directly in Get Method passes in the corresponding parent-child key that will do .

analysis

fastjson In design and standard library Unmarshal The difference is , It will JSON The analysis is divided into two parts :Parse、Get.

Parse Responsible for JSON The string parses into a structure and returns , Then get the data through the returned structure . stay Parse The parsing process is lockless , So if you want to call concurrently Parse Parsing requires the use of ParserPool

fastjson Is to traverse from top to bottom JSON , Then the parsed data is stored in Value In the structure :

type Value struct {
	o Object
	a []*Value
	s string
	t Type
}

This structure is very simple :

  • o Object: Indicates that the parsed structure is an object ;
  • a []*Value: The parsed structure is an array ;
  • s string: If the structure being parsed is neither an object nor an array , Then other types of values will be stored in this field as strings ;
  • t Type: Indicates the type of this structure , Yes TypeObject、TypeArray、TypeString、TypeNumber etc. .
type Object struct {
	kvs           []kv
	keysUnescaped bool
}

type kv struct {
	k string
	v *Value
}

This structure holds the recursive structure of the object . If we take the JSON After the string parsing is completed, there is such a structure :

fastjson

Code

In code implementation , Because there is no reflection part of the code , So the whole parsing process becomes very refreshing . Let's look directly at the parsing of the trunk :

func parseValue(s string, c *cache, depth int) (*Value, string, error) {
	if len(s) == 0 {
		return nil, s, fmt.Errorf("cannot parse empty string")
	}
	depth++
	//  Of maximum depth json The string cannot exceed MaxDepth
	if depth > MaxDepth {
		return nil, s, fmt.Errorf("too big depth for the nested JSON; it exceeds %d", MaxDepth)
	}
	//  Parse object 
	if s[0] == '{' {
		v, tail, err := parseObject(s[1:], c, depth)
		if err != nil {
			return nil, tail, fmt.Errorf("cannot parse object: %s", err)
		}
		return v, tail, nil
	}
	//  Parsing arrays 
	if s[0] == '[' {
		...
	}
	//  Parse string 
	if s[0] == '"' {
		...
	} 
	...
	return v, tail, nil
}

parseValue The type to be parsed will be determined according to the first non empty character of the string . Here, an object type is used for parsing :

func parseObject(s string, c *cache, depth int) (*Value, string, error) {
	...
	o := c.getValue()
	o.t = TypeObject
	o.o.reset()
	for {
		var err error
		//  obtain Ojbect In structure  kv  object 
		kv := o.o.getKV()
		... 
		//  analysis  key  value 
		
		kv.k, s, err = parseRawKey(s[1:])
		... 
		//  Recursive parsing  value  value 
		kv.v, s, err = parseValue(s, c, depth)
		...
		//  encounter  , No. continue to parse 
		if s[0] == ',' {
			s = s[1:]
			continue
		}
		//  End of analysis 
		if s[0] == '}' {
			return o, s[1:], nil
		}
		return nil, s, fmt.Errorf("missing ',' after object value")
	}
}

parseObject The function is also very simple , In the loop body, we get key value , And then call parseValue Recursive parsing value value , Analyze from top to bottom JSON object , Until I finally met } sign out .

Summary

Through the above analysis, we can know that fastjson It is much simpler than the standard library in implementation , The performance is also much higher . Use Parse Good analysis JSON The tree can be reused many times , Avoid the need for repeated parsing to improve performance .

But its function is very simple , There is no commonly used such as JSON turn Struct or JSON turn map The operation of . If you just want to simply get JSON The value in , So it is very convenient to use this library , But if you want to JSON To convert values into a structure, you need to set values one by one .

GJSON

The base address :https://github.com/tidwall/gjson

GJSON In my test , Although the performance is not fastjson So extreme , But the function is perfect , The performance is also quite OK Of , Let me give you a brief introduction GJSON The function of .

GJSON The use of is and fastjson About the same , It's also very simple , Just pass in the parameter json String and the value to be obtained :

json := `{"name":{"first":"li","last":"dj"},"age":18}`
lastName := gjson.Get(json, "name.last")

In addition to this function, you can also perform simple fuzzy matching , Wildcards in keys are supported * and ?,* Match any number of characters ,? Match a single character , as follows :

json := `{
	"name":{"first":"Tom", "last": "Anderson"},
	"age": 37,
	"children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "child*.2"))
fmt.Println("first c?ild:", gjson.Get(json, "c?ildren.0"))
  • child*.2: First child* matching children,.2 Read the first 3 Elements ;
  • c?ildren.0c?ildren Match to children,.0 Read the first element ;

In addition to fuzzy matching, it also supports modifier operations :

json := `{
	"name":{"first":"Tom", "last": "Anderson"},
	"age": 37,
	"children": ["Sara", "Alex", "Jack"]
}`
fmt.Println("third child*:", gjson.Get(json, "children|@reverse"))

children|@reverse Read the array first children, Then use the modifier @reverse After flipping, return to , Output .

nestedJSON := `{"nested": ["one", "two", ["three", "four"]]}`
fmt.Println(gjson.Get(nestedJSON, "nested|@flatten"))

@flatten Will array nested After flattening the inner array to the outer layer, it returns :

["one","two","three", "four"]

And so on, and some other interesting features , You can check the official documents .

analysis

GJSON Of Get Method parameters are composed of two parts , One is JSON strand , The other is called Path Indicates that you need to obtain JSON Value matching path .

stay GJSON Because it needs to meet many defined parsing scenarios , So parsing is divided into two parts , It needs to be resolved first Path After that, you can traverse the parsing JSON strand .

In the process of parsing, if the value on can be matched is encountered , Then it will go straight back to , There is no need to continue traversing , If it matches multiple values , Then it will go through the whole JSON strand . If you meet someone Path stay JSON There is no match in the string , So you also need to traverse the entire JSON strand .

In the process of parsing, it will not be like fastjson Save the parsed content in a structure , Can be used repeatedly . So when you call GetMany When you want to return multiple values , In fact, it also needs to traverse JSON String multiple times , So the efficiency will be lower .

GJSON

besides , In parsing JSON It will not be verified when , Even if the string put in is not JSON It will also be interpreted in the same way , Therefore, users need to make sure that what they put in JSON .

Code

func Get(json, path string) Result {
	//  analysis  path 
	if len(path) > 1 {
		...
	}
	var i int
	var c = &parseContext{json: json}
	if len(path) >= 2 && path[0] == '.' && path[1] == '.' {
		c.lines = true
		parseArray(c, 0, path[2:])
	} else {
		//  Parse according to different objects , There will always be a cycle , Until I find  '{'  or  '['
		for ; i < len(c.json); i++ {
			if c.json[i] == '{' {
				i++
				 
				parseObject(c, i, path)
				break
			}
			if c.json[i] == '[' {
				i++
				parseArray(c, i, path)
				break
			}
		}
	}
	if c.piped {
		res := c.value.Get(c.pipe)
		res.Index = 0
		return res
	}
	fillIndex(json, c)
	return c.value
}

Get In the method, you can see that there is a long string of code used to parse all kinds of Path, Then one for The loop goes through JSON Until I find '{' or '[', Then the corresponding logic is used for processing .

func parseObject(c *parseContext, i int, path string) (int, bool) {
	var pmatch, kesc, vesc, ok, hit bool
	var key, val string
	rp := parseObjectPath(path)
	if !rp.more && rp.piped {
		c.pipe = rp.pipe
		c.piped = true
	}
	//  Nest two  for  loop   seek  key  value 
	for i < len(c.json) {
		for ; i < len(c.json); i++ {
			if c.json[i] == '"' { 
				i++
				var s = i
				for ; i < len(c.json); i++ {
					if c.json[i] > '\\' {
						continue
					}
					//  find  key  Value jumps to  parse_key_string_done
					if c.json[i] == '"' {
						i, key, kesc, ok = i+1, c.json[s:i], false, true
						goto parse_key_string_done
					}
					...
				}
				key, kesc, ok = c.json[s:], false, false
			//  direct break
			parse_key_string_done:
				break
			}
			if c.json[i] == '}' {
				return i + 1, false
			}
		}
		if !ok {
			return i, false
		}
		//  Check whether it is a fuzzy match 
		if rp.wild {
			if kesc {
				pmatch = match.Match(unescape(key), rp.part)
			} else {
				pmatch = match.Match(key, rp.part)
			}
		} else {
			if kesc {
				pmatch = rp.part == unescape(key)
			} else {
				pmatch = rp.part == key
			}
		}
		//  analysis  value
		hit = pmatch && !rp.more
		for ; i < len(c.json); i++ {
			switch c.json[i] {
			default:
				continue
			case '"':
				i++
				i, val, vesc, ok = parseString(c.json, i)
				if !ok {
					return i, false
				}
				if hit {
					if vesc {
						c.value.Str = unescape(val[1 : len(val)-1])
					} else {
						c.value.Str = val[1 : len(val)-1]
					}
					c.value.Raw = val
					c.value.Type = String
					return i, true
				}
			case '{':
				if pmatch && !hit {
					i, hit = parseObject(c, i+1, rp.path)
					if hit {
						return i, true
					}
				} else {
					i, val = parseSquash(c.json, i)
					if hit {
						c.value.Raw = val
						c.value.Type = JSON
						return i, true
					}
				}
			...
			break
		}
	}
	return i, false
}

Look at it parseObject This code is not intended to let you learn how to parse JSON, And traversal strings , But I want to show you a bad case What is the .for Cycle layer by layer ,if One by one, I saw San It's worth losing , Does this piece of code look familiar to everyone ? Is it a bit like the code written by a colleague at work ?

Summary

advantage :

  1. The performance is relatively good compared with the standard library ;
  2. High playability , It can be searched in various ways 、 Custom return value , This is very convenient ;

shortcoming :

  1. It won't check JSON The correctness of the ;
  2. Code Code smell It's heavy .

It should be noted that , If you need to parse, return JSON If it's worth it ,GetMany The function will be based on the specified key Values are iterated over and over again JSON character string , It can be interpreted as map You can reduce the number of iterations .

jsonparser

The base address :https://github.com/buger/jsonparser

This is also a hot topic , And it is called high performance , It can parse ten times faster than the standard library .

analysis

jsonparser It is also an JSON Of byte section , And you can pass in multiple key Value to quickly locate the corresponding value , And back to .

and GJSON equally , In the parsing process, it will not be like fastjson There is also a data structure cache that has been parsed JSON character string , But when you need to parse multiple values, you can use EachKey Function to parse multiple values , You only need to traverse it once JSON String can be used to obtain multiple values .

If you encounter a value that can match on , Then it will go straight back to , There is no need to continue traversing , If it matches multiple values , Then it will go through the whole JSON strand . If you meet someone Path stay JSON There is no match in the string , So you also need to traverse the entire JSON strand .

And is traversing JSON The use of recursion is reduced by looping the string , Reduces the depth of the call stack , To some extent, it can also improve performance .

In terms of functionality ArrayEach、ObjectEach、EachKey All three functions can be passed into a user-defined function , Realize personalized requirements through functions , The utility is greatly enhanced .

about jsonparser Come on , The code has nothing to analyze , Very clear , Those who are interested can go and have a look by themselves .

Summary

about jsonparser The reason why the performance is so high compared with the standard library can be summarized as :

  1. Use for Loop to reduce the use of recursion ;
  2. Compared with the standard library, reflection is not used ;
  3. Find the corresponding key When the value is found, exit directly , You don't have to recurse down ;
  4. The operation of JSON Strings are all passed in , I will not apply for new space again , Reduced memory allocation ;

In addition to that api The design of is also very practical ,ArrayEach、ObjectEach、EachKey All three functions can be passed into a user-defined function, which solves many problems in the actual business development .

The disadvantages are also very obvious , Not right JSON check , Even if this It's not JSON.

Performance comparison

Analytic small JSON character string

Parsing a simple structure , The size is about 190 bytes String

Library name

operation

Each iteration takes

Amount of memory occupied

Number of memory allocations

performance

Standard library

It can be interpreted as map

724 ns/op

976 B/op

51 allocs/op

slow

It can be interpreted as struct

297 ns/op

256 B/op

5 allocs/op

commonly

fastjson

get

68.2 ns/op

0 B/op

0 allocs/op

The fastest

parse

35.1 ns/op

0 B/op

0 allocs/op

The fastest

GJSON

turn map

255 ns/op

1009 B/op

11 allocs/op

commonly

get

232 ns/op

448 B/op

1 allocs/op

commonly

jsonparser

get

106 ns/op

232 B/op

3 allocs/op

fast

Parse medium size JSON character string

Parsing a with certain complexity , The size is about 2.3KB String

Library name

operation

Each iteration takes

Amount of memory occupied

Number of memory allocations

performance

Standard library

It can be interpreted as map

4263 ns/op

10212 B/op

208 allocs/op

slow

It can be interpreted as struct

4789 ns/op

9206 B/op

259 allocs/op

slow

fastjson

get

285 ns/op

0 B/op

0 allocs/op

The fastest

parse

302 ns/op

0 B/op

0 allocs/op

The fastest

GJSON

turn map

2571 ns/op

8539 B/op

83 allocs/op

commonly

get

1489 ns/op

448 B/op

1 allocs/op

commonly

jsonparser

get

878 ns/op

2728 B/op

5 allocs/op

fast

Analysis of big JSON character string

High parsing complexity , The size is about 2.2MB String

Library name

operation

Each iteration takes

Amount of memory occupied

Number of memory allocations

performance

Standard library

It can be interpreted as map

2292959 ns/op

5214009 B/op

95402 allocs/op

slow

It can be interpreted as struct

1165490 ns/op

2023 B/op

76 allocs/op

commonly

fastjson

get

368056 ns/op

0 B/op

0 allocs/op

fast

parse

371397 ns/op

0 B/op

0 allocs/op

fast

GJSON

turn map

1901727 ns/op

4788894 B/op

54372 allocs/op

commonly

get

1322167 ns/op

448 B/op

1 allocs/op

commonly

jsonparser

get

233090 ns/op

1788865 B/op

376 allocs/op

The fastest

summary

In this sharing process , I found a lot JSON The analysis libraries are compared and analyzed respectively , It can be found that these high-performance parsing libraries basically have some common characteristics :

  • Don't use reflections ;
  • By traversing JSON The bytes of the string are parsed one by one ;
  • Try to use incoming JSON String to parse and traverse , Reduce memory allocation ;
  • At the expense of some compatibility ;

For all that , But functionally , Each has its own characteristics fastjson Of api The easiest way to operate ;GJSON Provides the function of fuzzy search , The highest degree of customization ;jsonparser In the process of implementing high-performance parsing , You can also insert a callback function to execute , Provides a certain degree of convenience .

Sum up , Go back to the beginning of the article , For my own business , Business is just a simple analysis http Requested returned JSON Some fields of the string , And the fields are all fixed , No search function is required , But sometimes you need to do some custom operations , So for me jsonparser Is the most appropriate .

So if you have certain requirements for performance , You might as well choose one according to your business situation JSON Parser .

Reference

https://github.com/buger/jsonparser

https://github.com/tidwall/gjson

https://github.com/valyala/fastjson

https://github.com/json-iterator/go

https://github.com/mailru/easyjson

https://github.com/Jeffail/gabs

https://github.com/bitly/go-simplejson

luozhiyun cool
原网站

版权声明
本文为[luozhiyun]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/05/20210522231827598s.html