当前位置:网站首页>JSON Library Tutorial from scratch (III): parsing strings, learning and sorting notes
JSON Library Tutorial from scratch (III): parsing strings, learning and sorting notes
2022-06-25 05:17:00 【zhongh58】
Original website : From scratch JSON Library Tutorial - You know
1. JSON String Syntax
JSON String Syntax and C The language is very similar , The characters are enclosed in double quotation marks , Such as "Hello". But the string is separated by double quotation marks , So how can I insert a double quotation mark into a string ? hold a"b It's written in "a"b" Definitely not , I don't know where the string ends .
therefore , We need to introduce escape characters (escape character),C Language and JSON All use \( Backslash ) As escape character , that " In the string, it is expressed as \",a"b Of JSON The string is written as "a\"b". As shown in the following String Syntax ,JSON In support of 9 Escape sequence :
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
A simple translation ,JSON A string is composed of two double quotation marks with zero or more characters . Character separation no escape character or escape sequence . The escape sequence has 9 Kind of , It all starts with a backslash , As is common \n Represents a newline character . What's more special is \uXXXX, among XXXX by 16 Carry on UTF-16 code , This unit will not handle this escape sequence , Let's break it down later .
Non escape characters are ordinary characters , The syntax lists the legal code point range ( The code point is introduced in the next unit ). It should be noted that , This scope does not include 0 to 31、 Double quotes and backslashes , These code points must be expressed by escape .
2. String representation
stay C In language , Strings are generally represented as empty terminated strings (null-terminated string), Empty character ('\0') Represents the end of the string . However ,JSON Strings are allowed to contain empty characters , For example, this JSON "Hello\u0000World" Is a single string , After parsing, it is 11 Characters . If you simply use a null terminated character to indicate JSON Results after analysis , Can't handle empty characters .
therefore , We can allocate memory to store the parsed characters , And the number of recording characters ( The length of the string ). Because most C Programs assume that a string is an empty terminated string , Let's add an empty character at the end , Then there is no need to deal with \u0000 The application of this character can simply regard it as a null terminated string .
lept_value In fact, it is a variant type (variant type), We go through type To determine what type it is now , This also determines which members are valid . First, we simply add two members to the structure :
typedef struct {
char* s;
size_t len;
double n;
lept_type type;
}lept_value;
We know , A value cannot be both a number and a string , So we can use C Linguistic union To save memory :
typedef struct {
union {
struct { char* s; size_t len; }s; /* string */
double n; /* number */
}u;
lept_type type;
}lept_value;
These two designs are in 32 The memory layout of bit platform is as follows , It can be seen that... Is used on the right union Can save memory .
We're going to take the previous v->n Change to v->u.n. To access the string data , Use v->u.s.s and v->u.s.len. This kind of writing is troublesome , Actually C11 New anonymous struct/union grammar , You can use v->n、v->s、v->len To make an interview .
3. memory management
Because the length of the string is not fixed , We need to allocate memory dynamically . For the sake of simplicity , We use the standard library <stdlib.h> Medium malloc()、realloc() and free() To allocate / Free memory .
When setting a value as a string , We need to make a copy of the string in the parameter :
void lept_set_string(lept_value* v, const char* s, size_t len) {
assert(v != NULL && (s != NULL || len == 0));
lept_free(v);
v->u.s.s = (char*)malloc(len + 1);
memcpy(v->u.s.s, s, len);
v->u.s.s[len] = '\0';
v->u.s.len = len;
v->type = LEPT_STRING;
}
Setting this v Before , We need to call lept_free(v) Go empty v Memory that may be allocated . For example, there is already a string , We have to release it first . Then simply use malloc() Distribution and use memcpy() Copy , And fill in the blank character at the end .malloc(len + 1) Medium 1 Because the end is empty .
void lept_free(lept_value* v) {
assert(v != NULL);
if (v->type == LEPT_STRING)
free(v->u.s.s);
v->type = LEPT_NULL;
}
Because we will check v The type of , Before calling all access functions , We must initialize this type . So we joined in lept_init(v), Because it is very simple, we use macros to realize :
#define lept_init(v) do { (v)->type = LEPT_NULL; } while(0)
4. Buffer and stack
We parse the string ( And then the array 、 object ) when , You need to store the parsing results in a temporary buffer , Last but not least lept_set_string() Set the result of the buffer into the value . Before you finish parsing a string , The size of this buffer is unpredictable . therefore , We can use dynamic arrays (dynamic array) This data structure , That is, when the array space is insufficient , Can automatically expand .C++ Standard library std::vector It is also a dynamic array .
If every time you parse a string , Re create a dynamic array , It is time-consuming . We can reuse this dynamic array , Every time you parse JSON You just need to create one . And we will find , Whether it's parsing strings 、 Array or object , We only need to access this dynamic array in a first in, last out way . let me put it another way , We need a dynamic stack data structure .
We put data from a dynamic stack into lept_context in :
typedef struct {
const char* json;
char* stack;
size_t size, top;
}lept_context;
among size Is the current stack capacity ,top Is the top of the stack ( Because we will expand stack, So don't put top Store in pointer form ).
We're creating lept_context Initialization when stack And finally free memory :
int lept_parse(lept_value* v, const char* json) {
lept_context c;
int ret;
assert(v != NULL);
c.json = json;
c.stack = NULL; /* <- */
c.size = c.top = 0; /* <- */
lept_init(v);
lept_parse_whitespace(&c);
if ((ret = lept_parse_value(&c, v)) == LEPT_PARSE_OK) {
/* ... */
}
assert(c.top == 0); /* <- */
free(c.stack); /* <- */
return ret;
}
On release , Add assertions to ensure that all data is ejected .
then , We implement stack push and pop operations . It is different from the normal stack , Our stack is stored in bytes . Data of any size can be pressed every time , It returns a pointer to the beginning of the data ( Meeting C++ You can refer to [1]):
#ifndef LEPT_PARSE_STACK_INIT_SIZE
#define LEPT_PARSE_STACK_INIT_SIZE 256
#endif
static void* lept_context_push(lept_context* c, size_t size) {
void* ret;
assert(size > 0);
if (c->top + size >= c->size) {
if (c->size == 0)
c->size = LEPT_PARSE_STACK_INIT_SIZE;
while (c->top + size >= c->size)
c->size += c->size >> 1; /* c->size * 1.5 */
c->stack = (char*)realloc(c->stack, c->size);
}
ret = c->stack + c->top;
c->top += size;
return ret;
}
static void* lept_context_pop(lept_context* c, size_t size) {
assert(c->top >= size);
return c->stack + (c->top -= size);
}
5. Parse string
With the above tools , The task of parsing strings becomes very simple . We just need to back up the top of the stack first , Then put the parsed characters on the stack , Finally, calculate the length and pop up all the characters at once , Then set it to the value . The following is a partial implementation , The escape and some illegal character verification are not handled .
#define PUTC(c, ch) do { *(char*)lept_context_push(c, sizeof(char)) = (ch); } while(0)
static int lept_parse_string(lept_context* c, lept_value* v) {
size_t head = c->top, len;
const char* p;
EXPECT(c, '\"');
p = c->json;
for (;;) {
char ch = *p++;
switch (ch) {
case '\"':
len = c->top - head;
lept_set_string(v, (const char*)lept_context_pop(c, len), len);
c->json = p;
return LEPT_PARSE_OK;
case '\0':
c->top = head;
return LEPT_PARSE_MISS_QUOTATION_MARK;
default:
PUTC(c, ch);
}
}
}
about *p++, to *p, After the operation, proceed to ++ operation .
边栏推荐
- February 20ctf record
- Redis (17)
- ORA-00800: soft external error
- Penetration test - directory traversal vulnerability
- Vue uses keep alive to cache page optimization projects
- Makefile Foundation
- hr竟主动给这位测试小姐姐涨工资,她是怎么做到的?
- Everything is an object
- Dynamic programming full backpack
- What is Ethernet and how to connect the computer
猜你喜欢
How to use the Magic pig system reinstallation master
Working principle of asemi three-phase rectifier bridge
Array and simple function encapsulation cases
The construction and usage of wampserver framework
Read the general components of antd source code
H5 canvas drawing circle drawing fillet [detailed explanation]
Dynamic programming example 1 leetcode 322 coin change
What if win11 Bluetooth fails to connect? Solution of win11 Bluetooth unable to connect
Personalized Federated Learning with Moreau Envelopes
【FLink】access closed classloader classloader. check-leaked-classloader
随机推荐
buuctf(re)
Compatible with Internet Explorer
【FLink】access closed classloader classloader. check-leaked-classloader
How to make colleagues under the same LAN connect to their own MySQL database
Characteristics of ES6 arrow function
XSS (cross site script attack) summary (II)
Reading and writing of nodejs excel (.Xlsx) files
[keil] GPIO output macro definition of aducm4050 official library
Ctfhub eggs
PHP calls map API
Dynamic programming example 2 leetcode62 unique paths
H5 canvas drawing circle drawing fillet [detailed explanation]
渗透测试-提权专题
Google Earth engine (GEE) - Global jrc/gsw1_ 1 / batch download of yearlyhistory dataset (China region)
Understand JS high-order function and write a high-order function
Svg code snippet of loading animation
Various pits encountered in the configuration of yolov3 on win10
What if win11 Bluetooth fails to connect? Solution of win11 Bluetooth unable to connect
Apache+php uploading large files
基于SSH实现的学生成绩管理系统