当前位置：网站首页>From jsonpath and XPath to spl

From jsonpath and XPath to spl

2022-06-28 12:45:00 【Big data dreamer】

XML and Json Not just structured text , And good at representing multi-layer data , It can carry enough general and rich information , Therefore, it is often used in various data exchange and information transfer transactions , such as WebService/Restful, Microservices, etc . But the multi-layer data structure is more complex than the traditional two-dimensional table structure , It is also difficult to process data after retrieval .

In the early , There's no professional json/XML After treatment technology ,JAVA Developers usually have to write hard code or calculate after storage . Hard coded computing power is poor , A lot of code , Development efficiency is very low . Although warehousing solves part of the computing power , But there are many steps , Big delay , Extra manufacturing JAVA Tight coupling with database , Poor architecture , And the database is only good at computing two-dimensional tables , The ability to process multi-layer structured data is not strong . later , Professional json/XML After treatment technology began to appear , To make the Java The efficiency of these operations has been greatly improved ,JsonPath/XPath Is one of the best .

JsonPath and XPath Breakthrough computing power

XPath Is widely used XML Dealing with language , Built in XOM/Xerces-J/Jdom/Dom4J Equal function library .JsonPath Modelled on the XPath The grammar of , It implements a similar function , And has its own function library , It is now widely used Json Dealing with language . Compared with the previous hard coding method ,XPath/JsonPath The code is much shorter , Breakthrough computing power .

such as , use arronlong HTTP Function library from WebService take XML character string , Use Dom4J The library will XML The string resolves to Document type , Use Dom4J Built in XPath Syntax for conditional query ：

String path= "http://.../emp_orders.xml";
String XMLStr= com.arronlong.httpclientutil.HttpClientUtil.get(com.arronlong.httpclientutil.common.HttpConfig.custom().url(path));
Document doc = DocumentHelper.parseText(XMLStr);
List<Node> list=doc.selectNodes("/xml/row/Orders[Amount>1000 and Amount<=3000 and contains(Client,'bro')]")

Similarly , use JsonPath Query conditions ：

String path= "http://.../emp_orders.json";
String JsonStr= com.arronlong.httpclientutil.HttpClientUtil.get(com.arronlong.httpclientutil.common.HttpConfig.custom().url(path));
Object document = Configuration.defaultConfiguration().jsonProvider().parse(JsonStr);
ArrayList l=JsonPath.read(document, "$[*].Orders[?(@.Amount>1000 && @.Amount<2000 && @.Client =~ /.*?business.*?/i )]");

JsonPath And XPath Usage is similar. , Grammar is interlinked , There is little difference in computing power , Let's say JsonPath Explain mainly .JsonPath/XPath The support for conditional query is relatively complete , Include Relational operator , If it is greater than 、 Less than or equal to ; Logical operators , Such as and 、 or 、 Not ; character string Regular expressions , Such as ~ /.?business.?/i; String function , Such as fuzzy matching contains. Besides ,JsonPath/XPath It also supports the use of... In conditional queries Mathematical operators （ function ）, Such as + - *、div; The position function , Such as position、last; Date function , Such as year-from-date、timezone-from-time.

One thing to note is that ,JsonPath/XPath It can flexibly express the hierarchical range of conditional query , Including absolute position 、 The relative position 、 Parent node 、 Child node 、 attribute 、 Elements, etc. , This is a multi-layer data processing language different from two-dimensional data processing language （SQL） The place of , As in the code $[*].Orders and /xml/row/Orders.

In addition to conditional queries ,JsonPath/XPath It also supports aggregate Computing , For example, use JsonPath Sum up ：

Double d=JsonPath.read(document, "$.sum($[*].Orders[*].Amount)");

JsonPath/XPath It also supports average 、 Maximum 、 Minimum 、 Aggregate functions such as counting .

It can be seen from these examples ,JsonPath/XPath The grammar of is intuitive and easy to understand , You can use shorter code to implement conditional queries and aggregate calculations , Easy access to multi tier structures , It is much more convenient than hard coding .

JsonPath and XPath Computing power is still insufficient

Compared with using Java code ,JsonPath and XPath The computing power of is indeed a breakthrough , But it is necessary to carry out daily calculation and even basic calculation ,JsonPath and XPath Is seriously inadequate , It's not as good as SQL. in fact ,JsonPath/XPath Only two basic calculations, conditional query and aggregation, are supported , Other calculations need to be assisted by complex coding .

such as , use JsonPath Group summary ：

ArrayList orders=JsonPath.read(document, "$[*].Orders[*]");
Comparator<HashMap> comparator = new Comparator<HashMap>() {
    
    public int compare(HashMap record1, HashMap record2) {
    
        if (!record1.get("Client").equals(record2.get("Client"))) {
    
            return ((String)record1.get("Client")).compareTo((String)record2.get("Client"));
        } else {
    
            return ((Integer)record1.get("OrderID")).compareTo((Integer)record2.get("OrderID"));
        }
    }
};
Collections.sort(orders, comparator);
ArrayList<HashMap> result=new ArrayList<HashMap>();
HashMap currentGroup=(HashMap)orders.get(0);
double sumValue=(double) currentGroup.get("Amount");
for(int i = 1;i < orders.size(); i ++){
    
    HashMap thisRecord=(HashMap)orders.get(i);
    if(thisRecord.get("Client").equals(currentGroup.get("Client"))){
    
        sumValue=sumValue+(double)thisRecord.get("Amount");
    }else{
    
        HashMap newGroup=new HashMap();
        newGroup.put(currentGroup.get("Client"),sumValue);
        result.add(newGroup);
        currentGroup=thisRecord;
        sumValue=(double) currentGroup.get("Amount");
    }
}

JsonPath/XPath Group summary is not supported , Most calculations can only be completed by self coding , This requires the programmer to control all the details , The code is verbose and error prone . If you change to a group field or summary field , You need to modify multiple codes , If you group or summarize multiple fields , The code still needs a lot of modification , It's hard to write generic code .

JsonPath/XPath The computing power of is seriously insufficient , Most basic calculations are not supported , In addition to group summary , It also includes ： rename 、 Sort 、 duplicate removal 、 Associated calculation 、 Set calculation 、 The cartesian product 、 Merge calculation 、 Window function 、 Orderly calculation, etc .JsonPath/XPath Nor does it support the mechanism of decomposing large computing goals into basic computing , For example, sub query 、 Multi step calculation, etc , Therefore, it is difficult to perform more complex calculations .

In addition to computing power ,Jsonpath/XPath One more question , Just don't have your own HTTP Interface , You must code yourself or use a third party HTTP function library , such as JourWon、Arronlong, The previous example used Arronlong function library . Except for the basic HTTP outside ,MongoDB or elasticSearch You can also return multiple layers of data , The interface protocol of each data source is different ,Jsonpath/XPath No relevant interfaces are provided , You can only write or import third-party class libraries by yourself , This leads to architectural complexity 、 Unstable factors increase 、 Reduced development efficiency .

JsonPath/XPath Insufficient computing power , Resulting in inefficient development . To improve development efficiency , You must use a computer with sufficient computing power json/XML Processing technology .

SPL It's a better choice .

SPL Have enough computing power

esProc SPL yes JVM Open source structured data under / Multi tier data processing language , Built in professional multi-layer data objects , Provides a wealth of calculation functions 、 String function 、 Date function , Have no less than SQL Computing power , Can improve WebService/Restful Development efficiency of post-processing .

SPL Built in professional multi-layer structured data objects , It provides a strong underlying support for computing functions

such as , Read from file XML character string , It can be interpreted as SPL Sequence table ：

	A
1	=file(“d:\xml\emp_orders.xml”).read()
2	=xml(A1,“xml/row”)

Click on A2 You can see the structure of the multi-level order table , among ,EId、State Such fields store simple data types ,Orders Fields store a collection of records （ Two-dimensional table ）. Click on Orders One of the lines , The observation data can be expanded ：

1png

SPL A sequence table is a professional data object , It can represent multi-layer data with arbitrary complex structure , Let's take another example ：

2png
The professionalism of the sequence table is also reflected in , It can represent two-dimensional or multi-layer data from any source , Including but not limited to XML\Json, file \ Network services . such as , Read from file Json character string （ As in the previous XML isomorphism ）, It can be interpreted as SPL Sequence table ：

	A
1	=file(“d:\xml\emp_orders.json”).read()
2	=json(A1)

The order table here and the preceding are from XML There is no difference , The subsequent calculation code is exactly the same , Let's say Json Explain mainly .

SPL Built in rich calculation functions , Complete the basic calculation sentence

such as , The same is true for multiple layers Json Query conditions :

	A
2	…// Omit fetching parsing
3	=A2.conj(Orders)
4	=A3.select(Amount>1000 && Amount<=2000 && [email protected](Client,“business”))

You can see ,SPL Support for conditional queries is complete , covers JsonPath/XPath The function of , Including relational operators 、 Logical operators 、 Regular expressions and string functions , Such as fuzzy matching like. Besides ,SPL It also supports the use of mathematical operators in conditional queries （ function ）、 The position function 、 Date function .SPL Flexible access to different levels , And the code is simpler , As in the code A2.conj(Orders).

SPL It is also easy to implement various aggregation calculations , For example, sum ：=A3.sum(Amount)

SPL Support rich basic computing , Have no less than SQL Computing power , such as JsonPath/XPath Group aggregation that must be hard coded ,SPL Just one sentence ：

=A2.conj(Orders).groups(Client;sum(Amount))

More examples ：

	A	B
1	….
3	=A2.groups(State,Gender;avg(Salary),count(1))	Multi field grouping summary
4	=A1.new(Name,Gender,Dept,Orders.OrderID,Orders.Client,Orders.Client,Orders.SellerId,Orders.Amount,Orders.OrderDate)	relation
5	=A1.sort(Salary)	Sort
6	=A1.id(State)	duplicate removal
7	=A2.top(-3;Amount)	topN
8	=A2.groups(Client;top(3,Amount))	Within the group TopN（ Window function ）

SPL Provides a number of date and string functions , More efficient development

SPL Support a large number of date functions and string functions , Far more than... In quantity and function JsonPath/XPath even to the extent that SQL, The same amount of operation code is shorter . such as ：

Time class functions , Date change ：elapse(“2020-02-27”,5) // return 2020-03-03

What day ：[email protected](“2020-02-27”) // return 5, That is, week 4

N Date after working days ：workday(date(“2022-01-01”),25) // return 2022-02-04

String class function , Judge whether it's all numbers ：isdigit(“12345”) // return true

Take the string before the substring ：[email protected](“abCDcdef”,“cd”) // return abCD

Split into string array by vertical line ：“aa|bb|cc”.split(“|”) // return [“aa”,“bb”,“cc”]

SPL It also supports year increase or decrease 、 Ask for the day of the year 、 Quarter 、 Split string by regular expression 、 Dismantle SQL Of where or select part 、 Take out the words 、 Remove as marked HTML And so on .

SPL Support better application architecture

SPL Support script external and hot switching , Multiple data sources can be calculated in a consistent way , Help to achieve better application architecture .

SPL Provides JDBC Interface , Support script external and hot switching

such as , Ahead of SPL Save the code as a script file , stay JAVA The file name is called as a stored procedure in ：

Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call groupBy()");

SPL The script file is external to JAVA, Separate computing code from applications , It can effectively reduce the system coupling .

SPL It's interpreted language , There is no need to restart after modification JAVA The application can execute directly , So as to realize code hot switching , It can ensure the stability of the system , Reduce maintenance difficulty .

SPL Support multiple data sources , Multiple layers of data can be calculated in a consistent way

Except for the documents ,SPL Also support from WebSerivce and Restful Multi tier file . such as , from WebService Read multiple layers XML, Query conditions :

	A
1	=ws_client(“http://127.0.0.1:6868/ws/RQWebService.asmx?wsdl”)
2	=ws_call(A1,“RQWebService”:“RQWebServiceSoap”:“getEmp_orders”)
3	=A2.conj(Orders)
4	=A3.select(Amount>1000 && Amount<=2000 && [email protected](Client,“business”))

Similarly , from Restful Take multiple layers Json, Make the same conditional query ：

	A
1	=httpfile(“http://127.0.0.1:6868/restful/emp_orders”).read()
2	=json(A1)
3	=A2.conj(Orders)
4	=A3.select(Amount>1000 && Amount<=2000 && [email protected](Client,“business”))

except WebService and Restful, Many special data sources are also multi-layer data , Common examples are MongoDB、ElasticSearch、SalesForce.SPL Support multiple data sources , Data can be retrieved directly from these data sources and calculated .

such as , from MongoDB Take multiple layers Json, Query conditions ：

	A
1	=mongo_open(“mongodb://127.0.0.1:27017/mongo”)
2	[email protected](A1,“data.find()”)
3	=A2.conj(Orders)
4	=A3.select(Amount>1000 && Amount<=2000 && [email protected](Client,“business”))

In addition to multiple layers of data ,SPL It also supports databases ,txt\csv\xls Wait for the documents , Hadoop、redis、Kafka、Cassandra etc. NoSQL.

Although the data sources are different , But in SPL The data types in are all sequence tables , So we can calculate multi-layer data in a consistent way . Consistent calculation code makes SPL Highly portable .

SPL Powerful computing power , It can simplify complex business logic

SPL Built in more convenient function syntax , It is suitable for computing multi-layer data with complex structure , It can simplify complex business logic , The computing power exceeds SQL.

SPL Built in more convenient function syntax , Provides powerful computing power

SPL Provides a unique function option syntax , Functions with similar functions can share a function name , Use only function options to distinguish the differences . such as select The basic function of the function is to filter , If only the qualified third party is filtered out 1 Bar record , Options available @1：

Orders.select@1(Amount&gt;1000)

When the amount of data is large , Improve performance with parallel computing , Options available @m：

Orders.select@m(Amount&gt;1000)

Sort the sorted data , Fast filtration with dichotomy , You can use @b：

Orders.select@b(Amount&gt;1000)

Function options can also be combined , such as ：

Orders.select@1b(Amount&gt;1000)

The parameters of structured operation functions are often complex , such as SQL You need to use various keywords to separate the parameters of a statement into multiple groups , But this will use a lot of keywords , It also makes the sentence structure inconsistent .

SPL Support hierarchical parameters , By semicolon 、 comma 、 The colon from high to low divides the parameters into three layers , Simplify the expression of complex parameters in a general way ：

join(Orders:o,SellerId ; Employees:e,EId)

SPL Strong presentation skills , It is suitable for computing multi-layer data with complex structure

such as ：Restful Return to multi tier Json, Contains multiple subdocuments , The structure is more complicated , Some of the data are as follows ：

[
 {
    
 "race": {
    
 "raceId":"1.33.1141109.2",
 "meetingId":"1.33.1141109"
 },
 ...
 "numberOfRunners": 2,
 "runners": [
 {
     "horseId":"1.00387464",
 "trainer": {
    
 "trainerId":"1.00034060"
 },
 "ownerColours":"Maroon,pink,dark blue."
 },
 {
     "horseId":"1.00373620",
 "trainer": {
    
 "trainerId":"1.00010997"
 },
 "ownerColours":"Black,Maroon,green,pink."
 }
 ]
 },
...
]

Now we will group and summarize the different levels （ Yes trainerId grouping , Count each group ownerColours The number of members of ）, Common methods are difficult to achieve ,SPL It's much simpler ：

	A
1	…
2	=A1(1).runners
3	=A2.groups(trainer.trainerId; ownerColours.array().count():times)

SPL Strong computing power , It can simplify complex business logic

SPL Step by step calculation is supported 、 Ordered computing 、 Calculation with complex logic such as calculation after grouping , quite a lot SQL/ Calculations that are difficult to implement by stored procedures , use SPL It's easy to solve . such as , Find out the top half of the total sales n A big client , And sort by sales in descending order ：

	A	B
1	…	/ Take the data
2	=A1.sort(amount:-1)	/ Sales are sorted in reverse order
3	=A2.cumulate(amount)	/ Calculate the cumulative sequence
4	=A3.m(-1)/2	/ The final accumulation is the total amount
5	=A3.pselect(~>=A4)	/ More than half the position
6	=A2(to(A5))	/ Take value by location

From coding to JsonPath/XPath,json/XML The technology of computing processing has been developed from scratch . from JsonPath/XPath To SPL, The computing power of multi-layer data changes from weak to strong .SPL Built in professional data objects 、 Rich computing functions 、 String function 、 Date function , Have enough computing power .SPL Support script external and hot switching , Multiple data sources can be calculated in a consistent way , Help to achieve better application architecture .SPL Built in more convenient function syntax , It is suitable for computing multi-layer data with complex structure , It can simplify complex business logic .