当前位置：网站首页>Mongodb -- use mongodb to intercept the string content in the field and perform grouping statistics

Mongodb -- use mongodb to intercept the string content in the field and perform grouping statistics

2022-06-26 06:05:00 【Big wind】

I've been busy recently , In addition to the inventory written before delivery, I can only write some simple things .

Here is a brief share of the recent problems encountered in index statistics for data .

Indicator statistics for a certain part of the field

In the use of mongodb The following data structures may be encountered during indicator statistics

/* 1 */
{
    
    "_id" : ObjectId("5edf4b5c64574814bc8ae4ae"),
    "address" : " Henan , Xinyang ",
    "state" : 0,
    "remark" : " Send successfully ",
    "createAt" : NumberLong(1591199999000)
}

/* 2 */
{
    
    "_id" : ObjectId("5edf4ca064574814bc8ae4d5"),
    "address" : " hubei , wuhan ",
    "state" : 0,
    "remark" : " Send successfully ",
    "createAt" : NumberLong(1591199999000)
}

/* 3 */
{
    
    "_id" : ObjectId("5edf4cac64574814bc8ae4d9"),
    "address" : " hubei , yichang ",
    "state" : 0,
    "remark" : " Send successfully ",
    "createAt" : NumberLong(1591199999000)
}

In some cases, we may need to make statistics according to the regional information , But the data is not absolutely clean , We may only need to make statistics according to some fields .

You need to use $split Or use $substr Do group calculation .

For example, in the above data, we need to count the business data of each province .

Use split Perform string interception

db.getCollection('AreaDemoLog').aggregate([
    {
    
        "$project": {
    
            //  use first $split Yes address Field for cutting , Get the name regions Region array for 
            "regions": {
    
                "$split": ["$address",","]
            }
        }
    },
    {
    
        "$project": {
    
            "regions": 1,
            //  And then use $arrayElemAt get regions The first element in the region array of , Name it  province 
            "province": {
    
                "$arrayElemAt": [ "$regions",0]
            }
        }
    },
    {
    
        "$group": {
    
            //  According to the province Field grouping and summing 
            "_id": "$province",
            "count": {
    
                "$sum": 1
            }
        }
    },
    {
    
        "$project": {
    
            "count": 1,
            "_id": 0,
            "province": "$_id"
        }
    }
])

Use substr Perform string interception

db.getCollection('AreaDemoLog').aggregate([
    {
    
        "$project": {
    
            //  use first $substrCP Yes address Field , Then get the target field directly 
            "province": {
    
                $substrCP:  [ '$address', 0, 2 ]
            }
        }
    },
    {
    
        "$group": {
    
            //  According to the province Field grouping and summing 
            "_id": "$province",
            "count": {
    
                "$sum": 1
            }
        }
    },
    {
    
        "$project": {
    
            "count": 1,
            "_id": 0,
            "province": "$_id"
        }
    }
])

Note about string interception

For intercepting the string structure of pure English and numbers, you can use $substr But when using pure Chinese characters to intercept fields , Use $substr The following exceptions will appear according to the set encoding ：

$substrBytes:  Invalid range, ending index is in the middle of a UTF-8 character.

because $substr Only applicable to ASCII code . So you need to use mongodb 3.4 Introduced in $substrCP To cut the string .

The above two queries can get the correct results

/* 1 */
{
    "count" : 16.0,
    "province" : " hubei "
}

/* 2 */
{
    "count" : 1.0,
    "province" : " Henan "
}

Convert the above query to JAVA Code

Put the above query statement into JAVA In the code is the following structure

Use split Perform string interception

    public static String test() {
    
        List<AggregationOperation> lstOperations = new ArrayList<>(10);
        //  Cut the area 
        AggregationOperation splitAgg =
            Aggregation.project().andExpression("{ $split: {'$address', ','}}").as("regions");
        lstOperations.add(splitAgg);

        ProjectionOperation province =
            Aggregation.project("$regions").andExpression("{ $arrayElemAt: { '$regions', 0 }}").as("province");
        lstOperations.add(province);
        //  Seek total 
        AggregationOperation groupAgg = Aggregation.group("$province").count().as("count");
        lstOperations.add(groupAgg);
        //  Define query content 
        ProjectionOperation projectionOperation =
            Aggregation.project("count").andExclude("_id").and("$_id").as("province");
        lstOperations.add(projectionOperation);

        AggregationOptions aggregationOptions = AggregationOptions.builder().allowDiskUse(true).build();
        // Start searching 
        Aggregation agg = Aggregation.newAggregation(lstOperations).withOptions(aggregationOptions);
        AggregationResults<Map> groupResult = this.mongoTemplate.aggregate(agg, "AreaDemoLog", Map.class);
        return "";
    }

Use substr Perform string interception

    public static String test() {
    
        List<AggregationOperation> lstOperations = new ArrayList<>(10);
        //  Cut the area 
		ProjectionOperation province = 
			Aggregation.project().andExpression("{ $substrCP: { '$address', 0, 2 } }").as("province");
		lstOperations.add(province);
        //  Seek total 
        AggregationOperation groupAgg = Aggregation.group("$province").count().as("count");
        lstOperations.add(groupAgg);
        //  Define query content 
        ProjectionOperation projectionOperation =
            Aggregation.project("count").andExclude("_id").and("$_id").as("province");
        lstOperations.add(projectionOperation);

        AggregationOptions aggregationOptions = AggregationOptions.builder().allowDiskUse(true).build();
        // Start searching 
        Aggregation agg = Aggregation.newAggregation(lstOperations).withOptions(aggregationOptions);
        AggregationResults<Map> groupResult = this.mongoTemplate.aggregate(agg, "AreaDemoLog", Map.class);
        return "";
    }

Convert the above query to JAVA The code needs to pay attention to the content

stay mongodb We used the following statement in the query

"$split": ["$address",","]

"$arrayElemAt": [ "$regions",0]

$substrCP:  [ '$address', 0, 2 ]

In the use of MongodbTemplate If you use the following spelling directly

andExpression("{ $split: [ '$address', ',' ] }")

andExpression("{ $arrayElemAt: [ '$regions', 0] }")

andExpression("{ $substrCP: [ '$address', 0, 2 ] }")

The following exception will appear in the final query

{
    
    "code": 1,
    "msg": "Expression [{ $split: ['$address', ',']}] @23: EL1043E: Unexpected token. Expected 'rsquare(])' but was 'comma(,)'"
}

So I'm going to transfer the above statement to JAVA You need to put "[...]" It is amended as follows "{...}"

Limited personal level , The above content may not be clearly described or wrong , If development students find , Please let me know in time , I will revise the relevant contents as soon as possible . If my article is of any help to you , Please give it to me Like it . Your praise is my driving force .

原网站

版权声明
本文为[Big wind]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/177/202206260559105710.html