当前位置:网站首页>Mongodb -- use mongodb to intercept the string content in the field and perform grouping statistics
Mongodb -- use mongodb to intercept the string content in the field and perform grouping statistics
2022-06-26 06:05:00 【Big wind】
I've been busy recently , In addition to the inventory written before delivery, I can only write some simple things .
Here is a brief share of the recent problems encountered in index statistics for data .
Indicator statistics for a certain part of the field
In the use of mongodb The following data structures may be encountered during indicator statistics
/* 1 */
{
"_id" : ObjectId("5edf4b5c64574814bc8ae4ae"),
"address" : " Henan , Xinyang ",
"state" : 0,
"remark" : " Send successfully ",
"createAt" : NumberLong(1591199999000)
}
/* 2 */
{
"_id" : ObjectId("5edf4ca064574814bc8ae4d5"),
"address" : " hubei , wuhan ",
"state" : 0,
"remark" : " Send successfully ",
"createAt" : NumberLong(1591199999000)
}
/* 3 */
{
"_id" : ObjectId("5edf4cac64574814bc8ae4d9"),
"address" : " hubei , yichang ",
"state" : 0,
"remark" : " Send successfully ",
"createAt" : NumberLong(1591199999000)
}
In some cases, we may need to make statistics according to the regional information , But the data is not absolutely clean , We may only need to make statistics according to some fields .
You need to use $split Or use $substr Do group calculation .
For example, in the above data, we need to count the business data of each province .
Use split Perform string interception
db.getCollection('AreaDemoLog').aggregate([
{
"$project": {
// use first $split Yes address Field for cutting , Get the name regions Region array for
"regions": {
"$split": ["$address",","]
}
}
},
{
"$project": {
"regions": 1,
// And then use $arrayElemAt get regions The first element in the region array of , Name it province
"province": {
"$arrayElemAt": [ "$regions",0]
}
}
},
{
"$group": {
// According to the province Field grouping and summing
"_id": "$province",
"count": {
"$sum": 1
}
}
},
{
"$project": {
"count": 1,
"_id": 0,
"province": "$_id"
}
}
])
Use substr Perform string interception
db.getCollection('AreaDemoLog').aggregate([
{
"$project": {
// use first $substrCP Yes address Field , Then get the target field directly
"province": {
$substrCP: [ '$address', 0, 2 ]
}
}
},
{
"$group": {
// According to the province Field grouping and summing
"_id": "$province",
"count": {
"$sum": 1
}
}
},
{
"$project": {
"count": 1,
"_id": 0,
"province": "$_id"
}
}
])
Note about string interception
For intercepting the string structure of pure English and numbers, you can use $substr But when using pure Chinese characters to intercept fields , Use $substr The following exceptions will appear according to the set encoding :
$substrBytes: Invalid range, ending index is in the middle of a UTF-8 character.
because $substr Only applicable to ASCII code . So you need to use mongodb 3.4 Introduced in $substrCP To cut the string .
The above two queries can get the correct results
/* 1 */
{
"count" : 16.0,
"province" : " hubei "
}
/* 2 */
{
"count" : 1.0,
"province" : " Henan "
}
Convert the above query to JAVA Code
Put the above query statement into JAVA In the code is the following structure
Use split Perform string interception
public static String test() {
List<AggregationOperation> lstOperations = new ArrayList<>(10);
// Cut the area
AggregationOperation splitAgg =
Aggregation.project().andExpression("{ $split: {'$address', ','}}").as("regions");
lstOperations.add(splitAgg);
ProjectionOperation province =
Aggregation.project("$regions").andExpression("{ $arrayElemAt: { '$regions', 0 }}").as("province");
lstOperations.add(province);
// Seek total
AggregationOperation groupAgg = Aggregation.group("$province").count().as("count");
lstOperations.add(groupAgg);
// Define query content
ProjectionOperation projectionOperation =
Aggregation.project("count").andExclude("_id").and("$_id").as("province");
lstOperations.add(projectionOperation);
AggregationOptions aggregationOptions = AggregationOptions.builder().allowDiskUse(true).build();
// Start searching
Aggregation agg = Aggregation.newAggregation(lstOperations).withOptions(aggregationOptions);
AggregationResults<Map> groupResult = this.mongoTemplate.aggregate(agg, "AreaDemoLog", Map.class);
return "";
}
Use substr Perform string interception
public static String test() {
List<AggregationOperation> lstOperations = new ArrayList<>(10);
// Cut the area
ProjectionOperation province =
Aggregation.project().andExpression("{ $substrCP: { '$address', 0, 2 } }").as("province");
lstOperations.add(province);
// Seek total
AggregationOperation groupAgg = Aggregation.group("$province").count().as("count");
lstOperations.add(groupAgg);
// Define query content
ProjectionOperation projectionOperation =
Aggregation.project("count").andExclude("_id").and("$_id").as("province");
lstOperations.add(projectionOperation);
AggregationOptions aggregationOptions = AggregationOptions.builder().allowDiskUse(true).build();
// Start searching
Aggregation agg = Aggregation.newAggregation(lstOperations).withOptions(aggregationOptions);
AggregationResults<Map> groupResult = this.mongoTemplate.aggregate(agg, "AreaDemoLog", Map.class);
return "";
}
Convert the above query to JAVA The code needs to pay attention to the content
stay mongodb We used the following statement in the query
"$split": ["$address",","]
"$arrayElemAt": [ "$regions",0]
$substrCP: [ '$address', 0, 2 ]
In the use of MongodbTemplate If you use the following spelling directly
andExpression("{ $split: [ '$address', ',' ] }")
andExpression("{ $arrayElemAt: [ '$regions', 0] }")
andExpression("{ $substrCP: [ '$address', 0, 2 ] }")
The following exception will appear in the final query
{
"code": 1,
"msg": "Expression [{ $split: ['$address', ',']}] @23: EL1043E: Unexpected token. Expected 'rsquare(])' but was 'comma(,)'"
}
So I'm going to transfer the above statement to JAVA You need to put "[...]" It is amended as follows "{...}"
Limited personal level , The above content may not be clearly described or wrong , If development students find , Please let me know in time , I will revise the relevant contents as soon as possible . If my article is of any help to you , Please give it to me Like it . Your praise is my driving force .
边栏推荐
- Pytorch (environment, tensorboard, transforms, torchvision, dataloader)
- 工作积累——Web请求中使用ThreadLocal遇见的问题
- C generic speed
- Consul service registration and discovery
- Project suspension
- 【群内问题学期汇总】初学者的部分参考问题
- Gram 矩阵
- 冒泡排序(Bubble Sort)
- NPM private server problem of peanut shell intranet penetration mapping
- Definition of Halcon hand eye calibration
猜你喜欢
随机推荐
Prototype mode, Baa Baa
Day4 branch and loop
numpy. exp()
电商借助小程序技术发力寻找增长突破口
Ribbon load balancing service call
Test depends on abstraction and does not depend on concrete
05. basic data type - Dict
kolla-ansible部署openstack yoga版本
How Navicat reuses the current connection information to another computer
Soft power and hard power in program development
Mongodb——使用Mongodb对字段中字符串内容进行截取,并进行分组统计
操作符的优先级、结合性、是否控制求值顺序【详解】
Cython入门
小程序如何关联微信小程序二维码,实现二码聚合
Unicloud cloud development obtains applet user openid
Life is so fragile
Day2- syntax basis and variables
Pytorch (environment, tensorboard, transforms, torchvision, dataloader)
BOM document
NPM private server problem of peanut shell intranet penetration mapping









![Selective search for object recognition paper notes [image object segmentation]](/img/cf/d3b08d41083f37c164b26a96b989c9.png)