当前位置:网站首页>Use es to realize fuzzy search and search recommendation of personal blog
Use es to realize fuzzy search and search recommendation of personal blog
2022-07-24 23:55:00 【Brother Dei!】
Catalog
Application blog
demand
Consider how to connect the blog system es technology .
1. How to synchronize database data to es Go to , To deal with mapping
2. Do fuzzy search + Intelligent recommended technology selection
3. Advantages and disadvantages of the selected technology
4. Highlight ( Select the title of the hit , Summary and its article content , excerpts 50 Characters )
5.JAVA_API Use
Realization effect
Two query methods are provided ——es and Mysql
The following is a es, There are search recommendations ( Judge by prefix ), There are multiple fields ( Article title , Summary , Content ) And fuzzy query ( Intelligent error correction, etc )


mysql Inquire about : adopt like sentence

canal Complete data synchronization ( from mysql To es)
insert data


canal Complete the response here :

Use es Search has been able to search Got it


Design
1. How to synchronize database data to es Go to , To deal with mapping
1.1 Use Logstash Complete full synchronization ; Use Canal Complete incremental synchronization
1.2 The choice of word splitter : ik; filter : Lowercase filtering
1.3 Need to synchronize es Database table of : Article table , label , Classification table
2. Do fuzzy search + Intelligent recommended technology selection
2.1 Fuzzy search uses :fuzzy
2.2 Intelligent recommendation :context_suggester/complate_suggester
3. Advantages and disadvantages of the selected technology
3.1 fuzzy The advantages and disadvantages of fuzzy search need to be compared with prefix as well as wildcard Comparison .
First ,prefix The performance is very poor , There is no cache , And only supports prefix ambiguity , Poor function ; and wildcard Wildcards are not very practical , The level of intelligent error correction does not need to be too high , Taken together fuzzy Not bad in function , Good performance, too .
[fuzziness Set to 1 Just fine , Set to 2 The error rate is too high ]
3.2 context_suggester Is based on completion suggester An intelligent recommendation scheme , Adopted completion such es Unique type of , High performance based on memory . It can meet the prefix recommendation , And you can also make recommendations through classification and labels , More powerful . The disadvantage is that you need to consider when inserting data input The input values , Difficulty ratio completion suggester high .
[( For convenience ) You can set through the title Input value , Categories and labels are inserted directly suggest In the subfield of ,boost( The weight ) Not to be considered ]
4. Highlight ( Select the title of the hit , Summary and its article content , excerpts 50 Characters )
Problems encountered : At the beginning, we couldn't solve this problem , That is, the highlighted label has been displayed as a string when it is sent to the foreground

It's behind finally Select the corresponding label in the block and handle it .

5.JAVA_API Use
5.1 Import corresponding es Version dependency
5.2 Create it under the project es_service package
5.3 Create the corresponding es Indexed model class , Operation class
6.canal Sync data details
1.deployer
canal The service side , The configuration that needs to be done :
Get into deployer\conf\example, modify instance.properties
canal.instance.master.address=127.0.0.1:3306 # Database connection address
# Fill in the database information
canal.instance.dbUsername=root
canal.instance.dbPassword=root
canal.instance.connectionCharset = UTF-8
# table regex
canal.instance.filter.regex = .\*\\..\*2.adapter
2.1 Go to the root directory , modify application.yml
server:
port: 8081
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true
username: root
password: root
canalAdapters:
- instance: example # canal instance Name or mq topic name
groups:
- groupId: g1
outerAdapters:
- name: logger
- name: es7
hosts: http://127.0.0.1:9200
properties:
mode: rest #transport or rest
# # security.auth: test:123456 # only used for rest mode
cluster.name: elasticsearch # The cluster name 2.2 Get into es7 Catalog , Create a configuration file corresponding to the database name
For example, here I want to put the database 'test' Data synchronization to Es, Then create it under this file test.yml
To write test.yml
dataSourceKey: defaultDS # Of the source data source key, Corresponding SrcDataSources The value in
destination: example #canal Of instance
groupId: g1 # Smart synchronous correspondence groupId The data of
esMapping:
_index: my_table #es Index name of
_id: _id #es Of id If not, you must configure the following pk
upsert: true
#pk: id
sql: "select id as _id,name from my_table"
#sql Statement writing --> id Must be set to _id
# objFields:
# _labels: array:;
etlCondition: "where a.c_time>={}"
commitBatch: 3000 # Submit batch size
Be careful :
es It must contain this index , And the mapping should be consistent with Mysql Match the fields
The pit in the process :
1. The installation directory contains Chinese . start-up adapter Go straight back .
2.adapter Under the application.yml Inside ,es Corresponding Ip Didn't write well . If this place is not well written , When the data of the database is updated adapter There will be a reaction in the background , however es Data cannot be received
- name: es7
hosts: http://127.0.0.1:9200 # There is no http:// Will report a mistake Illegal character in scheme name at index 03. Test whether the missing field index can
SELECT id AS id, createtime,desc,lv,NAME,price FROM product
mysql Table of :

es The index of :

Deliberately missing two fields .
Experiments have proved that it is feasible ,es The index field of does not need to be followed mysql Keep consistent in quantity , But when matching, it will be case sensitive , So in es7 The configuration file in the folder sql Pay attention to the specification when writing statements .

4. Another pit ( How to synchronize multiple tables )
Try to write more yml file , Point to the same index ?
have a look sql Mapping description :
sql Support free combination of multi table Association , But there are certain limitations : 1 The main table cannot be a subquery statement . 2 Only use left outer join That is, the leftmost watch must be the main watch . 3 The associated slave table cannot have multiple tables if it is a subquery . 4 Lord sql There can be no where Query criteria ( From the table subquery, you can have where Conditions but not recommended , May cause data synchronization inconsistency , For example, it was modified where Field contents in the condition ). 5 The association condition only allows the '=' Operations cannot have other constant judgments such as : on a.role_id=b.id and b.statues=1. 6 The association condition must have a field in the main query statement such as : on a.role_id=b.id Among them a.role_id perhaps b.id Must appear in the Lord select In the sentence . Elastic Search Of mapping Properties and sql The value of the query should be one-to-one ( I won't support it select *), such as : select a.id as _id, a.name, a.email as _email from user, among name Map to es mapping Of name field, _email take Mapping to mapping Of _email field, Here's the alias ( If there are aliases ) As the final mapping field . there _id You can fill in the _id: _id mapping .
After reading the above instructions , I want to try Select Add my connection to the main sentence id, The discovery was successful
sql: "SELECT DISTINCT ma.id AS _id,mab.id as mab_id,ma.comment_counts,
ma.create_date,ma.summary,ma.title,mab.content,
ma.view_counts,ma.weight,ma.summary_img
FROM ms_article ma
LEFT JOIN ms_article_body mab ON ma.body_id = mab.id"Try to connect all objects id Add to Select In the main sentence :
sql: "SELECT DISTINCT ma.id AS _id,ma.comment_counts,
ma.create_date,ma.summary,ma.title,mab.id as mab_id,
ma.view_counts,ma.weight,msu.account,mc.id as mc_id,
mab.content ,mc.category_name ,mat.article_id as mat_id,
ma.summary_img,mt.tag_name,mat.tag_id as mat_tag_id,
msu.id as msu_id
FROM ms_article ma
LEFT JOIN ms_article_body mab ON ma.body_id = mab.id
LEFT JOIN ms_category mc ON mc.id=ma.category_id
LEFT JOIN ms_article_tag mat ON mat.article_id = ma.id
LEFT JOIN ms_tag mt ON mt.id=mat.tag_id
LEFT JOIN ms_sys_user msu ON ma.author_id= msu.id"Now don't report mistakes . However, we still have to test whether the inserted data of multiple tables can be updated normally
1. No problem deleting .
2. No problem with modification
3. There was a problem with the insertion at the beginning , But after the investigation, it was found that , Add all the query fields or the corresponding fields .
For example, ,ma.body_id = mab.id Inside ma It's my main watch ,mab It's a subtable , At this time, I put select The main sentence should be corresponding to the main table body_id. So the part above me Sql There is still something wrong with the statement , After modification, it becomes :
sql: "SELECT DISTINCT ma.id AS _id,ma.comment_counts,
ma.create_date,ma.summary,ma.title,ma.body_id,
ma.view_counts,ma.weight,msu.account,ma.category_id,
mab.content ,mc.category_name ,mat.article_id as mat_id,
ma.summary_img,mt.tag_name,mat.tag_id as mat_tag_id,
ma.author_id as mau_id,mt.id as mt_it
FROM ms_article ma
LEFT JOIN ms_article_body mab ON ma.body_id = mab.id
LEFT JOIN ms_category mc ON mc.id=ma.category_id
LEFT JOIN ms_article_tag mat ON mat.article_id = ma.id
LEFT JOIN ms_tag mt ON mt.id=mat.tag_id
LEFT JOIN ms_sys_user msu ON ma.author_id= msu.id"complete .
gitee Warehouse
It's not easy to make , Please give me more praise +star~
边栏推荐
- SQL file import database - Nanny level tutorial
- Piziheng embedded: the method of making source code into lib Library under MCU Xpress IDE and its difference with IAR and MDK
- SQL result export function. If you click the work order but don't enter it, the interface is always blank and there is no response. What should you do?
- 谢振东:公共交通行业数字化转型升级的探索与实践
- Horizontally centered element
- 来自大佬洗礼!2022 头条首发纯手打 MySQL 高级进阶笔记, 吃透 P7 有望
- Notes of Teacher Li Hongyi's 2020 in-depth learning series 6
- JS ------ Chapter 5 functions and events
- Weekly summary (*66): next five years
- Install K6 test tool
猜你喜欢

Introduction to HLS programming

Processing PDF and JPG files in VB6

Go basic notes_ 4_ map

Shardingsphere database sub database sub table introduction

Zheng Huijuan: Research on application scenarios and evaluation methods of data assets based on the unified market

芯片的功耗

Piziheng embedded: the method of making source code into lib Library under MCU Xpress IDE and its difference with IAR and MDK

郑慧娟:基于统一大市场的数据资产应用场景与评估方法研究

How painful is it to write unit tests? Can you do it

指针与数组
随机推荐
How to put long links into Excel
2. Load test
Live broadcast preview | online seminar on open source security governance models and tools
codeforces round #797 ABCDEFG
JS ------ Chapter II JS logic control
Efficiency increased by 98%! AI weapon behind operation and maintenance inspection of high altitude photovoltaic power station
来自大佬洗礼!2022 头条首发纯手打 MySQL 高级进阶笔记, 吃透 P7 有望
.net redis client newlife.redis.core library usage
Development direction and problems of optaplanner
线段树杂谈
What are the meanings and application scenarios of the three giants of cloud computing: IAAs, PAAS and SaaS?
dpkg : Breaks: libapt-pkg5.0 (< 1.7~b) but 1.6.15 is to be installedE: Broken packages
Notes of Teacher Li Hongyi's 2020 in-depth learning series 9
痞子衡嵌入式:MCUXpresso IDE下将源码制作成Lib库方法及其与IAR,MDK差异
Processing of ffmpeg wasapi can't activate audio endpoint error
代码覆盖率
Restructuredtext grammar summary for beginners
Go基础笔记_4_map
The laneatt code is reproduced and tested with the video collected by yourself
Entity service is an anti pattern