当前位置:网站首页>Use es to realize fuzzy search and search recommendation of personal blog
Use es to realize fuzzy search and search recommendation of personal blog
2022-07-24 23:55:00 【Brother Dei!】
Catalog
Application blog
demand
Consider how to connect the blog system es technology .
1. How to synchronize database data to es Go to , To deal with mapping
2. Do fuzzy search + Intelligent recommended technology selection
3. Advantages and disadvantages of the selected technology
4. Highlight ( Select the title of the hit , Summary and its article content , excerpts 50 Characters )
5.JAVA_API Use
Realization effect
Two query methods are provided ——es and Mysql
The following is a es, There are search recommendations ( Judge by prefix ), There are multiple fields ( Article title , Summary , Content ) And fuzzy query ( Intelligent error correction, etc )


mysql Inquire about : adopt like sentence

canal Complete data synchronization ( from mysql To es)
insert data


canal Complete the response here :

Use es Search has been able to search Got it


Design
1. How to synchronize database data to es Go to , To deal with mapping
1.1 Use Logstash Complete full synchronization ; Use Canal Complete incremental synchronization
1.2 The choice of word splitter : ik; filter : Lowercase filtering
1.3 Need to synchronize es Database table of : Article table , label , Classification table
2. Do fuzzy search + Intelligent recommended technology selection
2.1 Fuzzy search uses :fuzzy
2.2 Intelligent recommendation :context_suggester/complate_suggester
3. Advantages and disadvantages of the selected technology
3.1 fuzzy The advantages and disadvantages of fuzzy search need to be compared with prefix as well as wildcard Comparison .
First ,prefix The performance is very poor , There is no cache , And only supports prefix ambiguity , Poor function ; and wildcard Wildcards are not very practical , The level of intelligent error correction does not need to be too high , Taken together fuzzy Not bad in function , Good performance, too .
[fuzziness Set to 1 Just fine , Set to 2 The error rate is too high ]
3.2 context_suggester Is based on completion suggester An intelligent recommendation scheme , Adopted completion such es Unique type of , High performance based on memory . It can meet the prefix recommendation , And you can also make recommendations through classification and labels , More powerful . The disadvantage is that you need to consider when inserting data input The input values , Difficulty ratio completion suggester high .
[( For convenience ) You can set through the title Input value , Categories and labels are inserted directly suggest In the subfield of ,boost( The weight ) Not to be considered ]
4. Highlight ( Select the title of the hit , Summary and its article content , excerpts 50 Characters )
Problems encountered : At the beginning, we couldn't solve this problem , That is, the highlighted label has been displayed as a string when it is sent to the foreground

It's behind finally Select the corresponding label in the block and handle it .

5.JAVA_API Use
5.1 Import corresponding es Version dependency
5.2 Create it under the project es_service package
5.3 Create the corresponding es Indexed model class , Operation class
6.canal Sync data details
1.deployer
canal The service side , The configuration that needs to be done :
Get into deployer\conf\example, modify instance.properties
canal.instance.master.address=127.0.0.1:3306 # Database connection address
# Fill in the database information
canal.instance.dbUsername=root
canal.instance.dbPassword=root
canal.instance.connectionCharset = UTF-8
# table regex
canal.instance.filter.regex = .\*\\..\*2.adapter
2.1 Go to the root directory , modify application.yml
server:
port: 8081
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true
username: root
password: root
canalAdapters:
- instance: example # canal instance Name or mq topic name
groups:
- groupId: g1
outerAdapters:
- name: logger
- name: es7
hosts: http://127.0.0.1:9200
properties:
mode: rest #transport or rest
# # security.auth: test:123456 # only used for rest mode
cluster.name: elasticsearch # The cluster name 2.2 Get into es7 Catalog , Create a configuration file corresponding to the database name
For example, here I want to put the database 'test' Data synchronization to Es, Then create it under this file test.yml
To write test.yml
dataSourceKey: defaultDS # Of the source data source key, Corresponding SrcDataSources The value in
destination: example #canal Of instance
groupId: g1 # Smart synchronous correspondence groupId The data of
esMapping:
_index: my_table #es Index name of
_id: _id #es Of id If not, you must configure the following pk
upsert: true
#pk: id
sql: "select id as _id,name from my_table"
#sql Statement writing --> id Must be set to _id
# objFields:
# _labels: array:;
etlCondition: "where a.c_time>={}"
commitBatch: 3000 # Submit batch size
Be careful :
es It must contain this index , And the mapping should be consistent with Mysql Match the fields
The pit in the process :
1. The installation directory contains Chinese . start-up adapter Go straight back .
2.adapter Under the application.yml Inside ,es Corresponding Ip Didn't write well . If this place is not well written , When the data of the database is updated adapter There will be a reaction in the background , however es Data cannot be received
- name: es7
hosts: http://127.0.0.1:9200 # There is no http:// Will report a mistake Illegal character in scheme name at index 03. Test whether the missing field index can
SELECT id AS id, createtime,desc,lv,NAME,price FROM product
mysql Table of :

es The index of :

Deliberately missing two fields .
Experiments have proved that it is feasible ,es The index field of does not need to be followed mysql Keep consistent in quantity , But when matching, it will be case sensitive , So in es7 The configuration file in the folder sql Pay attention to the specification when writing statements .

4. Another pit ( How to synchronize multiple tables )
Try to write more yml file , Point to the same index ?
have a look sql Mapping description :
sql Support free combination of multi table Association , But there are certain limitations : 1 The main table cannot be a subquery statement . 2 Only use left outer join That is, the leftmost watch must be the main watch . 3 The associated slave table cannot have multiple tables if it is a subquery . 4 Lord sql There can be no where Query criteria ( From the table subquery, you can have where Conditions but not recommended , May cause data synchronization inconsistency , For example, it was modified where Field contents in the condition ). 5 The association condition only allows the '=' Operations cannot have other constant judgments such as : on a.role_id=b.id and b.statues=1. 6 The association condition must have a field in the main query statement such as : on a.role_id=b.id Among them a.role_id perhaps b.id Must appear in the Lord select In the sentence . Elastic Search Of mapping Properties and sql The value of the query should be one-to-one ( I won't support it select *), such as : select a.id as _id, a.name, a.email as _email from user, among name Map to es mapping Of name field, _email take Mapping to mapping Of _email field, Here's the alias ( If there are aliases ) As the final mapping field . there _id You can fill in the _id: _id mapping .
After reading the above instructions , I want to try Select Add my connection to the main sentence id, The discovery was successful
sql: "SELECT DISTINCT ma.id AS _id,mab.id as mab_id,ma.comment_counts,
ma.create_date,ma.summary,ma.title,mab.content,
ma.view_counts,ma.weight,ma.summary_img
FROM ms_article ma
LEFT JOIN ms_article_body mab ON ma.body_id = mab.id"Try to connect all objects id Add to Select In the main sentence :
sql: "SELECT DISTINCT ma.id AS _id,ma.comment_counts,
ma.create_date,ma.summary,ma.title,mab.id as mab_id,
ma.view_counts,ma.weight,msu.account,mc.id as mc_id,
mab.content ,mc.category_name ,mat.article_id as mat_id,
ma.summary_img,mt.tag_name,mat.tag_id as mat_tag_id,
msu.id as msu_id
FROM ms_article ma
LEFT JOIN ms_article_body mab ON ma.body_id = mab.id
LEFT JOIN ms_category mc ON mc.id=ma.category_id
LEFT JOIN ms_article_tag mat ON mat.article_id = ma.id
LEFT JOIN ms_tag mt ON mt.id=mat.tag_id
LEFT JOIN ms_sys_user msu ON ma.author_id= msu.id"Now don't report mistakes . However, we still have to test whether the inserted data of multiple tables can be updated normally
1. No problem deleting .
2. No problem with modification
3. There was a problem with the insertion at the beginning , But after the investigation, it was found that , Add all the query fields or the corresponding fields .
For example, ,ma.body_id = mab.id Inside ma It's my main watch ,mab It's a subtable , At this time, I put select The main sentence should be corresponding to the main table body_id. So the part above me Sql There is still something wrong with the statement , After modification, it becomes :
sql: "SELECT DISTINCT ma.id AS _id,ma.comment_counts,
ma.create_date,ma.summary,ma.title,ma.body_id,
ma.view_counts,ma.weight,msu.account,ma.category_id,
mab.content ,mc.category_name ,mat.article_id as mat_id,
ma.summary_img,mt.tag_name,mat.tag_id as mat_tag_id,
ma.author_id as mau_id,mt.id as mt_it
FROM ms_article ma
LEFT JOIN ms_article_body mab ON ma.body_id = mab.id
LEFT JOIN ms_category mc ON mc.id=ma.category_id
LEFT JOIN ms_article_tag mat ON mat.article_id = ma.id
LEFT JOIN ms_tag mt ON mt.id=mat.tag_id
LEFT JOIN ms_sys_user msu ON ma.author_id= msu.id"complete .
gitee Warehouse
It's not easy to make , Please give me more praise +star~
边栏推荐
- Install Kaspersky 2018 under win server 2012 R2
- 云计算三类巨头:IaaS、PaaS、SaaS,分别是什么意思,应用场景是什么?
- Notes of Teacher Li Hongyi's 2020 in-depth learning series 5
- VGA display based on FPGA
- How to propose effective solutions for high-end products? (1 methodology + 2 cases + 1 List)
- Which securities account is the best and safest for beginners
- @Mapkey usage instructions
- 采坑记录:TypeError: 'module' object is not callable
- Be an artistic test / development programmer and slowly change yourself
- codeforces round #797 ABCDEFG
猜你喜欢

Processing PDF and JPG files in VB6

Add a little surprise to life and be a prototype designer of creative life -- sharing with X contestants in the programming challenge

Implementation of cat and dog data set classification experiment based on tensorflow and keras convolutional neural network

Notes of Teacher Li Hongyi's 2020 in-depth learning series 4
Simple message queue implementation nodejs + redis =mq

3. Pressure test

ES6 adds -iterator traversal, for..Of loop

Sql文件导入数据库-保姆级教程

Opengauss kernel analysis: query rewriting

Pit record: typeerror:'module'object is not callable
随机推荐
Wine wechat initialization 96% stuck
Vite3.0 has been released, can you still roll it (list of new features)
Analyzing the principle of DNS resolution in kubernetes cluster
See project code Note 1
给生活加点惊喜,做创意生活的原型设计师丨编程挑战赛 x 选手分享
MySQL common basic commands
SQL result export function. If you click the work order but don't enter it, the interface is always blank and there is no response. What should you do?
BGP related knowledge points
[brother hero July training] day 20: search Binary Tree
Only by learning these JMeter plug-ins can we design complex performance test scenarios
I'd like to ask if the table creation DDL of ODPs can't be directly executed in MySQL. The string type is incompatible. Can you only adjust this by yourself
Qt | 事件系统 QEvent
Notes of Teacher Li Hongyi's 2020 in-depth learning series 3
ShardingSphere-数据库分库分表简介
Salesforce zero foundation learning (116) workflow - & gt; On flow
QT | event system qevent
In pgplsql: = and=
Add a little surprise to life and be a prototype designer of creative life -- sharing with X contestants in the programming challenge
Browser cache
NVIDIA inspector detailed instructions