当前位置:网站首页>Optimized the search function of broken websites
Optimized the search function of broken websites
2022-06-24 05:51:00 【Programmer fish skin】
Use ES + Cloud development practice optimizes website search
Hello everyone , I'm fish skin , Today, we will conduct a technical battle , Demand analysis => Technology selection => Design implementation , from 0 To 1, Take you to optimize the flexibility of website search .
ES + Cloud development Search Optimization Practice
Outline of this article :
background
I developed Programming navigation website Now online 6 It's been months , But from the beginning of the launch , There has always been a serious problem with the website , The search function is not easy to use .
before , In order to go online quickly , The search function simply uses the database fuzzy query ( contain ) To achieve , Development is convenient , But this approach is very inflexible .
for instance , There is a resource on the website called “Java Design patterns ”, And user search “Java Design patterns ” You can't find anything , The reason is that the resource name contains spaces , The keywords entered by the user when searching do not contain spaces .
Spaces are just a special case , There are many similar situations , For example, there is a resource on the website called “Java Concurrent programming practice ”, But users search “Java actual combat ” when , Obviously, the former includes “Java” and “ actual combat ” These two words , But I can't find anything .
Need to know , The search function is very important for an information aggregation site , Directly affect the user experience . You can't find resources on your website , Who can use ?
So I also received some polite suggestions from my friends , Like this bald man Tom:
No previous optimization search , There are two main reasons : poor + Be afraid of trouble . But as the number of website users increases , It's time to fill in the hole !
Technology selection
Want to improve website search flexibility , have access to Full text search technology , It can be implemented on both the front end and the back end .
Front end full text search
Sometimes , The data we want to retrieve is limited , And all the data are Store on client Of .
Personal blogging sites, for example , We usually store each article as a file in a directory , Instead of being stored in the background database , In this case , There is no need to request dynamic data from the server , Then you can search data directly at the front end .
There are some ready-made search libraries , such as Lunr.js(GitHub 7k+ star), First add the content to be retrieved :
var idx = lunr(function () {
this.field('title')
this.field('body')
// Content
this.add({
"title": "yupi",
"body": "wx Search the programmer's skin , Read my original article ",
"id": "1"
})
})Then search :
idx.search(" Fish skin ")The advantage of pure front-end full-text search is that there is no need for a back-end 、 Easy and convenient , It can save the pressure on the server ; No networking , No additional network overhead , Faster retrieval .
Back end full-text search
Different from the front end , The back-end full-text search is completed on the server , Search for qualified data from remote database , Then directly return to the front end .
At present, the mainstream back-end full-text search technology is Elasticsearch, A distribution 、RESTful Style search and data analysis engine .
It's powerful and flexible , But you need to build it yourself 、 Defining data 、 Manage dictionaries 、 Upload and maintain data, etc , It's very operational , Need some level , Designed by novices and bosses ES The search system is very different .
therefore , Not familiar with Elasticsearch Classmate , You can also directly use the ready-made full-text retrieval service . such as Algolia, Provided directly through it API Upload the data to be retrieved , Then use what it provides API Just search . It provides some free space , It's enough for small websites and learning to use .
choice
So which implementation method should I choose for my programming navigation website ?
First , The number of resources on this website is not fixed 、 Irregular dynamic updating , Therefore, it is not suitable for front-end full-text retrieval .
secondly , Considering the large amount of data on the website in the future , And it may be necessary to dynamically optimize the retrieval system according to the user's search ( Such as custom programming Dictionary ), So consider using Elasticsearch technology Build your own search engine , Instead of ready-made full-text retrieval services , In this way, you can customize the system whatever you want in the future . Besides , No need to send website data to other platforms , It can ensure the security of data .
ES install
Be sure to use Elasticsearch after , First build the environment .
You can buy your own servers , Then install it manually step by step according to the official documents . For personal websites with a certain scale , Although the construction process is not difficult , But the later maintenance cost is huge , For example, performance analysis 、 monitor 、 The alarm 、 Safety and so on. , You need to configure it yourself . Especially in the later stage, the amount of website data is greater , Also consider building clusters 、 Horizontal expansion, etc .
therefore , I choose to directly use the services provided by the cloud service provider Elasticsearch service , Choose Tencent cloud here , Automatically built a ready-made... For you ES The cluster service , It also provides visual architecture management 、 Cluster monitoring 、 journal 、 Advanced plugins 、 Intelligent patrol inspection and other functions .
although ES The price of the service is expensive , But it saves a lot of time and cost , It's worth it for me .
There is also a very convenient customized search service Elastic App Search, If you are interested, you can try .
ES Public service
Our goal is to optimize the search function of website resources , But the next step is not to write specific business logic directly , Instead, develop a Public ES service .
In fact the ES The operation is relatively simple , It can be simply understood as a database , So public ES The service shall have basic functions of adding, deleting, modifying and querying , For other functions to call .
Realization
Because the back end of programming navigation uses Tencent cloud development technology , use Node.js To write Services , So choose the one officially recommended @elastic/elasticsearch Library to operate ES.
It's okay without cloud development , You can first understand it as a back-end , Welcome to my previous article : Learn about cloud development .
The code is simple , First establish and ES The connection of , Here, in order to ensure data security , Use an intranet address :
const client = new Client({
// Intranet address
node: 'http://10.0.61.1:9200',
// User name and password
auth: {
username: esConfig.username,
password: esConfig.password,
},
}); Then write addition, deletion, modification and query . Here's a step abstract , adopt switch Equidistant statement , Distinguish operations according to request parameters 、 Data to be operated, etc , In this way, you don't have to write each operation as an interface independently .
// Accept request parameters
const { op, index, id, params } = event;
// Perform addition, deletion, modification and query according to the operation
switch (op) {
case 'add':
return doAdd(index, id, params);
case 'delete':
return doDelete(index, id);
case 'search':
return doSearch(index, params);
case 'update':
return doUpdate(index, id, params);
}In cloud development , If a function hasn't been called for too long , It frees up resources . On next request , Cold start will be carried out , Recreate the resource , This causes the interface to return slowly . therefore , Encapsulate multiple operations into the same function , It can also reduce the probability of cold start .
The specific addition, deletion, modification and query code will not be repeated , Facing ES Node Just read the official documents of , Later, we will open source the code into the programming navigation warehouse (https://github.com/liyupi/code-nav).
Local debugging
After writing the code , You can use the cloud to develop your own tcb The command line tool executes the function locally .
Remember to put it first ES Change your connection address to public network , Then enter a command line . For example, we want to ES Insert a piece of data , Pass in the name of the function to execute 、 Request parameters 、 Code path :
tcb fn run
--name <functionName>
--params "{\"op\": \"add\"}"
--path <functionPath>After successful execution , You can be in ES See the newly inserted data in ( adopt Kibana Panel or curl see ):
Remote testing
After testing the public service code locally , hold ES Change the connection address to intranet IP, Then publish to the cloud .
Next, try writing another function to access the public ES service , For example, insert resources into ES, adopt callFunction request :
// Add resources to ES
function addData() {
// Request public service
app.callFunction({
name: 'esService',
data: {
op: 'add',
index: 'resource',
id,
params: data,
}
});
}however , The data was not successfully inserted , Instead, it returns the interface timeout ,Why?
Intranet configuration
I know from the log that ES Can't connect , Will it be because of the launch ES Machines and... Where public services are located ES Not on the same intranet ?
So you need to change it on the cloud development console ES Private network configuration of public services , Choose and buy ES Just use the same subnet :
After the modification , Remote request again ES Public service , The data is inserted successfully ~
Data index
Well developed ES After public service , You can write specific business logic .
In the first ES Build an index in ( Database like tables ), To agree on the type of data 、 Word segmentation and other information , Instead of allowing arbitrary insertion of data .
For example, in order to search more flexibly , The resource name should be specified as "text" type , To open the participle , And designate ik Chinese word segmentation :
"name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}The number of likes should be set to "long" type , Only numbers are allowed in :
"likeNum": {
"type": "long"
}It's also best to specify an alias for the index , It is convenient to rebuild the index when modifying the field later :
"aliases" : { "resource": {}}Write an index json After the configuration , adopt curl or Kibana To call ES Just create a new index interface .
Data synchronization
Before , The resource data of programming navigation websites are stored in the database , Users query from the database . And now we have to change from ES Query in ,ES You can't be empty , We have to find a way to synchronize the resource data in the database to ES in .
There are several synchronization strategies .
Double write
before , The resources recommended by the user will only be inserted into the database , Double write refers to when a resource is inserted into the database , At the same time insert into ES Just fine .
Sounds simple , But there are some problems with this approach :
- Will change the previous code , Every place where the database is written should be supplemented with ES.
- There will be write failures on one side 、 Success on the other side , Cause database and ES The data are inconsistent .
Is there any change to the existing code Intrusion is smaller How about the method ?
Timing synchronization
If the requirement for real-time data is not high , Timing synchronization can be selected , Copy the newly inserted or modified data from the database to the database at regular intervals ES On .
There are many ways to do it , For example, use Logstash Data transmission pipeline , Or write your own scheduled task program , So you don't have to change the existing code at all .
Real time synchronization
If the requirement for real-time data is very high , The data just inserted into the database should be searched immediately , Then you have to synchronize in real time . Except for double writing , You can also listen to the database binlog, In case of any changes to the database , We can all feel .
Ali has an open source project called Canal , Can monitor in real time MySQL database , And push the notification to the downstream , Interested friends can have a look at .
Realization
Because the search of programming resources does not require high real-time performance , So timing synchronization is ok.
Cloud development provides timing function by default , I'll just write a cloud function , Every time 1 Once per minute , Nearly every time you read from the database 5 Data changed in minutes , To prevent the last execution failure . Besides , Also configure the timeout , Prevent function execution failure caused by too long execution time .
Developing in the cloud - The cloud function console can be configured visually , You need to specify a for the scheduled task crontab expression :
After starting timing synchronization , Don't forget to write and execute another For the first time Synchronization function , Used to synchronize the full amount of historical data to ES.
data retrieval
Now? ES There are already data on , There's only one last step left , Just how to find the data ?
First of all, we need to learn ES Search for DSL( grammar ), Including how to get Columns 、 Search for 、 Filter 、 Pagination 、 Sort, etc , For beginners , It's still a bit of a hassle , Especially the combination of Boolean expressions in query conditions , If you don't pay attention, you can't find out the data . So I suggest you start with Kibana Write query syntax in the provided debugging tool :
After finding the expected data , Then write the back-end search function , The accepted request parameters should be consistent with the original interface , Reduce changes .
The query syntax can be dynamically spliced according to the request from the front end , For example, search by resource name :
// Passed the resource name if (name) { // Splice query statements query.bool.should = [ { match: { name } } ];}thus , The search optimization of the whole website is completed .
Try the effect again , Now even if I type some more “ fish ” The word , You can also find !
ES How to realize flexible search ? Welcome to This article .
new ES The release of search interface does not mean that the old database query interface is obsolete , You can keep... At the same time . Use the new interface when searching for resources by name , More flexible ; According to the audit status 、 When searching for resources published by a user , You can use the old interface , Check... From the database . So as to share the load , Separation of duties , Let the right technology do the right thing !
That's what we're sharing , If it's helpful, give it a compliment ️
I'm fish skin , Finally, I'll send you some more Help me get to the big factory offer Learning materials :
ran , leave 6T Resources for !
Welcome to I started from 0 Self study and enter Tencent's programming learning 、 To apply for a job 、 Textual research 、 Book writing experience , No more confusion !
I studied computer for four years , Mutual encouragement !
边栏推荐
- Flutter - date of birth calculation age tool class
- How to file a personal domain name? What are the benefits of domain name filing?
- Idea2020 latest activation tutorial, continuously updated
- PNAs: development of white matter pathways in human brain during the second and third trimester of pregnancy
- Several relations to be clarified in the process of digital transformation: stock and increment
- What is the reason why the list of channels on the left side of easycvr video Plaza displays garbled codes?
- How to build a website after registering a domain name? Do you need maintenance later?
- Will cloud server hosting become the mainstream?
- 5g/4g data acquisition telemetry terminal
- Lightweight toss plan 3, develop in the browser - build your own development bucket (Part 1)
猜你喜欢
随机推荐
Tidb massive region cluster tuning practice
How to resolve the domain name to IP? How long does it take for the domain name resolution to take effect?
PNAs: development of white matter pathways in human brain during the second and third trimester of pregnancy
Tencent security operation center integrates ueba capabilities to help enterprises ensure internal network security
The basic concept of network is the relationship among services, protocols, processes and ports.
Test development knowledge map
Tamp the foundation, step into the cloud and rise to the original cloud Devops
Best practices for building a distributed Domain Driven Architecture Based on data mesh
How to apply for a domain name? Why should domain names be filed in advance?
My two-year persistence is worth it!
What is a first level domain name? What are the steps to purchase a primary domain name?
How to make a website with a domain name? What are the functions of the website?
PV and PVC analysis and use in kubernetes
Data warehouse data processing DB basic concept analysis and understanding OLAP OLTP hatp similarities and differences MPP architecture
Cloud studio 2.0: the beginning of cloud
Interpretation of Cocos creator source code: siblingindex and zindex
What are the stages from tradition to Tencent cloud
What happened to the JVM locking on Tencent ECS?
How to register a Chinese domain name? Is it necessary to register a Chinese domain name?
Explain thoroughly and learn thoroughly binary tree (6): written test of binary tree: flip | width | depth