当前位置：网站首页>Optimized the search function of broken websites

Optimized the search function of broken websites

2022-06-24 05:51:00 【Programmer fish skin】

Use ES + Cloud development practice optimizes website search

Hello everyone , I'm fish skin , Today, we will conduct a technical battle , Demand analysis => Technology selection => Design implementation , from 0 To 1, Take you to optimize the flexibility of website search .

ES + Cloud development Search Optimization Practice

Outline of this article ：

Fish skin - Website search optimization

background

I developed Programming navigation website Now online 6 It's been months , But from the beginning of the launch , There has always been a serious problem with the website , The search function is not easy to use .

before , In order to go online quickly , The search function simply uses the database fuzzy query （ contain ） To achieve , Development is convenient , But this approach is very inflexible .

for instance , There is a resource on the website called “Java Design patterns ”, And user search “Java Design patterns ” You can't find anything , The reason is that the resource name contains spaces , The keywords entered by the user when searching do not contain spaces .

Spaces are just a special case , There are many similar situations , For example, there is a resource on the website called “Java Concurrent programming practice ”, But users search “Java actual combat ” when , Obviously, the former includes “Java” and “ actual combat ” These two words , But I can't find anything .

Need to know , The search function is very important for an information aggregation site , Directly affect the user experience . You can't find resources on your website , Who can use ？

So I also received some polite suggestions from my friends , Like this bald man Tom：

No previous optimization search , There are two main reasons ： poor + Be afraid of trouble . But as the number of website users increases , It's time to fill in the hole ！

Technology selection

Want to improve website search flexibility , have access to Full text search technology , It can be implemented on both the front end and the back end .

Front end full text search

Sometimes , The data we want to retrieve is limited , And all the data are Store on client Of .

Personal blogging sites, for example , We usually store each article as a file in a directory , Instead of being stored in the background database , In this case , There is no need to request dynamic data from the server , Then you can search data directly at the front end .

There are some ready-made search libraries , such as Lunr.js（GitHub 7k+ star）, First add the content to be retrieved ：

var idx = lunr(function () {
  this.field('title')
  this.field('body')
  //  Content 
  this.add({
    "title": "yupi",
    "body": "wx Search the programmer's skin , Read my original article ",
    "id": "1"
  })
})

Then search ：

idx.search(" Fish skin ")

The advantage of pure front-end full-text search is that there is no need for a back-end 、 Easy and convenient , It can save the pressure on the server ; No networking , No additional network overhead , Faster retrieval .

Back end full-text search

Different from the front end , The back-end full-text search is completed on the server , Search for qualified data from remote database , Then directly return to the front end .

At present, the mainstream back-end full-text search technology is Elasticsearch, A distribution 、RESTful Style search and data analysis engine .

It's powerful and flexible , But you need to build it yourself 、 Defining data 、 Manage dictionaries 、 Upload and maintain data, etc , It's very operational , Need some level , Designed by novices and bosses ES The search system is very different .

therefore , Not familiar with Elasticsearch Classmate , You can also directly use the ready-made full-text retrieval service . such as Algolia, Provided directly through it API Upload the data to be retrieved , Then use what it provides API Just search . It provides some free space , It's enough for small websites and learning to use .

Algolia Retrieval service

choice

So which implementation method should I choose for my programming navigation website ？

First , The number of resources on this website is not fixed 、 Irregular dynamic updating , Therefore, it is not suitable for front-end full-text retrieval .

secondly , Considering the large amount of data on the website in the future , And it may be necessary to dynamically optimize the retrieval system according to the user's search （ Such as custom programming Dictionary ）, So consider using Elasticsearch technology Build your own search engine , Instead of ready-made full-text retrieval services , In this way, you can customize the system whatever you want in the future . Besides , No need to send website data to other platforms , It can ensure the security of data .

ES install

Be sure to use Elasticsearch after , First build the environment .

You can buy your own servers , Then install it manually step by step according to the official documents . For personal websites with a certain scale , Although the construction process is not difficult , But the later maintenance cost is huge , For example, performance analysis 、 monitor 、 The alarm 、 Safety and so on. , You need to configure it yourself . Especially in the later stage, the amount of website data is greater , Also consider building clusters 、 Horizontal expansion, etc .

therefore , I choose to directly use the services provided by the cloud service provider Elasticsearch service , Choose Tencent cloud here , Automatically built a ready-made... For you ES The cluster service , It also provides visual architecture management 、 Cluster monitoring 、 journal 、 Advanced plugins 、 Intelligent patrol inspection and other functions .

cloud ES Cluster architecture

although ES The price of the service is expensive , But it saves a lot of time and cost , It's worth it for me .

There is also a very convenient customized search service Elastic App Search, If you are interested, you can try .

ES Public service

Our goal is to optimize the search function of website resources , But the next step is not to write specific business logic directly , Instead, develop a Public ES service .

In fact the ES The operation is relatively simple , It can be simply understood as a database , So public ES The service shall have basic functions of adding, deleting, modifying and querying , For other functions to call .

Realization

Because the back end of programming navigation uses Tencent cloud development technology , use Node.js To write Services , So choose the one officially recommended @elastic/elasticsearch Library to operate ES.

It's okay without cloud development , You can first understand it as a back-end , Welcome to my previous article ： Learn about cloud development .

The code is simple , First establish and ES The connection of , Here, in order to ensure data security , Use an intranet address ：

const client = new Client({
	//  Intranet address 
  node: 'http://10.0.61.1:9200',
  //  User name and password 
  auth: {
    username: esConfig.username,
    password: esConfig.password,
  },
});

Then write addition, deletion, modification and query . Here's a step abstract , adopt switch Equidistant statement , Distinguish operations according to request parameters 、 Data to be operated, etc , In this way, you don't have to write each operation as an interface independently .

//  Accept request parameters 
const { op, index, id, params } = event;
//  Perform addition, deletion, modification and query according to the operation 
switch (op) {
  case 'add':
    return doAdd(index, id, params);
  case 'delete':
    return doDelete(index, id);
  case 'search':
    return doSearch(index, params);
  case 'update':
    return doUpdate(index, id, params);
}

In cloud development , If a function hasn't been called for too long , It frees up resources . On next request , Cold start will be carried out , Recreate the resource , This causes the interface to return slowly . therefore , Encapsulate multiple operations into the same function , It can also reduce the probability of cold start .

The specific addition, deletion, modification and query code will not be repeated , Facing ES Node Just read the official documents of , Later, we will open source the code into the programming navigation warehouse （https://github.com/liyupi/code-nav）.

Local debugging

After writing the code , You can use the cloud to develop your own tcb The command line tool executes the function locally .

Remember to put it first ES Change your connection address to public network , Then enter a command line . For example, we want to ES Insert a piece of data , Pass in the name of the function to execute 、 Request parameters 、 Code path ：

tcb fn run 
  --name <functionName>
  --params "{\"op\": \"add\"}"
  --path <functionPath>

After successful execution , You can be in ES See the newly inserted data in （ adopt Kibana Panel or curl see ）：

Remote testing

After testing the public service code locally , hold ES Change the connection address to intranet IP, Then publish to the cloud .

Next, try writing another function to access the public ES service , For example, insert resources into ES, adopt callFunction request ：

//  Add resources to  ES
function addData() {
  //  Request public service 
  app.callFunction({
    name: 'esService',
    data: {
      op: 'add',
      index: 'resource',
      id,
      params: data,
    }
  });
}

however , The data was not successfully inserted , Instead, it returns the interface timeout ,Why？

Intranet configuration

I know from the log that ES Can't connect , Will it be because of the launch ES Machines and... Where public services are located ES Not on the same intranet ？

So you need to change it on the cloud development console ES Private network configuration of public services , Choose and buy ES Just use the same subnet ：

To configure ES Cloud function private network

After the modification , Remote request again ES Public service , The data is inserted successfully ~

Data index

Well developed ES After public service , You can write specific business logic .

In the first ES Build an index in （ Database like tables ）, To agree on the type of data 、 Word segmentation and other information , Instead of allowing arbitrary insertion of data .

For example, in order to search more flexibly , The resource name should be specified as "text" type , To open the participle , And designate ik Chinese word segmentation ：

"name": {
  "type": "text",
  "analyzer": "ik_max_word",
  "search_analyzer": "ik_smart",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

The number of likes should be set to "long" type , Only numbers are allowed in ：

"likeNum": {
  "type": "long"
}

It's also best to specify an alias for the index , It is convenient to rebuild the index when modifying the field later ：

"aliases" : {   "resource": {}}

Write an index json After the configuration , adopt curl or Kibana To call ES Just create a new index interface .

Data synchronization

Before , The resource data of programming navigation websites are stored in the database , Users query from the database . And now we have to change from ES Query in ,ES You can't be empty , We have to find a way to synchronize the resource data in the database to ES in .

There are several synchronization strategies .

Double write

before , The resources recommended by the user will only be inserted into the database , Double write refers to when a resource is inserted into the database , At the same time insert into ES Just fine .

Sounds simple , But there are some problems with this approach ：

Will change the previous code , Every place where the database is written should be supplemented with ES.
There will be write failures on one side 、 Success on the other side , Cause database and ES The data are inconsistent .

Is there any change to the existing code Intrusion is smaller How about the method ？

Timing synchronization

If the requirement for real-time data is not high , Timing synchronization can be selected , Copy the newly inserted or modified data from the database to the database at regular intervals ES On .

There are many ways to do it , For example, use Logstash Data transmission pipeline , Or write your own scheduled task program , So you don't have to change the existing code at all .

Real time synchronization

If the requirement for real-time data is very high , The data just inserted into the database should be searched immediately , Then you have to synchronize in real time . Except for double writing , You can also listen to the database binlog, In case of any changes to the database , We can all feel .

Ali has an open source project called Canal , Can monitor in real time MySQL database , And push the notification to the downstream , Interested friends can have a look at .

Canal project

Realization

Because the search of programming resources does not require high real-time performance , So timing synchronization is ok.

Cloud development provides timing function by default , I'll just write a cloud function , Every time 1 Once per minute , Nearly every time you read from the database 5 Data changed in minutes , To prevent the last execution failure . Besides , Also configure the timeout , Prevent function execution failure caused by too long execution time .

Developing in the cloud - The cloud function console can be configured visually , You need to specify a for the scheduled task crontab expression ：

Configure cloud function timing and timeout

After starting timing synchronization , Don't forget to write and execute another For the first time Synchronization function , Used to synchronize the full amount of historical data to ES.

data retrieval

Now? ES There are already data on , There's only one last step left , Just how to find the data ？

First of all, we need to learn ES Search for DSL（ grammar ）, Including how to get Columns 、 Search for 、 Filter 、 Pagination 、 Sort, etc , For beginners , It's still a bit of a hassle , Especially the combination of Boolean expressions in query conditions , If you don't pay attention, you can't find out the data . So I suggest you start with Kibana Write query syntax in the provided debugging tool ：

Kibana debugging

After finding the expected data , Then write the back-end search function , The accepted request parameters should be consistent with the original interface , Reduce changes .

The query syntax can be dynamically spliced according to the request from the front end , For example, search by resource name ：

//  Passed the resource name if (name) {  //  Splice query statements   query.bool.should = [    {      match: {        name      }    }  ];}

thus , The search optimization of the whole website is completed .

Try the effect again , Now even if I type some more “ fish ” The word , You can also find ！

ES How to realize flexible search ？ Welcome to This article .

new ES The release of search interface does not mean that the old database query interface is obsolete , You can keep... At the same time . Use the new interface when searching for resources by name , More flexible ; According to the audit status 、 When searching for resources published by a user , You can use the old interface , Check... From the database . So as to share the load , Separation of duties , Let the right technology do the right thing ！

That's what we're sharing , If it's helpful, give it a compliment ️

I'm fish skin , Finally, I'll send you some more Help me get to the big factory offer Learning materials ：

ran , leave 6T Resources for ！

Welcome to I started from 0 Self study and enter Tencent's programming learning 、 To apply for a job 、 Textual research 、 Book writing experience , No more confusion ！

I studied computer for four years , Mutual encouragement ！

原网站

版权声明
本文为[Programmer fish skin]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/08/20210802153022955z.html

当前位置：网站首页>Optimized the search function of broken websites

Optimized the search function of broken websites

ES + Cloud development Search Optimization Practice

background

Technology selection

Front end full text search

Back end full-text search

choice

ES install

ES Public service

Realization

Local debugging

Remote testing

Intranet configuration

Data index

Data synchronization

Double write

Timing synchronization

Real time synchronization

Realization

data retrieval

边栏推荐

猜你喜欢

随机推荐