当前位置:网站首页>Optimized the search function of broken websites

Optimized the search function of broken websites

2022-06-24 05:51:00 Programmer fish skin

Use ES + Cloud development practice optimizes website search

Hello everyone , I'm fish skin , Today, we will conduct a technical battle , Demand analysis => Technology selection => Design implementation , from 0 To 1, Take you to optimize the flexibility of website search .

ES + Cloud development Search Optimization Practice

Outline of this article :

Fish skin - Website search optimization

background

I developed Programming navigation website Now online 6 It's been months , But from the beginning of the launch , There has always been a serious problem with the website , The search function is not easy to use .

before , In order to go online quickly , The search function simply uses the database fuzzy query ( contain ) To achieve , Development is convenient , But this approach is very inflexible .

for instance , There is a resource on the website called “Java Design patterns ”, And user search “Java Design patterns ” You can't find anything , The reason is that the resource name contains spaces , The keywords entered by the user when searching do not contain spaces .

Spaces are just a special case , There are many similar situations , For example, there is a resource on the website called “Java Concurrent programming practice ”, But users search “Java actual combat ” when , Obviously, the former includes “Java” and “ actual combat ” These two words , But I can't find anything .

Need to know , The search function is very important for an information aggregation site , Directly affect the user experience . You can't find resources on your website , Who can use ?

So I also received some polite suggestions from my friends , Like this bald man Tom:

No previous optimization search , There are two main reasons : poor + Be afraid of trouble . But as the number of website users increases , It's time to fill in the hole !

Technology selection

Want to improve website search flexibility , have access to Full text search technology , It can be implemented on both the front end and the back end .

Front end full text search

Sometimes , The data we want to retrieve is limited , And all the data are Store on client Of .

Personal blogging sites, for example , We usually store each article as a file in a directory , Instead of being stored in the background database , In this case , There is no need to request dynamic data from the server , Then you can search data directly at the front end .

There are some ready-made search libraries , such as Lunr.js(GitHub 7k+ star), First add the content to be retrieved :

var idx = lunr(function () {
  this.field('title')
  this.field('body')
  //  Content 
  this.add({
    "title": "yupi",
    "body": "wx Search the programmer's skin , Read my original article ",
    "id": "1"
  })
})

Then search :

idx.search(" Fish skin ")

The advantage of pure front-end full-text search is that there is no need for a back-end 、 Easy and convenient , It can save the pressure on the server ; No networking , No additional network overhead , Faster retrieval .

Back end full-text search

Different from the front end , The back-end full-text search is completed on the server , Search for qualified data from remote database , Then directly return to the front end .

At present, the mainstream back-end full-text search technology is Elasticsearch, A distribution 、RESTful Style search and data analysis engine .

It's powerful and flexible , But you need to build it yourself 、 Defining data 、 Manage dictionaries 、 Upload and maintain data, etc , It's very operational , Need some level , Designed by novices and bosses ES The search system is very different .

therefore , Not familiar with Elasticsearch Classmate , You can also directly use the ready-made full-text retrieval service . such as Algolia, Provided directly through it API Upload the data to be retrieved , Then use what it provides API Just search . It provides some free space , It's enough for small websites and learning to use .

Algolia Retrieval service

choice

So which implementation method should I choose for my programming navigation website ?

First , The number of resources on this website is not fixed 、 Irregular dynamic updating , Therefore, it is not suitable for front-end full-text retrieval .

secondly , Considering the large amount of data on the website in the future , And it may be necessary to dynamically optimize the retrieval system according to the user's search ( Such as custom programming Dictionary ), So consider using Elasticsearch technology Build your own search engine , Instead of ready-made full-text retrieval services , In this way, you can customize the system whatever you want in the future . Besides , No need to send website data to other platforms , It can ensure the security of data .

ES install

Be sure to use Elasticsearch after , First build the environment .

You can buy your own servers , Then install it manually step by step according to the official documents . For personal websites with a certain scale , Although the construction process is not difficult , But the later maintenance cost is huge , For example, performance analysis 、 monitor 、 The alarm 、 Safety and so on. , You need to configure it yourself . Especially in the later stage, the amount of website data is greater , Also consider building clusters 、 Horizontal expansion, etc .

therefore , I choose to directly use the services provided by the cloud service provider Elasticsearch service , Choose Tencent cloud here , Automatically built a ready-made... For you ES The cluster service , It also provides visual architecture management 、 Cluster monitoring 、 journal 、 Advanced plugins 、 Intelligent patrol inspection and other functions .

cloud ES Cluster architecture

although ES The price of the service is expensive , But it saves a lot of time and cost , It's worth it for me .

There is also a very convenient customized search service Elastic App Search, If you are interested, you can try .

ES Public service

Our goal is to optimize the search function of website resources , But the next step is not to write specific business logic directly , Instead, develop a Public ES service .

In fact the ES The operation is relatively simple , It can be simply understood as a database , So public ES The service shall have basic functions of adding, deleting, modifying and querying , For other functions to call .

Realization

Because the back end of programming navigation uses Tencent cloud development technology , use Node.js To write Services , So choose the one officially recommended @elastic/elasticsearch Library to operate ES.

It's okay without cloud development , You can first understand it as a back-end , Welcome to my previous article : Learn about cloud development .

The code is simple , First establish and ES The connection of , Here, in order to ensure data security , Use an intranet address :

const client = new Client({
	//  Intranet address 
  node: 'http://10.0.61.1:9200',
  //  User name and password 
  auth: {
    username: esConfig.username,
    password: esConfig.password,
  },
});

Then write addition, deletion, modification and query . Here's a step abstract , adopt switch Equidistant statement , Distinguish operations according to request parameters 、 Data to be operated, etc , In this way, you don't have to write each operation as an interface independently .

//  Accept request parameters 
const { op, index, id, params } = event;
//  Perform addition, deletion, modification and query according to the operation 
switch (op) {
  case 'add':
    return doAdd(index, id, params);
  case 'delete':
    return doDelete(index, id);
  case 'search':
    return doSearch(index, params);
  case 'update':
    return doUpdate(index, id, params);
}

In cloud development , If a function hasn't been called for too long , It frees up resources . On next request , Cold start will be carried out , Recreate the resource , This causes the interface to return slowly . therefore , Encapsulate multiple operations into the same function , It can also reduce the probability of cold start .

The specific addition, deletion, modification and query code will not be repeated , Facing ES Node Just read the official documents of , Later, we will open source the code into the programming navigation warehouse (https://github.com/liyupi/code-nav).

Local debugging

After writing the code , You can use the cloud to develop your own tcb The command line tool executes the function locally .

Remember to put it first ES Change your connection address to public network , Then enter a command line . For example, we want to ES Insert a piece of data , Pass in the name of the function to execute 、 Request parameters 、 Code path :

tcb fn run 
  --name <functionName>
  --params "{\"op\": \"add\"}"
  --path <functionPath>

After successful execution , You can be in ES See the newly inserted data in ( adopt Kibana Panel or curl see ):

Remote testing

After testing the public service code locally , hold ES Change the connection address to intranet IP, Then publish to the cloud .

Next, try writing another function to access the public ES service , For example, insert resources into ES, adopt callFunction request :

//  Add resources to  ES
function addData() {
  //  Request public service 
  app.callFunction({
    name: 'esService',
    data: {
      op: 'add',
      index: 'resource',
      id,
      params: data,
    }
  });
}

however , The data was not successfully inserted , Instead, it returns the interface timeout ,Why?

Intranet configuration

I know from the log that ES Can't connect , Will it be because of the launch ES Machines and... Where public services are located ES Not on the same intranet ?

So you need to change it on the cloud development console ES Private network configuration of public services , Choose and buy ES Just use the same subnet :

To configure ES Cloud function private network

After the modification , Remote request again ES Public service , The data is inserted successfully ~

Data index

Well developed ES After public service , You can write specific business logic .

In the first ES Build an index in ( Database like tables ), To agree on the type of data 、 Word segmentation and other information , Instead of allowing arbitrary insertion of data .

For example, in order to search more flexibly , The resource name should be specified as "text" type , To open the participle , And designate ik Chinese word segmentation :

"name": {
  "type": "text",
  "analyzer": "ik_max_word",
  "search_analyzer": "ik_smart",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

The number of likes should be set to "long" type , Only numbers are allowed in :

"likeNum": {
  "type": "long"
}

It's also best to specify an alias for the index , It is convenient to rebuild the index when modifying the field later :

"aliases" : {   "resource": {}}

Write an index json After the configuration , adopt curl or Kibana To call ES Just create a new index interface .

Data synchronization

Before , The resource data of programming navigation websites are stored in the database , Users query from the database . And now we have to change from ES Query in ,ES You can't be empty , We have to find a way to synchronize the resource data in the database to ES in .

There are several synchronization strategies .

Double write

before , The resources recommended by the user will only be inserted into the database , Double write refers to when a resource is inserted into the database , At the same time insert into ES Just fine .

Sounds simple , But there are some problems with this approach :

  1. Will change the previous code , Every place where the database is written should be supplemented with ES.
  2. There will be write failures on one side 、 Success on the other side , Cause database and ES The data are inconsistent .

Is there any change to the existing code Intrusion is smaller How about the method ?

Timing synchronization

If the requirement for real-time data is not high , Timing synchronization can be selected , Copy the newly inserted or modified data from the database to the database at regular intervals ES On .

There are many ways to do it , For example, use Logstash Data transmission pipeline , Or write your own scheduled task program , So you don't have to change the existing code at all .

Real time synchronization

If the requirement for real-time data is very high , The data just inserted into the database should be searched immediately , Then you have to synchronize in real time . Except for double writing , You can also listen to the database binlog, In case of any changes to the database , We can all feel .

Ali has an open source project called Canal , Can monitor in real time MySQL database , And push the notification to the downstream , Interested friends can have a look at .

Canal project

Realization

Because the search of programming resources does not require high real-time performance , So timing synchronization is ok.

Cloud development provides timing function by default , I'll just write a cloud function , Every time 1 Once per minute , Nearly every time you read from the database 5 Data changed in minutes , To prevent the last execution failure . Besides , Also configure the timeout , Prevent function execution failure caused by too long execution time .

Developing in the cloud - The cloud function console can be configured visually , You need to specify a for the scheduled task crontab expression :

Configure cloud function timing and timeout

After starting timing synchronization , Don't forget to write and execute another For the first time Synchronization function , Used to synchronize the full amount of historical data to ES.

data retrieval

Now? ES There are already data on , There's only one last step left , Just how to find the data ?

First of all, we need to learn ES Search for DSL( grammar ), Including how to get Columns 、 Search for 、 Filter 、 Pagination 、 Sort, etc , For beginners , It's still a bit of a hassle , Especially the combination of Boolean expressions in query conditions , If you don't pay attention, you can't find out the data . So I suggest you start with Kibana Write query syntax in the provided debugging tool :

Kibana debugging

After finding the expected data , Then write the back-end search function , The accepted request parameters should be consistent with the original interface , Reduce changes .

The query syntax can be dynamically spliced according to the request from the front end , For example, search by resource name :

//  Passed the resource name if (name) {  //  Splice query statements   query.bool.should = [    {      match: {        name      }    }  ];}

thus , The search optimization of the whole website is completed .

Try the effect again , Now even if I type some more “ fish ” The word , You can also find !

ES How to realize flexible search ? Welcome to This article .

new ES The release of search interface does not mean that the old database query interface is obsolete , You can keep... At the same time . Use the new interface when searching for resources by name , More flexible ; According to the audit status 、 When searching for resources published by a user , You can use the old interface , Check... From the database . So as to share the load , Separation of duties , Let the right technology do the right thing !


That's what we're sharing , If it's helpful, give it a compliment ️

I'm fish skin , Finally, I'll send you some more Help me get to the big factory offer Learning materials

ran , leave 6T Resources for !

Welcome to I started from 0 Self study and enter Tencent's programming learning 、 To apply for a job 、 Textual research 、 Book writing experience , No more confusion !

I studied computer for four years , Mutual encouragement !

原网站

版权声明
本文为[Programmer fish skin]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/08/20210802153022955z.html

猜你喜欢

    随机推荐