当前位置:网站首页>Full text search of MySQL
Full text search of MySQL
2022-06-23 21:24:00 【eeaters】
mysql Index of participle
- Preface
- Scope and limitations of use
- Global configuration of full-text search
- Data preparation
- Full text retrieval metadata
- Full text search query
- Custom stop words
- ngram Full text searcher ( Chinese stop words )
Preface
You can follow the official directly : mysql Official documents -fulltext
Now, when the product doesn't agree with each other, we want to segment words or query all fuzzy words , Previous solutions include :
- The amount of data is small , The data is thrown to the front end ; The front end will see to it
- A lot of code has like Both sides % Of the query , I hate this kind of sql, But it seems that most development doesn't matter , It's usually hard to persuade others
- Unfortunately, I came to this demand , Argue with the product , Purpose : Fuzzy query is very reasonable , But the matching rules need to be changed a little , After fuzzy query , I need to pick up ( In most cases, the demand products will step back )
- The argument failed , The product said that such and such products are like this , When the boss says something, we also want to ; As a result, the link is complex and the request volume is not small , Needs assessment , Do you need to go to es
This time, a similar requirement is in the design stage , Because there's plenty of time , The requirements are simple , Just learn from the official website mysql Full text search of , In case it's appropriate , In the future, there will be one more alternative …
Scope and limitations of use
- Only supported with InnoDB and MyISAM engine , The forms of expression are slightly different , Didn't take MyISAM To test
- Partitioned tables are not supported
- I won't support it Unicode code ,usc2 It is best not to use this character set
- Stop words do not support Chinese by default , Japanese ….
- Character based
ngramThe full-text search parser supports three languages of China, Japan and South Korea - There is another one in Japanese MeCab Parser plug-in
- Character based
- Although we can set a character set for each line , But the columns related to full-text search must be the same as the characters
- % This is for fuzzy queries , Full text search does not support this wildcard ; You usually use
word*It looks like - DML( Additions and deletions ) In operation , The transaction will not be formally inserted into the full-text index table until it is committed , There will be no dirty reading and so on
Global configuration of full-text search
show global VARIABLES where Variable_name like 'innodb_ft%' Variable_name Value --- innodb_ft_aux_table innodb_ft_cache_size 8000000 innodb_ft_enable_diag_print OFF innodb_ft_enable_stopword ON innodb_ft_max_token_size 84 innodb_ft_min_token_size 3 innodb_ft_num_word_optimize 2000 innodb_ft_result_cache_limit 2000000000 innodb_ft_server_stopword_table innodb_ft_sort_pll_degree 2 innodb_ft_total_cache_size 640000000 innodb_ft_user_stopword_table
Data preparation
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR (200),
body TEXT,
FULLTEXT (title, body)
) ENGINE = INNODB;
INSERT INTO articles (title, body)
VALUES
(
'MySQL Tutorial',
'DBMS stands for DataBase ...'
),
(
'How To Use MySQL Well',
'After you went through a ...'
),
(
'Optimizing MySQL',
'In this tutorial we show ...'
),
(
'1001 MySQL Tricks',
'1. Never run mysqld as root. 2. ...'
),
(
'MySQL vs. YourSQL',
'In the following database comparison ...'
),
(
'MySQL Security',
'When configured properly, MySQL ...'
);
# When querying table data later, you need to execute the following sql To locate the debug in this table
set GLOBAL innodb_ft_aux_table = 'test/articles';Full text retrieval metadata
SHOW TABLES FROM INFORMATION_SCHEMA LIKE 'INNODB_FT%'; Tables_in_information_schema (INNODB_FT%) --- INNODB_FT_CONFIG INNODB_FT_BEING_DELETED INNODB_FT_DELETED INNODB_FT_DEFAULT_STOPWORD INNODB_FT_INDEX_TABLE INNODB_FT_INDEX_CACHE
INNODB_FT_CONFIG
Provides a InnoDB Meta information for full-text retrieval and related processing
select * from INFORMATION_SCHEMA.INNODB_FT_CONFIG KEY VALUE --- optimize_checkpoint_limit 180 synced_doc_id 8 stopword_table_name use_stopword 1
INNODB_FT_BEING_DELETED
For monitoring or debugging ; Normally, the data is empty
INNODB_FT_DELETED
Store deleted innoDB The line of ; The cost of index reorganization is too high ; mysql Record with the lines to be deleted , The query will filter the data from the result set ;
But this data is not permanent ; When executed OPTIMIZE TABLE articles; Index reorganization will kill the data in the table
INNODB_FT_DEFAULT_STOPWORD
stay innoDB List of default stop words when creating a full-text search index in the table
select * from INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD value --- a about an are as at be by com de en for from how i in is it la of on or that the this to was what when where who will with und the www
INNODB_FT_INDEX_CACHE
When a new row is inserted . To avoid index reorganization , The index is temporarily stored in the cache
We can execute OPTIMIZE TABLE articles; After the cache Empty , Index put INNODB\_FT\_INDEX\_TABLE In the table
select * from INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE limit 5 WORD FIRST_DOC_ID LAST_DOC_ID DOC_COUNT DOC_ID POSITION --- 1001 5 5 1 5 0 after 3 3 1 3 22 comparison 6 6 1 6 44 configured 7 7 1 7 20 database 2 6 2 2 31
INNODB_FT_INDEX_TABLE
For the first time insert after , There is no information in the table , You need to perform OPTIMIZE TABLE articles;
The structure and cache Agreement
## OPTIMIZE TABLE articles; SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 5; ## above select The result set WORD FIRST_DOC_ID LAST_DOC_ID DOC_COUNT DOC_ID POSITION --- 1001 5 5 1 5 0 after 3 3 1 3 22 comparison 6 6 1 6 44 configured 7 7 1 7 20 database 2 6 2 2 31
Full text search query
Query mode
search_modifier:
{
IN NATURAL LANGUAGE MODE -- This is the default
| IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
| IN BOOLEAN MODE
| WITH QUERY EXPANSION
}Simple query
The full-text index has two fields , Then you have to use two together ; If you want to use a field, you need to set a full-text search index for a field
Full text search has relevance ranking , When the following conditions are met, they are sorted according to the degree of correlation
- There is no clear order by
- Search must be performed using full-text search
- When there is multi table associated query , The full-text index must be the leftmost non constant table in the connection
SELECT count(*) count FROM articles WHERE MATCH(title,body) AGAINST('database')
count
---
2
# Because full-text retrieval is prioritized by default ;count You can go through the following sql To avoid sorting and improve performance
SELECT COUNT(IF(MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE), 1, NULL)) AS count FROM articles;Word segmentation is not case sensitive by default , Want to distinguish from the character set collation adjustment |
|---|
Relevance score query
- The word segmentation option will perform word segmentation
- No, where Then all lines will be scored , If you don't want too much interference , You can add where
select id,MATCH(title,body) AGAINST ('tutorial abdc esf') as score FROM articles
# WHERE MATCH(title,body) AGAINST ('tutorial abdc esf' );
id score
---
1 0.22764469683170319
2 0
3 0.22764469683170319
4 0
5 0
6 0Boolean full text search
As mentioned earlier, the default is NATURAL Way to query ; We can adjust the matching wording through Boolean modifiers , Add... Before the query criteria + Means to contain , - To exclude
select * FROM articles where MATCH(title,body) AGAINST ('+MYSQL -configured -tutorial' IN BOOLEAN MODE);
id title body
---
2 How To Use MySQL Well After you went through a ...
4 1001 MySQL Tricks 1. Never run mysqld as root. 2. ...
5 MySQL vs. YourSQL In the following database comparison ...Some operating instructions for Full-text Retrieval :
MYSQL DBMSFind a line that contains at least one string+MYSQL +DBMSFind a string containing two+MYSQL DBMSFind contains MYSQL The line of , If there is DBMS Then sort by priority+MYSQL -DBMSFind contains MYSQL But it doesn't include DBMS The line of'"MySQL Tutorial"'Double quotation marks combine words- There are other combinations on the official website , such as
~Number , Not very clear , It's not recorded
Full text search extended query ( Synonym effect )
When we use QUERY EXPANSION In mode , It can achieve something similar es Synonym effect of ;
SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);
id title body
---
1 MySQL Tutorial DBMS stands for DataBase ...
5 MySQL vs. YourSQL In the following database comparison ...SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' WITH QUERY EXPANSION);
id title body
---
5 MySQL vs. YourSQL In the following database comparison ...
1 MySQL Tutorial DBMS stands for DataBase ...
3 Optimizing MySQL In this tutorial we show ...
6 MySQL Security When configured properly, MySQL ...
2 How To Use MySQL Well After you went through a ...
4 1001 MySQL Tricks 1. Never run mysqld as root. 2. ...Custom stop words
The metadata retrieval mentioned above INNODB_FT_DEFAULT_STOPWORD yes mysql Default stop word for ; However, stop words can be customized , But the field must be value
Whether the stop words are case sensitive is related to the sorting rules of the server , such as : latin1_swedish_ci Don't distinguish big news , latin1_general_cs / latin1_bin It is case sensitive
CREATE TABLE my_stopwords(value VARCHAR(25)) ENGINE INNODB;
INSERT into my_stopwords (value) values ('Ishmael'),('Ralph');
# Set the new table as the table used by the stop words
SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
# Create another table to test
CREATE TABLE `opening_lines` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`opening_line` text,
`author` varchar(200) DEFAULT NULL,
`title` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `ft_opening_lines` (`opening_line`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO opening_lines (opening_line, author, title)
VALUES
(
'Call me Ishmael.',
'Herman Melville',
'Moby-Dick'
),
(
'A screaming comes across the sky.',
'Thomas Pynchon',
'Gravity\'s Rainbow'
),
(
'I am an invisible man.',
'Ralph Ellison',
'Invisible Man'
),
(
'Where now? Who now? When now?',
'Samuel Beckett',
'The Unnamable'
),
(
'It was love at first sight.',
'Joseph Heller',
'Catch-22'
),
(
'All this happened, more or less.',
'Kurt Vonnegut',
'Slaughterhouse-Five'
),
(
'Mrs. Dalloway said she would buy the flowers herself.',
'Virginia Woolf',
'Mrs. Dalloway'
),
(
'It was a pleasure to burn.',
'Ray Bradbury',
'Fahrenheit 451'
);ngram Full text searcher ( Chinese stop words )
The default stop word size is 2; Modifying the value requires mysql Specify... When starting : mysqld --ngram_token_size=n Test a default of 2 The effect of
Here we need to pay attention to , Although the default stop words are in English ; But as mentioned earlier, you can customize the stop words ; You can add a Chinese word stopper
# Or the previous table ; To build a ngram Full text search index , The previous full-text index must be deleted , Otherwise, this will not take effect
ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER ngram;
# insert data
INSERT INTO articles (title,body) VALUES
(' Database management ',' In this tutorial, I will show you how to manage databases '),
(' Database application development ',' Learn to develop database applications ');
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position LIMIT 15;
WORD FIRST_DOC_ID LAST_DOC_ID DOC_COUNT DOC_ID POSITION
---
Database management 9 9 1 9 0
data 9 10 2 9 0
Database 9 10 2 9 3
Warehouse management 9 9 1 9 6
management 9 9 1 9 9
In this tutorial, I will show you how to manage databases 9 9 1 9 16
In Ben 9 9 1 9 16
Ben Jiao 9 9 1 9 19
course 9 9 1 9 22
Cheng Zhong 9 9 1 9 25
To me 9 9 1 9 28
I will 9 9 1 9 31
To 9 9 1 9 34
To you 9 9 1 9 37
Your exhibition 9 9 1 9 40
SELECT * FROM articles where MATCH(title,body) AGAINST(' Database application ')
id title body
---
8 Database application development Learn to develop database applications
7 Database management In this tutorial, I will show you how to manage databases There are subtle differences between different models , There is a small example on the official website :
- If it is
natural language modePattern ; One ab file , One abc file , searchab bcCan be found - If it is
boolean mode search,Pattern ; One ab file , One abc file , searchab bcOnly... Can be found abc This article
边栏推荐
- Sharelist supports simultaneous mounting of Google drive/onedrive multiple network disks
- [typescript] some summaries in actual combat
- 上线项目之局域网上线软件使用-----phpStudy
- Troubleshooting of black screen after easynvr is cascaded to the upper platform and played for one minute
- How to reduce snapshots
- December 29, 2021: the elimination rules of a subsequence are as follows: 1. In a subsequence
- Talk about how to customize data desensitization
- Is it safe to open an account with flush?
- Polling and connection
- Global and Chinese market of gas fire pit 2022-2028: Research Report on technology, participants, trends, market size and share
猜你喜欢

Gradle asked seven times. You should know that~

How to gradually improve PMO's own ability and management level

How PMO uses two dimensions for performance appraisal

New SQL syntax quick manual!

Steps for formulating the project PMO strategic plan

What are the main dimensions of PMO performance appraisal?

Four aspects of PMO Department value assessment

How to view the role of PMO in agile organizations?

How does PMO select and train project managers?
![Harmonyos application development -- mynotepad[memo][api v6] based on textfield and image pseudo rich text](/img/b1/71cc36c45102bdb9c06e099eb42267.jpg)
Harmonyos application development -- mynotepad[memo][api v6] based on textfield and image pseudo rich text
随机推荐
How to solve the problem that the ID is not displayed when easycvr edits the national standard channel?
Cool 3D sphere text cloud effect!
Spingboot reads the parameter values in the YML configuration file
From AIPL to grow, talking about the marketing analysis model of Internet manufacturers
Fortress deployment server setup operation guide for novices
How does the fortress machine connect to the server? Novice must know operation steps
JS namespace
Bi-sql index
I am 30 years old, no longer young, and have nothing
Setinterval stop
Global and Chinese market of cloud billing services 2022-2028: Research Report on technology, participants, trends, market size and share
On line project LAN on-line software use ----phpstudy
It's very interesting. Make an app to decorate the Christmas hat on Christmas!
Go local variables & global variables
Use of paging components in fusiondesign
How to deal with product pictures? How to select mapping software?
Why is it invalid to assign values to offsetwidth and offsetHeight
. Net Core 3. X MVC built-in log extension log4net
Markdown syntax summary
Uniapp routing page Jump