当前位置:网站首页>Full text search of MySQL

Full text search of MySQL

2022-06-23 21:24:00 eeaters

mysql Index of participle

Preface

You can follow the official directly : mysql Official documents -fulltext

Now, when the product doesn't agree with each other, we want to segment words or query all fuzzy words , Previous solutions include :

  1. The amount of data is small , The data is thrown to the front end ; The front end will see to it
  2. A lot of code has like Both sides % Of the query , I hate this kind of sql, But it seems that most development doesn't matter , It's usually hard to persuade others
  3. Unfortunately, I came to this demand , Argue with the product , Purpose : Fuzzy query is very reasonable , But the matching rules need to be changed a little , After fuzzy query , I need to pick up ( In most cases, the demand products will step back )
  4. The argument failed , The product said that such and such products are like this , When the boss says something, we also want to ; As a result, the link is complex and the request volume is not small , Needs assessment , Do you need to go to es

This time, a similar requirement is in the design stage , Because there's plenty of time , The requirements are simple , Just learn from the official website mysql Full text search of , In case it's appropriate , In the future, there will be one more alternative …

Scope and limitations of use

  1. Only supported with InnoDB and MyISAM engine , The forms of expression are slightly different , Didn't take MyISAM To test
  2. Partitioned tables are not supported
  3. I won't support it Unicode code ,usc2 It is best not to use this character set
  4. Stop words do not support Chinese by default , Japanese ….
    • Character based ngram The full-text search parser supports three languages of China, Japan and South Korea
    • There is another one in Japanese MeCab Parser plug-in
  5. Although we can set a character set for each line , But the columns related to full-text search must be the same as the characters
  6. % This is for fuzzy queries , Full text search does not support this wildcard ; You usually use word* It looks like
  7. DML( Additions and deletions ) In operation , The transaction will not be formally inserted into the full-text index table until it is committed , There will be no dirty reading and so on

Global configuration of full-text search

show global VARIABLES where Variable_name like 'innodb_ft%'

Variable_name	Value
---
innodb_ft_aux_table
innodb_ft_cache_size	8000000
innodb_ft_enable_diag_print	OFF
innodb_ft_enable_stopword	ON
innodb_ft_max_token_size	84
innodb_ft_min_token_size	3
innodb_ft_num_word_optimize	2000
innodb_ft_result_cache_limit	2000000000
innodb_ft_server_stopword_table
innodb_ft_sort_pll_degree	2
innodb_ft_total_cache_size	640000000
innodb_ft_user_stopword_table

Data preparation

CREATE TABLE articles (
	id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
	title VARCHAR (200),
	body TEXT,
	FULLTEXT (title, body)
) ENGINE = INNODB;

INSERT INTO articles (title, body)
VALUES
	(
		'MySQL Tutorial',
		'DBMS stands for DataBase ...'
	),
	(
		'How To Use MySQL Well',
		'After you went through a ...'
	),
	(
		'Optimizing MySQL',
		'In this tutorial we show ...'
	),
	(
		'1001 MySQL Tricks',
		'1. Never run mysqld as root. 2. ...'
	),
	(
		'MySQL vs. YourSQL',
		'In the following database comparison ...'
	),
	(
		'MySQL Security',
		'When configured properly, MySQL ...'
	);

    #  When querying table data later, you need to execute the following sql To locate the debug in this table 
	set GLOBAL innodb_ft_aux_table = 'test/articles';

Full text retrieval metadata

SHOW TABLES FROM INFORMATION_SCHEMA LIKE 'INNODB_FT%';

Tables_in_information_schema (INNODB_FT%)
---
INNODB_FT_CONFIG
INNODB_FT_BEING_DELETED
INNODB_FT_DELETED
INNODB_FT_DEFAULT_STOPWORD
INNODB_FT_INDEX_TABLE
INNODB_FT_INDEX_CACHE

INNODB_FT_CONFIG

Provides a InnoDB Meta information for full-text retrieval and related processing

select * from INFORMATION_SCHEMA.INNODB_FT_CONFIG

KEY	                         VALUE
---
optimize_checkpoint_limit	  180
synced_doc_id	               8
stopword_table_name
use_stopword	               1

INNODB_FT_BEING_DELETED

For monitoring or debugging ; Normally, the data is empty

INNODB_FT_DELETED

Store deleted innoDB The line of ; The cost of index reorganization is too high ; mysql Record with the lines to be deleted , The query will filter the data from the result set ;

But this data is not permanent ; When executed OPTIMIZE TABLE articles; Index reorganization will kill the data in the table

INNODB_FT_DEFAULT_STOPWORD

stay innoDB List of default stop words when creating a full-text search index in the table

select * from INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD

value
---
a
about
an
are
as
at
be
by
com
de
en
for
from
how
i
in
is
it
la
of
on
or
that
the
this
to
was
what
when
where
who
will
with
und
the
www

INNODB_FT_INDEX_CACHE

When a new row is inserted . To avoid index reorganization , The index is temporarily stored in the cache

We can execute OPTIMIZE TABLE articles; After the cache Empty , Index put INNODB\_FT\_INDEX\_TABLE In the table

select * from INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE limit 5

WORD	FIRST_DOC_ID	LAST_DOC_ID	DOC_COUNT	DOC_ID	POSITION
---
1001	5	5	1	5	0
after	3	3	1	3	22
comparison	6	6	1	6	44
configured	7	7	1	7	20
database	2	6	2	2	31

INNODB_FT_INDEX_TABLE

For the first time insert after , There is no information in the table , You need to perform OPTIMIZE TABLE articles;

The structure and cache Agreement

##
OPTIMIZE TABLE articles;
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 5;


##  above select The result set 
WORD	FIRST_DOC_ID	LAST_DOC_ID	DOC_COUNT	DOC_ID	POSITION
---
1001        	5       	5         	1        	5   	0
after       	3        	3	        1        	3   	22
comparison   	6	        6	        1        	6   	44
configured	    7 	        7        	1         	7   	20
database	    2	        6        	2        	2   	31

Full text search query

mysql The official sample

Query mode

search_modifier:
  {
       IN NATURAL LANGUAGE MODE         --  This is the default 
     | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
     | IN BOOLEAN MODE
     | WITH QUERY EXPANSION
  }

Simple query

The full-text index has two fields , Then you have to use two together ; If you want to use a field, you need to set a full-text search index for a field

Full text search has relevance ranking , When the following conditions are met, they are sorted according to the degree of correlation

  1. There is no clear order by
  2. Search must be performed using full-text search
  3. When there is multi table associated query , The full-text index must be the leftmost non constant table in the connection
SELECT count(*) count FROM articles WHERE MATCH(title,body) AGAINST('database')

count
---
2

#  Because full-text retrieval is prioritized by default ;count You can go through the following sql To avoid sorting and improve performance 
SELECT    COUNT(IF(MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE), 1, NULL))    AS count    FROM articles;

Word segmentation is not case sensitive by default , Want to distinguish from the character set collation adjustment

Relevance score query

  • The word segmentation option will perform word segmentation
  • No, where Then all lines will be scored , If you don't want too much interference , You can add where
select id,MATCH(title,body) AGAINST ('tutorial abdc esf') as score FROM articles
# WHERE MATCH(title,body) AGAINST ('tutorial abdc esf' );

id	score
---
1	0.22764469683170319
2	0
3	0.22764469683170319
4	0
5	0
6	0

Boolean full text search

As mentioned earlier, the default is NATURAL Way to query ; We can adjust the matching wording through Boolean modifiers , Add... Before the query criteria + Means to contain , - To exclude

select * FROM articles where MATCH(title,body) AGAINST ('+MYSQL -configured -tutorial' IN BOOLEAN MODE);

id	title	body
---
2	How To Use MySQL Well	After you went through a ...
4	1001 MySQL Tricks	1. Never run mysqld as root. 2. ...
5	MySQL vs. YourSQL	In the following database comparison ...

Some operating instructions for Full-text Retrieval :

  1. MYSQL DBMS Find a line that contains at least one string
  2. +MYSQL +DBMS Find a string containing two
  3. +MYSQL DBMS Find contains MYSQL The line of , If there is DBMS Then sort by priority
  4. +MYSQL -DBMS Find contains MYSQL But it doesn't include DBMS The line of
  5. '"MySQL Tutorial"' Double quotation marks combine words
  6. There are other combinations on the official website , such as ~ Number , Not very clear , It's not recorded

Full text search extended query ( Synonym effect )

When we use QUERY EXPANSION In mode , It can achieve something similar es Synonym effect of ;

SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' IN NATURAL LANGUAGE MODE);

id	title	body
---
1	MySQL Tutorial	DBMS stands for DataBase ...
5	MySQL vs. YourSQL	In the following database comparison ...
SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' WITH QUERY EXPANSION);

id	title	body
---
5	MySQL vs. YourSQL	In the following database comparison ...
1	MySQL Tutorial	DBMS stands for DataBase ...
3	Optimizing MySQL	In this tutorial we show ...
6	MySQL Security	When configured properly, MySQL ...
2	How To Use MySQL Well	After you went through a ...
4	1001 MySQL Tricks	1. Never run mysqld as root. 2. ...

Custom stop words

The metadata retrieval mentioned above INNODB_FT_DEFAULT_STOPWORD yes mysql Default stop word for ; However, stop words can be customized , But the field must be value

Whether the stop words are case sensitive is related to the sorting rules of the server , such as : latin1_swedish_ci Don't distinguish big news , latin1_general_cs / latin1_bin It is case sensitive

CREATE TABLE my_stopwords(value VARCHAR(25)) ENGINE INNODB;
INSERT into my_stopwords (value) values ('Ishmael'),('Ralph');

#  Set the new table as the table used by the stop words 
SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';

#  Create another table to test 
CREATE TABLE `opening_lines` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `opening_line` text,
  `author` varchar(200) DEFAULT NULL,
  `title` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `ft_opening_lines` (`opening_line`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;


INSERT INTO opening_lines (opening_line, author, title)
VALUES
	(
		'Call me Ishmael.',
		'Herman Melville',
		'Moby-Dick'
	),
	(
		'A screaming comes across the sky.',
		'Thomas Pynchon',
		'Gravity\'s Rainbow'
	),
	(
		'I am an invisible man.',
		'Ralph Ellison',
		'Invisible Man'
	),
	(
		'Where now? Who now? When now?',
		'Samuel Beckett',
		'The Unnamable'
	),
	(
		'It was love at first sight.',
		'Joseph Heller',
		'Catch-22'
	),
	(
		'All this happened, more or less.',
		'Kurt Vonnegut',
		'Slaughterhouse-Five'
	),
	(
		'Mrs. Dalloway said she would buy the flowers herself.',
		'Virginia Woolf',
		'Mrs. Dalloway'
	),
	(
		'It was a pleasure to burn.',
		'Ray Bradbury',
		'Fahrenheit 451'
	);

ngram Full text searcher ( Chinese stop words )

The default stop word size is 2; Modifying the value requires mysql Specify... When starting : mysqld --ngram_token_size=n Test a default of 2 The effect of

Here we need to pay attention to , Although the default stop words are in English ; But as mentioned earlier, you can customize the stop words ; You can add a Chinese word stopper

#  Or the previous table ;  To build a ngram Full text search index , The previous full-text index must be deleted , Otherwise, this will not take effect 
ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER ngram;


#  insert data 
INSERT INTO articles (title,body) VALUES
    (' Database management ',' In this tutorial, I will show you how to manage databases '),
    (' Database application development ',' Learn to develop database applications ');


SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position LIMIT 15;

WORD	FIRST_DOC_ID	LAST_DOC_ID	DOC_COUNT	DOC_ID	POSITION
---
 Database management 	9	9	1	9	0
 data 	9	10	2	9	0
 Database 	9	10	2	9	3
 Warehouse management 	9	9	1	9	6
 management 	9	9	1	9	9
 In this tutorial, I will show you how to manage databases 	9	9	1	9	16
 In Ben 	9	9	1	9	16
 Ben Jiao 	9	9	1	9	19
 course 	9	9	1	9	22
 Cheng Zhong 	9	9	1	9	25
 To me 	9	9	1	9	28
 I will 	9	9	1	9	31
 To 	9	9	1	9	34
 To you 	9	9	1	9	37
 Your exhibition 	9	9	1	9	40


SELECT * FROM articles where MATCH(title,body) AGAINST(' Database application ')

id	title	body
---
8	 Database application development 	 Learn to develop database applications 
7	 Database management 	 In this tutorial, I will show you how to manage databases 

There are subtle differences between different models , There is a small example on the official website :

  • If it is natural language mode Pattern ; One ab file , One abc file , search ab bc Can be found
  • If it is boolean mode search, Pattern ; One ab file , One abc file , search ab bc Only... Can be found abc This article
原网站

版权声明
本文为[eeaters]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/12/202112241253351040.html