当前位置:网站首页>MySQL case: analysis of full-text indexing
MySQL case: analysis of full-text indexing
2022-06-24 07:33:00 【[email protected]】
Preface
Full text indexing , It is a way to create inverted indexes , Ways to quickly match document content . and B+ Tree index is the same , Inverted index is also an index structure , An inverted index is composed of all non repeated word segmentation in the document and the mapping of its document . Inverted indexes generally have two different structures , One is inverted file index, The other is full inverted index.
(1)inverted file index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID)}
Number | Text | Documents |
|---|---|---|
1 | how | (1,3) |
2 | are | (1,3) |
3 | you | (1,3) |
4 | fine | (2,4) |
5 | thanks | (2,4) |
(2)full inverted index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID: In the document )}
Number | Text | Documents |
|---|---|---|
1 | how | (1:1),(3:1) |
2 | are | (1:2),(3:2) |
3 | you | (1:3),(3:3) |
4 | fine | (2:1),(4:1) |
5 | thanks | (2:2),(4:2) |
Realization principle
Auxiliary table
stay MySQL InnoDB in , When a full-text index is created , A series of auxiliary tables will be created at the same time , Information for storing inverted indexes .
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+----------------------------------------------------+-------+(1)FTS_0000000000000147_00000000000001c9_INDEX_1-6: this 6 Auxiliary tables are used to store inverted indexes , Stored is the participle 、 file ID And location ; namely InnoDB It's using full inverted index.
(2)FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE:FTS_0000000000000147_DELETED What is stored is what has been deleted 、 Documents that have not been removed from full-text index data ,FTS_0000000000000147_DELETED_CACHE Is its cache table .
(3)FTS_0000000000000147_BEING_DELETED/FTS_0000000000000147_BEING_DELETED_CACHE:FTS_0000000000000147_BEING_DELETED What is stored is what has been deleted 、 Documents that are being removed from full-text index data ,FTS_0000000000000147_BEING_DELETED_CACHE Is its cache table .
(4)FTS_0000000000000147_CONFIG: Store internal information about full-text indexes ; The most important thing is to store FTS_SYNCED_DOC_ID, Represents a document that has been parsed and flushed ; Happen when crash recovery when , Can pass FTS_SYNCED_DOC_ID To determine which documents have not been swiped 、 It needs to be re parsed and added to the full-text index cache .
Insert data into
If when inserting a document , It is necessary to carry out word segmentation 、 Operations such as updating auxiliary tables , That could cost a lot . To avoid this problem ,InnoDB Full text index cache is introduced , Used to cache recently inserted data , The data will not be written to the auxiliary table in batches until the cache is full ; Can pass INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE Query recently inserted data ; Can pass innodb_ft_cache_size/innodb_ft_total_cache_size Parameters control a single table / Full text index cache size for all tables ; Another thing to note , Full text index cache , Only the recently inserted data is cached , Instead of caching the data of the auxiliary table , When the result is returned , You need to merge the data of the auxiliary table and the recently inserted data in the cache before returning .
Data deletion
If you delete a document , You need to update the auxiliary table , This can also be costly . To avoid this problem ,InnoDB Only deleted documents will be recorded in FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE surface , It will not be deleted from the auxiliary table , If you want to thoroughly clean up the deleted data , Need to pass through optimize table Rebuild full text index .
mysql> set GLOBAL innodb_optimize_fulltext_only=ON; Query OK, 0 rows affected (0.01 sec) mysql> OPTIMIZE TABLE opening_lines; +--------------------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------+----------+----------+----------+ | test.opening_lines | optimize | status | OK | +--------------------+----------+----------+----------+ 1 row in set (0.01 sec)
Data update
For data updates ,InnoDB Data is deleted first 、 And then insert the data , Refer to the above for the specific operation process .
Watch
We mentioned before , When a full-text index is created , A series of auxiliary tables are also created at the same time , Used to store information about full-text indexes ; however , We can't directly query these auxiliary tables , Only by querying information_schema Under the encapsulated temporary table to monitor the full-text index status , As follows :
INNODB_FT_CONFIG INNODB_FT_INDEX_TABLE INNODB_FT_INDEX_CACHE INNODB_FT_DEFAULT_STOPWORD INNODB_FT_DELETED INNODB_FT_BEING_DELETED
Basic grammar
Syntax of full-text indexing , The syntax is not very different from that of a normal index , It's as follows :
(1) Create full text index
alter table $table_name add fulltext index $index_name($column_name); create fulltext index $index_name on $table_name($column_name);
(2) Delete full text index
alter table $table_name drop index $index_name;
(3) Inquire about
select xxx from $table_name where match($column_name) against(xxx);
summary
In some specific situations , Full text indexing is still very useful , Can greatly speed up the query speed ; however ,MySQL The full-text index of has great limitations , For example, it is not supported to specify the delimiter of the participle ( Default is space ),ngram The parser can specify fixed length participles , But the practicality is still poor . If it is a scenario with high requirements for Full-text Retrieval , Recommended or used ES Products such as .
版权声明
本文为[[email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/06/20210630195005941p.html
边栏推荐
- 湖北专升本-湖师计科
- A summary of the posture of bouncing and forwarding around the firewall
- Tutorial on simple use of Modbus to BACnet gateway
- Camera calibration (calibration purpose and principle)
- MaxCompute远程连接,上传、下载数据文件操作
- [MRCTF2020]千层套路
- 【Cnpm】使用教程
- 捏脸师: 炙手可热的元宇宙造型师
- PIP install XXX on the terminal but no module named XXX on pycharm
- 【图像特征提取】基于脉冲耦合神经网络(PCNN)实现图像特征提取含Matlab源码
猜你喜欢

JVM debugging tool -jmap
![[WUSTCTF2020]alison_ likes_ jojo](/img/a9/dcc6f524772cd0b8781289cbaef63f.png)
[WUSTCTF2020]alison_ likes_ jojo

现货黄金有哪些眩人的小技术?

只显示两行,超出部分省略号显示

buuctf misc 从娃娃抓起

RDD basic knowledge points

【图像融合】基于NSST结合PCNN实现图像融合附matlab代码

RDD基础知识点

Buuctf misc grab from the doll
![[pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet](/img/86/5db689cdac2a927a23dff3fb9594b0.png)
[pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet
随机推荐
软件性能测试分析与调优实践之路-JMeter对RPC服务的性能压测分析与调优-手稿节选
How VPN works
Description of module data serial number positioning area code positioning refers to GBK code
In the era of industrial Internet, there are no more centers in the real sense, and these centers just turn tangible into intangible
RDD基础知识点
Unexpected token u in JSON at position 0
Black box and white box models for interpretable AI
Learning to use BACnet gateway of building control system is not so difficult
Software performance test analysis and tuning practice path - JMeter's performance pressure test analysis and tuning of RPC Services - manuscript excerpts
[vulhub shooting range]] ZABBIX SQL injection (cve-2016-10134) vulnerability recurrence
【图像融合】基于像素显着性结合小波变换实现多焦点和多光谱图像融合附matlab代码
Bjdctf 2020 Bar _ Babystack
使用SystemParametersInfo访问用户界面设置
Selector (>, ~, +, [])
MFC multithreaded semaphore csemaphore critical area and mutually exclusive events
20 not to be missed ES6 tips
PIP install XXX on the terminal but no module named XXX on pycharm
[从零开始学习FPGA编程-41]:视野篇 - 摩尔时代与摩尔定律以及后摩尔时代的到来
JVM debugging tool -jmap
[cnpm] tutorial