对MySQL数据库特定字段进行全文检索

一、背景

现有数据库某字段保存有大量文本,现在需要对多条数据进行全文检索,以确定哪些条数据包含关键字。此时可使用MySQL全文检索。在搜索之前,需要为响应的字段增加索引,这一步会产生较大的系统资源消耗,请谨慎操作。

使用方法如下文。

二、确认要全文检索的字段

首先查看表结构。

use wordpress;
desc wp_posts;

返回结果如下。

+-----------------------+-----------------+------+-----+---------------------+----------------+
| Field                 | Type            | Null | Key | Default             | Extra          |
+-----------------------+-----------------+------+-----+---------------------+----------------+
| ID                    | bigint unsigned | NO   | PRI | NULL                | auto_increment |
| post_author           | bigint unsigned | NO   | MUL | 0                   |                |
| post_date             | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_date_gmt         | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_content          | longtext        | NO   | MUL | NULL                |                |
| post_title            | text            | NO   |     | NULL                |                |
| post_excerpt          | text            | NO   |     | NULL                |                |
| post_status           | varchar(20)     | NO   |     | publish             |                |
| comment_status        | varchar(20)     | NO   |     | open                |                |
| ping_status           | varchar(20)     | NO   |     | open                |                |
| post_password         | varchar(255)    | NO   |     |                     |                |
| post_name             | varchar(200)    | NO   | MUL |                     |                |
| to_ping               | text            | NO   |     | NULL                |                |
| pinged                | text            | NO   |     | NULL                |                |
| post_modified         | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_modified_gmt     | datetime        | NO   |     | 0000-00-00 00:00:00 |                |
| post_content_filtered | longtext        | NO   |     | NULL                |                |
| post_parent           | bigint unsigned | NO   | MUL | 0                   |                |
| guid                  | varchar(255)    | NO   |     |                     |                |
| menu_order            | int             | NO   |     | 0                   |                |
| post_type             | varchar(20)     | NO   | MUL | post                |                |
| post_mime_type        | varchar(100)    | NO   |     |                     |                |
| comment_count         | bigint          | NO   |     | 0                   |                |
+-----------------------+-----------------+------+-----+---------------------+----------------+
23 rows in set (0.06 sec)

执行SQL语句,确认内容是在正确的字段。

select * from wp_posts where post_status='publish' limit 1

查询后,发现内容是存储在名为post_content字段。

三、查询字段是否有索引

以表明wp_posts为例,执行如下命令:

show index from wp_posts;

返回结果如下:

mysql> show index from wp_posts;
+----------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| wp_posts |          0 | PRIMARY          |            1 | ID          | A         |        3731 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | post_name        |            1 | post_name   | A         |        1907 |      191 |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | type_status_date |            1 | post_type   | A         |          11 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | type_status_date |            2 | post_status | A         |          12 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | type_status_date |            3 | post_date   | A         |        3731 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | type_status_date |            4 | ID          | A         |        3731 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | post_parent      |            1 | post_parent | A         |         342 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| wp_posts |          1 | post_author      |            1 | post_author | A         |           2 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+----------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
8 rows in set (0.04 sec)

mysql> 

可看到这个索引列表中Column_name字段名称位置,没有上一步搜索的post_content字段。因此就需要为其添加索引。

四、为字段增加索引

现在给post_content字段添加索引。命令如下。

ALTER TABLE wp_posts
ADD FULLTEXT INDEX blog_text (post_content);

根据数据量,这一步消耗的时间可能很长,对CPU消耗很大,需要在非生产高峰、维护窗口执行。CPU使用情况可以通过RDS界面进行监控。

本例中数据量很小,因此几十秒执行完毕,返回结果如下:

Query OK, 0 rows affected, 1 warning (31.13 sec)
Records: 0  Duplicates: 0  Warnings: 1

五、使用全文检索

查询语句如下,其中post_content是列的名字,查询内容是关键字video。命令如下。统计有多少条记录匹配。

SELECT count(post_title)
FROM wp_posts
WHERE MATCH(post_content) AGAINST('video') 
	and post_status='publish';

返回结果有49条。

+---------------------+
| count(post_content) |
+---------------------+
|                  49 |
+---------------------+
1 row in set (0.06 sec)

查询其标题。

SELECT post_title
FROM wp_posts
WHERE MATCH(post_content) AGAINST('video')
	and post_status='publish'
ORDER by post_modified desc
limit 10;

由此确认全文检索正常。


最后修改于 2025-09-04