ES在电商项目中的简单应用:检索商品(Laravel)
本文主要介绍常见的电商商品数据如何存入ES和查询.ES作为搜索引擎,相比数据库的SQL搜索语句可以实现更多丰富的筛选条件.
常见的使用方法是:先按用户的搜索条件从ES中查询出关键信息(如id),然后直接列表返回给用户,或是根据ES结果作为SQL条件再从数据库中查询.
另外文章末尾会附上示例代码地址
需要的安装文件
安装elasticsearch
1.下载并且解压
我本地的安装目录是 /opt/program
下载成功后解压
tar -zxvf elasticsearch-8.6.1-linux-x86_64.tar.gz
sudo mv elasticsearch-8.6.1 elasticsearch
2.第一次启动
cd elasticsearch/bin
./elastivsearch
第一次启动完成之后es会在config目录下生成默认的配置,根据自身需要我们需要去修改,比如https校验之类的
注意!elasticsearch无法以root账户启动
一些常见的启动错误
1、 check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
出现这个错误提示我们需要增加虚拟内存,可以这样操作以解决
#修改sysctl.conf
vim /etc/sysctl.conf
#在文件末尾增加以下内容
vm.max_map_count=262144
2、exception during geoip databases updateorg.elasticsearch.ElasticsearchException: not all primary shards of [.geoip_databases] index are active
该错误为跨域报错,通过修改es配置文件以处理
#进入config文件夹
cd /opt/program/elasticsearch/config
#修改elasticsearch.yml 配置信息
vim elasticsearch.yml
#在文件末尾增加以下内容
#解决[.geoip_databases] index are active 问题
ingest.geoip.downloader.enabled: false
#允许跨域
http.cors.enabled: true
http.cors.allow-origin: "*"
3、若是 localhost:9200 无法访问,我们需要去更改配置去掉https校验
#进入config文件夹
cd /opt/program/elasticsearch/config
#修改elasticsearch.yml 配置信息
vim elasticsearch.yml
#修改已有配置,这些配置在首次启动后会默认生成
#关闭xpack认证
xpack.security.enabled: true 改成 false
#与客户端http链接是否加密,先选择不加密
xpack.security.http.ssl: true 改成 false
修改密码
##注意:该操作需要在es运行中执行
./elasticsearch-reset-password -u elastic -i
安装IK分词器
什么是分词器
分词就是将一段文本按照一定的规则切分成以一个一个的关键字的过程,分词器就是这一过程的实现,通过分词可以一定程度上增加关键词的命中率
一般来说分词器由三种组件构成:
1.character filter 字符过滤器:
在一段文本分词之前,先进行预处理,比如说最常见的就是 【过滤html标签】,hello --> hello,I & you --> I and you
2.tokenizers 分词器:
默认情况下,英文分词根据空格将单词分开;中文分词按单字隔开,也可以采用机器学习算法来分词
3.Token filters Token过滤器:
将切分的单词进行加工,大小写转换,去掉停用词(例如“a”、“and”、“the”等等 ),加入同义词(例如同义词像“jump”和“leap”)
安装中文分词器(IK)
该操作可直接通过命令行完成安装,首先复制安装包下载链接
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v8.6.1/elasticsearch-analysis-ik-8.6.1.zip
注意:该操作需要先关闭es才能执行成功
出现如下提示则表示版本错误
Plugin [analysis-ik] was built for Elasticsearch version 8.5.0 but version 8.6.1 is running
创建laravel项目并且引入es的composer包
创建laravel项目我这里就不做赘述,引入composer命令
composer require elasticsearch/elasticsearch
安装成功后创建数据表结构,以下是sql,在代码里我写成了迁移文件
# 商品信息主表
CREATE TABLE `products` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'normal',
`category_id` bigint unsigned DEFAULT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`long_title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`image` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`on_sale` tinyint(1) NOT NULL DEFAULT '1',
`rating` double(8,2) NOT NULL DEFAULT '5.00',
`sold_count` int unsigned NOT NULL DEFAULT '0',
`review_count` int unsigned NOT NULL DEFAULT '0',
`price` decimal(10,2) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `products_category_id_foreign` (`category_id`),
KEY `products_type_index` (`type`)
) ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
# 端口的sku表,每个库存单位占一条,如:同款手机的不同版本就各算作一个库存单位,对应的价格可能也不一样的,主要用来确定货单价和库存的
# 与主表关联
CREATE TABLE `product_skus` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`price` decimal(10,2) NOT NULL,
`stock` int unsigned NOT NULL,
`product_id` bigint unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `product_skus_product_id_foreign` (`product_id`),
CONSTRAINT `product_skus_product_id_foreign` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=295 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
# 商品的详细属性,每个属性占一条,不同商品之间允许存在同名的属性,如:A手机的`内存`属性值为`8G`,主要用来展示属性和筛选商品的
# 与主表关联
CREATE TABLE `product_properties` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`product_id` bigint unsigned NOT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`value` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `product_properties_product_id_foreign` (`product_id`),
CONSTRAINT `product_properties_product_id_foreign` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
laravel中执行数据库迁移
# 迁移数据库结构
php artisan migrate
# 迁移数据
php artisan db:seed
编写es实例创建代码包括增删改查
<?php
namespace App\Services;
use Elastic\Elasticsearch\ClientBuilder;
use Elastic\Elasticsearch\Exception\ClientResponseException;
use Elastic\Elasticsearch\Exception\MissingParameterException;
use Elastic\Elasticsearch\Exception\ServerResponseException;
class Elastic
{
public \Elastic\Elasticsearch\Client $client;
public string $index;
/**
* @Desc 构造函数中创建es实例
* @param string $index
* @throws \Elastic\Elasticsearch\Exception\AuthenticationException
*/
public function __construct(string $index)
{
$this->index = $index;
$this->client = ClientBuilder::create()
->setHosts(['http://localhost:9200'])
->setBasicAuthentication('elastic', '37LVAY8qAt8p8NncxuUg')
// ->setCABundle('path/to/http_ca.crt')
->build();
}
/**
* @Desc 初始化索引
* @return void
* @throws ClientResponseException
* @throws MissingParameterException
* @throws ServerResponseException
*/
public function createIndex()
{
$str = '{"properties": {
"type": { "type": "keyword" } ,
"title": { "type": "text", "analyzer": "ik_smart" },
"long_title": { "type": "text", "analyzer": "ik_smart" },
"category_id": { "type": "integer" },
"category": { "type": "keyword" },
"category_path": { "type": "keyword" },
"description": { "type": "text", "analyzer": "ik_smart" },
"price": { "type": "scaled_float", "scaling_factor": 100 },
"on_sale": { "type": "boolean" },
"rating": { "type": "float" },
"sold_count": { "type": "integer" },
"review_count": { "type": "integer" },
"skus": {
"type": "nested",
"properties": {
"title": { "type": "text", "analyzer": "ik_smart", "copy_to": "skus_title" },
"description": { "type": "text", "analyzer": "ik_smart", "copy_to": "skus_description" },
"price": { "type": "scaled_float", "scaling_factor": 100 }
}
},
"properties": {
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"value": { "type": "keyword", "copy_to": "properties_value" },
"search_value": { "type": "keyword"}
}
}
}
}';
$this->client->index([
'index' => $this->index,
'body' => ['mappings' => json_decode($str, true)]
]);
//dd($res->asArray());
}
/**
* @Desc 新增文档
* @param int $id
* @param string $body
* @return void
* @throws ClientResponseException
* @throws MissingParameterException
* @throws ServerResponseException
*/
public function addData(int $id, string $body)
{
$index = $this->index;
try {
$this->client->create(compact('id', 'index', 'body'));
} catch (\Exception $e) {
if ($e->getCode() != 409) {
throw $e;
}
}
}
/**
* @Desc 检索
* @param string|null $body
* @return array
* @throws ClientResponseException
* @throws ServerResponseException
*/
public function search(string $body = null)
{
$index = $this->index;
return $this->client->search(compact('index', 'body'))->asArray();
}
public function update(int $id, string $body)
{
$index = $this->index;
return $this->client->update(compact('id', 'index', 'body'))->asArray();
}
/**
* @Desc 删除文档
* @param int $id
* @return array
* @throws ClientResponseException
* @throws MissingParameterException
* @throws ServerResponseException
*/
public function delete(int $id)
{
$index = $this->index;
return $this->client->delete(compact('id', 'index'))->asArray();
}
}
"analyzer": "ik_smart"
代表这个字段需要使用 IK 中文分词器分词,在前面的章节也介绍过了。
还有有一些字段的类型是keyword
,这是字符串类型的一种,这种类型是告诉 Elasticsearch 不需要对这个字段做分词,通常用于邮箱、标签、属性等字段。
scaled_float
代表一个小数位固定的浮点型字段,与 Mysql 的 decimal 类型类似,后面的scaling_factor
用来指定小数位精度,100 就代表精确到小数点后两位。
skus 和 properties 的字段类型是nested
,代表这个字段是一个复杂对象,由下一级的 properties 字段定义这个对象的字段。有人可能会问,我们的『商品 SKU』和『商品属性』明明是对象数组,为什么这里可以定义成对象?这是 Elasticsearch 的另外一个特性,每个字段都可以保存多个值,这也是 Elasticsearch 的类型没有数组的原因,因为不需要,每个字段都可以是数组。注意看
skus.title
字段的定义里加入了copy_to
参数,值是skus_title
,Elasticsearch 就会把这个字段值复制到skus_title
字段里,这样就可以在multi_match
的 fields 里通过skus_title
来匹配。skus.description
和properties.name
同理。请确保 Elasticsearch 返回了
"acknowledged" : true
,否则就要检查提交的内容是否有问题。
数据的初始化以及查询代码
<?php
namespace App\Http\Controllers;
use App\Models\Product;
use App\Services\Elastic;
class ElasticDemo extends Controller
{
/**
* 初始化数据
* @return void
* @throws \Elastic\Elasticsearch\Exception\ClientResponseException
* @throws \Elastic\Elasticsearch\Exception\MissingParameterException
* @throws \Elastic\Elasticsearch\Exception\ServerResponseException
*/
public function initData()
{
(new Elastic('products'))->createIndex();
$data = Product::with([
'skus:title,description,price,product_id',
'properties:product_id,name,value'
])->get()->toArray();
foreach ($data as $datum) {
foreach ($datum['properties'] as $property) {
$property['search_value'] = "{$property['name']}:{$property['value']}";
}
$datum['on_sale'] = (bool)$datum['on_sale']
(new Elastic('products'))->addData($datum['id'], json_encode($datum));
}
}
/**
* @Desc 查询
* @return void
* @throws \Elastic\Elasticsearch\Exception\ClientResponseException
* @throws \Elastic\Elasticsearch\Exception\ServerResponseException
*/
public function search()
{
$bodyArr = [
'query' => [
'bool' => [
'filter' => [
[
'term' => ['on_sale' => true]
]
],
'must' => [
[
'multi_match' => [
'query' => "金士顿",
'type' => 'best_fields',
'fields' => [
"title^3",
"long_title^2",
"category^2",
"description",
"skus_title",
"skus_description",
"properties_value",
]
]
]
]
]
],
'aggs' => [
'properties_count' => [
'nested' => ['path' => 'properties'],
'aggs' => [
'properties_name' => [
'terms' => ['field' => 'properties.name'],
'aggs' => [
'properties_value' => [
'terms' => ['field' => 'properties.value']
]
]
],
]
]
]
];
$res = (new Elastic('products'))->search(json_encode($bodyArr));
dd($res);
}
}
对上面DSL语句的一些解释:
- 以金士顿作为关键字在多个字段中进行查询,title^3表示提升从title字段查询出来的结果的权重
- propertiescount是自定义的聚合结果名称,同理后面的properties*亦然
- properties_count聚合的作用是相当于在查询出来的结果中,将嵌套属性properties全部查询出来
- properties_name是在上一层的基础上按properties.name即属性名分组
- properties_value同理,在上一层的基础上按属性值分组
数据库增删改的同步
增删改查这里我采用了laravel的观察者Event,每次对Product的增删改就会触发同步代码进行操作
php artisan make:observer ProductObserver --model=Product
创建后然后编辑文件 app/Observers/ProductObserver.php
<?php
namespace App\Observers;
use App\Models\Product;
use App\Services\Elastic;
class ProductObserver
{
/**
* Handle the Product "created" event.
*
* @param \App\Models\Product $product
* @return void
*/
public function created(Product $product)
{
$data = Product::with([
'skus:title,description,price,product_id',
'properties:product_id,name,value'
])->whereId($product->id)->first()->toArray();
foreach ($data['properties'] as $property) {
$property['search_value'] = "{$property['name']}:{$property['value']}";
}
$data['on_sale'] = (bool)$data['on_sale'];
(new Elastic('products'))->addData($data['id'], json_encode($data));
}
/**
* Handle the Product "updated" event.
*
* @param \App\Models\Product $product
* @return void
*/
public function updated(Product $product)
{
$data = Product::with([
'skus:title,description,price,product_id',
'properties:product_id,name,value'
])->whereId($product->id)->first()->toArray();
foreach ($data['properties'] as $property) {
$property['search_value'] = "{$property['name']}:{$property['value']}";
}
$data['on_sale'] = (bool)$data['on_sale'];
(new Elastic('products'))->update($data['id'], json_encode($data));
}
/**
* Handle the Product "deleted" event.
*
* @param \App\Models\Product $product
* @return void
*/
public function deleted(Product $product)
{
(new Elastic('products'))->delete($product->id);
}
/**
* Handle the Product "restored" event.
*
* @param \App\Models\Product $product
* @return void
*/
public function restored(Product $product)
{
//
}
/**
* Handle the Product "force deleted" event.
*
* @param \App\Models\Product $product
* @return void
*/
public function forceDeleted(Product $product)
{
//
}
}
然后在 App\Providers\EventServiceProvider
中注册服务
Product::observe(ProductObserver::class);
查询结果
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.9678952,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "93",
"_score" : 1.9678952,
"_source" : {
"id" : 93,
"type" : "normal",
"category_id" : 13,
"title" : "Kingston/金士顿 金士顿DDR3 1600 8GB",
"long_title" : "Kingston/金士顿 DDR3 1600 8G 台式机电脑 三代 内存条 兼容1333",
"on_sale" : true,
"rating" : 5,
"sold_count" : 0,
"review_count" : 0,
"price" : "239.00",
"category" : [
"电脑配件",
"内存"
],
"category_path" : "-10-",
"description" : "",
"skus" : [
{
"title" : "DDR3 1600 8G",
"description" : "DDR3 1600 8G",
"price" : "439.00"
},
{
"title" : "DDR3 1600 4G",
"description" : "DDR3 1600 4G",
"price" : "239.00"
},
{
"title" : "DDR3 1333 4G",
"description" : "DDR3 1333 4G",
"price" : "259.00"
}
],
"properties" : [
{
"name" : "品牌名称",
"value" : "金士顿",
"search_value" : "品牌名称:金士顿"
},
{
"name" : "传输类型",
"value" : "DDR3",
"search_value" : "传输类型:DDR3"
},
{
"name" : "内存容量",
"value" : "4GB",
"search_value" : "内存容量:4GB"
},
{
"name" : "内存容量",
"value" : "8GB",
"search_value" : "内存容量:8GB"
}
]
}
},
{
"_index" : "products",
"_type" : "_doc",
"_id" : "91",
"_score" : 1.6602381,
"_source" : {
"id" : 91,
"type" : "normal",
"category_id" : 13,
"title" : "Kingston/金士顿 HX424C15FB/8",
"long_title" : "金士顿 骇客神条 ddr4 2400 8g 台式机 电脑 四代内存条 吃鸡内存",
"on_sale" : true,
"rating" : 5,
"sold_count" : 0,
"review_count" : 0,
"price" : "399.00",
"category" : [
"电脑配件",
"内存"
],
"category_path" : "-10-",
"description" : "",
"skus" : [
{
"title" : "8GB 黑色",
"description" : "8GB 2400 DDR4 黑色",
"price" : "549.00"
},
{
"title" : "8GB 绿色",
"description" : "8GB 2400 DDR4 绿色",
"price" : "529.00"
},
{
"title" : "16GB",
"description" : "2400 16GB",
"price" : "1299.00"
},
{
"title" : "4GB",
"description" : "2400 4GB",
"price" : "399.00"
}
],
"properties" : [
{
"name" : "品牌名称",
"value" : "金士顿",
"search_value" : "品牌名称:金士顿"
},
{
"name" : "内存容量",
"value" : "8GB",
"search_value" : "内存容量:8GB"
},
{
"name" : "传输类型",
"value" : "DDR4",
"search_value" : "传输类型:DDR4"
},
{
"name" : "内存容量",
"value" : "4GB",
"search_value" : "内存容量:4GB"
},
{
"name" : "内存容量",
"value" : "16GB",
"search_value" : "内存容量:16GB"
}
]
}
}
]
},
"aggregations" : {
"properties_count" : {
"doc_count" : 9,
"properties_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "内存容量",
"doc_count" : 5,
"properties_value" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "4GB",
"doc_count" : 2
},
{
"key" : "8GB",
"doc_count" : 2
},
{
"key" : "16GB",
"doc_count" : 1
}
]
}
},
{
"key" : "传输类型",
"doc_count" : 2,
"properties_value" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "DDR3",
"doc_count" : 1
},
{
"key" : "DDR4",
"doc_count" : 1
}
]
}
},
{
"key" : "品牌名称",
"doc_count" : 2,
"properties_value" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "金士顿",
"doc_count" : 2
}
]
}
}
]
}
}
}
}
示例代码(laravel)
Github:https://github.com/dyjh/laravel-demo
附一:ES的数据类型
1.text/keyword都可理解存储string类型,在2.x版本种,es是存在string类型的,进化过程中演变为了text和keyword。
2.Numeric,存储的则为数值型数据,其中包含long, integer, short, byte, double, float, half_float, scaled_float共8种数值类型,也是够丰富的,mysql种的数值型数据,都可以映射到Numeric当中。
3.Date,时间类型,可以通过预设的3种形式或者自定义的format来存储日期信息。
4.Boolean
5.binary
6.范围类型:integer_range, float_range, long_range, double_range, date_range
其中最常用的还是text/keyword
1.text类型
场景:用于索引全文值字段,例如电子邮件正文或产品说明。这些字段是被分词的。分析过程允许Elasticsearch在每个全文字段中搜索单个单词。文本字段不用于排序,很少用于聚合。
2.keyword类型
场景:用于索引结构化内容的字段,例如电子邮件地址,主机名,状态代码,邮政编码或标签。
它们通常用于过滤(找到我的所有博客文章,其中 status为published),排序,和聚合。关键字字段只能按其确切值进行搜索。
peter
大佬 膜拜