本文會用到的連結
Beginner’s Crash Course to Elastic Stack - Part 1: Intro to Elasticsearch and Kibana
我的主¡管底下有三個員工,我是其中一位,今天主管說有一個新需求要交代,派給了另外兩個同事,但沒派給我,為了展現本人一心向學的個性(快打考績了,要狗腿一點,並且希望可以苟過適用期),我便很假掰的問了一下
「不好意思,請問等等的會議,雖然我沒有被分配到工作,但我也想參加,不知道可以嗎?」
中午參加會議完後,主管便打電話給我
「Hoxton,剛剛的會議你有參加齁?現在有多了一個新的需求OOO,再請你處理一下」
ㄏㄏ ,結果新的需求會用到Elastic Search 我完全不懂,死定
安裝Elastic Search Kibana
本人強烈建議,一率安裝8.1.0的版本!!!
Elastic Search是一個程式,而Kibana則是它的GUI
相關的安裝可以到Udemy上查看,這也是我最推薦的方式,安裝的部分是免費可以看的
教學連結Udemy
到這邊下載對應版本的Elastic Search,再次強調,安裝8.1.0
https://www.elastic.co/downloads/elasticsearch
2024/4/27補充
下載完後,進到根目錄中,本人強烈建議編輯 config/elasticsearch.yml 中的兩個值
關閉ssl認證
1
2
| xpack.security.http.ssl:
enabled: false
|
關閉帳號密碼認證
1
| xpack.security.enabled: false
|
關閉這兩個會讓你少很多痛苦,相信我
解壓鎖完後進入到Elastic Search的根目錄中,輸入以下指令,就可以啟用ElasticSearch了
在啟動完成後,ElasticSearch會為我們創建一個超級使用者,並會把它的密碼輸出在terminal中
如果你不幸忘記了密碼,可以輸入以下指令來重置
1
| bin/elasticsearch-reset-password -u elastic
|
還會產生一組Token用作Kibana的連接,這組Token會存續30分鐘
當然,如果你不幸又忘記了,可以輸入以下指令來重新獲取
1
| bin/elasticsearch-create-enrollment-token --scope kibana
|
接著安裝Kibana
再次再次強調,請安裝8.1.0的版本
安裝解壓縮完,到Kibana的根目錄下執行
就可以啟動Kibana,並且訪問對應的頁面
URL : http://localhost:5601/app/home#/
接著按照他的要求,把Elastic Search所提供的token填入Kibana中
然後帳號密碼的部分也是請查看Elastic Search的Terminal
這樣就完成登入了!真的是操你媽的8.11版本,我搞超久,後來降成8.1.0就沒問題了,我真的是幹你媽的Elastic Search
如何使用Kibana
URL : http://localhost:5601/app/home#/
進入Kibana後點選這邊
按下這個賤,就可以送出請求囉!
Elastic Search 介紹
常見的使用場景有:Uber在搜尋駕駛、搜尋附近的餐廳、遊戲的數據搜集、Tinder配對、火星好奇心號的數據收集、Log紀錄、安全性分析等等
並不是一個Database,他更像是一個搜尋與分析的工具
Index
Document
Field
ElasticSearch的Request格式
本章節參考至
Beginner’s Crash Course to Elastic Stack - Part 1: Intro to Elasticsearch and Kibana
文章內容來自:Part-1-Intro-to-Elasticsearch-and-Kibana
編輯一下elastic/congig/elasticsearch.yml的檔案,將這cluster跟node命名如下
查看cluster狀態
大概是長這樣,比如說以下的格式就是去查cluster的健康狀況
回應的結果就是這樣
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| {
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 11,
"active_shards" : 11,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
|
查看node狀態
再發一個請求,查看node的狀態,確認我們對node的重新命名有生效
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| {
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elasticsearch",
"nodes" : {
"yz_GM-vITti54G1nS6GiEg" : {
"timestamp" : 1698688524532,
"name" : "curd_node",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
...以下略
}
|
從這個Response我們也可以確定,我們對elasticsearch.yml的修改(更改cluster以及node名稱,確實是有生效的)
創建index
儲存Document到index中(Create)
有兩種方式,一種是透過PUT、另一種是透過POST,兩者的差別如下
- 當使用POST時,elastic search會自動為你的document創建id
- 使用PUT時,代表你要自己指定document的id是什麼
Post方式
Reqeust
1
2
3
4
| POST favorite_candy/_doc
{"first_name":"Lisa",
"candy":"Sour Skittles"
}
|
Response
1
2
3
4
5
6
7
8
9
10
11
12
13
| {
"_index" : "favorite_candy",
"_id" : "RXvLgYsBNuMe8RFD4lGF",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
|
Put方式
那個1代表我要Assign的Id名稱
1
2
3
4
| PUT favorite_candy/_doc/1
{"first_name":"Lisa",
"candy":"Sour Skittles"
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| {
"_index" : "favorite_candy",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
|
我們在試著多塞一些資料進去,因為我們等等要retrive這些資料
1
2
3
4
5
6
7
8
9
| PUT favorite_candy/_doc/2
{"first_name":"Rachel",
"candy":"Rolos"
}
PUT favorite_candy/_doc/3
{"first_name":"Tom",
"candy":"Sweet Tarts"
}
|
這邊特別注意一下,像現在的情況,id 1已經有一組資料了,如果重複PUT資料到id 1的情況,返回的結果會是這樣,version會變成2,result會變成update
也有另一種方式,類似SQL中的 craete if not exist…的用法,就是這樣
1
2
3
4
| PUT favorite_candy/_create/1
{"first_name":"Rachel",
"candy":"Rolos"
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| {
"error" : {
"root_cause" : [
{
"type" : "version_conflict_engine_exception",
"reason" : "[1]: version conflict, document already exists (current version [2])",
"index_uuid" : "PscoViAvQF6UvIEvN35Cpg",
"shard" : "0",
"index" : "favorite_candy"
}
],
"type" : "version_conflict_engine_exception",
"reason" : "[1]: version conflict, document already exists (current version [2])",
"index_uuid" : "PscoViAvQF6UvIEvN35Cpg",
"shard" : "0",
"index" : "favorite_candy"
},
"status" : 409
}
|
這樣子,如果id已經存在,就不會overwrite這個document
查看Ducument(Retrive)
GET {index名稱}/_doc/{id}
1
| GET favorite_candy/_doc/1
|
更新Document(Update)
1
2
3
4
5
6
7
| POST Name-of-the-Index/_update/id-of-the-document-you-want-to-update
{
"doc": {
"field1": "value",
"field2": "value",
}
}
|
1
2
3
4
5
6
| POST favorite_candy/_update/1
{
"doc": {
"candy": "M&M's"
}
}
|
刪除Document(Delete)
1
| DELETE Name-of-the-Index/_doc/id-of-the-document-you-want-to-delete
|
1
| DELETE favorite_candy/_doc/1
|
Precision And Recall
使用實際資料來練習Elastic Search
Part-2-Understanding-the-relevance-of-your-search-with-Elasticsearch-and-Kibana
資料下載連結:Kaggle
資料大概長這樣,就是一個新聞網站的json資料,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| [
{
"link": "https://www.huffpost.com/entry/covid-boosters-uptake-us_n_632d719ee4b087fae6feaac9",
"headline": "Over 4 Million Americans Roll Up Sleeves For Omicron-Targeted COVID Boosters",
"category": "U.S. NEWS",
"short_description": "Health experts said it is too early to predict whether demand would match up with the 171 million doses of the new boosters the U.S. ordered for the fall.",
"authors": "Carla K. Johnson, AP",
"date": "2022-09-23"
},
{
"link": "https://www.huffpost.com/entry/american-airlines-passenger-banned-flight-attendant-punch-justice-department_n_632e25d3e4b0e247890329fe",
"headline": "American Airlines Flyer Charged, Banned For Life After Punching Flight Attendant On Video",
"category": "U.S. NEWS",
"short_description": "He was subdued by passengers and crew when he fled to the back of the aircraft after the confrontation, according to the U.S. attorney's office in Los Angeles.",
"authors": "Mary Papenfuss",
"date": "2022-09-23"
},
{
"link": "https://www.huffpost.com/entry/funniest-tweets-cats-dogs-september-17-23_n_632de332e4b0695c1d81dc02",
"headline": "23 Of The Funniest Tweets About Cats And Dogs This Week (Sept. 17-23)",
"category": "COMEDY",
"short_description": "\"Until you have a dog you don't understand what could be eaten.\"",
"authors": "Elyse Wanshel",
"date": "2022-09-23"
}
]
|
匯入資料
index name : news_headlines
查看有哪些index
查看Index底下有哪些Document
如果數量太多,會顯示預設的10個,relation也會顯示 gte 代表 great than
為了在大型數據集上提高響應速度,Elasticsearch默認限制了總計數為10,000。如果您想要確切的總命中數,請使用以下查詢。
1
2
3
4
| GET news_headlines/_search
{
"track_total_hits": true
}
|
兩種不同的搜尋 Query & Aggregation
Query用來搜尋一些符合特定指標的Document
Aggregation是將數據總結為指標、統計數據和其他分析的過程,更接近分析數據
使用Query依照關鍵字來做搜尋
match中搭配filedname即可達到搜尋的效果-
1
2
3
4
5
6
7
8
| GET user/_search
{
"query": {
"match": {
"username": "abc123"
}
}
}
|
1
2
3
4
5
6
| GET news_headlines/_search
{
"query": {"match": {
"headline": "Khloe Kardashian Kendall Jenner"
}}
}
|
值得注意的是,這個不像是SQL的
1
| select * from news_headlines where headline = 'Khloe Kardashian Kendall Jenner'
|
而是像這樣,match query是一種全文檢索(Fulltext query)
1
| select * from news_headlines where headline like '%Khloe%' or headline like '%Kardashian%' or headline like '%Kendall%' or headline like '%Jenner%'
|
使用operator 提升 Query精準度
如果我們是希望提升精准度,要找出包含 Khloe、Kardashian、Kendall、Jenner的,我們可以在搜尋加上operator
1
2
3
4
5
6
7
8
9
| GET news_headlines/_search
{
"query": {"match": {
"headline": {
"query": "Khloe Kardashian Kendall Jenner",
"operator":"AND"
}
}}
}
|
使用minium_should_match
1
2
3
4
5
6
7
8
9
| GET news_headlines/_search
{
"query": {"match": {
"headline": {
"query": "Khloe Kardashian Kendall Jenner",
"minium_should_match": 3
}
}}
}
|
使用Query依照時間來做搜尋
範例:
gte : greater than
lte : less than
1
2
3
4
5
6
7
8
9
10
11
| GET enter_name_of_the_index_here/_search
{
"query": {
"Specify the type of query here": {
"Enter name of the field here": {
"gte": "Enter lowest value of the range here",
"lte": "Enter highest value of the range here"
}
}
}
}
|
1
2
3
4
5
6
7
8
9
10
11
| GET news_headlines/_search
{
"query": {
"range": {
"date": {
"gte": "2015-06-20",
"lte": "2015-09-22"
}
}
}
}
|
使用Query依照時間來做搜尋並排序
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| GET news_headlines/_search
{
"query": {
"range": {
"date": {
"gte": "2015-06-20",
"lte": "2015-09-22"
}
}
},
"sort": [
{
"date": {
"order": "asc"
}
}
]
}
|
各種不同的Match 語法
Match_Phrase
假設我們要找一首歌「Shape of You」,如果只用以下的搜尋,會出現很多不相關的結果,因為Match Quert會把 Shape 、 of 、 You 當成三個不同的詞彙去找,想要避免這樣,就需要使用
1
2
3
4
5
6
7
8
| GET news_headlines/_search
{
"query": {
"match": {
"headline": "shape of you"
}
}
}
|
1
2
3
4
5
6
7
8
| GET news_headlines/_search
{
"query": {
"match_phrase": {
"headline": "shape of you"
}
}
}
|
Multi_Match
在不同的Field中搜尋,效果就有點像SQL的
Select * from news_headlines where headline like ‘%Michelle Obama%’ or short_description like ‘%Michelle Obama%’ or authors like like ‘%Michelle Obama%’
1
2
3
4
5
6
7
8
9
10
11
12
13
| GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "Michelle Obama",
"fields": [
"headline",
"short_description",
"authors"
]
}
}
}
|
但這樣其實有一些問題,我們可能最主要是想找Michelle Obama的文章,但只要short_description裡面有提到Michelle Obama就會被包含進來,我們可能更 Focus 在 headline 的權重上面
加權型Multi_Match
就是在Field後多加一個次方,就能將 headline 有 Michelle Obama 的結果先列出來了(Per Field Boosting)
1
2
3
4
5
6
7
8
9
10
11
12
13
| GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "Michelle Obama",
"fields": [
"headline^2",
"short_description",
"authors"
]
}
}
}
|
將Multi-Match跟Match_Pharse結合
集百家之長,結合出來的搜尋
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| GET news_headlines/_search
{
"query": {
"multi_match": {
"query": "Michelle Obama",
"fields": [
"headline^2",
"short_description",
"authors"
],
"type":"phrase"
}
}
}
|
使用Bool Query來進行搜尋
TimeCode標好了 Beginner’s Crash Course to Elastic Stack - Part 3: Full text queries
文章內容 :https://github.com/LisaHJung/Part-3-Running-full-text-queries-and-combined-queries-with-Elasticsearch-and-Kibana
所謂的bool query其實就是將不同條件整合再一起的一種搜尋,相當於SQL不是會有那種 where id = 1 and status=1的那種where and or語句嗎,bool query就是在做這個部分
常見的bool query有以下幾種
- must:相當於AND 代表一定要有
- filter:表示過濾條件,類似where的用法
- should:不會影響到搜尋的結果,但會影響到排序,符合的會靠上
- Must_not:表示不匹配的,相當於NOT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| GET name_of_index/_search
{
"query": {
"bool": {
"must": [
{One or more queries can be specified here. A document MUST match all of these queries to be considered as a hit.}
],
"must_not": [
{A document must NOT match any of the queries specified here. It it does, it is excluded from the search results.}
],
"should": [
{A document does not have to match any queries specified here. However, it if it does match, this document is given a higher score.}
],
"filter": [
{These filters(queries) place documents in either yes or no category. Ones that fall into the yes category are included in the hits. }
]
}
}
}
|
這些bool也會跟基本查詢
做搭配
比如說像這樣
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| {
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2019-01-01" }}}
],
"should": [
{ "match": { "author": "John" }},
{ "match": { "author": "Doe" }}
],
"must_not": [
{ "match": { "category": "Marketing" }}
]
}
}
}
|
而一般Query是長這樣,可以稍微比較感覺到Query 跟Bool Query的差異
1
2
3
4
5
6
7
8
9
10
11
| {
"query": {
"range": {
"date": {
"gte": "2015-06-20",
"lte": "2015-09-22"
}
}
}
}
|
以下示範一些常見的Bool Query的寫法
Bool Must查詢
1
2
3
4
5
6
7
8
9
10
11
12
13
| {
"query": {
"bool": {
"must": [
{
"match": {
"username": "Hoxton"
}
}
]
}
}
}
|
Bool Filter過濾
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| {
"query": {
"bool": {
"filter": [
{
"match": {
"status": "1"
}
},{
"range":{
"loginTime":{"gte":"2023-01-01","lte":"2024-12-31"}
}
}
]
}
}
}
|
Bool Should
1
2
3
4
5
6
7
8
9
10
11
12
| GET /my_index/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "Elasticsearch" }},
{ "match": { "content": "search" }}
]
}
}
}
|
Bool MustNot
1
2
3
4
5
6
7
8
9
10
11
| {
"query": {
"bool": {
"must_not": [
{ "match": { "title": "Elasticsearch" }},
{ "match": { "content": "search" }}
]
}
}
}
|
Bool 多重搜尋(Match)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| {
"query":{
"bool":{
"must":[
{
"match":{
"userName":"Hoxton"
}
},
{
"match":{
"gender":1
}
}
]
}
}
}
|
Bool Terms 搜尋
2024/05/25更新
找出某個Filed中符合條件的document
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| GET news_headlines/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"category": [
"U.S. NEWS",
"COMEDY"
]
}
}
]
}
}
}
|
使用Aggregation做搜尋
Kaggle 網址 :Kaggle E-Commerce Data
1
2
3
4
5
6
7
8
9
10
11
| GET enter_name_of_the_index_here/_search
{
"aggs": {
"name your aggregation here": {
"specify aggregation type here": {
"field": "name the field you want to aggregate here",
"size": state how many buckets you want returned here
}
}
}
}
|
Aggs: 代表你要送一個Aggregation
By_category: 你Aggregation出來的東西要叫什麼
Terms: 以字段做分析
Filed:字段具體的key是什麼
Size:Filed最大上限,假設現在你的原始資料,key其實有1000個,但你只想要看100個,就可以在這邊限制,底下我有附圖片,可以看差距
1
2
3
4
5
6
7
8
9
10
11
| GET news_headlines/_search
{
"aggs": {
"by_category": {
"terms": {
"field": "category",
"size": 100
}
}
}
}
|
原始資料
聚合出來的結果
Size參數的影響
又使用Query 又使用Aggregation做搜尋
搜尋某一類別中最重要的詞語
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| GET enter_name_of_the_index_here/_search
{
"query": {
"match": {
"Enter the name of the field": "Enter the value you are looking for"
}
},
"aggregations": {
"Name your aggregation here": {
"significant_text": {
"field": "Enter the name of the field you are searching for"
}
}
}
}
|
以下的搜尋可以這樣讀
「我想要找到,在ENTERTAINMENT這個目錄中,headline的中,最具重要性詞語有哪些
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| GET news_headlines/_search
{
"query": {
"match": {
"category": "ENTERTAINMENT"
}
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
|
Aggregation 實戰
Kaggle 網址 :Kaggle E-Commerce Data
Youtube 網址 :Beginner’s Crash Course to Elastic Stack - Part 4: Aggregations
資料的樣子
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| [
{
"_index": "ecommerce_data",
"_id": "U3hHL48B6YD2b_RB4N-A",
"_score": 1,
"_source": {
"UnitPrice": 0.55,
"Description": "AGED GLASS SILVER T-LIGHT HOLDER",
"Quantity": 144,
"Country": "United Kingdom",
"InvoiceNo": "543456",
"InvoiceDate": "2/8/2011 12:41",
"CustomerID": 15753,
"StockCode": "21326"
}
},
{
"_index": "ecommerce_data",
"_id": "VHhHL48B6YD2b_RB4N-A",
"_score": 1,
"_source": {
"UnitPrice": 1.25,
"Description": "SET OF SALT AND PEPPER TOADSTOOLS",
"Quantity": 12,
"Country": "United Kingdom",
"InvoiceNo": "543457",
"InvoiceDate": "2/8/2011 12:47",
"CustomerID": 17428,
"StockCode": "22892"
}
},
]
|
三種Aggregation
- Metric Aggregation:用來計算數字型的資料,比如說計算最大值、最小值、平均值等等
- Bucket Aggregation:想要聚合複數Subset的資料時,使用Bucket Aggregation (一桶一桶的)
- Combined Aggregation
Metric Aggregation-求總和
1
2
3
4
5
6
7
8
9
10
11
| GET ecommerce_data/_search
{
"size":0,
"aggs": {
"sum_unit_price": {
"sum": {
"field": "UnitPrice"
}
}
}
}
|
Size:0 影響的是會不會顯示hint的資料,如果0就只會顯示聚合結果
Metric Aggregation-求最低
Size:0 影響的是會不會顯示hint的資料,如果0就只會顯示聚合結果
1
2
3
4
5
6
7
8
9
10
11
| GET ecommerce_data/_search
{
"size":0,
"aggs": {
"sum_unit_price": {
"min": {
"field": "UnitPrice"
}
}
}
}
|
Metric Aggregation-求最高
1
2
3
4
5
6
7
8
9
10
11
| GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"highest_unit_price": {
"max": {
"field": "UnitPrice"
}
}
}
}
|
Size:0 影響的是會不會顯示hint的資料,如果0就只會顯示聚合結果
Metric Aggregation-求平均
Size:0 影響的是會不會顯示hint的資料,如果0就只會顯示聚合結果
1
2
3
4
5
6
7
8
9
10
11
| GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"average_unit_price": {
"avg": {
"field": "UnitPrice"
}
}
}
}
|
Mertic Aggregation - 求所有(總和、最低、最高、平均)
Size:0 影響的是會不會顯示hint的資料,如果0就只會顯示聚合結果
1
2
3
4
5
6
7
8
9
10
11
| GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"all_stats_unit_price": {
"stats": {
"field": "UnitPrice"
}
}
}
}
|
使用Query條件來限制Aggregation的範圍
限制只統計Country為Germany的平均單位售價
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| GET ecommerce_data/_search
{
"size": 0,
"query": {
"match": {
"Country": "Germany"
}
},
"aggs": {
"germany_average_unit_price": {
"avg": {
"field": "UnitPrice"
}
}
}
}
|
Cardinality Aggregation
關鍵字:去重複、去重
離散數學中的Cardinality,指的是一個集合中,不重複元素的個數,例如集合A={1,2,7,7,7},那麼A的Cardinality就是3,寫作|A|=3。
在ES中的Cardinality Aggregation也是類似的概念
統計不重複的 CustomerID
1
2
3
4
5
6
7
8
9
10
11
| GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"number_unique_customers": {
"cardinality": {
"field": "CustomerID"
}
}
}
}
|
共有4359個不重複的CustomerId
Bucket Aggregation
四種不同的Bucket Aggregation
- Date Historgram Aggregation:使用時間區間來 Group 資料
- Histogram Aggregation
- Range Aggregation
- Terms Aggregation
Historgram :直方圖
Date Historgram Aggregation
按InvoiceDate,把資料每八個小時分成一桶
1
2
3
4
5
6
7
8
9
10
11
12
| GET ecommerce_data/_search
{
"size": 0,
"aggs": {
"transactions_by_8_hrs": {
"date_histogram": {
"field": "InvoiceDate",
"fixed_interval": "8h"
}
}
}
}
|
精確與召回(Precision And Recall)
幹,這真的好難翻,精確就有點像是AND,召回就有點像是OR,使用召回,有可能會找到不太相干的資料,比如找臘腸,卻找到臘腸狗那樣,搜尋結果較多,但較不準確
Increase Recall
1
2
3
4
5
6
7
8
9
10
| GET enter_name_of_index_here/_search
{
"query": {
"match": {
"Specify the field you want to search": {
"query": "Enter search terms"
}
}
}
}
|
1
2
3
4
5
6
7
8
9
10
| GET news_headlines/_search
{
"query": {
"match": {
"headline": {
"query": "Khloe Kardashian Kendall Jenner"
}
}
}
}
|
Increase Precision
這個搜尋是精確搜尋,會找到更符合預期的結果,語法其實就是多增加一個operator and就可以了啦
唉,好累,不知道為啥10/31半夜還在這邊卷
1
2
3
4
5
6
7
8
9
10
11
| GET enter_name_of_index_here/_search
{
"query": {
"match": {
"Specify the field you want to search": {
"query": "Enter search terms",
"operator": "and"
}
}
}
}
|
1
2
3
4
5
6
7
8
9
10
11
| GET news_headlines/_search
{
"query": {
"match": {
"headline": {
"query": "Khloe Kardashian Kendall Jenner",
"operator": "and"
}
}
}
}
|
最小匹配搜尋(minimum_should_match)
minimum_should_match是用來微調(fined tuning)匹配量的,我們的搜尋最初是包含Khloe Kardashian Kendall Jenner這四個詞的搜尋,但我們可以選擇到底要匹配幾個,比如說可以匹配3個,那就是C4取3,可以看下面的範例
1
2
3
4
5
6
7
8
9
10
11
| GET enter_name_of_index_here/_search
{
"query": {
"match": {
"headline": {
"query": "Enter search term here",
"minimum_should_match": Enter a number here
}
}
}
}
|
1
2
3
4
5
6
7
8
9
10
11
| GET news_headlines/_search
{
"query": {
"match": {
"headline": {
"query": "Khloe Kardashian Kendall Jenner",
"minimum_should_match": 3
}
}
}
}
|
不同最小匹配量的值所影響到的結果