ES 中Join的使用

引出问题

在mysql中,可以使用join来实现表与表之间的数据连接,在es中如何实现这个问题?

相对于mysql来说,es有几个不同的地方

  1. 不支持跨index的join
  2. 一个index只能包含一个类型
  3. 分布式的存储方式,对于数据的搜寻造成障碍

对于上面的几个问题,es的解决方案是**在一个索引下,借助父子关系,实现类似Mysql中多表关联的操作**

定义类型和join索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PUT myorder
{
"mappings": {
"_doc": {
"properties": {
"order_join": {
"type": "join",
"relations": {
"order": "suborder"
}
}
}
}
}
}

定义join关系为order_join,其中order是父文档,suborder是子文档。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
put  myorder/_mapping/_doc
{
"properties": {
"orderId": {
"type": "keyword"
},
"shortTime": {
"type": "date"
},
"name": {
"type": "keyword"
},
"amount": {
"type": "double"
},
"desc": {
"type": "text"
}
}
}

插入主单数据

1
2
3
4
5
6
7
8
9
PUT myorder/_doc/10001
{
"shortTime": "2019-01-05",
"orderId": "10001",
"name": "user2",
"amount": 123.09,
"desc": "其他收入",
"order_join": "order"
}

order_join定义为order类型

插入子单数据

使用自定义ID用PUT方法

1
2
3
4
5
6
7
8
9
10
11
12
POST myorder/_doc?routing=1
{
"shortTime": "2019-01-05",
"orderId": "10001",
"name": "user2",
"amount": 12.09,
"desc": "收入",
"order_join": {
"name": "suborder",
"parent":"10001"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13

POST myorder/_doc?routing=1
{
"shortTime": "2019-01-05",
"orderId": "10002",
"name": "user2",
"amount": 122.09,
"desc": "收入",
"order_join": {
"name": "suborder",
"parent":"10001"
}
}

建立父子关系索引,routing 参数是必须的,因为父子文档必须在同一个分片上

查询主单

1
2
3
4
5
6
7
8
9
10
11
12
13
GET myorder/_search
{
"query": {
"has_child" : {
"type" : "suborder",
"query" : {
"match_all" : {

}
}
}
}
}

查询子单

1
2
3
4
5
6
7
8
9
10
11
12
13
GET myorder/_search
{
"query": {
"has_parent" : {
"parent_type" : "order",
"query" : {
"match_all" : {

}
}
}
}
}

聚合查询

  • 主单聚合

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    GET myorder/_search
    {
    "query": {
    "parent_id": {
    "type": "suborder",
    "id": "10001"
    }
    },
    "aggs": {
    "parents12312": {
    "terms": {
    "field": "order_join#order"
    },
    "aggs": {
    "sumAmount": {
    "stats": {
    "field": "amount"
    }
    }
    }
    }
    }
    }

  • 子单聚合

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19

    GET myorder/_search
    {
    "size": 0,
    "aggs": {
    "parent": {
    "children": {
    "type": "suborder"
    },
    "aggs": {
    "sumAmount": {
    "stats": {
    "field": "amount"
    }
    }
    }
    }
    }
    }
  • 聚合加筛选:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    GET myorder/_search
    {
    "query": {
    "has_child" : {
    "type" : "suborder",
    "query" : {
    "match_all" : {

    }
    }
    }
    },
    "aggs": {
    "parent": {
    "children": {
    "type": "suborder"
    },
    "aggs": {
    "fields": {
    "terms": {
    "field": "orderId"
    },
    "aggs": {
    "sumAmount": {
    "sum": {
    "field": "amount"
    }
    },
    "having": {
    "bucket_selector": {
    "buckets_path": {
    "orderCount": "_count",
    "sumAmount": "sumAmount"
    },
    "script": {
    "source": "params.sumAmount >= 100 && params.orderCount >=0"
    }
    }
    }
    }
    }
    }
    }
    }
    }

定义一对多的索引


一对一的索引模型很难满足日常业务的数据处理,es也支持一对多的join

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PUT myorder
{
"mappings": {
"_doc": {
"properties": {
"order_join": {
"type": "join",
"relations": {
"order": ["suborder1", "suborder2"],
"suborder2":"suborder3"
}
}
}
}
}
}

上面的索引的关联的关系如下:

1
2
3
4
5
      order
/ \
suborder1 suborder2
\
suborder3