当前位置:网站首页>How Clickhouse queries imported data within a specified time period

How Clickhouse queries imported data within a specified time period

2022-06-24 05:08:00 jasong

One purpose

  1. Data query
  2. Data migration and import

Two Why say ClickHouse Data migration

  1. Clickhouse copier No incremental import
  2. Clickhouse remote slower , And for ClickHouse Internal table
  3. The data filtering dimension is small

3、 ... and ClickHouse MergeTreeData

QueryPlanPtr MergeTreeDataSelectExecutor::readFromParts(
    MergeTreeData::DataPartsVector parts,
    const Names & column_names_to_return,
    const StorageMetadataPtr & metadata_snapshot,
    const SelectQueryInfo & query_info,
    const Context & context,
    const UInt64 max_block_size,
    const unsigned num_streams,
    const PartitionIdToMaxBlock * max_block_numbers_to_read) const
{
    for (const String & name : column_names_to_return)
    {
        if (name == "_part")
        {
            part_column_queried = true;
            virt_column_names.push_back(name);
        }
        else if (name == "_part_index")
        {
            virt_column_names.push_back(name);
        }
        else if (name == "_partition_id")
        {
            virt_column_names.push_back(name);
        }
        else if (name == "_part_uuid")
        {
            part_uuid_column_queried = true;
            virt_column_names.push_back(name);
        }
        else if (name == "_sample_factor")
        {
            sample_factor_column_queried = true;
            virt_column_names.push_back(name);
        }
        else
        {
            real_column_names.push_back(name);
        }
    }

3.1 How to use it

  • ClickHouse MergeTree The data has the above virtual fields
  • So we can simply and directly limit the data dimension without modifying the code part Granularity

Four operations

4.1 Create tables and import

## 1  View table fields 
DESCRIBE TABLE db_1.test_26

Query id: 856af95b-cb07-43d9-a776-5e6fd3d3c456

┌─name──┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ id    │ UInt16 │              │                    │         │                  │                │
│ value │ UInt32 │              │                    │         │                  │                │
│ dt    │ Date   │              │                    │         │                  │                │
└───────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

3 rows in set. Elapsed: 0.004 sec.

##  Write ignore 

4.2 Inquire about

## 2  View all data 
SELECT *
FROM db_1.test_26

Query id: 6211055b-02af-482e-bc55-ccd765b0b929

┌─id─┬─value─┬─────────dt─┐
│ 11 │  2013 │ 1975-06-12 │
└────┴───────┴────────────┘
┌─id─┬─value─┬─────────dt─┐
│ 11 │  2013 │ 1975-06-12 │
│ 11 │  2013 │ 1975-06-12 │
│ 11 │  2013 │ 1975-06-12 │
│ 11 │  2013 │ 1975-06-12 │
└────┴───────┴────────────┘
┌─id─┬─value─┬─────────dt─┐
│ 11 │  2013 │ 1975-06-12 │
└────┴───────┴────────────┘

6 rows in set. Elapsed: 0.148 sec. 

4.3 _part Virtual hidden fields

## 3  View the corresponding data part

SELECT
    id,
    value,
    dt,
    _part
FROM db_1.test_26

Query id: b7d81a80-089a-4434-b82e-a0e27c60c8ac

┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │  2013 │ 1975-06-12 │ 197506_5_5_0 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │  2013 │ 1975-06-12 │ 197506_6_6_0 │
└────┴───────┴────────────┴──────────────┘

6 rows in set. Elapsed: 0.111 sec. 

4.4 system.parts utilize

DESCRIBE TABLE system.parts

Query id: 2dea5ab6-6857-4708-8919-a09f2382f059

┌─name──────────────────────────────────┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ partition                             │ String          │              │                    │         │                  │                │
│ name                                  │ String          │              │                    │         │                  │                │
│ uuid                                  │ UUID            │              │                    │         │                  │                │
│ part_type                             │ String          │              │                    │         │                  │                │
│ active                                │ UInt8           │              │                    │         │                  │                │
│ marks                                 │ UInt64          │              │                    │         │                  │                │
│ rows                                  │ UInt64          │              │                    │         │                  │                │
│ bytes_on_disk                         │ UInt64          │              │                    │         │                  │                │
│ data_compressed_bytes                 │ UInt64          │              │                    │         │                  │                │
│ data_uncompressed_bytes               │ UInt64          │              │                    │         │                  │                │
│ marks_bytes                           │ UInt64          │              │                    │         │                  │                │
│ modification_time                     │ DateTime        │              │                    │         │                  │                │
│ remove_time                           │ DateTime        │              │                    │         │                  │                │
│ refcount                              │ UInt32          │              │                    │         │                  │                │
│ min_date                              │ Date            │              │                    │         │                  │                │
│ max_date                              │ Date            │              │                    │         │                  │                │
│ min_time                              │ DateTime        │              │                    │         │                  │                │
│ max_time                              │ DateTime        │              │                    │         │                  │                │
│ partition_id                          │ String          │              │                    │         │                  │                │
│ min_block_number                      │ Int64           │              │                    │         │                  │                │
│ max_block_number                      │ Int64           │              │                    │         │                  │                │
│ level                                 │ UInt32          │              │                    │         │                  │                │
│ data_version                          │ UInt64          │              │                    │         │                  │                │
│ primary_key_bytes_in_memory           │ UInt64          │              │                    │         │                  │                │
│ primary_key_bytes_in_memory_allocated │ UInt64          │              │                    │         │                  │                │
│ is_frozen                             │ UInt8           │              │                    │         │                  │                │
│ database                              │ String          │              │                    │         │                  │                │
│ table                                 │ String          │              │                    │         │                  │                │
│ engine                                │ String          │              │                    │         │                  │                │
│ disk_name                             │ String          │              │                    │         │                  │                │
│ path                                  │ String          │              │                    │         │                  │                │
│ hash_of_all_files                     │ String          │              │                    │         │                  │                │
│ hash_of_uncompressed_files            │ String          │              │                    │         │                  │                │
│ uncompressed_hash_of_compressed_files │ String          │              │                    │         │                  │                │
│ delete_ttl_info_min                   │ DateTime        │              │                    │         │                  │                │
│ delete_ttl_info_max                   │ DateTime        │              │                    │         │                  │                │
│ move_ttl_info.expression              │ Array(String)   │              │                    │         │                  │                │
│ move_ttl_info.min                     │ Array(DateTime) │              │                    │         │                  │                │
│ move_ttl_info.max                     │ Array(DateTime) │              │                    │         │                  │                │
│ default_compression_codec             │ String          │              │                    │         │                  │                │
│ recompression_ttl_info.expression     │ Array(String)   │              │                    │         │                  │                │
│ recompression_ttl_info.min            │ Array(DateTime) │              │                    │         │                  │                │
│ recompression_ttl_info.max            │ Array(DateTime) │              │                    │         │                  │                │
│ group_by_ttl_info.expression          │ Array(String)   │              │                    │         │                  │                │
│ group_by_ttl_info.min                 │ Array(DateTime) │              │                    │         │                  │                │
│ group_by_ttl_info.max                 │ Array(DateTime) │              │                    │         │                  │                │
│ rows_where_ttl_info.expression        │ Array(String)   │              │                    │         │                  │                │
│ rows_where_ttl_info.min               │ Array(DateTime) │              │                    │         │                  │                │
│ rows_where_ttl_info.max               │ Array(DateTime) │              │                    │         │                  │                │
│ bytes                                 │ UInt64          │ ALIAS        │ bytes_on_disk      │         │                  │                │
│ marks_size                            │ UInt64          │ ALIAS        │ marks_bytes        │         │                  │                │
└───────────────────────────────────────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

51 rows in set. Elapsed: 0.006 sec. 

## 4  see part  Modification log 
SELECT
    name,
    modification_time
FROM system.parts
WHERE (database = 'db_1') AND (table = 'test_26')

Query id: 3e8b8a92-cfbe-4a87-bdc3-8a3b420a29a4

┌─name─────────┬───modification_time─┐
│ 197506_1_4_1 │ 2021-08-14 23:39:19 │
│ 197506_5_5_0 │ 2021-08-17 09:55:16 │
│ 197506_6_6_0 │ 2021-08-24 16:54:11 │###  At present part  The data will be filtered out later 
└──────────────┴─────────────────────┘

3 rows in set. Elapsed: 0.020 sec.

4.5 Filter

### 5  Filter the data we want 
### eg : part  Date on  2021-08-24 16:00:00  Previous data 
###  Through the original table and system table system.parts  Migration 
### 197506_6_6_0  The part  The data is filtered out 
SELECT
    id,
    value,
    dt,
    _part
FROM db_1.test_26 AS A
INNER JOIN system.parts AS B ON A._part = B.name
WHERE B.modification_time < '2021-08-24 16:00:00'

Query id: 8f9345dd-3529-4d80-beaf-bc0457d64dc9

┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │  2013 │ 1975-06-12 │ 197506_5_5_0 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │  2013 │ 1975-06-12 │ 197506_1_4_1 │
└────┴───────┴────────────┴──────────────┘

4.6 Get our data

### 6  What needs to be executed finally SQL 
SELECT
    id,
    value,
    dt
FROM db_1.test_26 AS A
INNER JOIN system.parts AS B ON A._part = B.name
WHERE B.modification_time < '2021-08-24 16:00:00'

Query id: 29794880-0ccb-43c9-8618-65b8c438086a

┌─id─┬─value─┬─────────dt─┐
│ 11 │  2013 │ 1975-06-12 │
│ 11 │  2013 │ 1975-06-12 │
│ 11 │  2013 │ 1975-06-12 │
│ 11 │  2013 │ 1975-06-12 │
└────┴───────┴────────────┘
┌─id─┬─value─┬─────────dt─┐
│ 11 │  2013 │ 1975-06-12 │
└────┴───────┴────────────┘

5 rows in set. Elapsed: 0.138 sec. 

5、 ... and CDW-ClickHouse

Tencent cloud CDW-ClickHouse data ETL To Oceanus

Oceanus Use ClickHouse-JDBC Action link ClickHouse

Then we can pass Oceanus Control time range

Realization ClickHouse Full and incremental import and ClickHouse And migration ClickHouse

Oceanus ClickHouse Data warehouse

Oceanus ClickHouse The import documents

clickhouse format

原网站

版权声明
本文为[jasong]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/08/20210824190016479W.html