当前位置:网站首页>How Clickhouse queries imported data within a specified time period
How Clickhouse queries imported data within a specified time period
2022-06-24 05:08:00 【jasong】
One purpose
- Data query
- Data migration and import
Two Why say ClickHouse Data migration
- Clickhouse copier No incremental import
- Clickhouse remote slower , And for ClickHouse Internal table
- The data filtering dimension is small
3、 ... and ClickHouse MergeTreeData
QueryPlanPtr MergeTreeDataSelectExecutor::readFromParts(
MergeTreeData::DataPartsVector parts,
const Names & column_names_to_return,
const StorageMetadataPtr & metadata_snapshot,
const SelectQueryInfo & query_info,
const Context & context,
const UInt64 max_block_size,
const unsigned num_streams,
const PartitionIdToMaxBlock * max_block_numbers_to_read) const
{
for (const String & name : column_names_to_return)
{
if (name == "_part")
{
part_column_queried = true;
virt_column_names.push_back(name);
}
else if (name == "_part_index")
{
virt_column_names.push_back(name);
}
else if (name == "_partition_id")
{
virt_column_names.push_back(name);
}
else if (name == "_part_uuid")
{
part_uuid_column_queried = true;
virt_column_names.push_back(name);
}
else if (name == "_sample_factor")
{
sample_factor_column_queried = true;
virt_column_names.push_back(name);
}
else
{
real_column_names.push_back(name);
}
}3.1 How to use it
- ClickHouse MergeTree The data has the above virtual fields
- So we can simply and directly limit the data dimension without modifying the code part Granularity
Four operations
4.1 Create tables and import
## 1 View table fields DESCRIBE TABLE db_1.test_26 Query id: 856af95b-cb07-43d9-a776-5e6fd3d3c456 ┌─name──┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐ │ id │ UInt16 │ │ │ │ │ │ │ value │ UInt32 │ │ │ │ │ │ │ dt │ Date │ │ │ │ │ │ └───────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘ 3 rows in set. Elapsed: 0.004 sec. ## Write ignore
4.2 Inquire about
## 2 View all data SELECT * FROM db_1.test_26 Query id: 6211055b-02af-482e-bc55-ccd765b0b929 ┌─id─┬─value─┬─────────dt─┐ │ 11 │ 2013 │ 1975-06-12 │ └────┴───────┴────────────┘ ┌─id─┬─value─┬─────────dt─┐ │ 11 │ 2013 │ 1975-06-12 │ │ 11 │ 2013 │ 1975-06-12 │ │ 11 │ 2013 │ 1975-06-12 │ │ 11 │ 2013 │ 1975-06-12 │ └────┴───────┴────────────┘ ┌─id─┬─value─┬─────────dt─┐ │ 11 │ 2013 │ 1975-06-12 │ └────┴───────┴────────────┘ 6 rows in set. Elapsed: 0.148 sec.
4.3 _part Virtual hidden fields
## 3 View the corresponding data part
SELECT
id,
value,
dt,
_part
FROM db_1.test_26
Query id: b7d81a80-089a-4434-b82e-a0e27c60c8ac
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_5_5_0 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_6_6_0 │
└────┴───────┴────────────┴──────────────┘
6 rows in set. Elapsed: 0.111 sec. 4.4 system.parts utilize
DESCRIBE TABLE system.parts
Query id: 2dea5ab6-6857-4708-8919-a09f2382f059
┌─name──────────────────────────────────┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ partition │ String │ │ │ │ │ │
│ name │ String │ │ │ │ │ │
│ uuid │ UUID │ │ │ │ │ │
│ part_type │ String │ │ │ │ │ │
│ active │ UInt8 │ │ │ │ │ │
│ marks │ UInt64 │ │ │ │ │ │
│ rows │ UInt64 │ │ │ │ │ │
│ bytes_on_disk │ UInt64 │ │ │ │ │ │
│ data_compressed_bytes │ UInt64 │ │ │ │ │ │
│ data_uncompressed_bytes │ UInt64 │ │ │ │ │ │
│ marks_bytes │ UInt64 │ │ │ │ │ │
│ modification_time │ DateTime │ │ │ │ │ │
│ remove_time │ DateTime │ │ │ │ │ │
│ refcount │ UInt32 │ │ │ │ │ │
│ min_date │ Date │ │ │ │ │ │
│ max_date │ Date │ │ │ │ │ │
│ min_time │ DateTime │ │ │ │ │ │
│ max_time │ DateTime │ │ │ │ │ │
│ partition_id │ String │ │ │ │ │ │
│ min_block_number │ Int64 │ │ │ │ │ │
│ max_block_number │ Int64 │ │ │ │ │ │
│ level │ UInt32 │ │ │ │ │ │
│ data_version │ UInt64 │ │ │ │ │ │
│ primary_key_bytes_in_memory │ UInt64 │ │ │ │ │ │
│ primary_key_bytes_in_memory_allocated │ UInt64 │ │ │ │ │ │
│ is_frozen │ UInt8 │ │ │ │ │ │
│ database │ String │ │ │ │ │ │
│ table │ String │ │ │ │ │ │
│ engine │ String │ │ │ │ │ │
│ disk_name │ String │ │ │ │ │ │
│ path │ String │ │ │ │ │ │
│ hash_of_all_files │ String │ │ │ │ │ │
│ hash_of_uncompressed_files │ String │ │ │ │ │ │
│ uncompressed_hash_of_compressed_files │ String │ │ │ │ │ │
│ delete_ttl_info_min │ DateTime │ │ │ │ │ │
│ delete_ttl_info_max │ DateTime │ │ │ │ │ │
│ move_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ move_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ move_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ default_compression_codec │ String │ │ │ │ │ │
│ recompression_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ recompression_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ recompression_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ group_by_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ group_by_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ group_by_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ rows_where_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ rows_where_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ rows_where_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ bytes │ UInt64 │ ALIAS │ bytes_on_disk │ │ │ │
│ marks_size │ UInt64 │ ALIAS │ marks_bytes │ │ │ │
└───────────────────────────────────────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
51 rows in set. Elapsed: 0.006 sec.
## 4 see part Modification log
SELECT
name,
modification_time
FROM system.parts
WHERE (database = 'db_1') AND (table = 'test_26')
Query id: 3e8b8a92-cfbe-4a87-bdc3-8a3b420a29a4
┌─name─────────┬───modification_time─┐
│ 197506_1_4_1 │ 2021-08-14 23:39:19 │
│ 197506_5_5_0 │ 2021-08-17 09:55:16 │
│ 197506_6_6_0 │ 2021-08-24 16:54:11 │### At present part The data will be filtered out later
└──────────────┴─────────────────────┘
3 rows in set. Elapsed: 0.020 sec.4.5 Filter
### 5 Filter the data we want
### eg : part Date on 2021-08-24 16:00:00 Previous data
### Through the original table and system table system.parts Migration
### 197506_6_6_0 The part The data is filtered out
SELECT
id,
value,
dt,
_part
FROM db_1.test_26 AS A
INNER JOIN system.parts AS B ON A._part = B.name
WHERE B.modification_time < '2021-08-24 16:00:00'
Query id: 8f9345dd-3529-4d80-beaf-bc0457d64dc9
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_5_5_0 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
└────┴───────┴────────────┴──────────────┘4.6 Get our data
### 6 What needs to be executed finally SQL
SELECT
id,
value,
dt
FROM db_1.test_26 AS A
INNER JOIN system.parts AS B ON A._part = B.name
WHERE B.modification_time < '2021-08-24 16:00:00'
Query id: 29794880-0ccb-43c9-8618-65b8c438086a
┌─id─┬─value─┬─────────dt─┐
│ 11 │ 2013 │ 1975-06-12 │
│ 11 │ 2013 │ 1975-06-12 │
│ 11 │ 2013 │ 1975-06-12 │
│ 11 │ 2013 │ 1975-06-12 │
└────┴───────┴────────────┘
┌─id─┬─value─┬─────────dt─┐
│ 11 │ 2013 │ 1975-06-12 │
└────┴───────┴────────────┘
5 rows in set. Elapsed: 0.138 sec. 5、 ... and CDW-ClickHouse
Tencent cloud CDW-ClickHouse data ETL To Oceanus
Oceanus Use ClickHouse-JDBC Action link ClickHouse
Then we can pass Oceanus Control time range
Realization ClickHouse Full and incremental import and ClickHouse And migration ClickHouse
Oceanus ClickHouse Data warehouse
边栏推荐
- 『应急响应实践』LogParser日志分析实践
- "Emergency response practice" logparser log analysis practice
- How to build a website for ECS? What are the prices of different ECS
- Powerbi - for you who are learning
- What is required for domain name filing and how to select an enterprise domain name
- Implementation principle of Flink connector mongodb CDC
- Qiming cloud sharing: tips on esp32c3 simple IO and serial port
- Getattribute return value is null
- Bi-sql where
- Bi-sql and & or & in
猜你喜欢

SAP mts/ato/mto/eto topic 8: ATO mode 2 d+ empty mode strategy 85

Leetcode (question 1) - sum of two numbers

Hard core observation 553 AI needs to identify almost everyone in the world with hundreds of billions of photos

『应急响应实践』LogParser日志分析实践

『渗透基础』Cobalt Strike基础使用入门_Cobalt Strike联动msfconsole

011_ Cascader cascade selector

少儿编程教育在特定场景中的普及作用

解析90后创客教育的主观积极性

Loss and optimization of linear regression, machine learning to predict house prices

Are you ready for the exam preparation strategy of level II cost engineer in 2022?
随机推荐
Understanding OAuth 2.0
查找GBase 8c数据库当前索引?
Build your unique online image
Introduction to ebpf
Bi-sql basic cognition
How the query address of cloud native monitoring data exposes the public network
The conference assistant hidden in wechat is the best way to work efficiently!
3 minutes to understand JSON schema
Shopify background XSS storage vulnerability
Ext4 file system jam caused by MEM CGroup OOM
How does the mobile phone remotely connect to the ECS? What should be paid attention to during the operation
Real time monitoring: system and application level real-time monitoring based on flow computing Oceanus (Flink)
Elfk service setup
How does a R & d make a small demand bigger and bigger step by step
Spirit breath development log (16)
Recognize workplus again, not only im but also enterprise mobile application management expert
Analysis on the subjective enthusiasm of post-90s makers' Education
What is an evpn switch?
How to build an ECS and how to control the server through the local host
Implementation principle of Flink connector mongodb CDC