当前位置:网站首页>Tableapi & SQL and example module of Flink

Tableapi & SQL and example module of Flink

2022-06-21 19:38:00 Dragon man who can't fly

The official introduction  

Flink Medium API

Flink It's streaming / The development of batch applications provides different levels of abstraction . 

  • Flink API The lowest level of abstraction is stateful real-time stream processing . Its abstract implementation is Process Function, also Process Function By Flink The framework is integrated into DataStream API For us to use . It allows users to freely handle events from a single stream or multiple streams in an application ( data ), And provide a state with global consistency and fault tolerance guarantee . Besides , Users can register event times in this layer of abstraction (event time) And processing time (processing time) The callback method , This allows programs to perform complex calculations .
  • Flink API The second level of abstraction is Core APIs. actually , Many applications don't need to use the lowest level abstraction mentioned above API, It can be used Core APIs Programming : It includes DataStream API( Applied to bounded / Unbounded data flow scenario ) and DataSet API( Applied to bounded dataset scenarios ) Two parts .Core APIs The streaming API(Fluent API) It provides a general module component for data processing , For example, various forms of user-defined transformation (transformations)、 join (joins)、 polymerization (aggregations)、 window (windows) And status (state) Operation etc. . This layer API Each programming language has its own class for the data types processed in .
    Process Function This kind of underlying abstraction and DataStream API The integration allows users to choose to use lower level abstractions API To meet your needs .DataSet API Some additional primitives are provided , For example, circulation / iteration (loop/iteration) operation .
  • Flink API The third level of abstraction is Table API.Table API It's a table (Table) Declarative programming centered on (DSL)API, For example, in the streaming data scenario , It can represent a dynamically changing table .Table API follow ( Expand ) relational model : That is, the table has schema( It's similar to... In a relational database schema), also Table API It also provides operations similar to those in the relational model , such as select、project、join、group-by and aggregate etc. .Table API A program defines the logical operations that should be performed in a declarative way , Instead of specifying exactly what code the program should execute . Even though Table API It is simple to use and can be extended by various types of user-defined functions , But it's better than Core API I'm not good at expressing myself . Besides ,Table API Before execution, the program will also use the optimization rules in the optimizer to optimize the expressions written by users .
    Table and DataStream/DataSet You can switch seamlessly ,Flink Allows users to write applications with Table API And DataStream/DataSet API A mixture of .
  • Flink API The top-level abstraction is SQL. This layer of abstraction is similar to... Both semantically and programmatically Table API, But its program implementation is SQL Query expression .SQL Abstract and Table API The relationship between abstractions is very close , also SQL The query statement can be in Table API Execute on the table defined in .

Table API and SQL 

Apache Flink There are two types of relationships API For unified processing of stream and batch :Table API and SQL.

  • Table API Is used for Scala and Java Language query API, It can be combined in a very intuitive way 、 Filter 、join Equirelation operator .Flink SQL Is based on Apache Calcite To achieve the standard SQL. These two kinds of API The query in is for batch (DataSet) And flow (DataStream) The input of has the same semantics , The same calculation results will be produced .
  • Table API and SQL Two kinds of API Is tightly integrated , as well as DataStream and DataSet API. You can be in these API Between , And some based on these API Easy switching between Libraries . such as , You can use CEP from DataStream Do pattern matching in , And then use Table API To analyze the matching results ; Or you can use SQL To scan 、 Filter 、 Aggregate a batch table , Then run another Gelly Graph algorithm To process the preprocessed data .

Be careful :Table API and SQL It is still in the active development stage , Not all features have been fully implemented yet . Not all [Table API,SQL] and [ flow , batch ] All combinations are supported .

Official documents

https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/table/

TableAPI&SQL Development

From the beginning of this article , increase TableAPI&SQL The content of the demonstration , On the basis of the original project , Expand one tableapi modular ; This module will demonstrate the of the following components TableApi And SQL Easy to use ;

  • elasticsearch
  • kafka
  • jdbc (mysql)

newly added tableapi modular

In the current project , Create name as tableapi Of maven Engineering module

pom.xml

   <artifactId>tableapi</artifactId>    

   <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-jdbc_2.12</artifactId>
            <version>1.11.1</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>1.11.1</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <!-- flink-connector-kafka -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>1.11.1</version>
        </dependency>

        <!-- mysql Drive pack  -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.47</version>
        </dependency>

        <!-- elasticsearch6 rely on  -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch6_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!--<dependency>-->
            <!--<groupId>org.apache.flink</groupId>-->
            <!--<artifactId>flink-sql-connector-elasticsearch6_2.11</artifactId>-->
            <!--<version>${flink.version}</version>-->
        <!--</dependency>-->
    </dependencies>

Refresh project maven, Download the relevant function dependent component package ;

Engineering module

Follow up on TableAPI&SQL The demo examples are here tableapi Module on the basis of development ;

Source download

Gitee:flink-examples: be based on flink.1.11.1 Version of the project example , This example contains most operators 、 window 、 Middleware connector 、tables&sql Usage of , Suitable for newcomers to learn to use ;

原网站

版权声明
本文为[Dragon man who can't fly]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206211805564436.html

随机推荐