Release Note 3.0.3 #44522

gavinchou · 2024-11-25T03:22:34Z

Behavioral Changes

Prohibited column updates on MOW tables with synchronous materialized views. #40190
Adjusted the default parameters of RoutineLoad to improve import efficiency. #42968
When StreamLoad fails, the return value of LoadedRows is adjusted to 0. #41946 #42291
Adjusted the default memory limit of Segment cache to 5%. #42308 #42436

New Features

Introduced the session variable enable_cooldown_replica_affinity to control the affinity of cold and hot tiered replicas. #42677

Lakehouse

Added table$partition syntax for querying partition information of Hive tables. #40774
- View Documentation
Supported creation of Hive tables in Text format. #41860 #42175
- View Documentation

Asynchronous Materialized Views

Introduced new materialized view attribute use_for_rewrite. When use_for_rewrite is set to false, the materialized view does not participate in transparent rewriting. #40332

Query Optimizer

Supported correlated non-aggregate subqueries. #42236

Query Execution

Added functions ngram_search, normal_cdf, to_iso8601, from_iso8601_date, SESSION_USER(), last_query_id. #38226 #40695 #41075 #41600 #39575 #40739
The aes_encrypt and aes_decrypt functions support GCM mode. #40004
Profile outputs the changed session variable values. #41016 #41318

Semi-structured Data Management

Added array functions array_match_all and array_match_any. #40605 #43514
The array function array_agg supports nesting ARRAY/MAP/STRUCT within ARRAY. #42009
Added approximate aggregate statistical functions approx_top_k and approx_top_sum. #44082

Improvements

Storage

Supported bitmap_empty as the default value. #40364
Introduced the session variable insert_timeout to control the timeout of DELETE statements. #41063
Improved some error message prompts. #41048 #39631
Improved the priority scheduling of replica repair. #41076
Enhanced the robustness of timezone handling when creating tables. #41926 #42389
Checked the validity of partition expressions when creating tables. #40158
Supported Unicode-encoded column names in DELETE operations. #39381

Compute-Storage Decoupled

Supported ARM architecture deployment in storage and compute separation mode. #42467 #43377
Optimized the eviction strategy and lock competition of file cache, improving hit rate and high concurrency point query performance. #42451 #43201 #41818 #43401
S3 storage vault supported use_path_style, solving the problem of using custom domain names for object storage. #43060 #43343 #43330
Optimized storage and compute separation configuration and deployment, preventing misoperations in different modes. #43381 #43522 #43434 #40764 #43891
Optimized observability and provided an interface for deleting specified segment file cache. #38489 #42896 #41037 #43412
Optimized Meta-service operation and maintenance interface: RPC rate limiting and tablet metadata correction. #42413 #43884 #41782 #43460

Lakehouse

Paimon Catalog supported Alibaba Cloud DLF and OSS-HDFS storage. #41247 #42585
- View Documentation
Supported reading of Hive tables in OpenCSV format. #42257 #42942
Optimized the performance of accessing the information_schema.columns table in External Catalog. #41659 #41962
Used the new Max Compute open storage API to access Max Compute data sources. #41614
Optimized the scheduling policy of the JNI part of Paimon tables, making scan tasks more balanced. #43310
Optimized the read performance of small ORC files. #42004 #43467
Supported reading of parquet files in brotli compressed format. #42177
Added file_cache_statistics table under the information_schema library to view metadata cache statistics. #42160

Query Optimizer

Optimization: When queries only differ in comments, the same SQL Cache can be reused. #40049
Optimization: Improved the stability of statistical information when data is frequently updated. #43865 #39788 #43009 #40457 #42409 #41894
Optimization: Enhanced the stability of constant folding. #42910 #41164 #39723 #41394 #42256 #40441
Optimization: Column pruning can generate better execution plans. #41719 #41548

Query Execution

Optimized the memory usage of the sort operator. #39306
Optimized the performance of computations on ARM. #38888 #38759
Optimized the computational performance of a series of functions. #40366 #40821 #40670 #41206 #40162
Used SSE instructions to optimize the performance of the match_ipv6_subnet function. #38755
Supported automatic creation of new partitions during insert overwrite. #38628 #42645
Added the status of each PipelineTask in Profile. #42981
IP type supported runtime filter. #39985

Semi-structured Data Management

Output the real SQL of prepared statements in audit logs. #43321
The filebeat doris output plugin supports fault tolerance and progress reporting. #36355
Optimized the performance of inverted index queries. #41547 #41585 #41567 #41577 #42060 #42372
The array function array overlaps supports acceleration using inverted indexes. #41571
The IP function is_ip_address_in_range supports acceleration using inverted indexes. #41571
Optimized the CAST performance of the VARIANT data type. #41775 #42438 #43320
Optimized the CPU resource consumption of the Variant data type. #42856 #43062 #43634
Optimized the metadata and execution memory resource consumption of the Variant data type. #42448 #43326 #41482 #43093 #43567 #43620

Permissions

Added a new configuration item ldap_group_filter in LDAP for custom group filtering. #43292

Other

Supported displaying connection count information by user in FE monitoring items. #39200

Bug Fixes

Storage

Fixed the issue with using IPv6 hostnames. #40074
Fixed the inaccurate display of broker/s3 load progress. #43535
Fixed the issue where queries might hang from FE. #41303 #42382
Fixed the issue of duplicate auto-increment IDs under exceptional circumstances. #43774 #43983
Fixed occasional NPE issues with groupcommit. #43635
Fixed the inaccurate calculation of auto bucket. #41675 #41835
Fixed the issue where FE might not correctly plan multi-table flows after restart. #41677 #42290

Compute-Storage Decoupled

Fixed the issue that MOW primary key tables with large delete bitmaps might cause coredump. #43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
Fixed the issue that segment files, when being a multiple of 5MB, would fail to upload objects. #43254
Fixed the issue that the default retry policy of aws sdk did not take effect. #43575 #43648
Fixed the issue that altering storage vault could continue execution even when the wrong type was specified. #43489 #43352 #43495
Fixed the issue that tablet_id might be 0 during the delayed commit process of large transactions. #42043 #42905
Fixed the issue that constant folding RCP and FE forwarding SQL might not be executed in the expected computation group. #43110 #41819 #41846
Fixed the issue that meta-service did not strictly check instance_id upon receiving RPC. #43253 #43832
Fixed the issue that FE follower information_schema version did not update in time. #43496
Fixed the issue of atomicity in file cache rename and inaccurate metrics. #42869 #43504 #43220

Lakehouse

Prohibited implicit conversion predicates from being pushed down to JDBC data sources to avoid inconsistent query results. #42102
Fixed some read issues with high-version Hive transactional tables. #42226
Fixed the issue that the Export command might cause deadlocks. #43083 #43402
Fixed the issue of being unable to query Hive views created by Spark. #43552
Fixed the issue that Hive partition paths containing special characters led to incorrect partition pruning. #42906
Fixed the issue that Iceberg Catalog could not use AWS Glue. #41084

Asynchronous Materialized Views

Fixed the issue that asynchronous materialized views might not refresh after the base table is rebuilt. #41762

Query Optimizer

Fixed the issue that partition pruning results might be incorrect when using multi-column range partitioning. #43332
Fixed the issue of incorrect calculation results in some limit offset scenarios. #42576

Query Execution

Fixed the issue that hash join with array types larger than 4G could cause BE Core. #43861
Fixed the issue that is null predicate operations might yield incorrect results in some scenarios. #43619
Fixed the issue that bitmap types might produce incorrect output results in hash join. #43718
Fixed some issues where function results were calculated incorrectly. #40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
Fixed some issues with JSON type parsing. #39937
Fixed issues with varchar and char types in runtime filter operations. #43758 #43919
Fixed some issues with the use of decimal256 in scalar and aggregate functions. #42136 #42356
Fixed the issue that arrow flight reported Reach limit of connections errors upon connection. #39127
Fixed the issue of incorrect memory usage statistics for BE in k8s environments. #41123

Semi-structured Data Management

Adjusted the default values of segment_cache_fd_percentage and inverted_index_fd_number_limit_percent. #42224
logstash now supports group_commit. #40450
Fixed the issue of coredump when building index. #43246 #43298
Fixed issues with variant index. #43375 #43773
Fixed potential fd and memory leaks under abnormal compaction circumstances. #42374
Inverted index match null now correctly returns null instead of false. #41786
Fixed the issue of coredump when ngram bloomfilter index bf_size is set to 65536. #43645
Fixed the issue of potential coredump during complex data type JOINs. #40398
Fixed the issue of coredump with TVF JSON data. #43187
Fixed the precision issue of bloom filter calculations for dates and times. #43612
Fixed the issue of coredump with IPv6 type storage. #43251
Fixed the issue of coredump when using VARIANT type with light_schema_change disabled. #40908
Improved cache performance for high-concurrency point queries. #44077
Fixed the issue that bloom filter indexes were not synchronized when columns were deleted. #43378
Fixed instability issues with es catalog under special circumstances such as mixed array and scalar data. #40314 #40385 #43399 #40614
Fixed coredump issues caused by abnormal regular pattern matching. #43394

Permissions

Fixed several issues where permissions were not properly restricted after authorization. #43193 #41723 #42107 #43306
Enhanced several permission checks. #40688 #40533 #41791 #42106

Other

Supplemented missing audit log fields in audit log tables and files. #43303
- View Documentation

The text was updated successfully, but these errors were encountered:

gavinchou · 2024-11-26T08:37:32Z

行为变更

禁止在具有同步物化视图的 MOW 表上进行列更新。#40190
调整 RoutineLoad 的默认参数以提升导入效率。#42968
当 StreamLoad 失败时，LoadedRows 的返回值调整为 0。#41946 #42291
将 Segment cache 的默认内存限制调整为 5%。#42308 #42436

新特性

引入 enable_cooldown_replica_affinity 会话变量，用以控制冷热分层副本的亲和性。#42677

Lakehouse

新增 table$partition 语法，用于查询 Hive 表的分区信息。#40774
- 查看文档
支持创建 Text 格式的 Hive 表。#41860 #42175
- 查看文档

异步物化视图

引入新的物化视图属性 use_for_rewrite。当 use_for_rewrite 设置为 false 时，物化视图不参与透明改写。#40332

查询优化器

支持关联非聚合子查询。#42236

查询执行

增加了 ngram_search、normal_cdf、to_iso8601、from_iso8601_date、SESSION_USER()、last_query_id 函数。#38226 #40695 #41075 #41600 #39575 #40739
aes_encrypt 和 aes_decrypt 函数支持 GCM 模式。#40004
Profile 中输出变更的会话变量值。#41016 #41318

半结构化数据管理

新增数组函数 array_match_all 和 array_match_any。#40605 #43514
数组函数 array_agg 支持在 ARRAY 中嵌套 ARRAY/MAP/STRUCT。#42009
新增近似聚合统计函数 approx_top_k 和 approx_top_sum。#44082

改进

存储

支持将 bitmap_empty 作为默认值。#40364
引入 insert_timeout 会话变量，用以控制 DELETE 语句的超时时间。#41063
改进部分错误提示信息。#41048 #39631
改进副本修复的优先级调度。#41076
提高了建表时对时区处理的鲁棒性。#41926 #42389
在创建表时检查分区表达式的合法性。#40158
在 DELETE 操作时支持 Unicode 编码的列名。#39381

存算分离

存算分离模式支持 ARM 架构部署。#42467 #43377
优化文件缓存的淘汰策略和锁竞争，提高命中率及高并发点查性能。#42451 #43201 #41818 #43401
S3 storage vault 支持 use_path_style，解决对象存储使用自定义域名的问题。#43060 #43343 #43330
优化存算分离配置及部署，预防不同模式下的误操作。#43381 #43522 #43434 #40764 #43891
优化可观测性，并提供删除指定 segment file cache 的接口。#38489 #42896 #41037 #43412
优化 Meta-service 运维接口：RPC 限速及修复 tablet 元数据修正。#42413 #43884 #41782 #43460

Lakehouse

Paimon Catalog 支持阿里云 DLF 和 OSS-HDFS 存储。#41247 #42585
- 查看文档
支持读取 OpenCSV 格式的 Hive 表。#42257 #42942
优化了访问 External Catalog 中 information_schema.columns 表的性能。#41659 #41962
使用新的 Max Compute 开放存储 API 访问 Max Compute 数据源。#41614
优化了 Paimon 表 JNI 部分的调度策略，使得扫描任务更加均衡。#43310
优化了 ORC 小文件的读取性能。#42004 #43467
支持读取 brotli 压缩格式的 parquet 文件。#42177
在 information_schema 库下新增 file_cache_statistics 表，用于查看元数据缓存统计信息。#42160

查询优化器

优化：当查询仅注释不同时，可以复用同一个 SQL Cache。#40049
优化：提升了在数据频繁更新时统计信息的稳定性。#43865 #39788 #43009 #40457 #42409 #41894
优化：提升常量折叠的稳定性。#42910 #41164 #39723 #41394 #42256 #40441
优化：列裁剪可以生成更优的执行计划。#41719 #41548

查询执行

优化了 sort 算子的内存使用。#39306
优化了 ARM 下运算的性能。#38888 #38759
优化了一系列函数的计算性能。#40366 #40821 #40670 #41206 #40162
使用 SSE 指令优化 match_ipv6_subnet 函数的性能。#38755
在 insert overwrite 时支持自动创建新的分区。#38628 #42645
在 Profile 中增加了每个 PipelineTask 的状态。#42981
IP 类型支持 runtime filter。#39985

半结构化数据管理

审计日志中输出 prepared statement 的真实 SQL。#43321
filebeat doris output plugin 支持容错、进度报告等。#36355
倒排索引查询性能优化。#41547 #41585 #41567 #41577 #42060 #42372
数组函数 array overlaps 支持使用倒排索引加速。#41571
IP 函数 is_ip_address_in_range 支持使用倒排索引加速。#41571
优化 VARIANT 数据类型的 CAST 性能。#41775 #42438 #43320
优化 Variant 数据类型的 CPU 资源消耗。#42856 #43062 #43634
优化 Variant 数据类型的元数据和执行内存资源消耗。#42448 #43326 #41482 #43093 #43567 #43620

权限

LDAP 新增配置项 ldap_group_filter 用于自定义过滤 group。#43292

其他

FE 监控项中的连接数信息支持按用户分别显示。#39200

缺陷修复

存储

修复 IPv6 hostname 使用问题。#40074
修复 broker/s3 load 进度展示不准确问题。#43535
修复查询从 FE 可能卡住的问题。#41303 #42382
修复异常情况下自增 id 重复的问题。#43774 #43983
修复 groupcommit 偶发 NPE 问题。#43635
修复 auto bucket 计算不准确的问题。#41675 #41835
修复 FE 重启时流控多表不能正确规划的问题。#41677 #42290

存算分离

修复 MOW 主键表 delete bitmap 过大可能导致 coredump 的问题。#43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
修复 segment 文件为 5MB 整数倍时上传对象失败的问题。#43254
修复 aws sdk 默认重试策略不生效的问题。#43575 #43648
修复 alter storage vault 时指定错误 type 也能继续执行的问题。#43489 #43352 #43495
修复大事务延迟提交过程中 tablet_id 可能为 0 的问题。#42043 #42905
修复常量折叠 RCP 以及 FE 转发 SQL 可能不在预期的计算组执行的问题。#43110 #41819 #41846
修复 meta-service 接收到 RPC 时不严格检查 instance_id 的问题。#43253 #43832
修复 FE follower information_schema version 没有及时更新的问题。#43496
修复 file cache rename 原子性以及指标不准确的问题。#42869 #43504 #43220

Lakehouse

禁止带有隐式转换的谓词条件下推给 JDBC 数据源，避免不一致的查询结果。#42102
修复 Hive 高版本事务表的一些读取问题。#42226
修复 Export 命令可能导致死锁的问题。#43083 #43402
修复无法查询 Spark 创建的 Hive 视图的问题。#43552
修复 Hive 分区路径中包含特殊字符导致分区裁剪有误的问题。#42906
修复 Iceberg Catalog 无法使用 AWS Glue 的问题。#41084

异步物化视图

修复基表重建后，异步物化视图可能无法刷新的问题。#41762

查询优化器

修复使用多列 range 分区时，分区裁剪结果可能有误的问题。#43332
修复部分 limit offset 场景下计算结果错误的问题。#42576

查询执行

修复 hash join 时 array 类型的大小超过 4G 导致 BE Core 的问题。#43861
修复 is null 谓词运算部分场景下结果不正确的问题。#43619
修复 bitmap 类型在 hash join 时输出结果不正确的问题。#43718
修复一些函数结果计算错误的问题。#40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
修复一些 JSON 类型解析的问题。#39937
修复 varchar 和 char 类型在 runtime filter 运算时的问题。#43758 #43919
修复一些 decimal256 在标量函数和聚合函数里使用的问题。#42136 #42356
修复 arrow flight 在连接时报 Reach limit of connections 错误的问题。#39127
修复 k8s 环境下，BE 可用内存统计不正确的问题。#41123

半结构化数据管理

调整 segment_cache_fd_percentage 和 inverted_index_fd_number_limit_percent 的默认值。[[fix](config) change segment_cache_fd_percentage and inverted_index_fd_number_limit_percent default value #42224]([fix](config) change segment_cache_fd_percentage and inverted_index_fd_number_limit_percent default value #42224
logstash 支持 group_commit。#40450
修复 build index 时 coredump 的问题。#43246 #43298
修复 variant index 的问题。#43375 #43773
修复后台 compaction 异常情况下可能出现的 fd 和内存泄漏。#42374
倒排索引 match null 正确返回 null 而不是 false。#41786
修复 ngram bloomfilter 索引 bf_size 设置为 65536 时 coredump 的问题。#43645
修复复杂数据类型 JOIN 可能出 coredump 的问题。#40398
修复 TVF JSON 数据 coredump 的问题。#43187
修复 bloom filter 计算日期和时间的精度问题。#43612
修复 IPv6 类型行存 coredump 的问题。#43251
修复关闭 light_schema_change 时使用 VARIANT 类型 coredump 的问题。#40908
提升高并发点查的 cache 性能。#44077
修复删除列时 bloom filter 索引没有同步更新的问题。#43378
修复 es catalog 在数组和标量混合数据等特殊情况下的不稳定问题。#40314 #40385 #43399 #40614
修复异常正则匹配导致的 coredump 问题。#43394

权限

修复若干权限授权之后无法正常限制的问题。#43193 #41723 #42107 #43306
加强若干权限校验。#40688 #40533 #41791 #42106

其他

补充了审计日志表和文件中缺失的审计日志字段。#43303
- 查看文档

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note 3.0.3 #44522

Release Note 3.0.3 #44522

gavinchou commented Nov 25, 2024 •

edited

Loading

gavinchou commented Nov 26, 2024

Release Note 3.0.3 #44522

Release Note 3.0.3 #44522

Comments

gavinchou commented Nov 25, 2024 • edited Loading

Behavioral Changes

New Features

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

Improvements

Storage

Compute-Storage Decoupled

Lakehouse

Query Optimizer

Query Execution

Semi-structured Data Management

Permissions

Other

Bug Fixes

Storage

Compute-Storage Decoupled

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

Permissions

Other

gavinchou commented Nov 26, 2024

行为变更

新特性

Lakehouse

异步物化视图

查询优化器

查询执行

半结构化数据管理

改进

存储

存算分离

Lakehouse

查询优化器

查询执行

半结构化数据管理

权限

其他

缺陷修复

存储

存算分离

Lakehouse

异步物化视图

查询优化器

查询执行

半结构化数据管理

权限

其他

gavinchou commented Nov 25, 2024 •

edited

Loading