Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Note 3.0.3 #44522

Open
gavinchou opened this issue Nov 25, 2024 · 1 comment
Open

Release Note 3.0.3 #44522

gavinchou opened this issue Nov 25, 2024 · 1 comment

Comments

@gavinchou
Copy link
Contributor

gavinchou commented Nov 25, 2024

Behavioral Changes

  • Prohibited column updates on MOW tables with synchronous materialized views. #40190
  • Adjusted the default parameters of RoutineLoad to improve import efficiency. #42968
  • When StreamLoad fails, the return value of LoadedRows is adjusted to 0. #41946 #42291
  • Adjusted the default memory limit of Segment cache to 5%. #42308 #42436

New Features

  • Introduced the session variable enable_cooldown_replica_affinity to control the affinity of cold and hot tiered replicas. #42677

Lakehouse

Asynchronous Materialized Views

  • Introduced new materialized view attribute use_for_rewrite. When use_for_rewrite is set to false, the materialized view does not participate in transparent rewriting. #40332

Query Optimizer

  • Supported correlated non-aggregate subqueries. #42236

Query Execution

  • Added functions ngram_search, normal_cdf, to_iso8601, from_iso8601_date, SESSION_USER(), last_query_id. #38226 #40695 #41075 #41600 #39575 #40739
  • The aes_encrypt and aes_decrypt functions support GCM mode. #40004
  • Profile outputs the changed session variable values. #41016 #41318

Semi-structured Data Management

  • Added array functions array_match_all and array_match_any. #40605 #43514
  • The array function array_agg supports nesting ARRAY/MAP/STRUCT within ARRAY. #42009
  • Added approximate aggregate statistical functions approx_top_k and approx_top_sum. #44082

Improvements

Storage

  • Supported bitmap_empty as the default value. #40364
  • Introduced the session variable insert_timeout to control the timeout of DELETE statements. #41063
  • Improved some error message prompts. #41048 #39631
  • Improved the priority scheduling of replica repair. #41076
  • Enhanced the robustness of timezone handling when creating tables. #41926 #42389
  • Checked the validity of partition expressions when creating tables. #40158
  • Supported Unicode-encoded column names in DELETE operations. #39381

Compute-Storage Decoupled

  • Supported ARM architecture deployment in storage and compute separation mode. #42467 #43377
  • Optimized the eviction strategy and lock competition of file cache, improving hit rate and high concurrency point query performance. #42451 #43201 #41818 #43401
  • S3 storage vault supported use_path_style, solving the problem of using custom domain names for object storage. #43060 #43343 #43330
  • Optimized storage and compute separation configuration and deployment, preventing misoperations in different modes. #43381 #43522 #43434 #40764 #43891
  • Optimized observability and provided an interface for deleting specified segment file cache. #38489 #42896 #41037 #43412
  • Optimized Meta-service operation and maintenance interface: RPC rate limiting and tablet metadata correction. #42413 #43884 #41782 #43460

Lakehouse

  • Paimon Catalog supported Alibaba Cloud DLF and OSS-HDFS storage. #41247 #42585
    • View Documentation
  • Supported reading of Hive tables in OpenCSV format. #42257 #42942
  • Optimized the performance of accessing the information_schema.columns table in External Catalog. #41659 #41962
  • Used the new Max Compute open storage API to access Max Compute data sources. #41614
  • Optimized the scheduling policy of the JNI part of Paimon tables, making scan tasks more balanced. #43310
  • Optimized the read performance of small ORC files. #42004 #43467
  • Supported reading of parquet files in brotli compressed format. #42177
  • Added file_cache_statistics table under the information_schema library to view metadata cache statistics. #42160

Query Optimizer

Query Execution

  • Optimized the memory usage of the sort operator. #39306
  • Optimized the performance of computations on ARM. #38888 #38759
  • Optimized the computational performance of a series of functions. #40366 #40821 #40670 #41206 #40162
  • Used SSE instructions to optimize the performance of the match_ipv6_subnet function. #38755
  • Supported automatic creation of new partitions during insert overwrite. #38628 #42645
  • Added the status of each PipelineTask in Profile. #42981
  • IP type supported runtime filter. #39985

Semi-structured Data Management

  • Output the real SQL of prepared statements in audit logs. #43321
  • The filebeat doris output plugin supports fault tolerance and progress reporting. #36355
  • Optimized the performance of inverted index queries. #41547 #41585 #41567 #41577 #42060 #42372
  • The array function array overlaps supports acceleration using inverted indexes. #41571
  • The IP function is_ip_address_in_range supports acceleration using inverted indexes. #41571
  • Optimized the CAST performance of the VARIANT data type. #41775 #42438 #43320
  • Optimized the CPU resource consumption of the Variant data type. #42856 #43062 #43634
  • Optimized the metadata and execution memory resource consumption of the Variant data type. #42448 #43326 #41482 #43093 #43567 #43620

Permissions

  • Added a new configuration item ldap_group_filter in LDAP for custom group filtering. #43292

Other

  • Supported displaying connection count information by user in FE monitoring items. #39200

Bug Fixes

Storage

  • Fixed the issue with using IPv6 hostnames. #40074
  • Fixed the inaccurate display of broker/s3 load progress. #43535
  • Fixed the issue where queries might hang from FE. #41303 #42382
  • Fixed the issue of duplicate auto-increment IDs under exceptional circumstances. #43774 #43983
  • Fixed occasional NPE issues with groupcommit. #43635
  • Fixed the inaccurate calculation of auto bucket. #41675 #41835
  • Fixed the issue where FE might not correctly plan multi-table flows after restart. #41677 #42290

Compute-Storage Decoupled

  • Fixed the issue that MOW primary key tables with large delete bitmaps might cause coredump. #43088 #43457 #43479 #43407 #43297 #43613 #43615 #43854 #43968 #44074 #41793 #42142
  • Fixed the issue that segment files, when being a multiple of 5MB, would fail to upload objects. #43254
  • Fixed the issue that the default retry policy of aws sdk did not take effect. #43575 #43648
  • Fixed the issue that altering storage vault could continue execution even when the wrong type was specified. #43489 #43352 #43495
  • Fixed the issue that tablet_id might be 0 during the delayed commit process of large transactions. #42043 #42905
  • Fixed the issue that constant folding RCP and FE forwarding SQL might not be executed in the expected computation group. #43110 #41819 #41846
  • Fixed the issue that meta-service did not strictly check instance_id upon receiving RPC. #43253 #43832
  • Fixed the issue that FE follower information_schema version did not update in time. #43496
  • Fixed the issue of atomicity in file cache rename and inaccurate metrics. #42869 #43504 #43220

Lakehouse

  • Prohibited implicit conversion predicates from being pushed down to JDBC data sources to avoid inconsistent query results. #42102
  • Fixed some read issues with high-version Hive transactional tables. #42226
  • Fixed the issue that the Export command might cause deadlocks. #43083 #43402
  • Fixed the issue of being unable to query Hive views created by Spark. #43552
  • Fixed the issue that Hive partition paths containing special characters led to incorrect partition pruning. #42906
  • Fixed the issue that Iceberg Catalog could not use AWS Glue. #41084

Asynchronous Materialized Views

  • Fixed the issue that asynchronous materialized views might not refresh after the base table is rebuilt. #41762

Query Optimizer

  • Fixed the issue that partition pruning results might be incorrect when using multi-column range partitioning. #43332
  • Fixed the issue of incorrect calculation results in some limit offset scenarios. #42576

Query Execution

  • Fixed the issue that hash join with array types larger than 4G could cause BE Core. #43861
  • Fixed the issue that is null predicate operations might yield incorrect results in some scenarios. #43619
  • Fixed the issue that bitmap types might produce incorrect output results in hash join. #43718
  • Fixed some issues where function results were calculated incorrectly. #40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
  • Fixed some issues with JSON type parsing. #39937
  • Fixed issues with varchar and char types in runtime filter operations. #43758 #43919
  • Fixed some issues with the use of decimal256 in scalar and aggregate functions. #42136 #42356
  • Fixed the issue that arrow flight reported Reach limit of connections errors upon connection. #39127
  • Fixed the issue of incorrect memory usage statistics for BE in k8s environments. #41123

Semi-structured Data Management

  • Adjusted the default values of segment_cache_fd_percentage and inverted_index_fd_number_limit_percent. #42224
  • logstash now supports group_commit. #40450
  • Fixed the issue of coredump when building index. #43246 #43298
  • Fixed issues with variant index. #43375 #43773
  • Fixed potential fd and memory leaks under abnormal compaction circumstances. #42374
  • Inverted index match null now correctly returns null instead of false. #41786
  • Fixed the issue of coredump when ngram bloomfilter index bf_size is set to 65536. #43645
  • Fixed the issue of potential coredump during complex data type JOINs. #40398
  • Fixed the issue of coredump with TVF JSON data. #43187
  • Fixed the precision issue of bloom filter calculations for dates and times. #43612
  • Fixed the issue of coredump with IPv6 type storage. #43251
  • Fixed the issue of coredump when using VARIANT type with light_schema_change disabled. #40908
  • Improved cache performance for high-concurrency point queries. #44077
  • Fixed the issue that bloom filter indexes were not synchronized when columns were deleted. #43378
  • Fixed instability issues with es catalog under special circumstances such as mixed array and scalar data. #40314 #40385 #43399 #40614
  • Fixed coredump issues caused by abnormal regular pattern matching. #43394

Permissions

Other

@gavinchou
Copy link
Contributor Author

行为变更

  • 禁止在具有同步物化视图的 MOW 表上进行列更新。#40190
  • 调整 RoutineLoad 的默认参数以提升导入效率。#42968
  • 当 StreamLoad 失败时,LoadedRows 的返回值调整为 0。#41946 #42291
  • 将 Segment cache 的默认内存限制调整为 5%。#42308 #42436

新特性

  • 引入 enable_cooldown_replica_affinity 会话变量,用以控制冷热分层副本的亲和性。#42677

Lakehouse

异步物化视图

  • 引入新的物化视图属性 use_for_rewrite。当 use_for_rewrite 设置为 false 时,物化视图不参与透明改写。#40332

查询优化器

  • 支持关联非聚合子查询。#42236

查询执行

  • 增加了 ngram_searchnormal_cdfto_iso8601from_iso8601_dateSESSION_USER()last_query_id 函数。#38226 #40695 #41075 #41600 #39575 #40739
  • aes_encryptaes_decrypt 函数支持 GCM 模式。#40004
  • Profile 中输出变更的会话变量值。#41016 #41318

半结构化数据管理

  • 新增数组函数 array_match_allarray_match_any#40605 #43514
  • 数组函数 array_agg 支持在 ARRAY 中嵌套 ARRAY/MAP/STRUCT。#42009
  • 新增近似聚合统计函数 approx_top_kapprox_top_sum#44082

改进

存储

  • 支持将 bitmap_empty 作为默认值。#40364
  • 引入 insert_timeout 会话变量,用以控制 DELETE 语句的超时时间。#41063
  • 改进部分错误提示信息。#41048 #39631
  • 改进副本修复的优先级调度。#41076
  • 提高了建表时对时区处理的鲁棒性。#41926 #42389
  • 在创建表时检查分区表达式的合法性。#40158
  • 在 DELETE 操作时支持 Unicode 编码的列名。#39381

存算分离

Lakehouse

  • Paimon Catalog 支持阿里云 DLF 和 OSS-HDFS 存储。#41247 #42585

  • 支持读取 OpenCSV 格式的 Hive 表。#42257 #42942

  • 优化了访问 External Catalog 中 information_schema.columns 表的性能。#41659 #41962

  • 使用新的 Max Compute 开放存储 API 访问 Max Compute 数据源。#41614

  • 优化了 Paimon 表 JNI 部分的调度策略,使得扫描任务更加均衡。#43310

  • 优化了 ORC 小文件的读取性能。#42004 #43467

  • 支持读取 brotli 压缩格式的 parquet 文件。#42177

  • information_schema 库下新增 file_cache_statistics 表,用于查看元数据缓存统计信息。#42160

查询优化器

查询执行

  • 优化了 sort 算子的内存使用。#39306
  • 优化了 ARM 下运算的性能。#38888 #38759
  • 优化了一系列函数的计算性能。#40366 #40821 #40670 #41206 #40162
  • 使用 SSE 指令优化 match_ipv6_subnet 函数的性能。#38755
  • 在 insert overwrite 时支持自动创建新的分区。#38628 #42645
  • 在 Profile 中增加了每个 PipelineTask 的状态。#42981
  • IP 类型支持 runtime filter。#39985

半结构化数据管理

权限

  • LDAP 新增配置项 ldap_group_filter 用于自定义过滤 group。#43292

其他

  • FE 监控项中的连接数信息支持按用户分别显示。#39200

缺陷修复

存储

  • 修复 IPv6 hostname 使用问题。#40074

  • 修复 broker/s3 load 进度展示不准确问题。#43535

  • 修复查询从 FE 可能卡住的问题。#41303 #42382

  • 修复异常情况下自增 id 重复的问题。#43774 #43983

  • 修复 groupcommit 偶发 NPE 问题。#43635

  • 修复 auto bucket 计算不准确的问题。#41675 #41835

  • 修复 FE 重启时流控多表不能正确规划的问题。#41677 #42290

存算分离

Lakehouse

  • 禁止带有隐式转换的谓词条件下推给 JDBC 数据源,避免不一致的查询结果。#42102
  • 修复 Hive 高版本事务表的一些读取问题。#42226
  • 修复 Export 命令可能导致死锁的问题。#43083 #43402
  • 修复无法查询 Spark 创建的 Hive 视图的问题。#43552
  • 修复 Hive 分区路径中包含特殊字符导致分区裁剪有误的问题。#42906
  • 修复 Iceberg Catalog 无法使用 AWS Glue 的问题。#41084

异步物化视图

  • 修复基表重建后,异步物化视图可能无法刷新的问题。#41762

查询优化器

  • 修复使用多列 range 分区时,分区裁剪结果可能有误的问题。#43332
  • 修复部分 limit offset 场景下计算结果错误的问题。#42576

查询执行

  • 修复 hash join 时 array 类型的大小超过 4G 导致 BE Core 的问题。#43861
  • 修复 is null 谓词运算部分场景下结果不正确的问题。#43619
  • 修复 bitmap 类型在 hash join 时输出结果不正确的问题。#43718
  • 修复一些函数结果计算错误的问题。#40710 #39358 #40929 #40869 #40285 #39891 #40530 #41948 #43588
  • 修复一些 JSON 类型解析的问题。#39937
  • 修复 varchar 和 char 类型在 runtime filter 运算时的问题。#43758 #43919
  • 修复一些 decimal256 在标量函数和聚合函数里使用的问题。#42136 #42356
  • 修复 arrow flight 在连接时报 Reach limit of connections 错误的问题。#39127
  • 修复 k8s 环境下,BE 可用内存统计不正确的问题。#41123

半结构化数据管理

权限

其他

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant