Metadata
- Author: Databricks
- Full Title:: Cost Based Optimizer in Apache Spark 2.2
- Category:: 🗞️Articles
- URL:: https://www.databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2-2.html
- Finished date:: 2023-04-30
Highlights
Note that this probably dont work on parquets. You need to store the statistics in Hive. Not sure about the interplay between this and Delta Tables
CBO relies on detailed statistics to optimize a query plan. To collect these statistics, users can issue these new SQL commands described below:
ANALYZE TABLE table_name COMPUTE STATISTIC
(View Highlight)