databricks - data bricks with iceberg glue catlog - Stack Overflow

admin2025-04-26 18

hi i am using databricks to read and write to iceberg using glue catalog . this is my configuration

16.1 ML (includes Apache Spark 3.5.0, Scala 2.12)

org.apache.iceberg:iceberg-aws:1.7.1
org.apache.hadoop:hadoop-aws:3.3.4
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1

this my configuration code-

spark = SparkSession.builder \
                .config(f"spark.sql.catalog.test1", "org.apache.iceberg.aws.glue.GlueCatalog") \
                .config(f"spark.sql.catalog.test1.type", "glue") \
                .config(f"spark.sql.catalog.test1.warehouse", s3_path) \
                .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
                .config("spark.sql.files.maxRecordsPerFile", 2000000) \
                .config("spark.sql.shuffle.partitions", 100) \
                .config("spark.sql.adaptive.enabled", "true") \
                .config("spark.sql.defaultCatalog", "test1") \
                .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
                .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider") \
                .getOrCreate()

but i am keep on getting below error

: org.apache.spark.SparkException: Plugin class for catalog 'test1' does not implement CatalogPlugin: org.apache.iceberg.aws.glue.GlueCatalog.

if i using catalog type type as Hadoop everything working fine but if using glue getting this error , i try many way but just no luck any one have suggestion please help me .

hi i am using databricks to read and write to iceberg using glue catalog . this is my configuration

16.1 ML (includes Apache Spark 3.5.0, Scala 2.12)

org.apache.iceberg:iceberg-aws:1.7.1
org.apache.hadoop:hadoop-aws:3.3.4
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1

this my configuration code-

spark = SparkSession.builder \
                .config(f"spark.sql.catalog.test1", "org.apache.iceberg.aws.glue.GlueCatalog") \
                .config(f"spark.sql.catalog.test1.type", "glue") \
                .config(f"spark.sql.catalog.test1.warehouse", s3_path) \
                .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
                .config("spark.sql.files.maxRecordsPerFile", 2000000) \
                .config("spark.sql.shuffle.partitions", 100) \
                .config("spark.sql.adaptive.enabled", "true") \
                .config("spark.sql.defaultCatalog", "test1") \
                .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
                .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider") \
                .getOrCreate()

but i am keep on getting below error

: org.apache.spark.SparkException: Plugin class for catalog 'test1' does not implement CatalogPlugin: org.apache.iceberg.aws.glue.GlueCatalog.

if i using catalog type type as Hadoop everything working fine but if using glue getting this error , i try many way but just no luck any one have suggestion please help me .

Share Improve this question asked Jan 13 at 7:59 Dilip 7476 gold badges17 silver badges39 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The error you're seeing (Plugin class for catalog 'test1' does not implement CatalogPlugin: org.apache.iceberg.aws.glue.GlueCatalog) is happening because you need to use org.apache.iceberg.spark.SparkCatalog instead of org.apache.iceberg.aws.glue.GlueCatalog, as specified in the Iceberg AWS Documentation

Here’s the corrected configuration:

spark = SparkSession.builder \
    .config("spark.sql.catalog.test1", "org.apache.iceberg.spark.SparkCatalog") \  # Use SparkCatalog for Glue
    .config("spark.sql.catalog.test1.type", "glue") \  # Specify Glue as the catalog type
    .config("spark.sql.catalog.test1.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \  # Use S3FileIO for S3 operations
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \  # Enable Iceberg extensions
    .config("spark.sql.files.maxRecordsPerFile", 2000000) \  # Control file size
    .config("spark.sql.shuffle.partitions", 100) \  # Tune shuffle partitions
    .config("spark.sql.adaptive.enabled", "true") \  # Enable adaptive query execution
    .config("spark.sql.defaultCatalog", "test1") \  # Set default catalog
    .config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \  # Use S3A for "s3://" paths
    .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider") \  # AWS credentials provider
    .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2") \  # Faster commit algorithm
    .config("spark.hadoop.fs.s3a.committer.name", "magic") \  # Enable magic committer
    .config("spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled", "true") \  # Enable magic committer for all buckets
    .getOrCreate()

By setting spark.sql.catalog.test1.type to glue, you’re already telling Iceberg to use AWS Glue as the catalog backend. The org.apache.iceberg.aws.glue.GlueCatalog class is essentially an alias for the configuration spark.sql.catalog.test1.type=glue, introduced in Iceberg PR #9647.

Another thing to note is that the warehouse configuration is not required when using AWS Glue as your catalog. In Glue, the location for each table is specified individually when the iceberg table is created, rather than being tied to a central warehouse path.

Could be also better to use the config spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem to enable the "s3://..." path format directly, which is more intuitive and widely used in AWS environments, instead of using "s3a://..." as filesystem client format.

Additionally, you can improve performance and compatibility by using the S3A filesystem with the magic committer, which avoids costly rename operations during writes to S3 (S3A Magic Committer Documentation)

转载请注明原文地址:http://www.anycun.com/QandA/1745662365a91062.html

databricks - data bricks with iceberg glue catlog - Stack Overflow

1 Answer 1

databricksdata bricks with iceberg glue catlogStack Overflow