Apache Iceberg: Definitive Guide Ch 3

Ace your homework & exams now with Quizwiz!

SELECT Query

A SQL command used to retrieve data from an Iceberg table based on specified conditions.

Merge Query

A command that combines both update and insert operations depending on the existence of matching records.

Insert Query

A command to add new rows to an Iceberg table.

Manifest List

A list of manifest files that contains references to the actual datafiles.

Merge-on-Read (MOR)

A strategy where only the changes are written to new files, and the data is merged during reads.

Copy-on-Write (COW)

A strategy where the entire file is rewritten for any row-level updates.

Metadata File

Contains information about the schema, partitioning, and current snapshots of an Iceberg table.

Manifest File

Contains metadata about datafiles, such as file paths, statistics, and partition information.

Partition Pruning

Filtering out partitions that are not relevant to the query, improving read performance.

Concurrent Writes

Handling multiple write operations simultaneously while maintaining data consistency.

Time-Travel Query

Querying historical states of a table using snapshots by specifying timestamps or snapshot IDs.

Snapshot

Represents the state of the table at a specific point in time.

Create Table Statement

SQL command used to create a new Iceberg table with specified schema and partitioning.

File Pruning

Skipping irrelevant files during query execution based on metadata to improve performance.

Null Value Counts

Statistics indicating the number of null values in each column of a datafile.

Upper and Lower Bounds

Statistics stored in manifest files indicating the range of values for each column in a datafile.

Datafile

The actual file where table data is stored, typically in formats like Parquet.

Catalog Interaction

The process where the query engine retrieves the current metadata file location from the catalog.

Write Query Lifecycle

The sequence of steps to insert, update, or delete data in an Iceberg table.

Read Query Lifecycle

The sequence of steps to retrieve data from an Iceberg table, ensuring optimal performance.


Related study sets

Microeconomics Final (multiple choice)

View Set

Lecture 4: digital impressions and CAD/CAM technology

View Set

Chap 6 - Relationship Development and Therapeutic Comm

View Set

English 2 - Test: Passage Response -

View Set