Apache Iceberg: Definitive Guide Ch 3
SELECT Query
A SQL command used to retrieve data from an Iceberg table based on specified conditions.
Merge Query
A command that combines both update and insert operations depending on the existence of matching records.
Insert Query
A command to add new rows to an Iceberg table.
Manifest List
A list of manifest files that contains references to the actual datafiles.
Merge-on-Read (MOR)
A strategy where only the changes are written to new files, and the data is merged during reads.
Copy-on-Write (COW)
A strategy where the entire file is rewritten for any row-level updates.
Metadata File
Contains information about the schema, partitioning, and current snapshots of an Iceberg table.
Manifest File
Contains metadata about datafiles, such as file paths, statistics, and partition information.
Partition Pruning
Filtering out partitions that are not relevant to the query, improving read performance.
Concurrent Writes
Handling multiple write operations simultaneously while maintaining data consistency.
Time-Travel Query
Querying historical states of a table using snapshots by specifying timestamps or snapshot IDs.
Snapshot
Represents the state of the table at a specific point in time.
Create Table Statement
SQL command used to create a new Iceberg table with specified schema and partitioning.
File Pruning
Skipping irrelevant files during query execution based on metadata to improve performance.
Null Value Counts
Statistics indicating the number of null values in each column of a datafile.
Upper and Lower Bounds
Statistics stored in manifest files indicating the range of values for each column in a datafile.
Datafile
The actual file where table data is stored, typically in formats like Parquet.
Catalog Interaction
The process where the query engine retrieves the current metadata file location from the catalog.
Write Query Lifecycle
The sequence of steps to insert, update, or delete data in an Iceberg table.
Read Query Lifecycle
The sequence of steps to retrieve data from an Iceberg table, ensuring optimal performance.