CCDAK Kafka Theory (Need to know)

¡Supera tus tareas y exámenes ahora con Quizwiz!

max.poll.records = 500 (default) does what?

(Consumer Poll Behavior) Controls how many records to receive per poll request. Increase if your messages are very small and have a lot of available RAM.

fetch.min.bytes = 1 (default) does what?

(Consumer Poll Behavior) Controls how much data you want to pull at least on each request. Help improving throughput and decreasing request number. At the cost of latency.

What is records-lag-max for a Consumer in Kafka?

(monitoring metrics) The maximum lag in terms of number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.

Segment come with two indexes (files) what are they?

1. An offset to position index (.index file): Allows Kafka where to read to find a message 2. A timestamp to offset index (.timestamp file): Allows Kafka to find a message with a timestamp

If a producer writes a 1GB/sec and consumer consumes at 250MB/sec then how many partitions are required?

4

Replication factor = 3 and partition = 2 if that is the case how many total partitions are distributed across Kafka Cluster?

6 partitions. Each partition will be having 1 leader and 2 ISR (in-sync replicas)

What is the Schema Registry Port?

8081

What is the REST Proxy port?

8082

What is the KSQL Port?

8088

List Broker Port:

9092

Start Consuming messages from kafka topic my-first-topic

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-first-topic --from-beginning >hello drew >learning kafka

Start Consuming messages in a consumer group from kafka topic my-first-topic

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-first-topic --group my-first-consumer-group --from-beginning

Produce messages to Kafka topic my-first-topic

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-first-topic --producer-property acks=all> >hello drew >learning kafka >^C

Shift offsets by 2 (backward) as another strategy

> bin/kafka-consumer-groups --bootstrap-server localhost:9092 --group my-first-consumer-group --reset-offsets --shift-by -2 --execute --topic my-first_topic

Shift offsets by 2 (forward) as another strategy

> bin/kafka-consumer-groups --bootstrap-server localhost:9092 --group my-first-consumer-group --reset-offsets --shift-by 2 --execute --topic my-first_topic

Describe consumer group

> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe -group my-first-consumer-group

Reset offset of consumer group to replay all messages

> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe -group my-first-consumer-group --reset-offsets --to-earliest --execute --topic my-first-topic

List all consumer groups

> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

Acks = all must be in conjunction with min.insync.replicas. Where can this be set?

Broker or topic level

Who has defaults for all topic configuration parameters?

Brokers

Is every broker in Kafka a bootstrap server? If so what does it know?

Every broker in Kafka is a "bootstrap server" which knows about all brokers, topics and partitions (metadata) that means Kafka client (e.g. producer,consumer etc) only need to connect to one broker in order to connect to entire cluster. At all times, only one broker should be the controller, and one broker must always be the controller in the cluster

When a consumer in a group has processed the data received from Kafka, it commits the offset in Kafka topic named _consumer_commit which is used when a consumer dies, it will be able to read back from where it left off. True or False?

False Instead the kafak topic is named _consumer_offset

When do old segments get deleted?

Depending on the log.retention.hours or log.retention.bytes rule

What happens when a producer sends a key that is null will data?

It is sent round robin

Where is min.insync.replicas = 2 set from?

Set at broker or topic level (Safe Producer Config)

Replication factor can not be greater then number of broker in the kafka cluster. If topic is having a replication factor of 3 then each partition will live on 3 different brokers. True or False?

True

Will adding a partition to a topic loose the guarantee of same key go to same partition?

True

With a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down. True or False?

True

min.insync.replica = 2 implies that at least 2 brokers that are ISR (including leader) must acknowledge. True or false?

True

What do you expect with enable.auto.commit=true & synchronous processing of batches? (Consumer Offset commit strategy)

With auto commit, offset will be committed automatically for you at regular interval (auto.commit.interval.ms=5000 by default) every time you call .poll(). If you don't use synchronous processing, you will be in "at most once" behavior because offsets will be committed before your data is processed.

Create a kafka topic with name my-first-topic

bin/kafka-topics.sh --zookeeper localhost:2181 --topic my-first-topic --create --replication-factor 1 --partitions 1

Delete kafka topic my-first-topic

bin/kafka-topics.sh --zookeeper localhost:2181 --topic my-first-topic --delete (Note: This will have no impact if delete.topic.enable is not set to true)

Describe kafka topic my-first-topic

bin/kafka-topics.sh --zookeeper localhost:2181 --topic my-first-topic --describe

Start a zookeeper at default port 2181

bin/zookeeper-server-start.sh config/zookeeper.properties

Deleted records can still be seen by consumers for a period determined by what?

delete.retention.ms=24 hours (default)

The offset of message is _________.

immutable

What configuration makes the cleaner check for work every 15 seconds?

log.cleaner.backoff.ms

What configuration is delete based on keys for that topic. Will delete old duplicate keys after the active segment is committed. (Kafka default for topic __consumer_offsets)

log.cleanup.policy=compact

What configuration is used to delete data based on the age of data (default is 1 week).

log.cleanup.policy=delete

What configuration is used for the max size in bytes for each partition?

log.retention.bytes = -1 (infinite default)

What configuration is used for the number hours to keep data for?

log.retention.hours= 1 week(deafult)

What is the configuration for time kafka will wait before closing the segment if not full?

log.segment.ms = 1 week (default)

What is the configuration for max size of a single segment in bytes

log.segments.bytes = 1 GB (default)

As long as number of partitions remains constant for a topic (no new partition), will the same key always go to the partition?

yes

Partition is having its own offset starting from _.

0

max.partition.fetch.bytes = 1MB (default)

Maximum data returned by broker per partition. If you read from 100 partition, you will need a lot of memory (RAM)

Deliver semantics what are the three different types and the definitions of each?

At most once : Offset are committed as soon as message batch is received. If the processing goes wrong, the message will be lost (it won't be read again) At least once (default): Offset are committed after the message is processed.If the processing goes wrong, the message will be read again. This can result in duplicate processing of message. Make sure your processing is idempotent. (i.e. re-processing the message won't impact your systems). For most of the application, we use this and ensure processing are idempotent. Exactly once: Can only be achieved for Kafka=>Kafka workflows using Kafka Streams API. For Kafka=>Sink workflows, use an idempotent consumer.

What is heartbeat.interval.ms=3 seconds(default)

Heartbeat is sent in 3 seconds interval. Usually 1/3rd of session.timeout.ms

What does the consumer heartbead thread do?

Heartbeat mechanism is used to detect if consumer application is dead.

What is session.timeout.ms=10 seconds (default)

If heartbeat is not sent in 10 seconds period, the consumer is considered dead. Set lower value to faster consumer rebalance.

ZooKeeper servers will be deployed on multiple nodes. This is called an ensemble. An ensemble is a set of 2n + 1 ZooKeeper servers where n is any number greater than 0. The odd number of servers allows ZooKeeper to perform majority elections for leadership. At any given time, there can be up to n failed servers in an ensemble and the ZooKeeper cluster will keep quorum. If at any time, quorum is lost, the ZooKeeper cluster will go down. In Zookeeper multi-node configuration, initLimit and syncLimit are used to govern how long following ZooKeeper servers can take to initialize with the current leader and how long they can be out of sync with the leader.

If tickTime=2000, initLimit=5 and syncLimit=2 then a follower can take (tickTime*initLimit) = 10000ms to initialize and may be out of sync for up to (tickTime*syncLimit) = 4000ms

auto.offset.reset=none does what?

It will throw an exception if no offset is found

When produce to a topic which does not exist and auto.create.topic.enable = true. How does it get created?

Kafka creates the topic automatically with the broker/topic settings num.partition and deafult.replication.factor.

What is batch.size=32KB or 64KB and what is it's purpose?

Maximum number of bytes that will be included in a batch (default 16KB). Any message bigger than the batch size will not be batched. (High Throughput Producer using compression and batching)

Automatically recover from errors in Producer are?

LEADER_NOT_AVAILABLE NOT_LEADER_FOR_PARTITION REBALANCE__IN_PROGRESS

What are the non retriable errors for a Producer?

MESSAGE_TOO_LARGE

What does max.poll.interval.ms = 5 minute (default) do?

Max amount of time between two .poll() calls before declaring consumer dead. If processing of message batch takes more time in general in application then should increase the interval.

What is linger.ms=20 and what is it's purpose?

Number of millisecond of a producer is willing to wait before sending a batch out. (default 0). Increase linger.ms value increasing the chance of batching. (High Throughput Producer using compression and batching)

Describe acks=0

Producer does not wait for ack (possible data loss)

Describe acks=1

Producer wait for leader ack (limited data loss)

Describe acks=alll

Producer wait for leader and replica ack (no data loss)

auto.offset.reset=latest does what?

Read from the end of the log (consumer offset)

Consumer offset can be lost if hasn't read new data in 7 days. If this is the case, how?

This can be controlled by broker setting offset.retention.minutes

If a key is sent then all message for that key will always go to the same partition? If so why?

This can be used to order the messages for a specific key since order is guarenteed in the same partition. Keys are hashed using the murmur2 algorithim by default then module of partitions

Are keys hashed by using "murmur2" algorithm by default?

True

Consumer read the messages in the order stored in topic-partition. True or False?

True

Example: replication.factor = 3, min.insync.replicas = 2, acks = all can only tolerate 1 broker going down, otherwise the producer will receive an exception NOT_ENOUGH_REPLICAS on send. True or False?

True

Messages are appended to a topic-partition in the order they are sent. True or False?

True

What is compression.type=snappy?

Value can be none(default), gzip, lz4, snappy. Compression is enabled at the producer level and doesn't require any config change in broker or consumer. Compression is more effective in case of bigger batch of messages being sent to in kafka. (High Throughput Producer using compression and batching)

Can a producer choose to send a key with a message?

Yes

Can a topic have one or more partition?

Yes

Does the poll mechanism also used to detect if the consumer application is dead?

Yes

Is enable.auto.commit=false & manual commit of offsets(recommended)? (Consumer Offset commit strategy)

Yes

Is order guaranteed within a partition and once data is written into the partition is it immutable?

Yes

It is not possible to delete a partition of a topic once it is created? Yes or No?

Yes

Kafka takes bytes as input without even loading them into memory. What is this called?

Zero Copy

List of the Default Ports for Zookeeper

Zookeeper: 2181 Zookeeper Leader Port: 3888 Zookeeper Election Port (Peer port) 2888

Start a kafka server at default port 9092

bin/kafka-server-start.sh config/server.properties

Find out all the partitions without a leader

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --unavailable-partitions

List all kafka topics

bin/kafka-topics.sh --zookeeper localhost:2181 --list

What is max.in.flight.per.connection?

number of producer request can be made in parallel (default is 5) (Safe Producer Config)

What is retries = MAX_INT

number of retries by producer incase of transient failure/exception. (default is 0). (Safe Producer Config)

Per thread ___ consumer is the rule.

one The consumer must not be multi-threaded

At a time only ___ segment is active in a ________

one partition

What is enable.idempotence = true?

producer send producerId with each message to identify for duplicate msg at kafka end. When kafka receives duplicate message with same producerId which is already committed. It does not commit it again and send ack to producer (default is false) (Safe Producer Config)

auto.offset.reset=earliest does what?

reads from the start of the log

Broker contains leader partition called leader of that partition and only leader can ________ and _____ data for partition.

receive serve

Partitions are made of ________ (.log files)

segments

The log cleanup happens on partition ________. Smaller/more segments mean the log cleanup will happen more often!

segments


Conjuntos de estudio relacionados

Music Test Middle Ages, Renaissance

View Set

Understanding Econ system/making ethical decisions

View Set

Module A : int'l institutions from a business perspective

View Set

Chapter 31 Environmental Emergencies

View Set

NS3: Damage Control & Firefighting

View Set

4410: H&I3 Exam 3 - Cognition, Behavior

View Set