kafka specific

¡Supera tus tareas y exámenes ahora con Quizwiz!

offsets

Unique, incremental IDs of messages are called offsets.

consumers

consumer is like a java application and it reads data from a topic. just like producers, consumers will recover automatically from broker failures. data is read in order within each partition

zookeeper

decides the leaders and ISRs (in sync replicas) for brokers in kafka

what's the distributed part of kafka

each broker contains certain topic partitions. when you create a topic it automatically distributes across multiple brokers

exactly once

holy grail of delivery of messages can only be achieved kafka to kafka workflows using kafka streams API. you must use an idempotent consumer for kafka to external system workflows to avoid dupes in final DB

acks strategy

how producer writes data to kafka: acks can be 0,1, all 0 = no acknowledgement, producer won't wait for acknowledgement, possible data loss. 1 = default, producer waits for leader for acknowledgement, limited data loss all = leaders and all replicas send acknowledgement (no data loss)

cluster

multiple brokers, if you connect to a single broker you an connected to the entire cluster

at most once

not great method of consumer offset delivery. offsets committed as soon as message is received. if processing goes wrong, i.e. consumer goes down, the message will be lost

consumer groups

represent an application.

broker

think about this as the server that holds the topics and their associated partitions.

at least once

this is preferred method of consumer offset delivery. offsets are committed after the message is processed. so you read data do something with the data, then you commit the offset. if the process goes wrong (consumer goes down) message will be read again.

how is kafka fault tolerant

through the topic replication factor. A with 2 partitions and rep factor of 2. topic A - part 0 will be replicated on two brokers. If we lose one of those brokers, the working broker can still server the data.

Caveat to 'at least once'

to avoid duplicate processing of messages, you need to ensure your processing is idempotent (i.e. process same message twice it does not impact your system)

topics

particular stream of data (sort of like a table in a database). Topics are split into partitions.

kafka guarantees

1. messages are appended to topic-partition in order they are sent 2. consumers read msgs in order stored in a topic-partition. 3. with a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down 4. as long as # of partitions remains constant for a topic (no new partitions) the same key will always go to the same partition

To produce data to a topic, a producer must provide the Kafka client with...

any broker from the cluster and the topic name. kafka clients will route your data to the appropriate brokers and partitions for you.

consumer offsets

kafka stores the offsets at which a consumer group has been reading, they're committed live in a kafka topic named __consumer_offsets. so if a consumer dies it will ready back from where it left off. consumers chose when to deliver offsets.

producer : round robin

when there is no key provided, the data will get sent to every broker

messages

within each partition you have messages which are like individual entries of data. Each message entry is a atomic unit (single entry) in a log (partition of a topic).

producer

write data to topics (get data into kafka). Producers will automatically recover when broker fails.

when you create a topic what requirements must be met?

you must specify the number of partitions and a replication factor. Note that your replication factor cannot exceed the number of brokers that you have in your cluster


Conjuntos de estudio relacionados

TestOut Security Pro Chapter 8-11, TestOut Security Pro Chapters 10 & 11

View Set

Obligations & Contracts - Nature & Effects

View Set

Chapter 22 Nursing Care of the Child With an Alteration in Mobility/Neuromuscular or Musculoskeletal Disorder

View Set

Peds - Chapter 25: Nursing Care of the Child With a Hematologic Disorder SCA

View Set