Kafka Test Questions

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Name properties of EVERY broker.

1) Every broker has metadata for all topics and partitions in the cluster. 2) Each broker maintains only a subset of the topics and partitions at one time. 3) Every broker is a bootstrap broker. While every broker is a bootstrap broker only one broker is elected controller at a time.

What is a custom partitioner?

A Custom Partitioner allows you to customize how the partition number gets computed from a source message. For example, you may want a special customer to go to a specific partition.

What is a Kafka partition made of?

A file and 2 indexes per segment. Kafka partitions are made of segments (usually each segment is 1 GB), and each segment has 2 corresponding indexes (offset index and time index).

In Avro, adding an element to an enum (enumerated type) without a default value represents what kind of schema evolution?

Breaking

How do you get a Kafka consumer to shutdown immediately but gracefully?

Call consumer.wakeUp() and catch a WakeUpException. To break from the loop, we can use the consumer's wakeup() method from a separate thread. This will raise a WakeupException from the thread blocking in poll() .

A consumer is configured with enable.auto.commit=false. What happens when close() is called on the consumer object?

Calling close() on the consumer immediately triggers a partition rebalance as the consumer will not be available anymore.

Which offset commit strategy would you recommend for an at most once scenario?

Commit the Sync immediately after the poll. For at most once: var records=consumre.poll() consumer.commitSync(); ... // process records For at least once: var records=consumre.poll() ... // process records consumer.commitSync();

What does the auto.offset.reset property of KSQL do?

Consumers can the set auto.offset.reset property to earliest to start consuming from the beginning: SET 'auto.offset.reset'='earliest';

What is the disadvantage of request/response communication?

Coupling. Point-to-point (request-response) style will couple a client to the server.

A consumer failed to process record #10 but succeeded in processing record #11. What do you need to do to guarantee at least once processing?

Do not commit until successfully processing record #10

What client communication protocols are supported by the schema registry?

HTTP, HTTPS

What is the protocol used by Kafka clients to securely connect to the Confluent REST Proxy?

HTTPS (SSH/TLS) TLS - but it is still called SSL.

A consumer has auto.offset.reset=latest, and the topic partition currently has data for offsets going from 45 to 2311. The consumer group never committed any offsets at all. Where will the consumer read from?

Offset 45.

A consumer starts and has auto.offset.reset=latest. The consumer group has committed the offset 743 for the topic. Where will this consumer read from?

Offset 643

You are sending messages with keys to a topic. To increase throughput, you add new partitions. What happens to old and new data?

Old data stays in their original partitions. New data may hash differently and go to different partitions than before.

What types of exceptions could a producer get while trying to send a message?

SerializationException BufferExhaustedException

How do you prevent network-induced duplicates while producing to Kafka?

Set enable.indempotence=true Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer.

When using plain JSON data with Kafka Connect, you see the following error message: org.apache.kafka.connect.errors.DataException: JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields. How will you fix this error?

Set key.converter.schemas.enable and value.converter.schemas.enable to false. Unless your JSON has "schema" and "payload" in this structure, then you must turn off key an value converter.schemas.enable.

Your manager would like to have topic availability over consistency. Which setting do you need to change in order to enable that?

Set the broker configuration unclean.leader.election.enable=TRUE. Setting this parameter to TRUE allows non ISR replicas to become a leader ensuring availability but losing consistency as data loss will occur.

To allow consumers in a group to resume at the previously committed offset, what do you need to do?

Set the group.id for the consumer. Consumers of the same group will resume reading from where offsets where last committed for the group.

How do you set topic retention to 1 hour?

Set topic config retention.ms = 3600000

Partition leader election is done by whom?

The Kafka Broker that is the Controller is the broker that is responsible for electing partition leaders.

Define max.in.flight.requests.per.connection. What is the default?

The term "in-flight request" means a produce request that has not been acknowledged by the broker. The default is 5.

Define the different types of Kafka Streams windows? We want the average of all events in every 5 minute window updated every minute Which type of Kafka Streams window would we use?

Tumbling time window: (Time-based): Fixed-size, non-overlapping, gap-less windows Hopping time window: (Time-based): Fixed-size, overlapping windows Sliding time window: (Time-based):Fixed-size, overlapping windows that work on differences between record timestamps Session window: (Session-based): Dynamically-sized, non-overlapping, data-driven windows In this case, we want a hopping window.

What does the broker.rack parameter do?

broker.rack indicates the rack of the broker. This will be used in rack aware replication assignment for fault tolerance. Examples: `RACK1`, `us-east-1d` Partitions for newly created topics are assigned in a rack alternating manner. This is the only change broker.rack does.

Give some example syntax for a consumer to subscribe to multiple topics at the same time.

consumer.subscribe(Pattern.compile("topic\..*)); -- Using regex here consumer.subscribe(Arrays.asList("topic.history","topic.sports","topic.politics")); -- Using an array here

Name Kafka's stateful transformations.

count, join, reduce, aggregate, windowing

Which is not an Avro primitive type? null boolean int long float double bytes string date

date The remaining are valid Avro primitive types.

Give some examples of stateless transformation operations.

filter, map, mapValues, flatMap, flatMapValues GroupBy Branch Foreach

Which consumer setting do you use to associate similar topic consumers?

group.id

What is the syntax to find all the partitions without a leader?

kafa-topics.sh --zookeeper localhost:2181 --describe --unavailable-partitions

Name All the Kakfa Ports

kafka default ports: 9092, can be changed on server.properties; zookeeper default ports: 2181 for client connections; 2888 for follower(other zookeeper nodes) connections; 3888 for inter nodes connections; KSQL default ports: 8088 server listener;

Which Kafka CLI do you use to consume a topic?

kafka-console-consumer

Which command do you issue to increase the number of partitions to a topic?

kafka-topics.sh

What are 2 requirements for a Kafka broker to connect to a Zookeeper ensemble?

1) All brokers must share the same zookeeper.connect parameter. 2) All brokers must have a unique value for broker.id.

There are 5 brokers in a topic with 10 partitions and a replication factor of 3, and a quota of producer byte-rate of 1 MB/sec. What is the maximum throughput allowed for this client?

Kafka currently supports quotas by data volume. Clients that produce or fetch messages at a byte rate that exceeds their quota are throttled by delaying the response by an amount that brings the byte rate within the configured quota. 5 MB/s is the max throughput (1 MB/sec * 5 brokers).

Producers and Consumers, which are thread-safe?

Producers are thread-safe. Consumers are NOT thread-safe; so, one consumer needs to run in one thread.

What is a recommended heap size for Kafka?

RAM: In most cases, Kafka can run optimally with 6 GB of RAM for heap space. For especially heavy production loads, use machines with 32 GB or more. Extra RAM will be used to bolster OS page cache and improve client throughput.

What does the leader replica do?

Replicas, in general, are passive; they don't handle produce or consume requests. The leader replica handles all the produce and consume requests sent from the partition leader.

Which is NOT a valid authentication mechanism in Kafka? SAML SASL/GSSAPI SSL SADL/SCRAM

SAML

What is the producer's default max.request.size? What happens if you send a message of size 3 MB to a topic with the default size?

The default max.request.size is 1 MB. The Kafka producer divides a larger message into sizes of max.request.size and sends them in order.

Describe the code producer.send(producerRecord).get()

This is a synchronous get. This will decrease throughput.

You run a ecommerce website which manages customers, products and transactions. How would you model this data within Kafka?

Transactions as a stream. Customers and products as a table.

A topic has 3 replicas and you set min.insync.replicas to 2. If 2 of the 3 replicas are not available, what happens when a producer request will acks=all is sent to the broker?

You will receive NotEnoughReplicasException. With this configuration, a single in-sync replica become read-only. Producer requests will receiver NotEnoughReplicasException.

What do you lose when you enable an SSL endpoint in Kafka?

Zero copy Kafka consumers are unable to take advantage of zero-copy optimization over SSL encryption. This is because SSL requires copying of data for encryption before sending it over the wire.

How do you create a topic named test with 3 partitions and 3 replicas using the Kafka CLI?

bin/kafka-topics.sh -- create -- bootstrap-server localhost:9092 -- replication-factor 3 -- partitions 3 -- topic test (The above was created with multiple lines for readability, but, you would enter this all on one line).

A Zookeeper configuration has a tickTime of 2000, initLimit of 20 and a syncLimit of 5. What is the timeout value for followers to connect to Zookeeper?

tickTIme * initLimit = timeout 2000 * 20 = 40000 ms = 40 seconds tickTime represents the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime. initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader. syncLimit limits how far out of date a server can be from a leader.

How can a uniquely identify a Kafka message?

topic + partition + timestamp

A Zookeeper ensemble contains 3 servers. Over which ports should the members of the ensemble be able to communicate in the default configuration?

2181 - client port 2888 - peer port 3888 - leader port

Your streams application is reading from an input topic that has 5 partitions. You run 5 instances of your application, each with num.streams.threads set to 5. How many stream tasks will be created and how many will be active?

25 created, but only 5 active. One partition is assigned a thread; so, 25 threads (tasks) will be created, but only 5 will be active.

There are 3 topic producers with 5 partitions and 10 consumers. How many consumers will be idle?

5 Each partition is assigned to 1 and only 1 consumer; hence, 5 consumers will be sitting idle.

You have a Zookeeper cluster that needs to be able to withstand the loss of 2 servers and still be able to function. What size should your Zookeeper cluster have to be?

5 Your Zookeeper cluster needs to have been an odd number of servers, and must maintain a majority of servers up to be able to vote. Therefore, a 2n+1 zookeeper cluster can survive to n zookeeper processes being down; so, here, the right answer is n=2, 2*n+1 = 5.

What's the difference between forward compatibility and backward compatibility in schema evolution?

Backward Compatible: Consumer can delete a field or add optional fields. Forward Compatible: Producer can fields or delete optional fields. Full Compatible: Either Consumer or Producer can add or delete optional fields.

A kafka topic has a replication factor of 3 and a min.insync.replicas setting of 2. Also, acks=all is set. How many brokers can fail before the producer can no longer produce?

1 Consumers, on the other hand, can continue to read if 2 brokers go down.

You are using JDBC source connector to copy data from a table to a Kafka topic. There is one connector created with max.tasks = 2 on a cluster of 3 workers. How many tasks are launched?

1 JDBC connector allows only 1 task per table.

Which actions will trigger a partition rebalance for a consumer group?

1) Increase the number of partitions 2) A consumer in the consumer group shuts down 3) You add a new consumer to the consumer group

When auto.create.topics.enable is set to true, under what circumstances will a Kafka broker automatically create a topic?

1) Producer sends a message to a topic 2) Consumer reads messages from a topic 3) Client requests metadata from a topic

What are the functions of Zookeeper

1) Store dynamic topic configurations 2) ACL information 3) Controller registration 4) Broker registration

Describe the broker Controller.

1) The Controller is a broker that is elected by the Zookeeper ensemble. 2) It is responsible for partition leader election. 3) This broker does usual broker functions too. 4) The broker is elected by Zookeeper. 5) There can be only one Controller at a time.

What authentication protocols are used by Kafka?

1. HTTP Clients can authenticate themselves to the REST APIs of the Schema Registry, Kafka Connect Workers, REST Proxy or KSQL Server via: ◦ HTTP Basic Auth ◦ SASL 2. Intra-broker and client to broker authentication happens via: ◦ MTLS: Mutual TLS/SSL ◦ SASL/Plain: ◦ SASL/GSSAPI (Kerberos): ◦ SASL/SCRAM-SHA-256 and SASL/SCRAM-SHA-512: ◦ OAuth2 3. Brokers authenticate themselves to the ZooKeeper ensemble via SASL

A Kafka topic has the following: 1) A replication factor of 3 2) a min_insync.replicas of 1 3) acks=all How many brokers can go down without the topic going offline?

2 A replication factor of 3 means we start with 3 brokers, but we only require a 1 insync replica; so, 2 brokers can crash while the topic stays online.

You have a Connect sink set up with a topic with 2 partitions. What is the maximum number of tasks that you can setup and why?

2 You can't have more sink tasks than the number of partitions.

You have a JDBC connector created with Kafka Connect cluster with 3 workers. The parameter tasks.max is set to 2. How many total tasks are launched and why?

2 tasks.max is a connect parameter not a worker parameter. With one connector, there is a maximum number of 2 tasks.

A Zookeeper ensemble contains 5 servers. What is the maximum number of servers that can go missing and the ensemble still run?

2 can fail You need to keep a majority which is 3 in this case; so, 2 can fail.

A Kafka topic has a replication factor of 3 and min.insync.replicas setting of 2. How many brokers can go down before a producer with acks=1 can't produce.

2. min.insync.replicas does NOT impact producers when acks=1 (only when acks=all)

What is the output of a KStream - KTable join?

A KStream A KStream is being processed into another KStream.

You use Mirror Maker to replicate a topic for (reading) analytical purposes. What type of mirroring is this?

Active - Passive

What king of delivery guarantee does this consumer offer? while (true) { ConsumerRecords<String, String> records = consumer.poll(100); try { consumer.commitSync(); } catch (CommitFailedException e) { log.error("commit failed", e) } for (ConsumerRecord<String, String> record : records) { System.out.printf("topic = %s, partition = %s, offset = %d, customer = %s, country = %s ", record.topic(), record.partition(), record.offset(), record.key(), record.value()); } }

At-most-once Here offset is committed before processing the message. If consumer crashes processing the message, the message will be lost.

What happens if the producer uses compression.type=snappy?

Both the producers and consumers will have to compress/decompress the data. Kafka transfers data with zero copy and no transformation. Any transformation (including compression) is the responsibility of the clients.

What does the broker setting auto.create.topic.enable=TRUE do?

By default, Kafka Brokers allow automatic Topic creation. If a Topic is automatically created, it will be configured according to the defaults in the server.properties file of the Brokers. The default setting for default.replication.factor is 1 - change it to an appropriate value for your environment

Create Stream, show streams, explain, count and join: which of these commands write to the Kafka cluster?

Create Stream

How long is log compaction evaluated?

Every time a segment is closed.

A consumer application is using KafkaAvroDeserializer to deserialize Avro messages. What happens if the message schema is not present in the AvroDeserializer local cache?

First local cache is checked for the message schema. In case of a cache miss, the schema is pulled from the schema registry. An exception will be thrown if the Schema Registry does not have the schema.

How to turn on exactly once semantics?

For a single partition, Idempotent producer sends remove the possibility of duplicate messages due to producer or broker errors. To turn on this feature and get exactly-once semantics per partition—meaning no duplicates, no data loss, and in-order semantics—configure your producer to set "enable.idempotence=true".

In Avro, removing or adding a field that has a default is what type of schema evolution?

Full Clients with new schema will be able to read records with old schema and clients with old schema will be able to read records saved with the new schema.

What is returned by a producer.send() call in the Java API?

Future metadata object which contains an offset and partition

Define high throughput and low latency. Which parameters should you set for either?

High throughput get more data in one batch large fetch.min.bytes reasonable fetch.max.wait.ms Low latency get data as quickly as possible set fetch.min.bytes=1 (default)

What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in the producer?

If retries > 0 and max.in.flight.requests.per.connection > 1, then out-of-order messages may happen. For example, assume you send message batches m1, m2, and m3 with max.in.flight.requests.per.connection=3. If all three message batches are in-flight but only m2 fails, then only m2 is retried. However, since m1 and m3 are already in-flight they will likely arrive first. The message batches will arrive in the order m1, m3, m2 — out of order. Note that retries > 0 is "at least once" delivery and may introduce duplicates, especially if used with acks=all.

Where are Avro schemas stored?

In the _schemas topic

Which producer option increases the chance of batching?

Increasing linger.ms The option linger.ms forces the producer to wait to send messages, hence increasing the chance of creating batches.

What are good candidates for KTables? A transaction stream Store items returned Inventory contents right now Money made up until now

Inventory contents right now Money made until now These represent aggregations of streams.

How does a consumer commit offsets in Kafka?

It interacts with the Group Coordinator broker. Consumers do not directly write to the _consumer_offsets_topic. Instead, they interact with the Group Coordinator broker was was elected to manage that topic.

What is the default value for the consumer group id with the kafka-console-consumer CLI?

It uses a random consumer group.

What kind of Kafka Streams joins are ALWAYS windowed joins?

KStream-KStream joins

Which component of Kafka should you use if you intend to transform data?

Kafka Streams Kafka Streams is a library for building streaming applications specifically applications that transform input Kafka topics into output Kafka topics.

What is meant by zero-copy?

Kafka makes use of this zero-copy principal by requesting the kernel to move the data to sockets rather than moving it via the application. Zero-copy greatly improves application performance and reduces the number of context switches between kernel and user mode.

What Java library is KSQL based on?

Kakfa Streams KSQL is based on Kafka Streams and allows you to express transformations using SQL which is automatically converted into a Kafka Streams program on the backend.

Of the following errors, which are retriable from a producer perspective: NOT_LEADER_FOR_PARTITION TOPIC_AUTHORIZATION_FAILED NOT_ENOUGH_REPLICAS MESSAGE_TOO_LARGE INVALID_REQUIRED_ACKS

NOT_LEADER_FOR_PARTITION NOT_ENOUGH_REPLICAS

Is KSQL fully compliant to ANSI SQL?

No KSQL is a dialect inspired by ANSI SQL. It has some differences because it is geared at processing streaming data.

A consumer sends a request to commit offset 2000. There is a temporary communication problem, so the broker never gets the request and therefore never responds. Meanwhile, the consumer continues to process another batch and, this time, successfully committed offset 3000. What should you do to deal with the missed commit?

Nothing. The next commit at 3000 will take care of the previous commit at 2000.

How many controllers are in a Kafka cluster at any one time?

Only 1 The Controller is a thread off the main broker software which maintains the list of partition information, such as leaders and ISR lists. The Controller runs on exactly one Broker in the cluster at any point in time. The Controller monitors the health of every Broker by monitoring their interaction with ZooKeeper. If a Broker registration in ZK goes away, that indicates a Broker failure. The Controller elects new leaders for partitions whose leaders are lost due to failures and the nbroadcasts the metadata changes to all the Brokers so that clients can contact any Broker for metadata updates. The Controller also stores this data in ZooKeeper so that if the Controller itself fails, the new Controller elected by ZooKeeper will have a set of data to start with. If the Controller fails, ZooKeeper will assign the role to the next Broker to check in with ZooKeeper after the failure.

How can you change the producer's batching configuration?

Producers can adjust batching configuration parameters ◦ batch.size message batch size in bytes (Default: 16KB) ◦ linger.ms time to wait for messages to batch together (Default: 0, i.e., sendimmediately) ▪ High throughput: large batch.size and linger.ms, or flush manually ▪ Low latency: small batch.size and linger.ms The internal thread which pushes data from the producer to the brokers is triggered by twothresholds: batch.size and linger.ms. Batching provides higher throughput due to larger datatransfers and fewer RPCs but at the cost of higher latency due to the time it takes toaccumulate the batches

What does stateless message processing mean?

Stateless means processing of each message depends only on the message. For example, converting from JSON ot Avro or filtering a stream are both stateless operations.

What is the consequence of setting unclean.leader.election.enable=TRUE?

Setting unclean.leader.election.enable to TRUE means that we allow out-of-sync replicas to become leaders. We can lose records when this parameter is set to true.

What is the Confluent Schema Registry?

The Confluent Schema Registry is your safeguard agains incompatible schema changes and will be the component that ensures no breaking schema evolution will be possible. Kafka brokers do not look at your payload and your payload schema, and therefore will not reject data.

A producer application on one machine was able to send messages to a Kafka topic. Later, this application was moved to another machine, but it ran into authorization issues. What is the likely cause?

The Kafka ACL does not allow the other machine IP.

Describe the acks settings.

The acks setting is a producer setting. It represents the number of acknowledgments the Producer requires the leader to have before considering the request complete. This controls the durability of records. • acks=0: Producer will not wait for any acknowledgment from the server • acks=1: Producer will wait until the leader has written the record to its local log • acks=all: Producer will wait until all in-sync replicas have acknowledged receipt of the record. The acks config parameter determines behavior of Producer when sending messages. Use this to configure the durability of messages being sent.

A topic has 3 replicas and you set the min.insync.replicas to 2. If 2 of the 3 replicas are not available, what happens when a consumer request is sent to the broker?

The consumer can still read the one replica, but the producer will be unable to write if the producer is using acks=all.

A consumer client sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response. How does the client handle this exception?

The consumer should issue a retry. Producer and consumer requests can only be sent to the node hosting the partition leader. In case the consumer has the wrong leader of a partition, the client will automatically issue a metadata request. The metadata request can be handled by any node; so, clients know which broker are the designated leader for the topic partitions.

Your consumers often take about 6 minutes to process a record batch. The consumer will enter a rebalance situation even though it's still running. How can you fix this?

The default value for max.poll.interval.ms is 300000. The consumer enters rebalance because it's taking so long for the consumer to process the batch. Therefore, the solution is to increase the max.poll.interval.ms to 600000.

Name Internal Kafka Streams Topics

The following three topics are examples of internal topics used by Kafka Streams. The first two are regular join information, the third one is actually a RocksDB persistent StateStore: {consumer-group}--KSTREAM-JOINOTHER-000000000X-store-changelog {consumer-group}--KSTREAM-JOINTHIS-000000000X-store-changelog {consumer-group}--incompleteMessageStore-changelog

Producing messages with a key does what?

The key will help determine which topic partition the message will go into. Keys are necessary if you require strong ordering or grouping for messages that share the same key. If you require the messages with the same key are always seen in the correct order, attaching a key to messages will ensure messages with the same key always go to the same partition in a topic. Kafka guarantees order withing a partition, but not across partitions in a topic. Not providing a key will result in a round-robin distribution across partitions which would likely break message order.

Describe the min.insync.replicas configuration parameter

The min.insync.replicas parameter is a topic parameter. Use min.insync.replicas with acks=all. the min.insync.replicas combined with acks=all defines minimum number of replicas in ISR needed to satisfy a produce request.

Generally, the same key goes into the same partition, unless what happens?

The number of partition changes.

You have a topic configured with log.retention.hours, log.retention.minutes and log.retention.ms. Which takes precedence?

The parameter with the smallest units, in this case log.retention.ms.

A producer is sending messages to a broker which is communicating to the partition leader. What happens if this broker goes down?

The producer client is automatically connected to a new broker which communicates to the newly elected partition leader.

If you want to send binary data through the REST proxy, it first needs to be base64 encoded. Who does this encoding?

The producer. The REST proxy requires to receive the data over REST that is already base64 encoded; hence, it is the responsibility of the producer.

Name Internal Kafka Cluster Topics

There are several types of internal Kafka topics: __consumer_offsets is used to store offset commits per topic/partition. __transaction_state is used to keep state for Kafka producers and consumers using transactional semantics. _schemas is used by Schema Registry to store all the schemas, metadata and compatibility configuration.

How many leader replicas can there be for a topic? How many partition numbers of the same topic can be managed by the same broker on disk? What is the relationship of the number of brokers to the number of partitions?

There can be only 1 leader. A broker can manage more than 1 partition number of the same topic at the same time on the same disk (e.g. have 12 partitions with 1 broker). There is no relationship to the number of brokers and the number of partitions.

When is the onCompletion() method called?

This method is called by the producer, and it is a method of the callback interface. A callback method the user can implement to provide asynchronous handling of request completion. This method will be called when the record sent to the server has been acknowledged. Exactly one of the arguments will be non-null.

If you want to send binary data to the broker (aka "produce") via the REST api, what must you do as you send the data? What must the consumer do once it receives the data?

To send binary data through REST, you must base64 encode it prior to (or as you're sending the data). The broker, however, will decode the base64 into bytes, and the consumer will receive binary data. There is no need to de-encode on the consumer side.

You're creating a topic to calculate word counts. What is an important configuration parameter to set?

cleanup.policy=compact In this case, our Key is the word and the number of counts. We only want to keep the last log on the key word.

Where are the ACLs stored in Kafka as a default?

Under the Zookeeper node /kafka-acl/

Using Java, how to you force a consumer to read from a specific topic partition.

Use "assign" For example, >>> # manually assign the partition list for the consumer >>> from kafka import TopicPartition >>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234') >>> consumer.assign([TopicPartition('foobar', 2)]) >>> msg = next(consumer)

A producer just sent a message to the leader broker for a topic partition. The producer used acks=1, and the data has not yet been replicated to the followers. User which conditions will the consumer see the message?

When the high watermark has advanced. The log end offset is the offset of the last message written to a log. The high watermark offset is the offset of the last message that was successfully copied to all of the log's replicas.

How are Avro SpecificRecord Classes generate

With an Avro Schema + a Maven/Gragle plugin

Name Internal Kafka Connect Topics

connect-configs stores configurations, connect-status helps to elect leaders for connect, and connect-offsets store source offsets for source connectors

What is a worker in the context of Kafka connect?

Workers can function as both a producer and consumer,depending on the type of connector you install. Workers are processes running one or more tasks, each in a different thread.

You have two topics. One has 5 partitions, one has 3 partitions. You would like to do a stream-table join? How would you proceed?

You cannot do a KStream-KStream join because the topics are not co-partitioned. Instead, you would have to create a GlobalKTable. GlobalKTables have their datasets replicated on each Kafka Streams instance and therefore no repartitioning is required.

What is wrong with this syntax: consumer.subscriber(arrays.asList("topic1")); List<TopicsPartitions> pc = new.ArrayList<>(); pc.add(new PartitionsTopic("topic1",0)); pc.add(new PartitionsTopic("topic1",1)); consumer.assigne(pc);

You cannot use subscribe() and assign() with the same consumer. Subscribe() is used to leverage the consumer group mechanism. Assign() is used to manually control partition assignment and reads the assignment.

Your system takes too long to decide to rebalance when a consumer dies, and it takes too long to rebalance. What do you do?

You must decrease the session.timeout.ms on the consumer to recognize that a rebalance is required, AND you must decrease the max.poll.interval.ms to speed up the rebalance operation.

In Kafka Streams, by what value are internal topics prefixed by?

application.id

What does auto.offset.reset=none mean?

auto.offset.reset=none means that the consumer will crash if the offsets it's recovering from have already been deleted from Kafka. For example, say the topic has offsets from 45 to 2311, but the consumer has a committed offset of 10. In this case, the consumer will crash.

In Avro, removing a field that does NOT have a default is called what?

backward schema evolution

Which data formats are natively available with the Confluent REST Proxy? binary protobuf avro json

binary, avro, json (not protobuf)

Which 3 properties are required when creating a producer?

bootstrap.servers key.serializer value.serializer

What information is needed for a consumer to connect to a topic?

broker and topic name

What is the syntax for determining which partitions where one or more of the replicas for that partition are not in-sync with the leader?

kafka-topics.sh --zookeeper localhost:2181 --describe --under-replicated-partitions The important thing to remember here is that we query zookeeper (port 2181) and use the option --under-replicated-partitions.

What is the difference between producer.send() and producer.send.get?

producer.send() is asynchronous - you get a future metadata record producer.send().get is synchronous and blocking.

What is the most important metric for consumer processing?

records-lag-max Records lag is the calculated difference between a consumer's current log offset and a producer's current log offset. Records lag max is the maximum observed value of records lag.


संबंधित स्टडी सेट्स

AP Euro Multiple Choice Study Guide

View Set

Conceptual Framework of Financial Reporting - Chpt 1, 2, & 3

View Set