ApacheKafka
What are components of splunk/splunk architecture?
1) Search head - provides GUI for searching 2) Indexer - indexes machine data 3) Forwarder -Forwards logs to Indexer 4) Deployment server -Manges splunk components in distributed environment
what is the recommended minimum number of replicas to have in kafka?
3 replicas
what is an in-syc replica
A Replica that contain recent "enough" messages to become a leader are called "in-sync replica
what is the risk of retrying to send a failed message?
Duplicates messages?For example, if network issues prevented the broker acknowledgement from reaching the producer, but the message was successfully written and replicated, the producer will treat the lack of acknowledgement as a temporary network issue and will retry sending the message (since it can't know that it was received)
why should you keep on retrying?
Because things like lack of leader or network connectivity issues ofter take few seconds to resolve - and if you just let the producer keep trying until it succeeds, it means you don't need to handle these issues in any other way.
what happens when a broker encounters a disk problem in kafka
Broker will fail completely.Hard crash. If you are running undersupervisor it will restart and if issue persist it will fail again.
how will kafka place replica by default?
By default, Kafka will make sure each replica for a partition is on a separate broker.
what is one of the dangers of site effect of kakfa replication ?
Cascading failures.
How does Kafka ensure consistency ?
Each partition has a single replica designated as the leader. All produce and consume requests go through the leader, in order to guarantee con‐ sistency. Keyword:Leader Replica.
what does a topic replica factor of 3 imply?
Each partition is replicated 3 times on 3 different brokers.
How do you ensure that your replicas are protected against failure?
Do not put all replica on same rack or AZ.Use Kafka rack awareness property.
what is the ramification of auto.commit.interval.ms?
Duplicate processing.
How should you configure retries on critical data or if your goal is to never lose a message ?
In general, if your goal is to never lose a message - your best approach is to configure the producer to keep trying to send the messages when it encounters a retriable error
when is data considered safe in kafka?
It is safe when you have it on enough replicas and data will get to disk on its own time.Which implies you want to make sure you have enough replicas.
what is the goal of this command controlled.shutdown.enable=true
It will migrate any partitions the server is the leader for to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.
what is the implication of Setting unclean.leader.election.enable to true?
Means we allow out of sync repli‐ cas to become leaders (knowns as "unclean election"), knowing that we will lose mes‐ sages when this occurs.
How do you increase the HA of an offset topic in kakfka.?
Set offset replica factor/
How should you configure the number of retries in critical data?
Set retry setting to max. Space it out. Dont retry in a tight loop
what are some of the thing we can do when retry buffer is full?
block.on.buffer.full. log to a persistent queue for rety
what does auto.commit.interval.ms do
control how frequently off‐ sets will be committed
what atre the implication of more partitions in kafka on files.
higher number of open file handles, and the OS must be tuned accordingly.
what are the consequences of disabling unclean leader election.
if a broker containing the leader replica for a partition becomes unavailable, and no in-sync replica exists to replace it, the partition becomes unavailable until the leader replica or another in-sync replica is back online
what are the condition for controlled shutdown to succeed?
if all the partitions hosted on the broker have replicas (i.e. the replication factor is greater than 1 and at least one of these replicas is alive). This is generally what you want since shutting down the last replica would make that topic partition unavailable
when will you get the error leader unavailable?
leader for the partition we are writing to just crashed and a new one is still getting elected
what does acks = 0 mean and what is the implication of this ?
means that a message is considered to be written successfully to Kafka if the producer managed to send it over the network. implication: This means that even in the expected case of a clean leader election, your producer will lose messages - because it won't know that the leader is unavailable while a new leader is being elected
what does acks = 1 mean and what is the implication of this ?-implication on speed. message loss . Leader failure .
means that the leader will send either an acknowledgment or an error the moment it got the message and wrote it to the partition data file (but not necessarily synced to disk). implication: This means that under normal circumstances of leader election, your producer will get LeaderNotAvailableException while a leader is getting elected, and if the producer handles this error correctly (see next section), it will retry sending the message and it will arrive safely to the new leader. you can still lose data in the scenario where the message was successfully written to the leader, but then the leader crashed before the message was propagated to the replicas
what happens when we let an out of sync replica become a replica leader?
we risk data loss and data inconsistencies. If we don't allow them to become leaders, we face lower availability as we must wait for the original leader to become available before the partition is back online.
what does it mean for message to be commited in kafka?
when all in-syc replica have have gotten that message. Only commited message are visible to consumer.
when is data considered committed in kafka?
when it is written to all in-sync repicas. Even when "all" means just one and the data could be lost if that replica is unavailable.
when will a producer receive NotEnoughReplicasException error?
when we have less than the number of insyc replica specified in min.insync.replicas setting. Consumers can continue read‐ ing existing data Becomes a read only topic?
why dont we allow consumer to see messages until all in-syc replica have gotten them,?
For consistency.
when is a replica considered in-syc in kafka?
Has an active session with Zookeeper - meaning, it sent a heartbeat to zookeeper in the last 6 seconds (configurable). Fetched messages from the leader in the last 10 seconds (configurable) Fetched the most recent messages from the leader in the last 10 seconds. That is - it isn't enough that the follower is still getting messages from the leader, it must have almost no lag
what the first reason and key reason for having more partition in kakfa?
Higher throughput/
whats makes a replica in-syc or what are the condition for a replica to be in-sync?
If a replica did not contact the leader in 10 seconds. if it is fetching messages but did not catch up to the most recent mes‐sage in 10 seconds. If broker looses connection to ZK.
when is the default kafka replica setting not safe enough ?
If all replicas for a partitions are placed on brokers that are on the same rack or AZ and that top-of-rack switch or AZ misbehaves, you will lose availability of the partition regardless of the replication factor
what happens when a kafka broker is shutdown gracefully?two optimizations
It will sync all its logs to disk to avoid needing to do any log recovery when it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery takes time so this speeds up intentional restarts. It will migrate any partitions the server is the leader for to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.
what are some of the ramification or consequences of slow replication in a kafka cluster from consume angle?
It will take longer for messages to be available for consumptions. Becos for messages to be consume they have to be replicated first. In-syc replica set.
Consequences of unclean leader election?
Messages that were not synced to the new leader are lost
How do deal with memory swapping issues in kakfa?
One way to avoid swapping is just to not configure any swap space at all. Having swap is not a requirement, but it does provide a safety net if something catastrophic hap‐ pens on the system. The recommen‐ dation is to set the vm.swappiness parameter to a very low value, such as 1 It is preferable to reduce the size of the page cache rather than swap.
Unclean Leader Election
Out-of-sync replica to become the leader
when can an Unclean Leader Election occur.(2 scenarios)
Partition had 3 replicas, the two followers became unavailable (lets say two brokers crashed). In this situation, as pro‐ ducers continue writing to the leader, all the messages are acknowledged and com‐ mitted (since the leader is the one and only in-sync replica)
what is the first word that come to mind when you here parition
Partition=parallelism
what happens when we have less than the minimum in-syc replica than we configured from a producer standpoint?
Producers will not get acks.Producer will continue to buffer messages till buffer is full.
what are the advantages of running with ack=0?
Running with acks=0 is very fast (which is why you see a lot of benchmarks with this configuration), you can get amazing through‐ put and utilize most of your bandwidth, but you are guaranteed to lose some mes‐ sages if you choose this route.
what is the implication of a higher replication factor
So higher replication factor leads to higher availa‐ bility, higher reliability and fewer disasters.
Performance implication of memory swapping for kakfa ?
The cost incurred by having pages of memory swapped to disk will show up as a noticeable impact in all aspects of performance in Kafka. Kafka makes heavy use of the system page cache, and if the VM system is swapping to disk, this shows that there is cer‐ tainly not enough memory being allocated to page cache.
More Partitions Requires More Open File Handles
The more partitions, the higher that one needs to configure the open file handle limit in the underlying operating system
How does disk affect kafka performance
The performance of producer clients will be most directly influenced by the through‐ put of the broker disk that is used for storing log segments. Kafka relies on disk I/O performance to provide a good response time to producers . Faster disk write=lower produce latency. Solid state disks have drasti‐ cally lower seek and access times and will provide the best performance. Spinning disks, on the other hand, are more economical and provide more capacity per uni
How do we protect against rack level or AZ misfortunes in kafka?
To protect against rack level misfortune, we rec‐ ommend placing brokers in multiple racks and using broker.rack broker configura‐ tion parameter to configure the rack name for each broker. If rack names are configured, Kafka will make sure replicas for a partition are spread across multiple racks, in order to guarantee even higher availability
what is the implication of Setting unclean.leader.election.enable to false?
We choose to wait for the original leader to come back online, resulting in lower availability. We typically see unclean leader elec‐ tion disabled (configuration set to false) in systems where data quality and consis‐ tency are critical - banking systems are a good example, most banks would rather be unable to process credit card payments for few minutes or even hours than risk pro‐ cessing a payment incorrectly
On topic with 3 replicas, setting NotEnoughReplicasException to 2 means ?
You can only write to a partition if at least 2 of the 3 replicas are in-sync.
What setting do you use to ensure that your data is written to more than one replica?
You need to set the minimum number of in-sync replicas to a higher value
what is the safest acknowledgement scheme?
ack =all. This is the safest option - the producer won't stop trying to send the message before it is fully committed
what are the 3 different acknowledgement modes producer can choose from?
acks = 0 acks = 1 acks = all
How do you deal with the possible message duplication issues in kafka becos kafka cant guarantee at-least once ?
any real-world applications add a unique identifier to each message to allow detecting duplicates and cleaning them when consuming the messages example elasticsearch. Other applications make the messages idempo‐ tent - meaning that even if the same message is sent twice, it has no negative impact on correctness. For example, the message "Account value is 110$" is idempotent, since sending it several time makes no different to the result. While the message "Add 10$ to the account" is not.
what does acks =all mean and what is the implication of this ?-implication on speed. message loss . Leader failure .
means that the leader will wait until all in-sync replicas got the message before sending back an acknowledgement or an error. Keyword:all in-sync replica In conjunction with min.insync.replica configuration on the broker, this lets you control how many repli‐ cas got the message before it is acknowledged. This is also the slowest option - the producer waits for all replicas to get all the messages before it can mark the message batch as "done" and carries on.
what setting configures how much you want to retry a message?
message.send.max.retries=3 by default
what controls the amount of time a follower can be inactive or behind before it is considered out of sync?
replica.lag.time.max.ms configuration parameter
I frequently get asked "how many times should configure the producer to retry?
the answer really depends on what you are planning on doing after the producer throws an exception that it retried N times and gave up.