In this tutorial, we are understanding about Apache Kafka. Apache Kafka is a highly scalable distributed messaging system. Kafka is useful where a large amount of data transfer from one end to another end and it should be scalably, fast, and reliable. We can transfer data between different sources. LinkedIn developed Kafka, and later on, it becomes open-sourced in early 2011.
Kafka Terminologies
ProducerA producer is an application that sends messages or data. For Kafka, it is an array of bytes.
ConsumerA consumer is an application that reads data from the Kafka.
BrokerA broker is the Kafka server or we can say that it is a software service running on virtual/physical machine. Producer and Consumer can't communicate directly. They always communicate through broker service.
TopicThe topic is a logical entity or in simple words, it's a unique name for Kafka stream. The producer produces topics and consumer consumes the topic.
PartitionsTopics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
OffsetsIt's nothing but sequence id given to message as they arrive in a partition. These ids are immutable. Offsets are local to partition.
ZookeeperApache Zookeeper is a distributed, open-source configuration, synchronization service along with the naming registry for distributed applications. Basically, Kafka – ZooKeeper stores a lot of shared information about Kafka Consumers and Kafka Brokers
All Kafka messages are organized into topics. Messages are sent to a specific topic and Messages are read from a specific topic. A consumer pulls messages off of a Kafka topic while producers push messages into a Kafka topic. Kafka runs in a cluster. Each node in the cluster is called a Kafka broker.
Kafka retains all published messages regardless of consumption. The retention period is configurable. The default retention time is 168 hrs or 7 days. The retention period is defined on a per topic basis.
Let's play with Kafka
NOTE: Your local environment must have Java 8+ installed.
STEP 1: GET KAFKA
Download the latest Kafka release and extract it:$ tar -xzf kafka_2.13-2.6.0.tgz $ cd kafka_2.13-2.6.0
STEP 2: START THE KAFKA ENVIRONMENT
Terminal-1: Start zookeeper service$ bin/zookeeper-server-start.sh config/zookeeper.propertiesTerminal-2: Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties
STEP 3: CREATE A TOPIC TO STORE YOUR EVENTS
Terminal-3: Use terminal 3$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092
STEP 4: DESCRIBE TOPIC
Terminal-3: Use terminal 3$ bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server localhost:9092 Topic:quickstart-events PartitionCount:1 ReplicationFactor:1 Configs: Topic: quickstart-events Partition: 0 Leader: 0 Replicas: 0 Isr: 0
STEP 5: WRITE SOME EVENTS INTO THE TOPIC
Terminal-3: Use terminal 3$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092 This is test event1. test event2. My test event3.
STEP 6: READ THE EVENTS
Terminal-4: Use terminal 4$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092 This is test event1. test event2. My test event3.
TERMINATE KAFKA
$ rm -rf /tmp/kafka-logs /tmp/zookeeper
Kafka UseCases
Kafka Messaging
Website Activity Tracking
Kafka Log Aggregation
Stream Processing
Kafka Event Sourcing
Companies using Kafka
LinkedIn,NetFlix,Twitter,Yahoo,Uber,Coursera,Microsoft,Mozilla,Bing,Oracle,GoldmanSachs and so many...
LinkedIn - Kafka
High Volume
Velocity
Variety
- 1.4 trillion msg/day
- 175 TB/day
- 433 million users
- peak 13 million msg/second
- Multiple RDBMS
- Multiple NoSQL DB - Hadoop, spark, etc
Advantages of Kafka
High-throughput
Low Latency
Fault-Tolerant
Durability
Scalability
High Concurrency
Real-time processing
0 comments:
Post a Comment