Apache Kafka in a Nutshell

shaun Big Data Enthusiast and lover of all things distributed and scalable.

 

The Gentlest Introduction to Apache Kafka

While “Chaos Theory” says the universe trends to complexity, a good engineer should strive to tame that complexity by providing the simplest solution possible. But “simple” solutions can still be too complex for individuals whose job it is to provide usable information to end users and stake holders.

If Stephen Hawking can explain Universe in a Nutshell then surely we can explain a good software solution without putting the non-engineering population to sleep.

If Stephen Hawking can explain Universe in a Nutshell then surely we can explain a good software solution without putting the non-engineering population to sleep.

In this series we will give a gentle introduction to some modern technology solutions in the simplest way possible.

What is Apache Kafka?

Kafka is a scalable messaging system. In modern software architecture we break the components of an application into “micro-services”.  Those micro-services usually exchange information over a communication medium called an API.

Who is using Apache Kafka?

Apache Kafka is being used by the lion’s share of tech and Fortune 500 companies.  The technology is being utilized in two main areas.

  • Microservice Architecture
  • Big Data Processing

How does Apache Kafka work?

Let’s think about those services as a teacher and a student  taking notes.

 

What happens if the student falls asleep?

When he wakes up.  He has missed some important stuff!

Gentle intro to Kafka

But what if he had a machine taking notes for him, and when he woke up he could see what had taken place while he was “offline”?

 

We have “decoupled” the teacher and the student, allowing them both to work at their own speed.

Let’s think about a separate case with a good student who doesn’t fall asleep in class. In fact, he is such a good student that he’s taking notes for a friend who does not speak English and translating them in real-time to Spanish.

So now there are two steps that are taking place in the processing of the message.

  • Write down his notes
  • Translate and write down in Spanish

kafka 2 stage processing

At some point it is very plausible that the teacher may deliver at a speed that will cause the student to get backed up in the “processing” of the data, but luckily he is not transcribing directly from the teacher.  He is pulling his notes from the note-taking machine!

Where the teacher is pushing her words into the ether in real-time, the note tacking machine is queuing the messages up for the student to pull them down when he needs them – at his discretion!  In fact, the notes are in the machine at any of the students’ disposal if they want to use them.

Why Apache Kafka?

Apache Kafka allows us to decouple our processes, allowing for scaling of services and rapid message processing.

So now that we understand the analog process, let’s drop in the digital pieces.  First, we will replace the note-taking machine with Kafka and we will automate what the teacher is doing with a teaching micro-service and what the student is doing with a Spanish translation service.

We can create a Kafka “topic” to serve as a queue for the messages coming from the teaching micro-service.  The Spanish Translation service can “subscribe” to that topic and “pull” messages down when it is ready to process.

If your Spanish service gets overwhelmed you can call in for some backup!

 

 

If you want to bring other services on in the future to translate to other languages you have the notes queued up for as long as your retention policy allows.

 

Next Steps: Deploying Kafka in Production

And thats it!  Kafka is that simple.  The next step is to learn the components in detail and how to deploy it in production.

If you are an engineer wanting to learn more about deploying kafka into your system a good start could be The Cloud Natvz Kafka Essentials Course.

As always our trainings are taught by a Big Data Architect.

Tags : Big Data gentle introduction In a Nutshell kafka
shaun

Big Data Enthusiast and lover of all things distributed and scalable.

Related Posts

Big Data in a Nutshell

Big Data in a Nutshell

May 21, 2019

How many times a day do you hear the words “Big Data”?  You probably hear it from the full gamut of people: math people, empirically-challenged people, technical people, marketing folks, engineers, your clients and every time you eavesdrop on a conversation in a coffee shop in a business district. Speaking over people’s heads is a … Continue reading Big Data in a Nutshell

Read More
Connecting to Nest Camera API and Getting Live URL

Connecting to Nest Camera API and Getting Live URL

March 8, 2019

I love our smart cameras but I am tired of waiting for rollouts of new features before I can use the camera. I would also like to store images and video automatically to my own cloud storage for later batch analysis with some home grown ML models. In this tutorial I am going to walk … Continue reading Connecting to Nest Camera API and Getting Live URL

Read More