A bit about me
          - I'm from Poland (Europe)
- 9 years of commercial exp in IT
- 7 years with Ruby and Rails
- I find Ruby quite useful
- I love open-source
- Interested in quality-assurance automation tools
- Running a blog mostly about Ruby related stuff
Please notify me if...
          - I speak 2 fast
- I should repeat something
- I should explain something better
- You have any questions
What is Apache Kafka?
          - Kafka is a high-throughput distributed messaging system
- Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization
- It can be elastically and transparently expanded without downtime
- It provides broadcasting to many applications
- Allows to build systems that are event based
Who uses Apache Kafka?
          - Linkedin
- Yahoo
- Twitter
- Netflix
- Square
- Spotify
- Pinterest
- Uber
- Tumblr
- Cisco
- Foursquare
- Shopify
- Oracle
- Urban Airship
- OVH
- And many more...
What is Karafka?
          - Karafka = Kafka + Ruby => KaR(uby)afka
- It is a microframework 
- It was designed to simplify Kafka based applications development
- It allows developers to build "Rails like" apps that consume and produce messages
Why we developed Karafka?
          - We've needed a tool that would allow us to build applications faster
- We've needed a tool that would allow us to process faster
- We've needed a tool that would allow us to handle events and messages from many sources and process them the same way
- Because single message can be automatically delivered to many Karafka applications
Why even bother with messaging when there is HTTP and REST?
          - HTTP does not provide broadcasting
- We often need to trigger many actions based on a single event
- We don't want to maintain internal API clients
- With a message broker you can replace microservices transparently
- You can obtain better microservices isolation
- Because you can create new microservices that use multiple different events from many sources
It really is about messaging
        Real life is asynchronous
        Microservices without broadcasting
          
            Without a broker you need to add code to both ends of your SOA system
          
          Microservices with broadcasting
          
          With a broker all you need to know is topic on which you want to listen and a message format
          
          Karafka uses goods that are already well known
          - Ruby-Kafka
- Celluloid to introduce sockets clustering inside threads
- Sidekiq to support background data processing
- Rails app structure concept for bigger apps
- Sinatra app structure concept for small apps
Karafka ecosystem
          
            Each part can be used independently
          
          - Karafka Framework - Engine to process incoming messages
- WaterDrop - Ruby-Kafka based library for outgoing messages
- Worker Glass - Worker wrapper that provides optional timeout and after failure (reentrancy)
Karafka framework components
          
            Apart of the implementation details, Karafka is combined from few logical parts:
          
          - Messages Consumer (Karafka::Connection::Consumer)
- Router (Karafka::Routing::Router)
- Base Controller (Karafka::BaseController)
- Base Worker (Karafka::BaseWorker)
- CLI (Karafka::Cli)
Karafka framework components
          How can I start using it?
# Gemfile
source 'https://rubygems.org'
gem 'karafka', github: 'karafka/karafka'
bundle install
bundle exec karafka install
          
            Then open app.rb and update configuration settings
          
          
          All the configutation options are described here:github.com/karafka/karafka
          
        Karafka conventions and features
        Karafka conventions and features
          
            Karafka has a routing engine similar to the Rails one (just much smaller)
          
App.routes.draw do
  topic :incoming_messages do
    group :composed_application
    controller Videos::DetailsController
    worker Workers::DetailsWorker
    parser Parsers::BinaryToJson
    interchanger Interchangers::Binary
  end
  # If you work with JSON data, only controller is required
  topic :new_videos do
    controller Videos::NewVideosController
  end
end
        Karafka conventions and features
NewVideosController  #=> NewVideosWorker
Users::PaymentsController #=> Users::PaymentsWorker
          
            By default Karafka builds a worker class per controller based on a controller name. This will allow you to prioritize (if needed) Sidekiq workers
          
        Karafka conventions and features
          
            You can overwrite all of the default behaviours
          
# If you work with JSON data, only controller is required
  topic :new_videos do
    controller Videos::NewVideosController
    # Instead of a default Videos::NewVideosWorker
    worker Videos::DifferentWorker
  end
end
        Karafka conventions and features
          
            Karafka controllers are simple. All you need is a #perform method that will be executed asynchronously in response to an incoming message
          
class CreateVideosController < Karafka::BaseController
  def perform
    Video.create!(params[:video])
  end
end
        Karafka conventions and features
          
            #before_enqueue filter that acts in a similar way to Rails #before_action
          
class CreateVideosController < Karafka::BaseController
  before_enqueue -> {
    # Reject old incoming messages
    # When before_enqueue throws false,
    # task won't be send to Sidekiq
    throw(:abort) if params[:sent_at] < 1.minute.ago
  }
end
          
            It can be used to provide first layer data filtering. If it returns false, Sidekiq task won't be scheduled
          
        Karafka conventions and features
There are also few usefull CLI commands available:
bundle exec karafka [COMMAND]
console # Start the Karafka console (short-cut alias: "c")
flow    # Print application data flow (incoming => outgoing)
help    # Describe available commands or one specific command
info    # Print configuration details and other options
install # Install all required things for Karafka application
routes  # Print out all defined routes in alphabetical order
server  # Start the Karafka server (short-cut alias: "s")
worker  # Start the Karafka Sidekiq worker (short-cut alias: "w")
        Karafka performance
          - Is strongly dependent on what you do in your code
- Redis performance (for Sidekiq) is a factor as well
- Message size is a factor
- Single process can handle around 30 000 messages/sec
- Less than a ms to send a message with the slowest (secure) mode (Kafka request.required.acks -1) 
- Less than 1/10 of a ms to send a message with in the 0 mode (Kafka request.required.acks 0) 
Karafka framework scalability
          
            Each scaling strategy targets a different problem
          
          
            Scaling strategies can be combined
          
          
            Following strategies are available:
          
          - Scaling using multiple Karafka threads
- Scaling using Kafka partitions
- Scaling using Karafka clusterization (in progress)
Scaling using multiple threads
          - Good when you have multiple topics that are not 100% utilized
- Good when you want to provide paralleism but still have a single process running
- Generally the easiest way to have multiple controllers listening at the same time
Scaling using multiple threads
          Scaling using Kafka partitions
          - Topic partition is the unit of parallelism in Kafka
- Partitions are an answer to heavy duty topics
- Karafka processes automatically rebalances between available partitions
- Karafka requires topics partitioning when you want to handle more than 30 000 messages per second per topic
Scaling using Kafka partitions
          Scaling using Karafka processes clustering
          - Single Karafka process can handle up to 30 000 messages per second (total)
- It means that the bigger your application is, the slower it gets (per controller)
- Thanks to process clustering, each Karafka process will listen only to a selected part of topics
- That way with a 10 process cluster, we can increase throughput to more than 300 000 messages per second
Scaling using Karafka processes clustering
          I want to integrate it with my Rails/Sinatra app
          - The best approach is to start generating messages from your current applications via
                WaterDrop
            
- With WaterDrop you can tell your Karafka apps what your other Ruby components are doing
def create
  video = Video.create!(params[:video])
  WaterDrop::Message.new(:video_created, video.to_json).send!
  respond_with video
end
          - Once you start sending messages, you can extract functionalities and responsibilities and move them to Karafka based applications
WANT TO CONTRIBUTE?
          
            github.com/karafka
          
          
            
          - The more people star it, the more people use it
- The more people use it, the more people star it
- There are many issues you can help us fix
- We use Code Climate and Travis with many QA tools to maintain the quality
 
    
          Karafka – Ruby framework for building Kafka message based applications
          
          
            
              Maciej Mensfeld 
              twitter: @maciejmensfeld 
              www: mensfeld.pl 
              e-mail: maciej@mensfeld.pl