PyCon Pune 2018

PyCon Pune

Real-time log analytics using Probabilistic Data Structures in Redis

Submitted by Srinivasan Rangarajan (@cnu) on Friday, 15 September 2017

Technical level: Advanced

Abstract

There are two ways to solve any problem: Accurately or approximately. Accurate data structures has its disadvantages - too much memory usage and unscalable for real-time nature of data. In this talk we will see how to take advantage of the newly release Redis 4.0 with pluggable modules to build a data pipeline which uses probabilistic data structures to get real-time insights.

Outline

There are different insights and metrics that could be obtained from log events data. Processing the data in real-time and getting accurate results are possible in theory. In practice, not so easy.

Not all results and metrics need to be accurate. There are places where the tradeoff between accuracy and memory usage/scalability is worth it. That is where probabilistic data structures (PDS) come in. In this talk I will be explaining about different PDSs and how they work. And I will also be talking about how to use Redis and it’s pluggable module system to use these data structure much more efficiently.

  1. Introduction
  2. Log Analysis
    1. Problem: Parsing high volume & velocity log event data.
    2. Various metrics to be measured.
  3. Redis 4.0
    1. New Features in Redis 4.0
    2. Using the new modules system for accessing these data structures
  4. Difference between accurate data structures and probabilistic data structures
    1. Hyperloglog - Cardinality of sets
    2. Top-K - Getting the top k items from a data set
    3. Count Min Sketch - Get item counts
    4. Bloom Filters - Check for membership

Speaker bio

I have been using python for more than 10 years professionally and have worked with numerous startups, building their engineering platform to solve problems at a large scale. Currently I manage the entire engineering team at Mad Street Den and am responsible for building and scaling the entire platform on which different Computer Vision based Retail Automation products are being built. The products we built are being used by millions of users every day all over the globe.

I am a regular speaker in Pycon India and have talked in 2009, 2013 and 2016. Apart from speaking in other local meetups, I have also been on the editorial board for the Fifth Elephant Conference 2017 - identifying and helping speakers to fine-tune their talks. I also occasionally contribute to few open source projects and maintain a few of my own.

Slides

https://speakerdeck.com/cnu/probabilistic-data-structures

Comments

  • Kushal Das (@kushaldas) Reviewer a year ago

    Thank you for submitting the talk to PyCon Pune. The talk selection team will contact you here in case of any queries. Meanwhile, please make sure that you provide a link to the presentation slides.

Login with Twitter or Google to leave a comment