Important Production bugs and fixes for Storm and Kafka integration

I will describe here a few details for Storm and Kafka integration modules, a few important bugs that you should be aware and how to overcome some of them (especially for production installations).

I am heavily using Apache Storm in production installations with Kafka as my main input source (Spout).

Storm integration modules with Kafka and versions:

Recently, I upgrade to Storm 1.0.3 (from 0.9.6) and to Kafka 0.9.0.1 (from 0.8.2.2).
Unfortunately, Storm 1.0.3 has 2 major bugs that you have to resolve in order to use it in a production environment.

Major bugs (related to Kafka):

  1. “New Kafka spout crashes if partitions are reassigned while tuples are in-flight [JIRA-2104]
    This is fixed in 1.0.x branch (Pull-1980)
  2. “Storm-kafka-client: Failed tuples are not always replayed” [JIRA-2087]
    This is fixed in 1.x branch (Pull-1826)

I faced the above bugs when started the migration process from Storm 0.9.6 to 1.0.3. When stressed my topologies, various things started to not work or either saw stalled Workers that had stopped processing data.
After reading many logs and doing many tests, we finally understood the problem (KafkaSpout bugs). We paused the migration process and we were looking to fix these problems.
Luckily, Storm committers had already fixed these bugs, so solution was already provided.
A big thanks to Storm community!!!!

In order to resolve these issues, I ported these two fixes in a forked version of “storm-kafka-client” and release the new customized module with a new maven version (1.0.3-<custom>1.0) . Then I just reference the new custom version in my projects.
Afterwards, we started stress tests again and everything work as expected.
Be aware that bug “2087” is fixed only in 1.x branch, but it is very easy to port it to 1.0.3 version.

Fortunately, a few days ago Storm 1.1.0 was released. This release already fixes these bugs and many others. I have not tested yet, but I will try it soon.
There was no Storm 1.1.0 release when I ported back these fixes to 1.0.3 release line.

If you plan to stay with Storm 1.0.3 release, then you have to be aware with a few additional bugs of this release that you may want to fix them in your “custom” release:

  • “Kafka outage can lead to lockup of topology” [STORM-2440] [FIX]
  • “ReportErrorAndDie doesn’t always die” [STORM-2194] [FIX]
  • “Utils.sleep method doesn’t set interrupted flag after catching InterruptedException” [STORM-2396] [FIX]
  • “Event Logger bolt is instantiated even if topology.eventlogger.executors=0” [STORM-2389] [FIX]
  • “Fail-back Blob deletion also fails in BlobSynchronizer.syncBlobs” [STORM-2386] [FIX] (related to Nimbus HA)
  • “Storm-HDFS’s listFilesByModificationTime is broken” [STORM-2350] [FIX]
  • “Type mismatch in ReadClusterState’s ProfileAction processing Map” [STORM-2345] [FIX]

Most of the above bugs (except 2440 & 2194) are already resolved in Storm 1.1.0 release. New release contains new features that you might be interested (Streaming SQL, Druid and OpenTSB integration, more).

Best regards,
Adrianos Dadis.

Real Democracy requires Free Software

Posted in Big Data, Software Development | Tagged , | 1 Comment

Stream All the Things in Athens Big Data Meetup – 28 Feb 2017

I have some good news 🙂

I am very pleasant to announce you that Dean Wampler, Big Data Architect for fast data products at Lightbend, will speak about streaming processing concepts, problems and solutions in our Athens Big Data Meetup!!!

The main subject is: Stream All the Things

Lightbend (former Typesafe) is the company behind Akka and Scala.

I am very sure that Dean will give a great talk, as his experience is really amazing.

Event will be held in ALBA college at 19:00 (28/2/2017). The venue has around 110 seats but there is space for people to stand as well (please RSVP).

See you there!!!

I am organizing Athens Big Data Meetup along with Euangelos Linardos and Stavros Kontopoulos.
We are always looking for speakers for our meetups. If you would like to give a talk please contact with me.

Regards,
Adrianos Dadis.

Real Democracy requires Free Software

Posted in Athens Big Data Meetup | Tagged , , , , | Leave a comment

Thoughts about Agile Greece Summit 2016

Last Friday, I was at Agile Greece Summit 2016.
It was a great conference with amazing speakers and very well organized.

Last year, I attended Agile Greece Summit (2015) which was great, but I believe recent conference was greater by many aspects.
This year there were about 400+ attenders instead of about 200 last year, which means that companies start to invest on Agile!!!
Speakers were very balanced and there were 2 main tracks well aligned with the sessions.

For my opinion, the most valuable sessions this year were:

  • “Managing for Happiness” (by Jurgen Appelo)
  • “15 Teams, 1 Continuous Delivery Pipeline” (by Fredrik Wendt)
  • “Spotify Running: Lessons learned from building a ‘Lean Startup’ inside a big tech company” (by Brendan Marsh)
  • “Improving Agility: Learning from Maersk Line Journey” (by Özlem Yüce)

The other sessions were great too, but these 4 were fantastic 🙂

Here are 16 Agile rules that help me everyday, which are all inspired from Agile Greece Summit conferences (2015 & 2016):

Agile Advices

There are many more things to say about Agile, but there are many good books and people that can help you further.

Agile coaching is fundamental for teams or companies that are new to Agile world.

Great thanks to all speakers, organizers and volunteers!!!

Best regards,
Adrianos Dadis.

Real Democracy requires Free Software

Posted in Release Management, Software Development | Tagged , , | Leave a comment

Big Data Meetup 206 – Apache Storm – Slides and Demo code

Hi all,

last Tuesday I talked about Stream processing using Apache Storm and Apache Kafka at 4th Athens Big Data Meetup.

I really enjoyed the talk. The audience was really interesting as their questions helped me focus and explain various tricky points on stream processing. Great thanks to all and especially to Konstantinos for his kind invitation and help.

Additional to our talk, there was an interesting academic presentation about Artificial Intelligence agents on backgammon game (and more related games) from Dr. Nikolaos Papahristou. Great ideas that make me rethink about AI from Big Data perspective. Compose AI Agents and Big Data techniques/frameworks seems to apply on some interesting use cases that worth investing some time.

Demo code available on github: sentiment-analysis-storm

Presentation slides available on Slideshare.


Regards,
Adrianos Dadis.

Real Democracy requires Free Software

Posted in Big Data, Java, Software Development | Tagged , , | Leave a comment

Speaking about Apache Storm and Apache Kafka at Big Data Meetup 24 May

Hello again.

me and Patroclos Christou, we will speak about Big Data Streaming processing using Apache Storm and Apache Kafka on Tuesday 24 May at 4th Big Data Meetup.

I will be glad to meet you there 🙂

Adrianos Dadis.

Real Democracy requires Free Software

Posted in Big Data, Java | Tagged , , | Leave a comment