Storm event processor – GC log file per worker

In the last three months, I am working with a new team building a product for Big Data analytics on Telecom domain.

Storm event processor is one of the main frameworks we use and it is really great. You can read more details on its official documentation (which has been improved).

Storm uses Workers to do your job, where each of them is a single JVM and is administrated internally by Storm (start, restart if no responsive, move Worker to another node of cluster, etc.). For a single job you can run many Workers on your cluster (Storm decides how to distribute your Workers in cluster nodes). As “node” I mean a running OS, either running on VM or on a physical machine.

The tricky point here is that all Workers in a node read the same configuration file (STORM_HOME/conf/storm.yaml) even they are running/processing a different kind of job. Additionally, there is a single parameter (worker.childopts) in this file, which is used for all Workers (of the same node) to initialize theirs JVMs (how to set JVM Options).

As we want to know how GC performs in each worker we need to monitor GC log of each Worker/JVM.

As I said, the problem is that as all Workers, in a node, read the same parameter from the same configuration file in order to initialize theirs JVMs, so it is not trivial to use a different GC logging file for each Worker/JVM.

Fortunately, Storm developers have expose a “variable” that solves this problem. This variable is named “ID” and it is unique for each Worker on each node (same Worker ID could exist in different nodes).

For Workers JVM Options, we use this entry in our “storm.yaml” file:

worker.childopts: "-Xmx1024m -XX:MaxPermSize=256m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/opt/storm/logs/gc-storm-worker-%ID%.log"

Be aware, that you have to add “%” before and after “ID” string (in order to be identified as an internal Storm variable).

Additionally, for Supervisor JVM Options (one process on each node), we use this entry in our “storm.yaml” file:

supervisor.childopts: "-Xmx512m -XX:MaxPermSize=256m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/opt/storm/logs/gc-storm-supervisor.log"

I have also included a kind of memory settings (“-Xmx” and “-XX:MaxPermSize”) too, but it is just an example.

Please keep in mind that Storm requires Oracle Hotspot JDK 6 (JDK 7/8 is not yet supported). This is a strong drawback, but we hope it will be fixed soon.

Hope it helps,
Adrianos Dadis.

Democracy Requires Free Software

About these ads

About Adrianos Dadis

Adrianos is working as senior software engineer in telcos business domain. Particularly interested in distributed systems, enterprise integration and middleware services. He mainly works with JBoss, Weblogic, Apache Storm, Apache HBase, Tomcat, Java EE, Spring, Drools, Oracle SOA Suite and various ESBs.
This entry was posted in Big Data, Java, Software Development and tagged , . Bookmark the permalink.

One Response to Storm event processor – GC log file per worker

  1. Pingback: Storm event processor – GC log file per worker - | 大编程|久爱编程网

Post your thought

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s