In the last three months, I am working with a new team building a product for Big Data analytics on Telecom domain.
Storm event processor is one of the main frameworks we use and it is really great. You can read more details on its official documentation (which has been improved).
Storm uses Workers to do your job, where each of them is a single JVM and is administrated internally by Storm (start, restart if no responsive, move Worker to another node of cluster, etc.). For a single job you can run many Workers on your cluster (Storm decides how to distribute your Workers in cluster nodes). As “node” I mean a running OS, either running on VM or on a physical machine.
The tricky point here is that all Workers in a node read the same configuration file (STORM_HOME/conf/storm.yaml) even they are running/processing a different kind of job. Additionally, there is a single parameter (worker.childopts) in this file, which is used for all Workers (of the same node) to initialize theirs JVMs (how to set JVM Options).
As we want to know how GC performs in each worker we need to monitor GC log of each Worker/JVM.
As I said, the problem is that as all Workers, in a node, read the same parameter from the same configuration file in order to initialize theirs JVMs, so it is not trivial to use a different GC logging file for each Worker/JVM.
Fortunately, Storm developers have expose a “variable” that solves this problem. This variable is named “ID” and it is unique for each Worker on each node (same Worker ID could exist in different nodes).
For Workers JVM Options, we use this entry in our “storm.yaml” file:
worker.childopts: "-Xmx1024m -XX:MaxPermSize=256m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/opt/storm/logs/gc-storm-worker-%ID%.log"
Be aware, that you have to add “%” before and after “ID” string (in order to be identified as an internal Storm variable).
Additionally, for Supervisor JVM Options (one process on each node), we use this entry in our “storm.yaml” file:
supervisor.childopts: "-Xmx512m -XX:MaxPermSize=256m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/opt/storm/logs/gc-storm-supervisor.log"
I have also included a kind of memory settings (“-Xmx” and “-XX:MaxPermSize”) too, but it is just an example.
Please keep in mind that Storm requires Oracle Hotspot JDK 6 (JDK 7/8 is not yet supported). This is a strong drawback, but we hope it will be fixed soon.
Hope it helps,
Democracy Requires Free Software