NodeManager OOM挂掉问题解决
博客原文
**hackershel**l
在更换JDK1.625到JDK1.745后,集群出现频繁死掉NM,出现结果为如下:
2015-08-12 16:35:06,662 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,10,system] threw an Error. Shutting down now...java.lang.OutOfMemoryError: Requested array size exceeds VM limitat java.lang.UNIXProcess$ProcessPipeInputStream.drainInputStream(UNIXProcess.java:267)at java.lang.UNIXProcess$ProcessPipeInputStream.processExited(UNIXProcess.java:280)at java.lang.UNIXProcess.processExited(UNIXProcess.java:187)at java.lang.UNIXProcess$3.run(UNIXProcess.java:175)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
和类似的
2015-08-12 16:37:56,893 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,10,system] threw an Error. Shutting down now...java.lang.OutOfMemoryError: Java heap spaceat java.lang.UNIXProcess$ProcessPipeInputStream.drainInputStream(UNIXProcess.java:267)at java.lang.UNIXProcess$ProcessPipeInputStream.processExited(UNIXProcess.java:280)at java.lang.UNIXProcess.processExited(UNIXProcess.java:187)at java.lang.UNIXProcess$3.run(UNIXProcess.java:175)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
在google搜索关键字hadoop UNIXProcess drainInputStream,找到关于JDK7的一些bug,在NM负载高的情况下,出现OOM问题。 详情请看HADOOP-10146
和一些相关解释:
JDK-8027348
JDK-8024521
后来更换JDK1.7_67则没出现OOM的问题