DSE Spark, 5 işte çalışan 3 düğüm kümesiyle kullanıyoruz. SIGTERM komutlarının işimizi durduran /var/log/spark/worker/worker-0/worker.log adresine geldiğini görüyoruz. Bu zamanlarda herhangi bir bellek veya işlemci kısıtlaması görmüyoruz ve hiç kimse bu çağrıları el ile yapmıyor.Datastax Spark İşler Nedeni Öldürüldü
YARN veya Mesos ile bir yığın boyutu sorunuyla sonuçlanan birkaç benzer sorun gördüm, ancak DSE'yi kullandığımız için bunlar alakalı görünmüyordu./
ERROR [SIGTERM handler] 2016-03-26 00:43:28,780 SignalLogger.scala:57 - RECEIVED SIGNAL 15: SIGTERM
ERROR [SIGHUP handler] 2016-03-26 00:43:28,788 SignalLogger.scala:57 - RECEIVED SIGNAL 1: SIGHUP
INFO [Spark Shutdown Hook] 2016-03-26 00:43:28,795 Logging.scala:59 - Killing process!
ERROR [File appending thread for /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stderr] 2016-03-26 00:43:28,848 Logging.scala:96 - Error writing stream to file /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stderr
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[na:1.8.0_71]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) ~[na:1.8.0_71]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_71]
at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[na:1.8.0_71]
at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) ~[spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
ERROR [File appending thread for /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stdout] 2016-03-26 00:43:28,892 Logging.scala:96 - Error writing stream to file /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stdout
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[na:1.8.0_71]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) ~[na:1.8.0_71]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_71]
at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[na:1.8.0_71]
at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) ~[spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [spark-core_2.10-1.4.1.3.jar:1.4.1.3]
ERROR [SIGTERM handler] 2016-03-26 00:43:29,070 SignalLogger.scala:57 - RECEIVED SIGNAL 15: SIGTERM
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,079 Logging.scala:59 - Disassociated [akka.tcp://[email protected]:44131] -> [akka.tcp://[email protected]:7077] Disassociated !
ERROR [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,080 Logging.scala:75 - Connection to master failed! Waiting for master to reconnect...
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,081 Logging.scala:59 - Connecting to master akka.tcp://[email protected]:7077/user/Master...
WARN [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,091 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,101 Logging.scala:59 - Disassociated [akka.tcp://[email protected]:44131] -> [akka.tcp://[email protected]:7077] Disassociated !
ERROR [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,102 Logging.scala:75 - Connection to master failed! Waiting for master to reconnect...
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,102 Logging.scala:59 - Not spawning another attempt to register with the master, since there is an attempt scheduled already.
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,323 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:49943] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,330 Logging.scala:59 - Executor app-20160325132151-0004/0 finished with state EXITED message Command exited with code 129 exitStatus 129
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,414 Logging.scala:59 - Killing process!
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,415 Logging.scala:59 - Executor app-20160325131848-0001/0 finished with state EXITED message Command exited with code 129 exitStatus 129
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,417 Logging.scala:59 - Killing process!
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,422 Logging.scala:59 - Unknown Executor app-20160325132151-0004/0 finished with state EXITED message Worker shutting down exitStatus 129
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,425 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:32874] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,433 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:56212] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,441 Logging.scala:59 - Executor app-20160325131918-0002/1 finished with state EXITED message Command exited with code 129 exitStatus 129
INFO [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,448 Logging.scala:59 - Unknown Executor app-20160325131918-0002/1 finished with state EXITED message Worker shutting down exitStatus 129
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,448 Logging.scala:59 - Shutdown hook called
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,449 Logging.scala:59 - Deleting directory /var/lib/spark/rdd/spark-28fa2f73-d2aa-44c0-ad4e-3ccfd07a95d2