2016-03-28 17 views
0

DSE Spark, 5 işte çalışan 3 düğüm kümesiyle kullanıyoruz. SIGTERM komutlarının işimizi durduran /var/log/spark/worker/worker-0/worker.log adresine geldiğini görüyoruz. Bu zamanlarda herhangi bir bellek veya işlemci kısıtlaması görmüyoruz ve hiç kimse bu çağrıları el ile yapmıyor.Datastax Spark İşler Nedeni Öldürüldü

YARN veya Mesos ile bir yığın boyutu sorunuyla sonuçlanan birkaç benzer sorun gördüm, ancak DSE'yi kullandığımız için bunlar alakalı görünmüyordu./

ERROR [SIGTERM handler] 2016-03-26 00:43:28,780 SignalLogger.scala:57 - RECEIVED SIGNAL 15: SIGTERM 
ERROR [SIGHUP handler] 2016-03-26 00:43:28,788 SignalLogger.scala:57 - RECEIVED SIGNAL 1: SIGHUP 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:28,795 Logging.scala:59 - Killing process! 
ERROR [File appending thread for /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stderr] 2016-03-26 00:43:28,848 Logging.scala:96 - Error writing stream to file /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stderr 
java.io.IOException: Stream closed 
     at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_71] 
     at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[na:1.8.0_71] 
     at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) ~[spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
ERROR [File appending thread for /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stdout] 2016-03-26 00:43:28,892 Logging.scala:96 - Error writing stream to file /var/lib/spark/worker/worker-0/app-20160325131848-0001/0/stdout 
java.io.IOException: Stream closed 
     at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) ~[na:1.8.0_71] 
     at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_71] 
     at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[na:1.8.0_71] 
     at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) ~[spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
     at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) [spark-core_2.10-1.4.1.3.jar:1.4.1.3] 
ERROR [SIGTERM handler] 2016-03-26 00:43:29,070 SignalLogger.scala:57 - RECEIVED SIGNAL 15: SIGTERM 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,079 Logging.scala:59 - Disassociated [akka.tcp://[email protected]:44131] -> [akka.tcp://[email protected]:7077] Disassociated ! 
ERROR [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,080 Logging.scala:75 - Connection to master failed! Waiting for master to reconnect... 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,081 Logging.scala:59 - Connecting to master akka.tcp://[email protected]:7077/user/Master... 
WARN [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,091 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,101 Logging.scala:59 - Disassociated [akka.tcp://[email protected]:44131] -> [akka.tcp://[email protected]:7077] Disassociated ! 
ERROR [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,102 Logging.scala:75 - Connection to master failed! Waiting for master to reconnect... 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,102 Logging.scala:59 - Not spawning another attempt to register with the master, since there is an attempt scheduled already. 
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,323 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:49943] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,330 Logging.scala:59 - Executor app-20160325132151-0004/0 finished with state EXITED message Command exited with code 129 exitStatus 129 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,414 Logging.scala:59 - Killing process! 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,415 Logging.scala:59 - Executor app-20160325131848-0001/0 finished with state EXITED message Command exited with code 129 exitStatus 129 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,417 Logging.scala:59 - Killing process! 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,422 Logging.scala:59 - Unknown Executor app-20160325132151-0004/0 finished with state EXITED message Worker shutting down exitStatus 129 
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,425 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:32874] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
WARN [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,433 Slf4jLogger.scala:71 - Association with remote system [akka.tcp://[email protected]:56212] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
INFO [sparkWorker-akka.actor.default-dispatcher-3] 2016-03-26 00:43:29,441 Logging.scala:59 - Executor app-20160325131918-0002/1 finished with state EXITED message Command exited with code 129 exitStatus 129 
INFO [sparkWorker-akka.actor.default-dispatcher-4] 2016-03-26 00:43:29,448 Logging.scala:59 - Unknown Executor app-20160325131918-0002/1 finished with state EXITED message Worker shutting down exitStatus 129 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,448 Logging.scala:59 - Shutdown hook called 
INFO [Spark Shutdown Hook] 2016-03-26 00:43:29,449 Logging.scala:59 - Deleting directory /var/lib/spark/rdd/spark-28fa2f73-d2aa-44c0-ad4e-3ccfd07a95d2 

cevap

0

Hata/var dosya

yazarken hata akımı bana yalındır görünüyor: Aşağıda

işlerin 2 koşuyordu 1 sunucudan günlük bilgi içeren bir örnek verilmektedir lib/kıvılcım/işçi/alt-0/uygulama-20160325131848-0001/0/stdout'u java.io.IOException: Akış java.io.BufferedInputStream.getBufIfOpen (BufferedInputStream.java:170) kapalı

Burada, veri kaynağınız (Cassandra) ve kıvılcım arasında bir ağ sorunu var. Node1'de gerçekte Spark'i hatırlayın, cassandra'nın2 node'sinden veri çekebilir/çeker, ancak bunu en aza indirmeye çalışır. Veya, serileştirmeniz sorunludur. Bu yüzden, Kryo'ya geçmek için bu parametreyi kıvılcım konfigürasyonuna ekleyin.

spark.serializer=org.apache.spark.serializer.KryoSerializer