Python betikleri ile Hadoop Akışı üzerinde bir Harita Azaltma işi çalıştırmaya ve Hadoop Streaming Job failed error in python ile aynı hataları almaya çalışıyorum ancak bu çözümler benim için işe yaramadı.Hadoop Akış İşi Python'da Başarısız (Başarılı Değil)
./bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \
-input "p1input/*" \
-output p1output \
-mapper "python p1mapper.py" \
-reducer "python p1reducer.py" \
-file /Users/Tish/Desktop/HW1/p1mapper.py \
-file /Users/Tish/Desktop/HW1/p1reducer.py
(:
Ama çalıştırdığınızda aşağıdaki Ben "| ./p1mapper.py | | sıralama ./p1reducer.py kedi örnek.txt" çalıştırdığınızda
Komut dosyalarım iyi çalışır Not:
packageJobJar: [/Users/Tish/Desktop/HW1/p1mapper.py, /Users/Tish/Desktop/CS246/HW1/p1reducer.py, /Users/Tish/Documents/workspace/hadoop-0.20.2/tmp/hadoop-unjar4363616744311424878/] [] /var/folders/Mk/MkDxFxURFZmLg+gkCGdO9U+++TM/-Tmp-/streamjob3714058030803466665.jar tmpDir=null
11/01/18 03:02:52 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/18 03:02:52 INFO streaming.StreamJob: getLocalDirs(): [tmp/mapred/local]
11/01/18 03:02:52 INFO streaming.StreamJob: Running job: job_201101180237_0005
11/01/18 03:02:52 INFO streaming.StreamJob: To kill this job, run:
11/01/18 03:02:52 INFO streaming.StreamJob: /Users/Tish/Documents/workspace/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201101180237_0005
11/01/18 03:02:52 INFO streaming.StreamJob: Tracking URL: http://www.glassdoor.com:50030/jobdetails.jsp?jobid=job_201101180237_0005
11/01/18 03:02:53 INFO streaming.StreamJob: map 0% reduce 0%
11/01/18 03:03:05 INFO streaming.StreamJob: map 100% reduce 0%
11/01/18 03:03:44 INFO streaming.StreamJob: map 50% reduce 0%
11/01/18 03:03:47 INFO streaming.StreamJob: map 100% reduce 100%
11/01/18 03:03:47 INFO streaming.StreamJob: To kill this job, run:
11/01/18 03:03:47 INFO streaming.StreamJob: /Users/Tish/Documents/workspace/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201101180237_0005
11/01/18 03:03:47 INFO streaming.StreamJob: Tracking URL: http://www.glassdoor.com:50030/jobdetails.jsp?jobid=job_201101180237_0005
11/01/18 03:03:47 ERROR streaming.StreamJob: Job not Successful!
11/01/18 03:03:47 INFO streaming.StreamJob: killJob...
Streaming Job Failed!
: Ben "piton" kaldırmak veya -mapper ve -reducer için tam yol adını yazın bile, sonuç
En fazla bu çıkışı) aynıdır Her Başarısız/Öldürülen Görev Girişimi İçin
: p1mapper.py
#!/usr/bin/env python
from operator import itemgetter
import sys
word2count = {}
for line in sys.stdin:
word, count = line.split(' ', 1)
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except ValueError: # count was not a number
pass
# sort
sorted_word2count = sorted(word2count.items(), key=itemgetter(1), reverse=True)
# write the top 3 sequences
for word, count in sorted_word2count[0:3]:
print '%s\t%s'% (word, count)
herhangi bir yardım takdir gerçekten misiniz
#!/usr/bin/env python
import sys
import re
SEQ_LEN = 4
eos = re.compile('(?<=[a-zA-Z])\.') # period preceded by an alphabet
ignore = re.compile('[\W\d]')
for line in sys.stdin:
array = re.split(eos, line)
for sent in array:
sent = ignore.sub('', sent)
sent = sent.lower()
if len(sent) >= SEQ_LEN:
for i in range(len(sent)-SEQ_LEN + 1):
print '%s 1' % sent[i:i+SEQ_LEN]
p1reducer.py:
Map output lost, rescheduling: getMapOutput(attempt_201101181225_0001_m_000000_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201101181225_0001/attempt_201101181225_0001_m_000000_0/output/file.out.index in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
İşte benim Python komut dosyaları Teşekkürler!
GÜNCELLEME:
HDF'ler-site.xml dosyasını:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml dosyasını:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>
için sıkışmış ve mapred bu noktada oldu -site.xml yapılandırmaları lütfen. –
Yukarıda yapıştırıldı. Çok teşekkürler Joe! – sirentian