hive - Weird behaviour with spark-submit -
i running following code in pyspark
:
in [14]: conf = sparkconf() in [15]: conf.getall() [(u'spark.eventlog.enabled', u'true'), (u'spark.eventlog.dir', u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationhistory'), (u'spark.master', u'local[*]'), (u'spark.yarn.historyserver.address', u'http://ip-10-0-0-220.ec2.internal:18088'), (u'spark.executor.extralibrarypath', u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.app.name', u'pyspark-shell'), (u'spark.driver.extralibrarypath', u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native')] in [16]: sc <pyspark.context.sparkcontext @ 0x7fab9dd8a750> in [17]: sc.version u'1.4.0' in [19]: sqlcontext <pyspark.sql.context.hivecontext @ 0x7fab9de785d0> in [20]: access = sqlcontext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json")
and runs smoothly (i can create tables in hive metastore, etc.)
but when try run code spark-submit
:
# -*- coding: utf-8 -*- __future__ import print_function import re pyspark import sparkcontext pyspark.sql import hivecontext pyspark.sql import row pyspark.conf import sparkconf if __name__ == "__main__": sc = sparkcontext(appname="minimal example 2") conf = sparkconf() print(conf.getall()) print(sc) print(sc.version) sqlcontext = hivecontext(sc) print(sqlcontext) # ## read access log file access = sqlcontext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") sc.stop()
i run code with:
$ spark-submit --master yarn-cluster --deploy-mode cluster minimal-example2.py
and runs without error (apparently), if check logs:
$ yarn logs -applicationid application_1435696841856_0027
it reads as:
15/07/01 16:55:10 info client.rmproxy: connecting resourcemanager @ ip-10-0-0-220.ec2.internal/10.0.0.220:8032 container: container_1435696841856_0027_01_000001 on ip-10-0-0-36.ec2.internal_8041 ===================================================================================== logtype: stderr loglength: 21077 log contents: slf4j: class path contains multiple slf4j bindings. slf4j: found binding in [jar:file:/yarn/nm/usercache/nanounanue/filecache/133/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/staticloggerbinder.class] slf4j: found binding in [jar:file:/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class] slf4j: see http://www.slf4j.org/codes.html#multiple_bindings explanation. slf4j: actual binding of type [org.slf4j.impl.log4jloggerfactory] 15/07/01 16:54:00 info yarn.applicationmaster: registered signal handlers [term, hup, int] 15/07/01 16:54:01 info yarn.applicationmaster: applicationattemptid: appattempt_1435696841856_0027_000001 15/07/01 16:54:02 info spark.securitymanager: changing view acls to: yarn,nanounanue 15/07/01 16:54:02 info spark.securitymanager: changing modify acls to: yarn,nanounanue 15/07/01 16:54:02 info spark.securitymanager: securitymanager: authentication disabled; ui acls disabled; users view permissions: set(yarn, nanounanue); users modify permissions: set(yarn, nanounanue) 15/07/01 16:54:02 info yarn.applicationmaster: starting user application in separate thread 15/07/01 16:54:02 info yarn.applicationmaster: waiting spark context initialization 15/07/01 16:54:02 info yarn.applicationmaster: waiting spark context initialization ... 15/07/01 16:54:03 info spark.sparkcontext: running spark version 1.4.0 15/07/01 16:54:03 info spark.securitymanager: changing view acls to: yarn,nanounanue 15/07/01 16:54:03 info spark.securitymanager: changing modify acls to: yarn,nanounanue 15/07/01 16:54:03 info spark.securitymanager: securitymanager: authentication disabled; ui acls disabled; users view permissions: set(yarn, nanounanue); users modify permissions: set(yarn, nanounanue) 15/07/01 16:54:03 info slf4j.slf4jlogger: slf4jlogger started 15/07/01 16:54:03 info remoting: starting remoting 15/07/01 16:54:03 info remoting: remoting started; listening on addresses :[akka.tcp://sparkdriver@10.0.0.36:41190] 15/07/01 16:54:03 info util.utils: started service 'sparkdriver' on port 41190. 15/07/01 16:54:04 info spark.sparkenv: registering mapoutputtracker 15/07/01 16:54:04 info spark.sparkenv: registering blockmanagermaster 15/07/01 16:54:04 info storage.diskblockmanager: created local directory @ /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-14127054-19b1-4cfe-80c3-2c5fc917c9cf 15/07/01 16:54:04 info storage.diskblockmanager: created local directory @ /data0/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-c8119846-7f6f-45eb-911b-443cb4d7e9c9 15/07/01 16:54:04 info storage.memorystore: memorystore started capacity 245.7 mb 15/07/01 16:54:04 info spark.httpfileserver: http file server directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/httpd-c4abf72b-2ee4-45d7-8252-c68f925bef58 15/07/01 16:54:04 info spark.httpserver: starting http server 15/07/01 16:54:04 info server.server: jetty-8.y.z-snapshot 15/07/01 16:54:04 info server.abstractconnector: started socketconnector@0.0.0.0:56437 15/07/01 16:54:04 info util.utils: started service 'http file server' on port 56437. 15/07/01 16:54:04 info spark.sparkenv: registering outputcommitcoordinator 15/07/01 16:54:04 info ui.jettyutils: adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.amipfilter 15/07/01 16:54:04 info server.server: jetty-8.y.z-snapshot 15/07/01 16:54:04 info server.abstractconnector: started selectchannelconnector@0.0.0.0:37958 15/07/01 16:54:04 info util.utils: started service 'sparkui' on port 37958. 15/07/01 16:54:04 info ui.sparkui: started sparkui @ http://10.0.0.36:37958 15/07/01 16:54:04 info cluster.yarnclusterscheduler: created yarnclusterscheduler 15/07/01 16:54:04 info util.utils: started service 'org.apache.spark.network.netty.nettyblocktransferservice' on port 49759. 15/07/01 16:54:04 info netty.nettyblocktransferservice: server created on 49759 15/07/01 16:54:05 info storage.blockmanagermaster: trying register blockmanager 15/07/01 16:54:05 info storage.blockmanagermasterendpoint: registering block manager 10.0.0.36:49759 245.7 mb ram, blockmanagerid(driver, 10.0.0.36, 49759) 15/07/01 16:54:05 info storage.blockmanagermaster: registered blockmanager 15/07/01 16:54:05 info scheduler.eventlogginglistener: logging events hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationhistory/application_1435696841856_0027_1 15/07/01 16:54:05 info cluster.yarnschedulerbackend$yarnschedulerendpoint: applicationmaster registered akkarpcendpointref(actor[akka://sparkdriver/user/yarnam#-1566924249]) 15/07/01 16:54:05 info client.rmproxy: connecting resourcemanager @ ip-10-0-0-220.ec2.internal/10.0.0.220:8030 15/07/01 16:54:05 info yarn.yarnrmclient: registering applicationmaster 15/07/01 16:54:05 info yarn.yarnallocator: request 2 executor containers, each 1 cores , 1408 mb memory including 384 mb overhead 15/07/01 16:54:05 info yarn.yarnallocator: container request (host: any, capability: <memory:1408, vcores:1>) 15/07/01 16:54:05 info yarn.yarnallocator: container request (host: any, capability: <memory:1408, vcores:1>) 15/07/01 16:54:05 info yarn.applicationmaster: started progress reporter thread - sleep time : 5000 15/07/01 16:54:11 info impl.amrmclientimpl: received new token : ip-10-0-0-99.ec2.internal:8041 15/07/01 16:54:11 info impl.amrmclientimpl: received new token : ip-10-0-0-37.ec2.internal:8041 15/07/01 16:54:11 info yarn.yarnallocator: launching container container_1435696841856_0027_01_000002 on host ip-10-0-0-99.ec2.internal 15/07/01 16:54:11 info yarn.yarnallocator: launching executorrunnable. driverurl: akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler, executorhostname: ip-10-0-0-99.ec2.internal 15/07/01 16:54:11 info yarn.yarnallocator: launching container container_1435696841856_0027_01_000003 on host ip-10-0-0-37.ec2.internal 15/07/01 16:54:11 info yarn.executorrunnable: starting executor container 15/07/01 16:54:11 info yarn.yarnallocator: launching executorrunnable. driverurl: akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler, executorhostname: ip-10-0-0-37.ec2.internal 15/07/01 16:54:11 info yarn.yarnallocator: received 2 containers yarn, launching executors on 2 of them. 15/07/01 16:54:11 info impl.containermanagementprotocolproxy: yarn.client.max-cached-nodemanagers-proxies : 0 15/07/01 16:54:11 info yarn.executorrunnable: starting executor container 15/07/01 16:54:11 info yarn.executorrunnable: setting containerlaunchcontext 15/07/01 16:54:11 info impl.containermanagementprotocolproxy: yarn.client.max-cached-nodemanagers-proxies : 0 15/07/01 16:54:11 info yarn.executorrunnable: setting containerlaunchcontext 15/07/01 16:54:11 info yarn.executorrunnable: preparing local resources 15/07/01 16:54:11 info yarn.executorrunnable: preparing local resources 15/07/01 16:54:11 info yarn.executorrunnable: prepared local resources map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s ize: 162896305 timestamp: 1435784032445 type: file visibility: private, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 032613 type: file visibility: private, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: fil e visibility: private, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: file visibility: priva te) 15/07/01 16:54:11 info yarn.executorrunnable: prepared local resources map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s ize: 162896305 timestamp: 1435784032445 type: file visibility: private, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 032613 type: file visibility: private, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: fil e visibility: private, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: file visibility: priva te) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor environment: map(classpath -> {{pwd}}<cps>{{pwd}}/__spark__.jar<cps>$hadoop_client_conf_dir<cps>$hadoop_conf_dir<cps>$hadoop_common_home/*<cps>$hadoop_common_home/lib/*<cps>$hadoop_hdfs_home/*<cps>$hadoo p_hdfs_home/lib/*<cps>$hadoop_yarn_home/*<cps>$hadoop_yarn_home/lib/*<cps>$hadoop_mapred_home/*<cps>$hadoop_mapred_home/lib/*<cps>$mr2_classpath, spark_log_url_stderr -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanounan ue/stderr?start=0, spark_yarn_staging_dir -> .sparkstaging/application_1435696841856_0027, spark_yarn_cache_files_file_sizes -> 162896305,281333,37562,2448, spark_user -> nanounanue, spark_yarn_cache_files_visibilities -> private,private,private,private, spark_yarn_mode -> true, spark_yarn_cache_files_time_stamps -> 1435784032445,1435784032613,1435784032652,1435784032692, pythonpath -> pyspark.zip:py4j-0.8.2.1-src.zip, spark_log_url_stdout -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanou nanue/stdout?start=0, spark_yarn_cache_files -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/applic ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_14 35696841856_0027/minimal-example2.py#minimal-example2.py) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor environment: map(classpath -> {{pwd}}<cps>{{pwd}}/__spark__.jar<cps>$hadoop_client_conf_dir<cps>$hadoop_conf_dir<cps>$hadoop_common_home/*<cps>$hadoop_common_home/lib/*<cps>$hadoop_hdfs_home/*<cps>$hadoo p_hdfs_home/lib/*<cps>$hadoop_yarn_home/*<cps>$hadoop_yarn_home/lib/*<cps>$hadoop_mapred_home/*<cps>$hadoop_mapred_home/lib/*<cps>$mr2_classpath, spark_log_url_stderr -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanounan ue/stderr?start=0, spark_yarn_staging_dir -> .sparkstaging/application_1435696841856_0027, spark_yarn_cache_files_file_sizes -> 162896305,281333,37562,2448, spark_user -> nanounanue, spark_yarn_cache_files_visibilities -> private,private,private,private, spark_yarn_mode -> true, spark_yarn_cache_files_time_stamps -> 1435784032445,1435784032613,1435784032652,1435784032692, pythonpath -> pyspark.zip:py4j-0.8.2.1-src.zip, spark_log_url_stdout -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanou nanue/stdout?start=0, spark_yarn_cache_files -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/applic ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_14 35696841856_0027/minimal-example2.py#minimal-example2.py) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor commands: list(ld_library_path="/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$ld_library_path", {{java_home}}/bin/java, -server, -xx:onoutofmemoryerror='kill %p', -xms1024m, -xmx 1024m, -djava.io.tmpdir={{pwd}}/tmp, '-dspark.ui.port=0', '-dspark.driver.port=41190', -dspark.yarn.app.container.log.dir=<log_dir>, org.apache.spark.executor.coarsegrainedexecutorbackend, --driver-url, akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler, --e xecutor-id, 1, --hostname, ip-10-0-0-99.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$pwd/__app__.jar, 1>, <log_dir>/stdout, 2>, <log_dir>/stderr) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor commands: list(ld_library_path="/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$ld_library_path", {{java_home}}/bin/java, -server, -xx:onoutofmemoryerror='kill %p', -xms1024m, -xmx 1024m, -djava.io.tmpdir={{pwd}}/tmp, '-dspark.ui.port=0', '-dspark.driver.port=41190', -dspark.yarn.app.container.log.dir=<log_dir>, org.apache.spark.executor.coarsegrainedexecutorbackend, --driver-url, akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler, --e xecutor-id, 2, --hostname, ip-10-0-0-37.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$pwd/__app__.jar, 1>, <log_dir>/stdout, 2>, <log_dir>/stderr) 15/07/01 16:54:11 info impl.containermanagementprotocolproxy: opening proxy : ip-10-0-0-37.ec2.internal:8041 15/07/01 16:54:14 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-99.ec2.internal:43176 15/07/01 16:54:15 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-37.ec2.internal:58472 15/07/01 16:54:15 info cluster.yarnclusterschedulerbackend: registered executor: akkarpcendpointref(actor[akka.tcp://sparkexecutor@ip-10-0-0-99.ec2.internal:49047/user/executor#563862009]) id 1 15/07/01 16:54:15 info cluster.yarnclusterschedulerbackend: registered executor: akkarpcendpointref(actor[akka.tcp://sparkexecutor@ip-10-0-0-37.ec2.internal:36122/user/executor#1370723906]) id 2 15/07/01 16:54:15 info cluster.yarnclusterschedulerbackend: schedulerbackend ready scheduling beginning after reached minregisteredresourcesratio: 0.8 15/07/01 16:54:15 info cluster.yarnclusterscheduler: yarnclusterscheduler.poststarthook done 15/07/01 16:54:15 info storage.blockmanagermasterendpoint: registering block manager ip-10-0-0-99.ec2.internal:59769 530.3 mb ram, blockmanagerid(1, ip-10-0-0-99.ec2.internal, 59769) 15/07/01 16:54:16 info storage.blockmanagermasterendpoint: registering block manager ip-10-0-0-37.ec2.internal:48859 530.3 mb ram, blockmanagerid(2, ip-10-0-0-37.ec2.internal, 48859) 15/07/01 16:54:16 info hive.hivecontext: initializing execution hive, version 0.13.1 15/07/01 16:54:17 info metastore.hivemetastore: 0: opening raw store implemenation class:org.apache.hadoop.hive.metastore.objectstore 15/07/01 16:54:17 info metastore.objectstore: objectstore, initialize called 15/07/01 16:54:17 info spark.sparkcontext: invoking stop() shutdown hook 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/metrics/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/stage/kill,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/api,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/static,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors/threaddump/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors/threaddump,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/environment/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/environment,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage/rdd/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage/rdd,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/pool/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/pool,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/stage/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/stage,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs/job/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs/job,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs,null} 15/07/01 16:54:17 info ui.sparkui: stopped spark web ui @ http://10.0.0.36:37958 15/07/01 16:54:17 info scheduler.dagscheduler: stopping dagscheduler 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: shutting down executors 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: asking each executor shut down 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-99.ec2.internal:49047 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-37.ec2.internal:36122 15/07/01 16:54:17 info ui.sparkui: stopped spark web ui @ http://10.0.0.36:37958 15/07/01 16:54:17 info scheduler.dagscheduler: stopping dagscheduler 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: shutting down executors 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: asking each executor shut down 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-99.ec2.internal:49047 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-37.ec2.internal:36122 15/07/01 16:54:17 info spark.mapoutputtrackermasterendpoint: mapoutputtrackermasterendpoint stopped! 15/07/01 16:54:17 info storage.memorystore: memorystore cleared 15/07/01 16:54:17 info storage.blockmanager: blockmanager stopped 15/07/01 16:54:17 info storage.blockmanagermaster: blockmanagermaster stopped 15/07/01 16:54:17 info spark.sparkcontext: stopped sparkcontext 15/07/01 16:54:17 info scheduler.outputcommitcoordinator$outputcommitcoordinatorendpoint: outputcommitcoordinator stopped! 15/07/01 16:54:17 info remote.remoteactorrefprovider$remotingterminator: shutting down remote daemon. 15/07/01 16:54:17 info remote.remoteactorrefprovider$remotingterminator: remote daemon shut down; proceeding flushing remote transports. 15/07/01 16:54:17 info yarn.applicationmaster: final app status: succeeded, exitcode: 0, (reason: shutdown hook called before final status reported.) 15/07/01 16:54:17 info yarn.applicationmaster: unregistering applicationmaster succeeded (diag message: shutdown hook called before final status reported.) 15/07/01 16:54:17 info impl.amrmclientimpl: waiting application unregistered. 15/07/01 16:54:17 info remote.remoteactorrefprovider$remotingterminator: remoting shut down. 15/07/01 16:54:17 info yarn.applicationmaster: deleting staging directory .sparkstaging/application_1435696841856_0027 15/07/01 16:54:17 info util.utils: shutdown hook called 15/07/01 16:54:17 info util.utils: deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/pyspark-215f5c19-b1cb-47df-ad43-79da4244de61 15/07/01 16:54:17 info util.utils: deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/tmp/spark-c96dc9dc-e6ee-451b-b09e-637f5d4ca990 logtype: stdout loglength: 2404 log contents: [(u'spark.eventlog.enabled', u'true'), (u'spark.submit.pyarchives', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.yarn.app.container.log.dir', u'/var/log/hadoop-yarn/container/application_1435696841856_0027/container_1435696841856_0027_01_000001'), (u'spark.eventlog.dir', u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationhistory'), (u'spark.org.apache.hadoop.yarn.server.webproxy.amfilter.amipfilter.param.proxy_hosts', u'ip-10-0-0-220.ec2.internal'), (u'spark.yarn.historyserver.address', u'http://ip-10-0-0-220.ec2.internal:18088' ), (u'spark.ui.port', u'0'), (u'spark.yarn.app.id', u'application_1435696841856_0027'), (u'spark.app.name', u'minimal-example2.py'), (u'spark.executor.instances', u'2'), (u'spark.executorenv.pythonpath', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.submit.pyfiles', u''), (u'spark.executor.extralibrarypath', u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.master', u'yarn-cluster'), (u'spark.ui.filters', u'org.apache.hadoop.yarn.server.webproxy.amfilter.amipfilter'), (u'spark.org.apache.hadoop.yarn.server.w ebproxy.amfilter.amipfilter.param.proxy_uri_bases', u'http://ip-10-0-0-220.ec2.internal:8088/proxy/application_1435696841856_0027'), (u'spark.driver.extralibrarypath', u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.yarn.app.attemptid', u '1')] <pyspark.context.sparkcontext object @ 0x3fd53d0> 1.4.0 <pyspark.sql.context.hivecontext object @ 0x40a9110> traceback (most recent call last): file "minimal-example2.py", line 53, in <module> access = sqlcontext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") file "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 591, in read file "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 39, in __init__ file "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 619, in _ssql_ctx exception: ("you must build spark hive. export 'spark_hive=true' , run build/sbt assembly", py4jjavaerror(u'an error occurred while calling none.org.apache.spark.sql.hive.hivecontext.\n', javaobject id=o53))
the important parte last line: "you must build spark hive."
why? doing wrong?
i got same issue. turned out message spark misleading; there no missing jars. problem me java class hivecontext
, called pyspark, parses hive-site.xml
when it's constructed , there exception being raised during construction. (pyspark catches exception , incorrectly suggests it's due missing jar.) ended being error property hive.metastore.client.connect.retry.delay
, set 2s
. hivecontext
class tries parse integer, fails. change 2
, remove characters in hive.metastore.client.socket.timeout
, hive.metastore.client.socket.lifetime
.
note can more descriptive error calling sqlcontext._get_hive_ctx()
directly.
Comments
Post a Comment