hive - Weird behaviour with spark-submit -


i running following code in pyspark:

in [14]: conf = sparkconf()  in [15]: conf.getall()  [(u'spark.eventlog.enabled', u'true'),  (u'spark.eventlog.dir',   u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationhistory'),  (u'spark.master', u'local[*]'),  (u'spark.yarn.historyserver.address',   u'http://ip-10-0-0-220.ec2.internal:18088'),  (u'spark.executor.extralibrarypath',   u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'),  (u'spark.app.name', u'pyspark-shell'),  (u'spark.driver.extralibrarypath',   u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native')]  in [16]: sc  <pyspark.context.sparkcontext @ 0x7fab9dd8a750>  in [17]: sc.version  u'1.4.0'  in [19]: sqlcontext  <pyspark.sql.context.hivecontext @ 0x7fab9de785d0>  in [20]: access = sqlcontext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 

and runs smoothly (i can create tables in hive metastore, etc.)

but when try run code spark-submit:

# -*- coding: utf-8 -*-                                                                                                                                                                                                                                                             __future__ import print_function  import re  pyspark import sparkcontext pyspark.sql import hivecontext pyspark.sql import row pyspark.conf import sparkconf  if __name__ == "__main__":      sc = sparkcontext(appname="minimal example 2")      conf = sparkconf()      print(conf.getall())      print(sc)      print(sc.version)      sqlcontext = hivecontext(sc)      print(sqlcontext)      # ## read access log file                                                                                                                                                                                                                                                      access = sqlcontext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json")      sc.stop() 

i run code with:

$ spark-submit --master yarn-cluster  --deploy-mode cluster minimal-example2.py 

and runs without error (apparently), if check logs:

$ yarn logs -applicationid application_1435696841856_0027        

it reads as:

15/07/01 16:55:10 info client.rmproxy: connecting resourcemanager @ ip-10-0-0-220.ec2.internal/10.0.0.220:8032   container: container_1435696841856_0027_01_000001 on ip-10-0-0-36.ec2.internal_8041 ===================================================================================== logtype: stderr loglength: 21077 log contents: slf4j: class path contains multiple slf4j bindings. slf4j: found binding in [jar:file:/yarn/nm/usercache/nanounanue/filecache/133/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/staticloggerbinder.class] slf4j: found binding in [jar:file:/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class] slf4j: see http://www.slf4j.org/codes.html#multiple_bindings explanation. slf4j: actual binding of type [org.slf4j.impl.log4jloggerfactory] 15/07/01 16:54:00 info yarn.applicationmaster: registered signal handlers [term, hup, int] 15/07/01 16:54:01 info yarn.applicationmaster: applicationattemptid: appattempt_1435696841856_0027_000001 15/07/01 16:54:02 info spark.securitymanager: changing view acls to: yarn,nanounanue 15/07/01 16:54:02 info spark.securitymanager: changing modify acls to: yarn,nanounanue 15/07/01 16:54:02 info spark.securitymanager: securitymanager: authentication disabled; ui acls disabled; users view permissions: set(yarn, nanounanue); users modify permissions: set(yarn, nanounanue) 15/07/01 16:54:02 info yarn.applicationmaster: starting user application in separate thread 15/07/01 16:54:02 info yarn.applicationmaster: waiting spark context initialization 15/07/01 16:54:02 info yarn.applicationmaster: waiting spark context initialization ...  15/07/01 16:54:03 info spark.sparkcontext: running spark version 1.4.0 15/07/01 16:54:03 info spark.securitymanager: changing view acls to: yarn,nanounanue 15/07/01 16:54:03 info spark.securitymanager: changing modify acls to: yarn,nanounanue 15/07/01 16:54:03 info spark.securitymanager: securitymanager: authentication disabled; ui acls disabled; users view permissions: set(yarn, nanounanue); users modify permissions: set(yarn, nanounanue) 15/07/01 16:54:03 info slf4j.slf4jlogger: slf4jlogger started 15/07/01 16:54:03 info remoting: starting remoting 15/07/01 16:54:03 info remoting: remoting started; listening on addresses :[akka.tcp://sparkdriver@10.0.0.36:41190] 15/07/01 16:54:03 info util.utils: started service 'sparkdriver' on port 41190. 15/07/01 16:54:04 info spark.sparkenv: registering mapoutputtracker 15/07/01 16:54:04 info spark.sparkenv: registering blockmanagermaster 15/07/01 16:54:04 info storage.diskblockmanager: created local directory @ /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-14127054-19b1-4cfe-80c3-2c5fc917c9cf 15/07/01 16:54:04 info storage.diskblockmanager: created local directory @ /data0/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-c8119846-7f6f-45eb-911b-443cb4d7e9c9 15/07/01 16:54:04 info storage.memorystore: memorystore started capacity 245.7 mb 15/07/01 16:54:04 info spark.httpfileserver: http file server directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/httpd-c4abf72b-2ee4-45d7-8252-c68f925bef58 15/07/01 16:54:04 info spark.httpserver: starting http server 15/07/01 16:54:04 info server.server: jetty-8.y.z-snapshot 15/07/01 16:54:04 info server.abstractconnector: started socketconnector@0.0.0.0:56437 15/07/01 16:54:04 info util.utils: started service 'http file server' on port 56437. 15/07/01 16:54:04 info spark.sparkenv: registering outputcommitcoordinator 15/07/01 16:54:04 info ui.jettyutils: adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.amipfilter 15/07/01 16:54:04 info server.server: jetty-8.y.z-snapshot 15/07/01 16:54:04 info server.abstractconnector: started selectchannelconnector@0.0.0.0:37958 15/07/01 16:54:04 info util.utils: started service 'sparkui' on port 37958. 15/07/01 16:54:04 info ui.sparkui: started sparkui @ http://10.0.0.36:37958 15/07/01 16:54:04 info cluster.yarnclusterscheduler: created yarnclusterscheduler 15/07/01 16:54:04 info util.utils: started service 'org.apache.spark.network.netty.nettyblocktransferservice' on port 49759. 15/07/01 16:54:04 info netty.nettyblocktransferservice: server created on 49759 15/07/01 16:54:05 info storage.blockmanagermaster: trying register blockmanager 15/07/01 16:54:05 info storage.blockmanagermasterendpoint: registering block manager 10.0.0.36:49759 245.7 mb ram, blockmanagerid(driver, 10.0.0.36, 49759) 15/07/01 16:54:05 info storage.blockmanagermaster: registered blockmanager 15/07/01 16:54:05 info scheduler.eventlogginglistener: logging events hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationhistory/application_1435696841856_0027_1 15/07/01 16:54:05 info cluster.yarnschedulerbackend$yarnschedulerendpoint: applicationmaster registered akkarpcendpointref(actor[akka://sparkdriver/user/yarnam#-1566924249]) 15/07/01 16:54:05 info client.rmproxy: connecting resourcemanager @ ip-10-0-0-220.ec2.internal/10.0.0.220:8030 15/07/01 16:54:05 info yarn.yarnrmclient: registering applicationmaster 15/07/01 16:54:05 info yarn.yarnallocator: request 2 executor containers, each 1 cores , 1408 mb memory including 384 mb overhead 15/07/01 16:54:05 info yarn.yarnallocator: container request (host: any, capability: <memory:1408, vcores:1>) 15/07/01 16:54:05 info yarn.yarnallocator: container request (host: any, capability: <memory:1408, vcores:1>) 15/07/01 16:54:05 info yarn.applicationmaster: started progress reporter thread - sleep time : 5000 15/07/01 16:54:11 info impl.amrmclientimpl: received new token : ip-10-0-0-99.ec2.internal:8041 15/07/01 16:54:11 info impl.amrmclientimpl: received new token : ip-10-0-0-37.ec2.internal:8041 15/07/01 16:54:11 info yarn.yarnallocator: launching container container_1435696841856_0027_01_000002 on host ip-10-0-0-99.ec2.internal 15/07/01 16:54:11 info yarn.yarnallocator: launching executorrunnable. driverurl: akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler,  executorhostname: ip-10-0-0-99.ec2.internal 15/07/01 16:54:11 info yarn.yarnallocator: launching container container_1435696841856_0027_01_000003 on host ip-10-0-0-37.ec2.internal 15/07/01 16:54:11 info yarn.executorrunnable: starting executor container 15/07/01 16:54:11 info yarn.yarnallocator: launching executorrunnable. driverurl: akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler,  executorhostname: ip-10-0-0-37.ec2.internal 15/07/01 16:54:11 info yarn.yarnallocator: received 2 containers yarn, launching executors on 2 of them. 15/07/01 16:54:11 info impl.containermanagementprotocolproxy: yarn.client.max-cached-nodemanagers-proxies : 0 15/07/01 16:54:11 info yarn.executorrunnable: starting executor container 15/07/01 16:54:11 info yarn.executorrunnable: setting containerlaunchcontext 15/07/01 16:54:11 info impl.containermanagementprotocolproxy: yarn.client.max-cached-nodemanagers-proxies : 0 15/07/01 16:54:11 info yarn.executorrunnable: setting containerlaunchcontext 15/07/01 16:54:11 info yarn.executorrunnable: preparing local resources 15/07/01 16:54:11 info yarn.executorrunnable: preparing local resources 15/07/01 16:54:11 info yarn.executorrunnable: prepared local resources map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s ize: 162896305 timestamp: 1435784032445 type: file visibility: private, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 032613 type: file visibility: private, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: fil e visibility: private, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: file visibility: priva te) 15/07/01 16:54:11 info yarn.executorrunnable: prepared local resources map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s ize: 162896305 timestamp: 1435784032445 type: file visibility: private, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 032613 type: file visibility: private, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: fil e visibility: private, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkstaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: file visibility: priva te) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor environment: map(classpath -> {{pwd}}<cps>{{pwd}}/__spark__.jar<cps>$hadoop_client_conf_dir<cps>$hadoop_conf_dir<cps>$hadoop_common_home/*<cps>$hadoop_common_home/lib/*<cps>$hadoop_hdfs_home/*<cps>$hadoo p_hdfs_home/lib/*<cps>$hadoop_yarn_home/*<cps>$hadoop_yarn_home/lib/*<cps>$hadoop_mapred_home/*<cps>$hadoop_mapred_home/lib/*<cps>$mr2_classpath, spark_log_url_stderr -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanounan ue/stderr?start=0, spark_yarn_staging_dir -> .sparkstaging/application_1435696841856_0027, spark_yarn_cache_files_file_sizes -> 162896305,281333,37562,2448, spark_user -> nanounanue, spark_yarn_cache_files_visibilities -> private,private,private,private, spark_yarn_mode ->  true, spark_yarn_cache_files_time_stamps -> 1435784032445,1435784032613,1435784032652,1435784032692, pythonpath -> pyspark.zip:py4j-0.8.2.1-src.zip, spark_log_url_stdout -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanou nanue/stdout?start=0, spark_yarn_cache_files -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/applic ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_14 35696841856_0027/minimal-example2.py#minimal-example2.py) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor environment: map(classpath -> {{pwd}}<cps>{{pwd}}/__spark__.jar<cps>$hadoop_client_conf_dir<cps>$hadoop_conf_dir<cps>$hadoop_common_home/*<cps>$hadoop_common_home/lib/*<cps>$hadoop_hdfs_home/*<cps>$hadoo p_hdfs_home/lib/*<cps>$hadoop_yarn_home/*<cps>$hadoop_yarn_home/lib/*<cps>$hadoop_mapred_home/*<cps>$hadoop_mapred_home/lib/*<cps>$mr2_classpath, spark_log_url_stderr -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanounan ue/stderr?start=0, spark_yarn_staging_dir -> .sparkstaging/application_1435696841856_0027, spark_yarn_cache_files_file_sizes -> 162896305,281333,37562,2448, spark_user -> nanounanue, spark_yarn_cache_files_visibilities -> private,private,private,private, spark_yarn_mode ->  true, spark_yarn_cache_files_time_stamps -> 1435784032445,1435784032613,1435784032652,1435784032692, pythonpath -> pyspark.zip:py4j-0.8.2.1-src.zip, spark_log_url_stdout -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanou nanue/stdout?start=0, spark_yarn_cache_files -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/applic ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkstaging/application_14 35696841856_0027/minimal-example2.py#minimal-example2.py) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor commands: list(ld_library_path="/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$ld_library_path", {{java_home}}/bin/java, -server, -xx:onoutofmemoryerror='kill %p', -xms1024m, -xmx 1024m, -djava.io.tmpdir={{pwd}}/tmp, '-dspark.ui.port=0', '-dspark.driver.port=41190', -dspark.yarn.app.container.log.dir=<log_dir>, org.apache.spark.executor.coarsegrainedexecutorbackend, --driver-url, akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler, --e xecutor-id, 1, --hostname, ip-10-0-0-99.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$pwd/__app__.jar, 1>, <log_dir>/stdout, 2>, <log_dir>/stderr) 15/07/01 16:54:11 info yarn.executorrunnable: setting executor commands: list(ld_library_path="/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$ld_library_path", {{java_home}}/bin/java, -server, -xx:onoutofmemoryerror='kill %p', -xms1024m, -xmx 1024m, -djava.io.tmpdir={{pwd}}/tmp, '-dspark.ui.port=0', '-dspark.driver.port=41190', -dspark.yarn.app.container.log.dir=<log_dir>, org.apache.spark.executor.coarsegrainedexecutorbackend, --driver-url, akka.tcp://sparkdriver@10.0.0.36:41190/user/coarsegrainedscheduler, --e xecutor-id, 2, --hostname, ip-10-0-0-37.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$pwd/__app__.jar, 1>, <log_dir>/stdout, 2>, <log_dir>/stderr) 15/07/01 16:54:11 info impl.containermanagementprotocolproxy: opening proxy : ip-10-0-0-37.ec2.internal:8041 15/07/01 16:54:14 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-99.ec2.internal:43176 15/07/01 16:54:15 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-37.ec2.internal:58472 15/07/01 16:54:15 info cluster.yarnclusterschedulerbackend: registered executor: akkarpcendpointref(actor[akka.tcp://sparkexecutor@ip-10-0-0-99.ec2.internal:49047/user/executor#563862009]) id 1 15/07/01 16:54:15 info cluster.yarnclusterschedulerbackend: registered executor: akkarpcendpointref(actor[akka.tcp://sparkexecutor@ip-10-0-0-37.ec2.internal:36122/user/executor#1370723906]) id 2 15/07/01 16:54:15 info cluster.yarnclusterschedulerbackend: schedulerbackend ready scheduling beginning after reached minregisteredresourcesratio: 0.8 15/07/01 16:54:15 info cluster.yarnclusterscheduler: yarnclusterscheduler.poststarthook done 15/07/01 16:54:15 info storage.blockmanagermasterendpoint: registering block manager ip-10-0-0-99.ec2.internal:59769 530.3 mb ram, blockmanagerid(1, ip-10-0-0-99.ec2.internal, 59769) 15/07/01 16:54:16 info storage.blockmanagermasterendpoint: registering block manager ip-10-0-0-37.ec2.internal:48859 530.3 mb ram, blockmanagerid(2, ip-10-0-0-37.ec2.internal, 48859) 15/07/01 16:54:16 info hive.hivecontext: initializing execution hive, version 0.13.1 15/07/01 16:54:17 info metastore.hivemetastore: 0: opening raw store implemenation class:org.apache.hadoop.hive.metastore.objectstore 15/07/01 16:54:17 info metastore.objectstore: objectstore, initialize called 15/07/01 16:54:17 info spark.sparkcontext: invoking stop() shutdown hook 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/metrics/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/stage/kill,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/api,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/static,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors/threaddump/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors/threaddump,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/executors,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/environment/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/environment,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage/rdd/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage/rdd,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/storage,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/pool/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/pool,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/stage/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/stage,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/stages,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs/job/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs/job,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs/json,null} 15/07/01 16:54:17 info handler.contexthandler: stopped o.s.j.s.servletcontexthandler{/jobs,null} 15/07/01 16:54:17 info ui.sparkui: stopped spark web ui @ http://10.0.0.36:37958 15/07/01 16:54:17 info scheduler.dagscheduler: stopping dagscheduler 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: shutting down executors 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: asking each executor shut down 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-99.ec2.internal:49047 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-37.ec2.internal:36122 15/07/01 16:54:17 info ui.sparkui: stopped spark web ui @ http://10.0.0.36:37958 15/07/01 16:54:17 info scheduler.dagscheduler: stopping dagscheduler 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: shutting down executors 15/07/01 16:54:17 info cluster.yarnclusterschedulerbackend: asking each executor shut down 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-99.ec2.internal:49047 15/07/01 16:54:17 info yarn.applicationmaster$amendpoint: driver terminated or disconnected! shutting down. ip-10-0-0-37.ec2.internal:36122 15/07/01 16:54:17 info spark.mapoutputtrackermasterendpoint: mapoutputtrackermasterendpoint stopped! 15/07/01 16:54:17 info storage.memorystore: memorystore cleared 15/07/01 16:54:17 info storage.blockmanager: blockmanager stopped 15/07/01 16:54:17 info storage.blockmanagermaster: blockmanagermaster stopped 15/07/01 16:54:17 info spark.sparkcontext: stopped sparkcontext 15/07/01 16:54:17 info scheduler.outputcommitcoordinator$outputcommitcoordinatorendpoint: outputcommitcoordinator stopped! 15/07/01 16:54:17 info remote.remoteactorrefprovider$remotingterminator: shutting down remote daemon. 15/07/01 16:54:17 info remote.remoteactorrefprovider$remotingterminator: remote daemon shut down; proceeding flushing remote transports. 15/07/01 16:54:17 info yarn.applicationmaster: final app status: succeeded, exitcode: 0, (reason: shutdown hook called before final status reported.) 15/07/01 16:54:17 info yarn.applicationmaster: unregistering applicationmaster succeeded (diag message: shutdown hook called before final status reported.) 15/07/01 16:54:17 info impl.amrmclientimpl: waiting application unregistered. 15/07/01 16:54:17 info remote.remoteactorrefprovider$remotingterminator: remoting shut down. 15/07/01 16:54:17 info yarn.applicationmaster: deleting staging directory .sparkstaging/application_1435696841856_0027 15/07/01 16:54:17 info util.utils: shutdown hook called 15/07/01 16:54:17 info util.utils: deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/pyspark-215f5c19-b1cb-47df-ad43-79da4244de61 15/07/01 16:54:17 info util.utils: deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/tmp/spark-c96dc9dc-e6ee-451b-b09e-637f5d4ca990  logtype: stdout loglength: 2404 log contents: [(u'spark.eventlog.enabled', u'true'), (u'spark.submit.pyarchives', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.yarn.app.container.log.dir', u'/var/log/hadoop-yarn/container/application_1435696841856_0027/container_1435696841856_0027_01_000001'), (u'spark.eventlog.dir',  u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationhistory'), (u'spark.org.apache.hadoop.yarn.server.webproxy.amfilter.amipfilter.param.proxy_hosts', u'ip-10-0-0-220.ec2.internal'), (u'spark.yarn.historyserver.address', u'http://ip-10-0-0-220.ec2.internal:18088' ), (u'spark.ui.port', u'0'), (u'spark.yarn.app.id', u'application_1435696841856_0027'), (u'spark.app.name', u'minimal-example2.py'), (u'spark.executor.instances', u'2'), (u'spark.executorenv.pythonpath', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.submit.pyfiles', u''),  (u'spark.executor.extralibrarypath', u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.master', u'yarn-cluster'), (u'spark.ui.filters', u'org.apache.hadoop.yarn.server.webproxy.amfilter.amipfilter'), (u'spark.org.apache.hadoop.yarn.server.w ebproxy.amfilter.amipfilter.param.proxy_uri_bases', u'http://ip-10-0-0-220.ec2.internal:8088/proxy/application_1435696841856_0027'), (u'spark.driver.extralibrarypath', u'/opt/cloudera/parcels/cdh-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.yarn.app.attemptid', u '1')] <pyspark.context.sparkcontext object @ 0x3fd53d0> 1.4.0 <pyspark.sql.context.hivecontext object @ 0x40a9110> traceback (most recent call last):   file "minimal-example2.py", line 53, in <module>     access = sqlcontext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json")   file "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 591, in read   file "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 39, in __init__   file "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 619, in _ssql_ctx exception: ("you must build spark hive. export 'spark_hive=true' , run build/sbt assembly", py4jjavaerror(u'an error occurred while calling none.org.apache.spark.sql.hive.hivecontext.\n', javaobject id=o53)) 

the important parte last line: "you must build spark hive." why? doing wrong?

i got same issue. turned out message spark misleading; there no missing jars. problem me java class hivecontext, called pyspark, parses hive-site.xml when it's constructed , there exception being raised during construction. (pyspark catches exception , incorrectly suggests it's due missing jar.) ended being error property hive.metastore.client.connect.retry.delay, set 2s. hivecontext class tries parse integer, fails. change 2 , remove characters in hive.metastore.client.socket.timeout , hive.metastore.client.socket.lifetime.

note can more descriptive error calling sqlcontext._get_hive_ctx() directly.


Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -