python - Scrapy: ERROR: Spider error processing -

i new in python & scrapy. tried run existing code, got error on every address:

2015-07-02 01:52:19 [scrapy] debug: crawled (200) <get http://www.tripadvisor.com/showuserreviews-g187147-d197524-r281927613-hotel_mirific_opera-paris_ile_de_france.html> (referer: http://www.tripadvisor.com/hotel_review-g187147-d197524-reviews-hotel_mirific_opera-paris_ile_de_france.html)2015-07-02 01:52:19 [scrapy] error: spider error processing <get http://www.tripadvisor.com/showuserreviews-g187147-d197524-r281927613-hotel_mirific_opera-paris_ile_de_france.html> (referer: http://www.tripadvisor.com/hotel_review-g187147-d197524-reviews-hotel_mirific_opera-paris_ile_de_france.html)

traceback (most recent call last): file "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output x in result: file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 22, in return (_set_referer(r) r in result or ()) file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in return (r r in result or () if _filter(r)) file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 54, in return (r r in result or () if _filter(r)) file "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/crawl.py", line 67, in _parse_response cb_res = callback(response, **cb_kwargs) or () file "/home/talmosko/documents/scrapy/tripadvisor/spiders/tripadvisor.py", line 30, in parse_item item['state'] = hxs.xpath('//*[@id="page"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')

indexerror: list index out of range

this code: http://pastebin.com/xzm5drdd

what problem? seems spide didnt answer..

thanks!

you trying access element doesn't exist, error in line

item['state'] =  hxs.xpath('//*[@id="page"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')

problably

item['state'] =  hxs.xpath('//*[@id="page"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()

is empty , trying access first element. have 2 options:

modify selector return data, idea test scrapy shell
try , catch indexerror

Search This Blog

Mind Blowing Facts

python - Scrapy: ERROR: Spider error processing -

Comments

Post a Comment

Popular posts from this blog

java - Solr query version issue: Invalid version or the data in not in 'javabin' format -

Hard vs. Soft Water: What's The Difference?

The Ten Most Livable Cities In The World