python - Scrapy: ERROR: Spider error processing -
i new in python & scrapy. tried run existing code, got error on every address:
2015-07-02 01:52:19 [scrapy] debug: crawled (200) <get http://www.tripadvisor.com/showuserreviews-g187147-d197524-r281927613-hotel_mirific_opera-paris_ile_de_france.html> (referer: http://www.tripadvisor.com/hotel_review-g187147-d197524-reviews-hotel_mirific_opera-paris_ile_de_france.html)2015-07-02 01:52:19 [scrapy] error: spider error processing <get http://www.tripadvisor.com/showuserreviews-g187147-d197524-r281927613-hotel_mirific_opera-paris_ile_de_france.html> (referer: http://www.tripadvisor.com/hotel_review-g187147-d197524-reviews-hotel_mirific_opera-paris_ile_de_france.html)
traceback (most recent call last): file "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output x in result: file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 22, in return (_set_referer(r) r in result or ()) file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in return (r r in result or () if _filter(r)) file "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 54, in return (r r in result or () if _filter(r)) file "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/crawl.py", line 67, in _parse_response cb_res = callback(response, **cb_kwargs) or () file "/home/talmosko/documents/scrapy/tripadvisor/spiders/tripadvisor.py", line 30, in parse_item item['state'] = hxs.xpath('//*[@id="page"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')
indexerror: list index out of range
this code: http://pastebin.com/xzm5drdd
what problem? seems spide didnt answer..
thanks!
you trying access element doesn't exist, error in line
item['state'] = hxs.xpath('//*[@id="page"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')
problably
item['state'] = hxs.xpath('//*[@id="page"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()
is empty , trying access first element. have 2 options:
- modify selector return data, idea test scrapy shell
- try , catch indexerror
Comments
Post a Comment