scrapy

安装

sudo apt-get install gcc python-virutalenv python-dev libxml2-dev
libxslt-dev

pip install Scrapy

抓取

scrapy crawl douban_book
参数

scrapy crawl myspider -a category=electronics -a domain=system

Spiders receive arguments in their constructors:

class MySpider(BaseSpider):
    name = 'myspider'

    def __init__(self, category='', domain=None):
        self.start_urls = ['http://www.example.com/categories/%s' % category]
        self.domain = domain
    ...

selector

Last updated