爬虫中seeting中的设置

大家好，欢迎来到IT知识分享网。

一、setting 自动生成的内容含义

–– coding: utf-8 ––

Scrapy settings for taoCarTest project

For simplicity, this file contains only settings considered important or

commonly used. You can find more settings consulting the documentation:

https://doc.scrapy.org/en/latest/topics/settings.html

https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

https://doc.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = ‘taoCarTest’

Crawl responsibly by identifying yourself (and your website) on the user-agent

#USER_AGENT = ‘taoCarTest (+http://www.yourdomain.com)’

Obey robots.txt rules

‘’‘Scrapy downloader 并发请求(concurrent requests)的最大值’’’

Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

Configure a delay for requests for the same website (default: 0)

See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay

The download delay setting will honor only one of:

Disable cookies (enabled by default)

Disable Telnet Console (enabled by default)

Override the default request headers:

‘Accept-Language’: ‘en’,

}

Enable or disable spider middlewares

See https://doc.scrapy.org/en/latest/topics/spider-middleware.html

‘taoCarTest.middlewares.TaocartestSpiderMiddleware’: 543,

Enable or disable downloader middlewares

See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

‘taoCarTest.middlewares.TaocartestDownloaderMiddleware’: 543,

Enable or disable extensions

See https://doc.scrapy.org/en/latest/topics/extensions.html

#EXTENSIONS = {

‘scrapy.extensions.telnet.TelnetConsole’: None,

Configure item pipelines

这里如果一个项目多个spiders的时候，每次运行的时候每次要在这里制定一个对应的pipeline

See https://doc.scrapy.org/en/latest/topics/item-pipeline.html

Enable and configure the AutoThrottle extension (disabled by default)

See https://doc.scrapy.org/en/latest/topics/autothrottle.html

The initial download delay

The maximum download delay to be set in case of high latencies

The average number of requests Scrapy should be sending in parallel to

each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

Enable showing throttling stats for every response received:

Enable and configure HTTP caching (disabled by default)

See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#一下内容为默认setting.py文件没有的字段，但是可以自己添加

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/121359.html