python_scrapy_filespipe重写

2023-11-09 09:29•python•阅读 610

主要原因：需要下载文件并保留原有后缀名，但scrapy的下载管道没有这个选项，需要重新定义filespipelines功能，参考其他人的文件，

import time
from urllib import parse
from scrapy.pipelines.files import FilesPipeline
class FileRenamePipeline(FilesPipeline):
    def file_path(self, request, response=None, info=None):
        print('_'*100)
        timest = str(int(time.time()*1000))
        name = parse.unquote(parse.unquote(request.url).split(';')[1]).split('"')[1]
        if '.' in name:
            file_name = name.split('.')[0] + '_' + timest + '.' + name.split('.')[1]
        else:
            file_name = name + '_' + timest
        return 'full/' + file_name

    custom_settings = {
        'ITEM_PIPELINES':{
            'spider_dataPlat.pipelines.FileRenamePipeline':2,
                          },
        'FILES_STORE':'E:\下载', # 文件下载路径
    }

        items = SpiderFileItem()
        items['file_urls'] = [final_url]
        items['files'] = name.split('.')[0]
        yield items

上一篇 »搜索引擎–基于Django/Scrapy/ElasticSearch的搜索引擎的实现
下一篇 »python3 scrapy 爬取腾讯招聘

python_scrapy_filespipe重写

相关推荐

python ----Linux上安装scrapy

Python - 重写不可变的字符串，字符串的修改

python应用：爬虫框架Scrapy系统学习第一篇——xpath详解

Centos 安装Python Scrapy PhantomJS

Scrapy学习-2-xpath&css使用

Scrapy css选择器提取数据

scrapy定制爬虫-爬取javascript——乾颐堂

php--重写override