1.

Can I Use Scrapy With Beautifulsoup?

Answer»

Yes, you can. As mentioned above, BEAUTIFULSOUP can be used for parsing HTML responses in Scrapy CALLBACKS. You just have to feed the response’s body into a BeautifulSoup object and extract whatever DATA you need from it.

Here’s an example spider using BeautifulSoup API, with lxml as the HTML parser:

from bs4 import BeautifulSoup

import scrapy

class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'http://www.example.com/',
)

def parse(self, response):
# USE lxml to get decent HTML parsing SPEED
soup = BeautifulSoup(response.text, 'lxml')
yield {
"url": response.url,
"title": soup.h1.string
}

Yes, you can. As mentioned above, BeautifulSoup can be used for parsing HTML responses in Scrapy callbacks. You just have to feed the response’s body into a BeautifulSoup object and extract whatever data you need from it.

Here’s an example spider using BeautifulSoup API, with lxml as the HTML parser:

from bs4 import BeautifulSoup

import scrapy

class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = (
'http://www.example.com/',
)

def parse(self, response):
# use lxml to get decent HTML parsing speed
soup = BeautifulSoup(response.text, 'lxml')
yield {
"url": response.url,
"title": soup.h1.string
}



Discussion

No Comment Found