What’s The Best Way To Parse Big Xml/csv Data Feeds?

1.	What’s The Best Way To Parse Big Xml/csv Data Feeds?
Answer» Parsing big feeds with XPath SELECTORS can be problematic SINCE they need to build the DOM of the entire feed in memory, and this can be QUITE slow and consume a lot of memory. In ORDER to avoid parsing all the entire feed at once in memory, you can use the functions xmliterand csviter from scrapy.utils.iterators module. In fact, this is what the feed spiders use under the cover. Parsing big feeds with XPath selectors can be problematic since they need to build the DOM of the entire feed in memory, and this can be quite slow and consume a lot of memory. In order to avoid parsing all the entire feed at once in memory, you can use the functions xmliterand csviter from scrapy.utils.iterators module. In fact, this is what the feed spiders use under the cover.

Answer»

Parsing big feeds with XPath SELECTORS can be problematic SINCE they need to build the DOM of the entire feed in memory, and this can be QUITE slow and consume a lot of memory.

In ORDER to avoid parsing all the entire feed at once in memory, you can use the functions xmliterand csviter from scrapy.utils.iterators module. In fact, this is what the feed spiders use under the cover.

Parsing big feeds with XPath selectors can be problematic since they need to build the DOM of the entire feed in memory, and this can be quite slow and consume a lot of memory.

In order to avoid parsing all the entire feed at once in memory, you can use the functions xmliterand csviter from scrapy.utils.iterators module. In fact, this is what the feed spiders use under the cover.

Discussion

No Comment Found

Related InterviewSolutions

Should I Use Spider Arguments Or Settings To Configure My Spider?
How Can I Instruct A Spider To Stop Itself?
How Can I See The Cookies Being Sent And Received From Scrapy?
Does Scrapy Manage Cookies Automatically?
What’s The Best Way To Parse Big Xml/csv Data Feeds?
What’s This Huge Cryptic __viewstate Parameter Used In Some Forms?
Simplest Way To Dump All My Scraped Items Into A Json/csv/xml File?
Can I Call Pdb.set_trace() From My Spiders To Debug Them?
What Does The Response Status Code 999 Means?
Can I Return (twisted) Deferreds From Signal Handlers?