Business Network California
Companies:114,335
Products and Services:4,140
Articles and publications:1,586
Tenders & Vacancies:494

How to Scrape Amazon Product Data Using Python ??
Information is out of date

12/10/2019
This Tutorial will explain you how we can extract product data from amazon. com using python amazon scraper

This Tutorial will explain you how we can extract product data from amazon.com using python amazon scraper

Have you heard about amazon Data Scraping ?? It is a way to Scrape amazon Products data from amazon.com by Automated way using python.

amazon data scraper provides updated product information along with changing prices, reviews ,and more..

We can perform amazon data scraping and Extract Following Data using python amazon data scraper.

  • Product title
  • URL
  • ASIN
  • UPC
  • Item Model Number
  • No Of Reviews
  • Sales Rank Final
  • No Of Ratings
  • product Description
  • Product Dimensions
  • Best Seller Rank
  • Shipping Weight
  • Category
  • Price

Screen shot of product page from we can extract data.

Inspecting element for data extractions from amazon.com

To find appropriate data from website first we have to  inspecting and understanding html tag  which is associated with given data ..

please follow below steps to finding tags

  • Open browser (Google Chrome , Mozilla )
  • Copy and paste url you want to scrape.
  • Press F12 to view HTML structure of given site.,
  • find html tags for  require data and implement in python coding

Here we have explained for finding price tag how can we find it , like this  other tags can easily find…

How to set up your computer for web scraper development

We will use Python 3 for this tutorial. The code will not run if you are using Python 2.7. To start, you need a computer with Python 3 and PIP installed in it.

Let’s check your python version. Open a terminal ( in Linux and Mac OS ) or Command Prompt ( on Windows ) and type

python –version

and press enter. If the output looks something like Python 3.x.x, you have Python 3 installed. If it says Python 2.x.x you have Python 2. If it prints an error, you don’t probably have python installed.

If you don’t have Python 3, install it first.

Install Python 3 and Pip

Linux – https://www.python.org/downloads/source/

Mac Users can follow this guide – https://www.python.org/downloads/mac-osx/

Windows Users go here – https://www.python.org/downloads/windows/

For PIP installation   visit this link  – https://www.liquidweb.com/kb/install-pip-windows/

Install Packages

  • Python Requests, to make requests and download the HTML content of the pages ( http://docs.python-requests.org/en/master/user/install/).
  • Python LXML, for parsing the HTML Tree Structure using Xpath (Learn how to install that here – http://lxml.de/installation.html)

Python Code to Scrape amazon.com

import requests
from lxml import html
import requests.packages.urllib3.exceptions
import json
from urllib3.exceptions import InsecureRequestWarning
import urllib3
from lxml import etree

# below code send http get request to yellowpages.com
# return content in form of string
# lib Refernce
# 1 :- request

def getRequest(url):
headers = {‘Accept’‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8’,
‘Accept-Encoding’‘gzip, deflate, br’,
‘Accept-Language’‘en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7’,
‘User-Agent’‘Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36’
}
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
response = requests.get(url, verify=False, headers=headers)
return response.text

# This method is use to parse data from string
# Return object with data
# lib Refrence
# 1 :- lxml
# 2 : json

def parseData(strHtml):
parser = html.fromstring(strHtml)

# Parse Data Using
URL = parser.xpath(‘//link[@rel=”canonical”]’)[0].attrib[‘href’]
ASIN = parser.xpath(‘//div[@id=”cerberus-data-metrics”]’)[0].attrib[‘data-asin’]
nodes = parser.xpath(‘//div[@class=”content”]/ul/li’)
for node in nodes:
if ‘UPC’ in .join(node.itertext()):
UPC = .join(node.itertext()).replace(‘UPC:’).strip()
for node in nodes:
if ‘Item model number’ in .join(node.itertext()):
ItemModelNumber = .join(node.itertext()).replace(
‘Item model number:’).strip()
for node in nodes:
if ‘Average Customer Review’ in .join(node.itertext()):
NoofReviews = .join(node.itertext()).replace(
‘Average Customer Review:’).strip()

SalesrankFinal = .join(parser.xpath(‘//ul[@class=”zg_hrsr”]’)[0].itertext()).strip()
NoofRatings = parser.xpath(‘//span[@id =”acrPopover”]/span[1]/a/i[1]/span’)[0].text.strip()
productDescription = .join(parser.xpath(‘//div[@id=”productDescription”]’)[0].itertext()).strip()
for node in nodes:
if ‘Product Dimensions’ in .join(node.itertext()):
Productdimensions = .join(node.itertext()).replace(
‘Product Dimensions:’).strip()
#Availability = parser.xpath(‘//span[@id=”availability”]’)[0].text.strip()
BSR = parser.xpath(‘//li[@id=”SalesRank”]/text()’)[1].strip()
for node in nodes:
if ‘Shipping Weight’ in .join(node.itertext()):
ShippingWeight = .join(node.itertext()).replace(‘Shipping Weight:’).strip()
Category = parser.xpath(‘//span[@id=”productTitle”]’)[0].text.strip()
Price = parser.xpath(‘//span[@class=”a-color-price”]’)[0].text.strip()
product_title = parser.xpath(‘//span[@id=”productTitle”]’)[0].text.strip()
#Shippingcost = parser.xpath(‘//span[@id=”ourprice_shippingmessage”]/span’)[0].text.strip()

return {
‘URL’: URL,
‘ASIN’: ASIN,
‘UPC’: UPC,
‘Item Model Number’: ItemModelNumber,
‘No Of Reviews’: NoofReviews,
‘Sales Rank Final’: SalesrankFinal,
‘No Of Ratings’: NoofRatings,
‘product Description’: productDescription,
‘Product Dimensions’: Productdimensions,
#’Availability’: Availability,
‘Best Seller Rank’: BSR,
‘Shipping Weight’: ShippingWeight,
‘Category’: Category,
‘Price’: Price,
‘product_title’: product_title
#’Shippingcost’: Shippingcost
}

if __name__ == “__main__”:
print(‘Scraping Data from yellow Pages’)
url = ‘https://www.amazon.com/Natrol-5-HTP-Release-Tablets-200mg/dp/B001HCHGPC/’
print(‘Url :- ‘ + url)
strHtml = getRequest(url)
result = parseData(strHtml)
print(result)

Above code is developed for Python 3.X  .. Run in any IDE  like PyCharm , sublime text etc…  We got here json file , we can also extract these data into sql database , export in CSV , Excel with modification in coding..

Here  using lxml library  data is extracted , you can do using beautifulsoup 4 also we can extract data from any website..

Run above code in any IDE of python and you will get result in JSON..  for test you can use another url from eBay..

Clarification :- This  code available in this tutorial is  only learning purpose . We are not responsible for how it is used and assume no liability for any detrimental usage of the source code. This code is only  use for knowledge expansion regarding programming field.. by this tutorial we are not encourage eBay scraping or web scraping but will help to understand scraping.. also we are not responsible to provide any support for this code .. user can modify for learning purpose..

view all (8)

Other articles and publications:

Bigbasket Product Scraping Bigbasket Product Scraping
Information has not been updated for a long time
Bigabasket product scraping – Infovium Grab product data & Expand your Ecommerce Business India Infovium web scraping services can provide you all product information.
2/12/2021
How Data Scraping Companies Helps In Business Marketing? How Data Scraping Companies Helps In Business Marketing?
Information has not been updated for a long time
What is Data Scraping? Data scraping is a technology that extract data – in large quantities and well-organized formats – from websites by automated manner.
2/5/2021
How Scraping Technology Makes Sentiment Analysis Easy? How Scraping Technology Makes Sentiment Analysis Easy?
Information has not been updated for a long time
Twitter, Linkedin, Reddit, Forum, etc. are also other options from where data extraction is possible for analysis. There are a lot of data on social media that business owners who show concern.
2/22/2021
How to make competitor analysis in case of Ecommerce business like Amazon. Know about how scraping and data extraction can help by providing automated way.
9/25/2020
This article contains information about How to scrape Twitter followers and data within short span of time with automated task. Know about scraping solutions angd Twitter scraping tool.
12/10/2019
Know about Automated web scraping services and its benefits
11/14/2019

Articles and publications of other companies:

How to Scrape TripAdvisor Hotels Data using Python | TripAdvisor scraping
Information has not been updated for a long time
Https://www. worthwebscraping. com/how-to-scrape-tripadvisor-hotels-data-using-python/
12/9/2020
How to Scrape Tweets from Twitter using Python - Worth
Information has not been updated for a long time
Https://www. worthwebscraping. com/how-to-scrape-tweets-from-twitter-using-python/
11/30/2020
Glassdoor Scraping – Scrape Glassdoor Job Postings using Python
Information has not been updated for a long time
Https://www. worthwebscraping. com/glassdoor-scraping-scrape-glassdoor-job-postings-using-python/
12/9/2020
Scrape Amazon Seller Prices & Compete For Your Business in Market Scrape Amazon Seller Prices & Compete For Your Business in Market
Information has not been updated for a long time
This online retailer has a large data that is vital to online product sellers and other smaller retailers. This post explains how Scrape Amazon Seller Prices can help you to compete in the online
1/8/2021
Amazon product scraping is fruitful way for customer reviews and product feedback analysis
10/15/2020
Amazon scraping - How to Scrape Data from Amazon using Python
Information has not been updated for a long time
Https://www. worthwebscraping. com/how-to-scrape-data-from-amazon-using-python-web-scraping/
11/30/2020
Business details
We are offering Data Extraction services from various all types of websites at low industry rate.
×