Issue
I am trying to download all csv files from the following website: https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices . I have managed to do that with the following code:
from bs4 import BeautifulSoup
import requests
url = 'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
csv_links = ['https://emi.ea.govt.nz'+a['href'] for a in soup.select('td.csv a')]
contents = []
for i in csv_links:
req = requests.get(i)
csv_contents = req.content
s=str(csv_contents,'utf-8')
data = StringIO(s)
df=pd.read_csv(data)
contents.append(df)
final_price = pd.concat(contents)
If at all feasible, I'd like to streamline this process. The file on the website is modified every day, and I don't want to run the script every day to extract all of the files; instead, I simply want to extract files from Yesterday and append the existing files in my folder. And to achieve this, I need to scrape the Date Modified column along with the files URL. I'd be grateful if someone could tell me how to acquire the dates when the files were updated.
Solution
You can apply list comprehension technique
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices'
r = requests.get(url)
print(r)
soup = BeautifulSoup(r.text, 'html.parser')
links=[]
date=[]
csv_links = ['https://emi.ea.govt.nz'+a['href'] for a in soup.select('td[class="expand-column csv"] a')]
modified_date=[ date.text for date in soup.select('td[class="two"] a')[1:]]
links.extend(csv_links)
date.extend(modified_date)
df = pd.DataFrame(data=list(zip(links,date)),columns=['csv_links','modified_date'])
print(df)
Output:
csv_links modified_date
0 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 22 Mar 2022
1 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 22 Mar 2022
2 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 22 Mar 2022
3 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 22 Mar 2022
4 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 22 Mar 2022
.. ... ...
107 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 20 Dec 2021
108 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 20 Dec 2021
109 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 20 Dec 2021
110 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 20 Dec 2021
111 https://emi.ea.govt.nz/Wholesale/Datasets/Fina... 20 Dec 2021
[112 rows x 2 columns]
Answered By - F.Hoque
Answer Checked By - David Marino (JavaFixing Volunteer)