Send Email Attachment from Redshift query (PostgreSQL) with Python
Back in early days of the pandemic I was once experiencing a rather small calamity where the incoming API data source to the data warehouse was corrupted due to external factors, hence I could not relied on the validity of the data and subsequently I had to inform the business users to temporarily stop using and looking at the downstream BI reporting system. Furthermore, the team had to prioritize fixing an issue in the production environment only if it was a business critical issue due to resource constraint as many employees in the team had been stood down, so I had to find another way on the creation of regular daily and weekly reports to my stakeholders with minimum manual efforts required. Below is what I have coded to automate the process to derive the query from Amazon Redshift (or other similar PostgreSQL based environment) and saved the query results to a csv or excel file onto a local or shared network drive and then afterwards send the final file to the recipient using SMTP. Although the solution below was a short term, it helped me immensely on focusing on other task at the time!
First, these are what you need for the python modules in order to be able to run the program.
Next, I setup my files path where I saved the csv files as a result of the Redshift query to a local or shared drive within your company VM or Virtual desktop for schedule provisioning if you are thinking to automate and scale-up the task on a longer term. For the purpose of this article I will use local drive as an example.
Then the next thing I did was to declare the connection string to Redshift environment using psycopg2 module and then create a function where I pass my parameters and query from the data warehouse. I have included an example of getting top 10 products on the past week as an example of the query generation.
The next part of the code is the actual function to access the query result I have saved to my local drive and send it to the required recipient. I am using 587 TLS ports connection on my SMTP session as my email domain was using office365, if you are using gmail for example then you should use port 465 for SSL connection.
The final part of the code is a module to call both of the functions I have created which is the Redshift query function and the send email function. I have also included Python Schedule library which let you runs Python functions (or any other callable) periodically at pre-determined intervals using a simple, human-friendly syntax.
And that’s it, simple data engineering project which at the time saved me a lot of time on had to manually refresh and re-saving my query. Hope this helps you out there (FYI I am not a data engineer, I am a data analyst :D so the code might not be the most sufficient).