If you are an electrical engineer, and want to know how to use Python to get data from the Internet and display it, this post is for you.
(This is the third part of a four part series. By the end we’ll have written a Python script to display a chart of electricity market prices. Enter your email → on the right so that you don’t miss the final post.)
- All you need to analyse the electricity market pt 1
- All you need to analyse the electricity market pt 2
- All you need to analyse the electricity market pt 3
- All you need to analyse the electricity market final
Australian electricity prices are high – let’s analyse
|Very high electricity prices [Larger Size]|
Previously I mentioned that the Australian electricity prices have gone through the roof (more than doubling) since the introduction of the carbon tax.
This series of posts is exploring how to analyse market data accessible from the internet. The methods described can be adapted to your country’s data or any sort of data available on the internet.
We began the series with a post detailing how to obtain a CSV file that contains the latest electricity market prices.
Then we unzipped the price data CSV file that was downloaded in Part 1 and had a brief look at its contents.
Now we will teach you how to pull that CSV file apart using Python. You will master the ability to highlight the columns and rows that you are interested in. Just like top Japanese chefs are qualified to cut the good meat from the fugu fish, you too will learn to slice the good data from the CSV file.
The code we have developed over this series so far:
- downloads a zipped file;
- unzips it; and
- reads the file contents as a CSV.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The example CSV file downloaded earlier had over 30 columns of information and many thousands of rows. We’ll use Python to get exactly the columns and rows that we want.
Which rows are important? Knowing what is in the CSV file is paramount at this stage. To this end, the site that provides the data may also provide a specification of the file structure. Failing that, you may have to get intimate with the data and spend a bit of time working out the format for yourself.
For the data provided by the Australian electricity market operator, the first
CSV column is a label. Each label describes the purpose of that row. There are
D. Shown below is an example of the data stored
in the first column,
1 2 3 4 5 6 7 8 9 10 11
Rows marked with a
C are comment rows, they give further information about
the file itself but aren’t necessary for us to worry about.
Rows marked with an
I are header rows. The header row is just like a header
row you use in a normal Microsoft Excel spreadsheet, it indicates what data is
stored in that column. For our goal of finding the price of electricity in
different regions of Australia over time, the columns that we are looking for
SETTLEMENTDATE (date and time),
REGIONID (price region) and
(electricity price $/MWh).
Rows marked with a
D are the data rows. We’ll take these rows for the
RRP then plot them on a chart.
Multiple header rows in a CSV file?
Immediately though, we run into a problem. Notice in the
CSV file structure
figure shown above that there are multiple
I header rows? There are no less
than four in the CSV file we downloaded. You can think of it as four CSV files
crammed into a single file.
We are only interested in one of these four sections, the section with the
columns we mentioned before,
analysis of the file structure, we know that all the data rows we are
interested in all begin with:
Master technique one:
Here is how to update the program to print the rows beginning with
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
filter function will only return the rows for which the function
Master technique 2:
Having correctly isolated the rows that interest us, slice up those
columns and get the
RRP columns (column
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
map function will return a list of the result of
get_date_region_and_rrp called with each row in
halfhourly_data as an argument.
1 2 3 4
The above section of code using
map is the equivalent of this code using
1 2 3 4 5 6 7 8
map is an extremely versatile function. Those four lines of
for loop code
are replaced with one
map line of code.
And now only the good data remains
date_region_price variable will have these contents:
1 2 3 4 5
Only the columns that we are interested in. Nice work!
However there is still a small problem. All of the data values are still text.
get_date_region_and_rrp to convert the first column to a date,
keep the second as a string and the third to a floating point value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
Great, now our data is in a format that Python can understand and plot. This is
what is contained in the
date_region_price value now:
1 2 3 4 5
Perfect. We now have all the data formatted exactly the way we want it.
map to quickly and efficiently sort and slice the
CSV data. Just like a master Japanese chef, I’m sure that you will not poison
your patrons with bad slices of data.
map are advanced level
functions that are often used to replace
for loops. Please practice using
filter. Experiment with them so that you understand how they work!
In the next, and final, blog post of this series we’ll show you how to plot the results using matplotlib.