Python for Power Systems

A blog for power systems engineers to learn Python.

All You Need to Analyse the Electricity Market Final

If you are an electrical engineer, and want to know how to use Python to get data from the Internet and display it, this post is for you.

This is the final part of a four part series. By the end we’ll have written a Python script to display a chart of electricity market prices.

Australian electricity prices are high – let’s analyse

Very high electricity prices [Larger Size]

Previously I mentioned that the Australian electricity prices have gone through the roof (more than doubling) since the introduction of the carbon tax.

This series of posts is exploring how to analyse market data accessible from the internet. The methods described can be adapted to your country’s data or any sort of data available on the internet.

We began the series with a post detailing how to obtain a CSV file that contains the latest electricity market prices.

Then we unzipped the price data CSV file that was downloaded in Part 1 and had a brief look at its contents.

Then we pulled that CSV file apart using Python.

Now you’ll plot the prices and system demand using matplotlib.

The code we have developed over this series so far:

  1. Downloads a zipped file;
  2. Unzips it;
  3. Reads the file contents as a CSV; and
  4. Extracts the half hourly demand and price values.
converted_data.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from __future__ import with_statement
import csv
import datetime
from urllib2 import urlopen
from StringIO import StringIO
from zipfile import ZipFile

PRICE_REPORTS_URL = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'
ZIP_URL = '/PUBLIC_PRICES_201207040000_20120705040607.ZIP'

# zippedfile is now one long string.
zippedfile = urlopen(PRICE_REPORTS_URL + ZIP_URL).read()

# StringIO turns the string into a real file-like object.
opened_zipfile = ZipFile(StringIO(zippedfile))

# assuming there is only one CSV in the zipped file.
csv_filename = opened_zipfile.namelist()[0]

prices_csv_file = opened_zipfile.open(csv_filename)

prices_csv_reader = csv.reader(prices_csv_file)

def is_halfhourly_data(row):
    """Returns True if the given row starts with 'D', 'TREGION', '', '1'"""
    return row[:4] == ["D", "TREGION", "", "1"]

halfhourly_data = filter(is_halfhourly_data, prices_csv_reader)

def get_date_region_and_rrp(row):
    """
    Returns the SETTLEMENTDATE, REGION and RRP from the given
    PUBLIC_PRICES CSV data row.

    SETTLEMENTDATE is converted to a Python date (the time is discarded);
    REGION is left as a string; and
    RRP is converted to a floating point.
    """
    return (datetime.datetime.strptime(row[4], '%Y/%m/%d %H:%M:%S').date(),
            row[6],
            float(row[7]))

date_region_price = map(get_date_region_and_rrp, halfhourly_data)

The completed example

Here is the complete code to download and plot the electricity prices with Python. We’ll step through the most important parts and show you two of Python’s advanced features defaultdict and yield.

If you aren’t interested in the advanced Python code, you can skip to the end. The matplotlib code that creates the chart is very short and easy to follow.

final.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
from __future__ import with_statement
from collections import defaultdict
import csv
import datetime
from urllib2 import urlopen
from StringIO import StringIO
from zipfile import ZipFile

import matplotlib.pyplot as plt

PRICE_REPORTS_URL = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'
ZIP_URL = '/PUBLIC_PRICES_201207040000_20120705040607.ZIP'
REGIONS = ("QLD1", "NSW1", "VIC1", "SA1", "TAS1")

# zippedfile is now one long string.
try:
    zippedfile = open(ZIP_URL.replace('/', '')).read()
except IOError:
    zippedfile = urlopen(PRICE_REPORTS_URL + ZIP_URL).read()
    f = open(ZIP_URL.replace('/', ''), 'wb')
    f.write(zippedfile)

# StringIO turns the string into a real file-like object.
opened_zipfile = ZipFile(StringIO(zippedfile))

# assuming there is only one CSV in the zipped file.
csv_filename = opened_zipfile.namelist()[0]

prices_csv_file = opened_zipfile.open(csv_filename)

prices_csv_reader = csv.reader(prices_csv_file)

def is_halfhourly_data(row):
    """Returns True if the given row starts with 'D', 'TREGION', '', '1'"""
    return row[:4] == ["D", "TREGION", "", "1"]

halfhourly_data = filter(is_halfhourly_data, prices_csv_reader)

def get_date_region_and_rrp(row):
    """
    Returns the SETTLEMENTDATE, REGION and RRP from the given
    PUBLIC_PRICES CSV data row.

    SETTLEMENTDATE is converted to a Python date (the time is discarded);
    REGION is left as a string; and
    RRP is converted to a floating point.
    """
    return (datetime.datetime.strptime(row[4], '%Y/%m/%d %H:%M:%S'),
            row[6],
            float(row[7]))

prices = map(get_date_region_and_rrp, halfhourly_data)

def get_region_price(date_region_prices, regions):
    """
    returns the dates and prices in two columns grouped by region,
    suitable for plotting with matplotlib.

    the order of returned prices is the same order as the `regions`
    argument.

    Args:
      date_region_prices: a list of (date, region, price) tuples
       [(datetime(2012, 08, 09), "QLD1", 45.6),
        (datetime(2012, 08, 09, 1), "NSW1", 46.0)
         ...
       ]
      regions: A list of the regions to return.
    >>> get_region_price([(datetime(2012, 09, 09), "QLD1", 43.2),
                          (datetime(2012, 09, 09), "NSW1", 45.5),
                          (datetime(2012, 09, 10), "NSW1", 44.2),
                          ...],
                          ("NSW1", "QLD1"))

    [(datetime(2012, 09, 09), datetime(2012, 09, 10))
     (45.5, 44.2)],

    [(datetime(2012, 09, 09),)
     (43.2,)]

    ...

    """
    region_prices = defaultdict(list)
    for date, region, price in date_region_prices:
        region_prices[region  + 'd'].append(date)
        region_prices[region  + 'p'].append(price)

    for region in regions:
        yield region_prices[region+'d'], region_prices[region+'p']

figure = plt.figure()

for dates, prices in get_region_price(prices, REGIONS):
    plt.plot(dates, prices, '-')

plt.legend(REGIONS)
plt.grid()
plt.xlabel("Time of day")
plt.ylabel("Electricity Price A$/MWh")
figure.autofmt_xdate()

plt.show()

Grouping the regions together

I want to plot each of the five Australian regions’s prices as a separate series. But I don’t have the data organised into separate x axis and y axis values. Instead there is one long Python list that has all regions.

one long region list
1
2
3
4
5
6
7
8
9
[
('2012-01-01 00:00', 'NSW1', 34.5),
('2012-01-01 00:00', 'VIC1', 33.2),
('2012-01-01 00:00', 'SA1',  36.1),
('2012-01-01 00:00', 'TAS1', 38.2),
('2012-01-01 00:00', 'QLD1', 37.4),
('2012-01-01 00:30', 'NSW1', 34.6),
...
]

Here is a way to use defaultdict to collect the date and price per region. For example the 'NSW1' and 'VIC1' regions. The defaultdict (official docs) is just like a normal dictionary, except that it has one additional powerful feature:

`defaultdict` will auto-initialise a new value if you attempt to access a `key` that is missing.

Confused? Here is a concrete example. Grouping power station names by generator category (Nuclear, Wind,..) using a normal Python dict:

with a normal dict
1
2
3
4
5
6
7
8
9
10
11
12
13
14
gens = {}
gens['NUCLEAR'] = ['NUCLEAR-1', 'HILLVIEW-2', 'NUCLEAR-2']

# we can add a new nuclear generator name.
gens['NUCLEAR'].append('HILLVIEW-1')

# we can't add a new wind farm name - yet.
gens['WINDFARM'].append('WINDY-HILL-1')
# KeyError 'WINDFARM'!

# first must make a new empty wind farm list
gens['WINDFARM'] = []
# now this works.
gens['WINDFARM'].append('WINDY-HILL-1')

A KeyError exception will be raised on line 8, because 'WINDFARM' is a key that doesn’t exist in the gens dictionary yet. It isn’t until line 12 that the 'WINDFARM' key is entered into the dictionary and the first wind farm can be appended.

Here is the same code using defaultdict to initialise an empty list when there is a missing key. Notice that there is no need to create a key with an empty list before appending.

with defaultdict
1
2
3
4
5
6
7
8
9
10
from collections import defaultdict

gens = defaultdict(list)

# empty list created for 'NUCLEAR' and straight away we extend it.
gens['NUCLEAR'].extend(['NUCLEAR-1', 'HILLVIEW-2', 'NUCLEAR-2'])
gens['NUCLEAR'].append('HILLVIEW-1')

# empty list created for 'WINDFARM' and straight away we append.
gens['WINDFARM'].append('WINDY-HILL-1')

Having seen defaultdict take another look at this code section from final.py:

using default dict to group things
1
2
3
4
region_prices = defaultdict(list)
for date, region, price in date_region_prices:
    region_prices[region  + 'd'].append(date)
    region_prices[region  + 'p'].append(price)

It makes two lists for every region. The key to the first list region + 'd' would look like NSW1d or VIC1d. The key to the second list is region + 'p' and looks like NSW1p or VIC1p.

The d stands for date, our x axis and the p stands for price, our y axis. Its time to plot those x and y values.

Making your own iterator

Use the yield keyword in a function to turn that function into something that can be used in a forloop (an iterable).

1
2
    for region in regions:
        yield region_prices[region+'d'], region_prices[region+'p']

I used the yield keyword in the get_region_price function to return the date and price (x and y axis) pairs that were grouped using defaultdict. They are returned one region at a time in the forloop.

using a function with yield
1
2
for dates, prices in get_region_price(prices, REGIONS):
    plt.plot(dates, prices, '-')

yield will take some getting used to if you’ve never seen it before. Try working with this script on your computer so you can see what is happening:

yield demo
1
2
3
4
5
6
7
8
9
10
11
12
13
names_age = [("NUCLEAR1", 10), ("NUCLEAR2", 11), ("WINDYPEAK", 2)]

def generator_names(names_age):

  print '[generator_names] started the generator_names generator'

  for name, age in names_age:
    print '[generator_names] about to yield', name
    yield name

for name in generator_names(names_age):
  print 'I got name: ', name

yield output
1
2
3
4
5
6
7
[generator_names] started the generator_names generator
[generator_names] about to yield NUCLEAR1
I got name:  NUCLEAR1
[generator_names] about to yield NUCLEAR2
I got name:  NUCLEAR2
[generator_names] about to yield WINDYPEAK
I got name:  WINDYPEAK

Plotting professionally in only 8 lines of code

Plotting the dates and prices is very easy once you have them in two lists, x axis and y axis.

The plot commands are similar to Matlab plotting routines:

displaying a chart
1
2
3
4
5
6
7
8
9
10
11
12
figure = plt.figure()

for dates, prices in get_region_price(prices, REGIONS):
    plt.plot(dates, prices, '-')

plt.legend(REGIONS)
plt.grid()
plt.xlabel("Time of day")
plt.ylabel("Electricity Price A$/MWh")
figure.autofmt_xdate()

plt.show()

Here is the list of steps in the code above.

  1. Create an empty figure;
  2. Plot each region’s prices;
  3. Display a legend;
  4. Enable grid lines;
  5. Set the x-axis label;
  6. Set the y-axis label;
  7. Auto-rotate the x axis date labels; and
  8. Show the plot.

Conclusion

The final program is quite short, just 100 lines of code. But it covers such a wide range of tasks:

  • Downloading files from the internet;
  • Unzipping files;
  • Reading CSV files;
  • Sorting, transposing and filtering data; and
  • Displaying data on a chart.

The post may not be clear in certain areas, or you may want us to write about something in more detail, so tell us using the form below.

Fill out my online form.