Using the statistics module in PyPSA

Note

You can download this example as a Jupyter notebook or start it in interactive mode.

Using the statistics module in PyPSA#

The statistics module is used to easily extract information from your networks. This is useful when inspecting your solved networks and creating first visualizations of your results.

With the statistics module, you can look at different metrics of your network. A list of the implemented metrics are:

  • Capital expenditure

  • Operational expenditure

  • Installed capacities

  • Optimal capacities

  • Supply

  • Withdrawal

  • Curtailment

  • Capacity Factor

  • Revenue

  • Market value

  • Energy balance

Now lets look at an example.

[1]:
import matplotlib.pyplot as plt
import numpy as np

import pypsa
ERROR 1: PROJ: proj_create_from_database: Open of /home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/latest/share/proj failed

First, we open an example network we want to investigate.

[2]:
n = pypsa.examples.scigrid_de()
WARNING:pypsa.io:Importing network from PyPSA version v0.17.1 while current version is v0.28.1. Read the release notes at https://pypsa.readthedocs.io/en/latest/release_notes.html to prepare your network for import.
INFO:pypsa.io:Imported network scigrid-de.nc has buses, generators, lines, loads, storage_units, transformers

Lets run an overview of all statistics by calling:

[3]:
n.statistics().dropna()
[3]:
Optimal Capacity Installed Capacity Supply Withdrawal Energy Balance Transmission Capacity Factor Curtailment Capital Expenditure Operational Expenditure Revenue Market Value
Generator Brown Coal 0.0 20879.500000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Gas 0.0 23913.130000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Geothermal 0.0 31.700000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Hard Coal 0.0 25312.600000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Multiple 0.0 152.700000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nuclear 0.0 12068.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Oil 0.0 2710.200000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Other 0.0 3027.800000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Run of River 0.0 3999.100000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Solar 0.0 37041.524779 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Storage Hydro 0.0 1445.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Waste 0.0 1645.900000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Wind Offshore 0.0 2973.500000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Wind Onshore 0.0 37339.895329 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Line - 0.0 961101.136714 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
StorageUnit Pumped Hydro 0.0 9179.500000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Transformer - 0.0 192000.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

So far the statistics are not so interesting, because we have not solved the network yet. We can only see that the network already has some installed capacities for different components.

You can see that statistics returns a pandas.DataFrame. The MultiIndex of the DataFrame provides the name of the network component (i.e. first entry of the MultiIndex, like Generator, Line,…) on the first index level. The carrier index level provides the carrier name of the given component. For example, in n.generators, we have the carriers Brown Coal, Gas and so on.

Now lets solve the network.

[4]:
n.optimize(n.snapshots[:4])
WARNING:pypsa.consistency:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
       '32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
       '87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
       '120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
       '159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
       '233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
       '267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
       '315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
       '362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
       '458'],
      dtype='object', name='Transformer')
WARNING:pypsa.consistency:The following lines have carriers which are not defined:
Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10',
       ...
       '850', '851', '852', '853', '854', '855', '856', '857', '858', '859'],
      dtype='object', name='Line', length=852)
WARNING:pypsa.consistency:The following generators have carriers which are not defined:
Index(['1 Gas', '1 Hard Coal', '102 Gas', '108 Run of River', '108 Waste',
       '111 Gas', '112 Gas', '112 Run of River', '114 Hard Coal',
       '115 Brown Coal',
       ...
       '382_220kV Solar', '384_220kV Solar', '385_220kV Solar',
       '391_220kV Solar', '403_220kV Solar', '404_220kV Solar',
       '413_220kV Solar', '421_220kV Solar', '450_220kV Solar',
       '458_220kV Solar'],
      dtype='object', name='Generator', length=1423)
WARNING:pypsa.consistency:The following storage_units have carriers which are not defined:
Index(['100_220kV Pumped Hydro', '114 Pumped Hydro', '121 Pumped Hydro',
       '140 Pumped Hydro', '141 Pumped Hydro', '158 Pumped Hydro',
       '166 Pumped Hydro', '205 Pumped Hydro', '228 Pumped Hydro',
       '235 Pumped Hydro', '25 Pumped Hydro', '266 Pumped Hydro',
       '268 Pumped Hydro', '279_220kV Pumped Hydro', '293 Pumped Hydro',
       '320 Pumped Hydro', '32_220kV Pumped Hydro', '333 Pumped Hydro',
       '342 Pumped Hydro', '343 Pumped Hydro', '362_220kV Pumped Hydro',
       '368 Pumped Hydro', '374 Pumped Hydro', '379 Pumped Hydro',
       '389 Pumped Hydro', '397 Pumped Hydro', '423 Pumped Hydro',
       '424 Pumped Hydro', '479 Pumped Hydro', '481 Pumped Hydro',
       '493 Pumped Hydro', '51 Pumped Hydro', '63 Pumped Hydro',
       '66 Pumped Hydro', '72 Pumped Hydro', '78_220kV Pumped Hydro',
       '8 Pumped Hydro', '98 Pumped Hydro'],
      dtype='object', name='StorageUnit')
WARNING:pypsa.consistency:The following buses have carriers which are not defined:
Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10',
       ...
       '382_220kV', '384_220kV', '385_220kV', '391_220kV', '403_220kV',
       '404_220kV', '413_220kV', '421_220kV', '450_220kV', '458_220kV'],
      dtype='object', name='Bus', length=585)
WARNING:pypsa.consistency:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
       '32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
       '87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
       '120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
       '159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
       '233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
       '267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
       '315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
       '362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
       '458'],
      dtype='object', name='Transformer')
WARNING:pypsa.consistency:The following lines have carriers which are not defined:
Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10',
       ...
       '850', '851', '852', '853', '854', '855', '856', '857', '858', '859'],
      dtype='object', name='Line', length=852)
WARNING:pypsa.consistency:The following generators have carriers which are not defined:
Index(['1 Gas', '1 Hard Coal', '102 Gas', '108 Run of River', '108 Waste',
       '111 Gas', '112 Gas', '112 Run of River', '114 Hard Coal',
       '115 Brown Coal',
       ...
       '382_220kV Solar', '384_220kV Solar', '385_220kV Solar',
       '391_220kV Solar', '403_220kV Solar', '404_220kV Solar',
       '413_220kV Solar', '421_220kV Solar', '450_220kV Solar',
       '458_220kV Solar'],
      dtype='object', name='Generator', length=1423)
WARNING:pypsa.consistency:The following storage_units have carriers which are not defined:
Index(['100_220kV Pumped Hydro', '114 Pumped Hydro', '121 Pumped Hydro',
       '140 Pumped Hydro', '141 Pumped Hydro', '158 Pumped Hydro',
       '166 Pumped Hydro', '205 Pumped Hydro', '228 Pumped Hydro',
       '235 Pumped Hydro', '25 Pumped Hydro', '266 Pumped Hydro',
       '268 Pumped Hydro', '279_220kV Pumped Hydro', '293 Pumped Hydro',
       '320 Pumped Hydro', '32_220kV Pumped Hydro', '333 Pumped Hydro',
       '342 Pumped Hydro', '343 Pumped Hydro', '362_220kV Pumped Hydro',
       '368 Pumped Hydro', '374 Pumped Hydro', '379 Pumped Hydro',
       '389 Pumped Hydro', '397 Pumped Hydro', '423 Pumped Hydro',
       '424 Pumped Hydro', '479 Pumped Hydro', '481 Pumped Hydro',
       '493 Pumped Hydro', '51 Pumped Hydro', '63 Pumped Hydro',
       '66 Pumped Hydro', '72 Pumped Hydro', '78_220kV Pumped Hydro',
       '8 Pumped Hydro', '98 Pumped Hydro'],
      dtype='object', name='StorageUnit')
WARNING:pypsa.consistency:The following buses have carriers which are not defined:
Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10',
       ...
       '382_220kV', '384_220kV', '385_220kV', '391_220kV', '403_220kV',
       '404_220kV', '413_220kV', '421_220kV', '450_220kV', '458_220kV'],
      dtype='object', name='Bus', length=585)
INFO:linopy.model: Solve problem using Highs solver
INFO:linopy.io: Writing time: 0.2s
INFO:linopy.solvers:Log file at /tmp/highs.log
Running HiGHS 1.7.1 (git hash: 0c240d8): Copyright (c) 2024 HiGHS under MIT licence terms
Coefficient ranges:
INFO:linopy.constants: Optimization successful:
Status: ok
Termination condition: optimal
Solution: 9940 primals, 23828 duals
Objective: 1.04e+06
Solver model: available
Solver message: optimal

INFO:pypsa.optimization.optimize:The shadow-prices of the constraints Generator-fix-p-lower, Generator-fix-p-upper, Line-fix-s-lower, Line-fix-s-upper, Transformer-fix-s-lower, Transformer-fix-s-upper, StorageUnit-fix-p_dispatch-lower, StorageUnit-fix-p_dispatch-upper, StorageUnit-fix-p_store-lower, StorageUnit-fix-p_store-upper, StorageUnit-fix-state_of_charge-lower, StorageUnit-fix-state_of_charge-upper, Kirchhoff-Voltage-Law, StorageUnit-energy_balance were not assigned to the network.
  Matrix [1e-02, 2e+02]
  Cost   [3e+00, 1e+02]
  Bound  [0e+00, 0e+00]
  RHS    [7e-08, 7e+03]
Presolving model
3039 rows, 6678 cols, 17943 nonzeros  0s
2265 rows, 5896 cols, 16781 nonzeros  0s
2069 rows, 5078 cols, 15895 nonzeros  0s
Presolve : Reductions: rows 2069(-21759); columns 5078(-4862); elements 15895(-27623)
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 0s
       2582     1.0389598073e+06 Pr: 0(0); Du: 0(2.96473e-12) 0s
Solving the original LP from the solution after postsolve
Model   status      : Optimal
Simplex   iterations: 2582
Objective value     :  1.0389598073e+06
HiGHS run time      :          0.31
[4]:
('ok', 'optimal')

Now we can look at the statistics of the solved network.

[5]:
n.statistics().round(1)
[5]:
Optimal Capacity Installed Capacity Supply Withdrawal Energy Balance Transmission Capacity Factor Curtailment Capital Expenditure Operational Expenditure Revenue Market Value
Generator Brown Coal 20879.5 20879.5 42859.0 0.0 42859.0 0.0 0.1 0.0 0.0 428590.4 588266.6 13.7
Gas 23913.1 23913.1 79.4 0.0 79.4 0.0 0.0 0.0 0.0 3971.9 3971.9 50.0
Geothermal 31.7 31.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Hard Coal 25312.6 25312.6 10535.7 0.0 10535.7 0.0 0.0 0.0 0.0 263391.6 272172.0 25.8
Multiple 152.7 152.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nuclear 12068.0 12068.0 31473.0 0.0 31473.0 0.0 0.1 0.0 0.0 251783.9 527108.2 16.7
Oil 2710.2 2710.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Other 3027.8 3027.8 336.0 0.0 336.0 0.0 0.0 0.0 0.0 10752.0 11503.2 34.2
Run of River 3999.1 3999.1 13723.3 0.0 13723.3 0.0 0.1 0.0 0.0 41170.0 301812.3 22.0
Solar 37041.5 37041.5 0.0 0.0 0.0 0.0 0.0 47391.0 0.0 0.0 0.0 0.0
Storage Hydro 1445.0 1445.0 3580.0 0.0 3580.0 0.0 0.1 0.0 0.0 10740.0 72904.4 20.4
Waste 1645.9 1645.9 4760.0 0.0 4760.0 0.0 0.1 0.0 0.0 28560.0 97621.8 20.5
Wind Offshore 2973.5 2973.5 5197.6 0.0 5197.6 0.0 0.1 64276.1 0.0 0.0 14034.2 2.7
Wind Onshore 37339.9 37339.9 83734.8 0.0 83734.8 0.0 0.1 389705.4 0.0 0.0 406030.6 4.8
Line AC 961101.1 961101.1 840871.5 840871.5 0.0 -16691.9 0.0 0.0 0.0 0.0 1045149.1 1.2
Load - 0.0 0.0 0.0 195357.1 -195357.1 0.0 NaN 0.0 0.0 0.0 -3396077.6 NaN
StorageUnit Pumped Hydro 9179.5 9179.5 0.0 921.7 -921.7 0.0 0.0 0.0 0.0 0.0 4013.3 NaN
Transformer - 192000.0 192000.0 103545.5 103545.5 0.0 13422.4 0.0 0.0 0.0 0.0 51489.9 0.5

As you can see there is now much more information available. There are still no capital expenditures in the network, because we only performed an operational optimization with this example network.

If you are interested in a specific metric, e.g. curtailment, you can run

[6]:
curtailment = n.statistics.curtailment()
curtailment[curtailment != 0]
[6]:
component  carrier
Generator  Solar             47391.043207
           Wind Offshore     64276.060744
           Wind Onshore     389705.444529
dtype: float64

Note that when calling a specific metric the statistics module returns a pandas.Series. To find the unit of the data returned by statistics, you can call attrs on the DataFrame or Series.

[7]:
curtailment.attrs
[7]:
{'name': 'Curtailment', 'unit': 'MWh'}

So the unit of curtailment is given in MWh. You can also customize your request.

For this you have various options: 1. You can select the component from which you want to get the metric with the attribute comps. Careful, comps has to be a list of strings.

[8]:
n.statistics.supply(comps=["Generator"])
[8]:
component  carrier
Generator  Brown Coal       42859.043049
           Gas                 79.438015
           Hard Coal        10535.663101
           Nuclear          31472.992981
           Other              336.000000
           Run of River     13723.318246
           Storage Hydro     3580.000000
           Waste             4760.000000
           Wind Offshore     5197.588558
           Wind Onshore     83734.810140
Name: generators, dtype: float64
  1. For metrics which have a time dimension, you can choose the aggregation method or decide to not aggregate them at all. Just use the aggregate_time attribute to specify what you want to do.

For example calculate the mean supply/generation per time step is

[9]:
n.statistics.supply(comps=["Generator"], aggregate_time="mean")
[9]:
component  carrier
Generator  Brown Coal       1785.793460
           Gas                 3.309917
           Hard Coal         438.985963
           Nuclear          1311.374708
           Other              14.000000
           Run of River      571.804927
           Storage Hydro     149.166667
           Waste             198.333333
           Wind Offshore     216.566190
           Wind Onshore     3488.950423
Name: generators, dtype: float64

Or retrieve the supply time series by not aggregating the time series.

[10]:
n.statistics.supply(comps=["Generator"], aggregate_time=False).iloc[:, :4]
[10]:
snapshot 2011-01-01 00:00:00 2011-01-01 01:00:00 2011-01-01 02:00:00 2011-01-01 03:00:00
component carrier
Generator Brown Coal 12563.905418 10848.565700 10006.392573 9440.179359
Gas 35.826849 23.721730 13.082968 6.806467
Geothermal NaN NaN NaN NaN
Hard Coal 4778.482386 3124.546674 1722.868548 909.765493
Multiple NaN NaN NaN NaN
Nuclear 7842.041331 7863.184880 7882.532030 7885.234740
Oil NaN NaN NaN NaN
Other 84.000000 84.000000 84.000000 84.000000
Run of River 3458.909012 3419.332843 3413.465753 3431.610638
Solar NaN NaN NaN NaN
Storage Hydro 895.000000 895.000000 895.000000 895.000000
Waste 1167.500000 1240.500000 1184.500000 1167.500000
Wind Offshore 2259.649281 980.561669 979.115422 978.262187
Wind Onshore 18933.199813 21284.006505 21640.382705 21877.221117
  1. You can choose how you want to group the components of the network and how to aggregate the groups. By default the components are grouped by their carriers and summed. However, you can change this by providing different groupby and aggregate_groups attributes.

[11]:
n.statistics.supply(comps=["Generator"], groupby=["bus"], aggregate_groups="max")
[11]:
component  bus
Generator  101          1392.590187
           102           970.749996
           103             0.127076
           104_220kV       0.896977
           105_220kV      76.955791
                           ...
           94_220kV      164.614902
           95_220kV      186.429288
           96_220kV       53.032760
           97              0.002514
           98            335.603727
Name: generators, Length: 434, dtype: float64

Now you obtained the maximal supply in one time step for every bus in the network.

Often it is better when inspecting your network to visualize the tables. Therefore, you can easily make plots to analyze your results. For example the supply of the generators.

[12]:
n.statistics.supply(comps=["Generator"]).droplevel(0).div(1e3).plot.bar(
    title="Generator in GWh"
)
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/latest/lib/python3.12/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/latest/lib/python3.12/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/latest/lib/python3.12/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(
[12]:
<Axes: title={'center': 'Generator in GWh'}, xlabel='carrier'>
../_images/examples_statistics_23_2.png

Or you could plot the generation time series of the generators.

[13]:
fig, ax = plt.subplots()
n.statistics.supply(comps=["Generator"], aggregate_time=False).droplevel(0).iloc[
    :, :4
].div(1e3).T.plot.area(
    title="Generation in GW",
    ax=ax,
    legend=False,
    linewidth=0,
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[13]:
<matplotlib.legend.Legend at 0x7f9c29213d70>
../_images/examples_statistics_25_1.png

Finally, we want to look at the energy balance of the network. The energy balance is not included in the overview of the statistics module. To calculate the energy balance, you can do

[14]:
n.statistics.energy_balance()
[14]:
component    carrier        bus_carrier
Load         -              AC            -195357.120000
Generator    Brown Coal     AC              42859.043049
             Gas            AC                 79.438015
             Hard Coal      AC              10535.663101
             Nuclear        AC              31472.992981
             Other          AC                336.000000
             Run of River   AC              13723.318246
             Storage Hydro  AC               3580.000000
             Waste          AC               4760.000000
             Wind Offshore  AC               5197.588558
             Wind Onshore   AC              83734.810140
StorageUnit  Pumped Hydro   AC               -921.734091
dtype: float64

Note that there is now an additional index level called bus carrier. This is because an energy balance is defined for every bus carrier. The bus carriers you have in your network you can find by looking at n.buses.carrier.unique(). For this network, there is only one bus carrier which is AC and corresponds to electricity. However, you can have further bus carriers for example when you have a sector coupled network. You could have heat or CO \(_2\) as bus carrier. Therefore, for many statistics functions you have to be careful about the units of the values and it is not always given by the attr object of the DataFrame or Series.

Finally, we want to plot the energy balance and the energy balance time series for electrcity which has the bus carrier AC. In a sector coupled network, you could also choose other bus carriers like H2 or heat. Note that in this example “-” represents the load in the system.

[15]:
fig, ax = plt.subplots()
n.statistics.energy_balance().loc[:, :, "AC"].groupby(
    "carrier"
).sum().to_frame().T.plot.bar(stacked=True, ax=ax, title="Energy Balance")
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[15]:
<matplotlib.legend.Legend at 0x7f9c295d54c0>
../_images/examples_statistics_29_1.png
[16]:
fig, ax = plt.subplots()
n.statistics.energy_balance(aggregate_time=False).loc[:, :, "AC"].droplevel(0).iloc[
    :, :4
].groupby("carrier").sum().where(lambda x: np.abs(x) > 1).fillna(0).T.plot.area(
    ax=ax, title="Energy Balance Timeseries"
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[16]:
<matplotlib.legend.Legend at 0x7f9c2961e7e0>
../_images/examples_statistics_30_1.png