Note

You can download this example as a Jupyter notebook or start it in interactive mode.

Using the statistics module in PyPSA#

The statistics module is used to easily extract information from your networks. This is useful when inspecting your solved networks and creating first visualizations of your results.

With the statistics module, you can look at different metrics of your network. A list of the implemented metrics are:

  • Capital expenditure

  • Operational expenditure

  • Installed capacities

  • Optimal capacities

  • Supply

  • Withdrawal

  • Curtailment

  • Capacity Factor

  • Revenue

  • Market value

  • Energy balance

Now lets look at an example.

[1]:
import pypsa
import matplotlib.pyplot as plt

First, we open an example network we want to investigate.

[2]:
n = pypsa.examples.scigrid_de()
WARNING:pypsa.io:Importing network from PyPSA version v0.17.1 while current version is v0.25.1. Read the release notes at https://pypsa.readthedocs.io/en/latest/release_notes.html to prepare your network for import.
INFO:pypsa.io:Imported network scigrid-de.nc has buses, generators, lines, loads, storage_units, transformers

Lets run an overview of all statistics by calling:

[3]:
n.statistics()
[3]:
Capacity Factor Capital Expenditure Curtailment Dispatch Installed Capacity Market Value Operational Expenditure Optimal Capacity Revenue Supply Withdrawal
carrier
Generator Brown Coal NaN 0.0 0.0 0.0 20879.500000 NaN 0.0 0.0 0.0 0.0 0.0
Gas NaN 0.0 0.0 0.0 23913.130000 NaN 0.0 0.0 0.0 0.0 0.0
Geothermal NaN 0.0 0.0 0.0 31.700000 NaN 0.0 0.0 0.0 0.0 0.0
Hard Coal NaN 0.0 0.0 0.0 25312.600000 NaN 0.0 0.0 0.0 0.0 0.0
Multiple NaN 0.0 0.0 0.0 152.700000 NaN 0.0 0.0 0.0 0.0 0.0
Nuclear NaN 0.0 0.0 0.0 12068.000000 NaN 0.0 0.0 0.0 0.0 0.0
Oil NaN 0.0 0.0 0.0 2710.200000 NaN 0.0 0.0 0.0 0.0 0.0
Other NaN 0.0 0.0 0.0 3027.800000 NaN 0.0 0.0 0.0 0.0 0.0
Run of River NaN 0.0 0.0 0.0 3999.100000 NaN 0.0 0.0 0.0 0.0 0.0
Solar NaN 0.0 0.0 0.0 37041.524779 NaN 0.0 0.0 0.0 0.0 0.0
Storage Hydro NaN 0.0 0.0 0.0 1445.000000 NaN 0.0 0.0 0.0 0.0 0.0
Waste NaN 0.0 0.0 0.0 1645.900000 NaN 0.0 0.0 0.0 0.0 0.0
Wind Offshore NaN 0.0 0.0 0.0 2973.500000 NaN 0.0 0.0 0.0 0.0 0.0
Wind Onshore NaN 0.0 0.0 0.0 37339.895329 NaN 0.0 0.0 0.0 0.0 0.0
Line - NaN 0.0 NaN NaN 961101.136714 NaN NaN 0.0 0.0 NaN NaN
Load - NaN NaN NaN 0.0 NaN NaN NaN NaN 0.0 0.0 0.0
StorageUnit Pumped Hydro NaN 0.0 0.0 0.0 9179.500000 NaN 0.0 0.0 0.0 0.0 0.0
Transformer - NaN 0.0 NaN NaN 192000.000000 NaN NaN 0.0 0.0 NaN NaN

So far the statistics are not so interesting, because we have not solved the network yet. We can only see that the network already has some installed capacities for different components.

You can see that statistics returns a pandas.DataFrame. The MultiIndex of the DataFrame provides the name of the network component (i.e. first entry of the MultiIndex, like Generator, Line,…) on the first index level. The carrier index level provides the carrier name of the given component. For example, in n.generators, we have the carriers Brown Coal, Gas and so on.

Now lets solve the network.

[4]:
n.optimize(n.snapshots[:4])
WARNING:pypsa.components:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
       '32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
       '87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
       '120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
       '159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
       '233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
       '267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
       '315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
       '362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
       '458'],
      dtype='object', name='Transformer')
WARNING:pypsa.components:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
       '32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
       '87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
       '120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
       '159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
       '233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
       '267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
       '315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
       '362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
       '458'],
      dtype='object', name='Transformer')
INFO:linopy.model: Solve problem using Glpk solver
INFO:linopy.io:Writing objective.
Writing constraints.: 100%|██████████| 15/15 [00:00<00:00, 72.93it/s]
Writing continuous variables.: 100%|██████████| 7/7 [00:00<00:00, 202.57it/s]
INFO:linopy.io: Writing time: 0.26s
GLPSOL--GLPK LP/MIP Solver 5.0
Parameter(s) specified in the command line:
 --lp /tmp/linopy-problem-z_9z8m1_.lp --output /tmp/linopy-solve-dulfvg_o.sol
Reading problem data from '/tmp/linopy-problem-z_9z8m1_.lp'...
23828 rows, 9940 columns, 46170 non-zeros
129508 lines were read
GLPK Simplex Optimizer 5.0
23828 rows, 9940 columns, 46170 non-zeros
Preprocessing...
3772 rows, 5708 columns, 21882 non-zeros
Scaling...
 A: min|aij| =  1.485e-02  max|aij| =  1.974e+02  ratio =  1.329e+04
GM: min|aij| =  1.965e-01  max|aij| =  5.090e+00  ratio =  2.591e+01
EQ: min|aij| =  3.882e-02  max|aij| =  1.000e+00  ratio =  2.576e+01
Constructing initial basis...
Size of triangular part is 3670
      0: obj =   3.201467608e+09 inf =   5.855e+08 (2860)
   6723: obj =   4.472917695e+06 inf =   0.000e+00 (0) 41
*  9338: obj =   1.038959807e+06 inf =   0.000e+00 (0) 12
OPTIMAL LP SOLUTION FOUND
Time used:   2.3 secs
Memory used: 17.8 Mb (18617364 bytes)
Writing basic solution to '/tmp/linopy-solve-dulfvg_o.sol'...
INFO:linopy.constants: Optimization successful:
Status: ok
Termination condition: optimal
Solution: 9940 primals, 23828 duals
Objective: 1.04e+06
Solver model: not available
Solver message: optimal

INFO:pypsa.optimization.optimize:The shadow-prices of the constraints Generator-fix-p-lower, Generator-fix-p-upper, Line-fix-s-lower, Line-fix-s-upper, Transformer-fix-s-lower, Transformer-fix-s-upper, StorageUnit-fix-p_dispatch-lower, StorageUnit-fix-p_dispatch-upper, StorageUnit-fix-p_store-lower, StorageUnit-fix-p_store-upper, StorageUnit-fix-state_of_charge-lower, StorageUnit-fix-state_of_charge-upper, Kirchhoff-Voltage-Law, StorageUnit-energy_balance were not assigned to the network.
[4]:
('ok', 'optimal')

Now we can look at the statistics of the solved network.

[5]:
n.statistics().round(1)
[5]:
Capacity Factor Capital Expenditure Curtailment Dispatch Installed Capacity Market Value Operational Expenditure Optimal Capacity Revenue Supply Withdrawal
carrier
Generator Brown Coal 0.1 0.0 0.0 42859.0 20879.5 13.7 428590.3 20879.5 588266.5 42859.0 0.0
Gas 0.0 0.0 0.0 79.4 23913.1 50.0 3971.9 23913.1 3971.9 79.4 0.0
Geothermal 0.0 0.0 0.0 0.0 31.7 NaN 0.0 31.7 0.0 0.0 0.0
Hard Coal 0.0 0.0 0.0 10535.7 25312.6 25.8 263391.6 25312.6 272172.0 10535.7 0.0
Multiple 0.0 0.0 0.0 0.0 152.7 NaN 0.0 152.7 0.0 0.0 0.0
Nuclear 0.1 0.0 0.0 31473.0 12068.0 16.7 251784.0 12068.0 527108.2 31473.0 0.0
Oil 0.0 0.0 0.0 0.0 2710.2 NaN 0.0 2710.2 0.0 0.0 0.0
Other 0.0 0.0 0.0 336.0 3027.8 34.2 10752.0 3027.8 11503.2 336.0 0.0
Run of River 0.1 0.0 0.0 13723.3 3999.1 22.0 41170.0 3999.1 301812.2 13723.3 0.0
Solar 0.0 0.0 47391.0 0.0 37041.5 NaN 0.0 37041.5 0.0 0.0 0.0
Storage Hydro 0.1 0.0 0.0 3580.0 1445.0 20.4 10740.0 1445.0 72904.4 3580.0 0.0
Waste 0.1 0.0 0.0 4760.0 1645.9 20.5 28560.0 1645.9 97621.8 4760.0 0.0
Wind Offshore 0.1 0.0 61905.2 7568.5 2973.5 1.9 0.0 2973.5 14034.2 7568.5 0.0
Wind Onshore 0.1 0.0 391519.8 81920.5 37339.9 5.0 0.0 37339.9 406030.6 81920.5 0.0
Line AC 0.0 0.0 NaN 17280.9 961101.1 60.5 NaN 961101.1 1045149.1 842740.0 -842740.0
Load - NaN NaN NaN -195357.1 NaN -17.4 NaN NaN 3396077.6 0.0 -195357.1
StorageUnit Pumped Hydro 0.0 0.0 0.0 -1478.3 9179.5 -2.7 0.0 9179.5 4013.3 0.0 -1478.3
Transformer - 0.0 0.0 NaN -13293.1 192000.0 -3.9 NaN 192000.0 51490.2 103485.6 -103485.6

As you can see there is now much more information available. There are still no capital expenditures in the network, because we only performed an operational optimization with this example network.

If you are interested in a specific metric, e.g. curtailment, you can run

[6]:
curtailment = n.statistics.curtailment()
curtailment
[6]:
             carrier
Generator    Brown Coal            0.000000
             Gas                   0.000000
             Geothermal            0.000000
             Hard Coal             0.000000
             Multiple              0.000000
             Nuclear               0.000000
             Oil                   0.000000
             Other                 0.000000
             Run of River          0.000000
             Solar             47391.043207
             Storage Hydro         0.000000
             Waste                 0.000000
             Wind Offshore     61905.157344
             Wind Onshore     391519.829741
StorageUnit  Pumped Hydro          0.000000
dtype: float64

Note that when calling a specific metric the statistics module returns a pandas.Series. To find the unit of the data returned by statistics, you can call attrs on the DataFrame or Series.

[7]:
curtailment.attrs
[7]:
{'name': 'Curtailment', 'unit': 'MWh'}

So the unit of curtailment is given in MWh. You can also customize your request.

For this you have various options: 1. You can select the component from which you want to get the metric with the attribute comps. Careful, comps has to be a list of strings.

[8]:
n.statistics.curtailment(comps=["Generator"])
[8]:
           carrier
Generator  Brown Coal            0.000000
           Gas                   0.000000
           Geothermal            0.000000
           Hard Coal             0.000000
           Multiple              0.000000
           Nuclear               0.000000
           Oil                   0.000000
           Other                 0.000000
           Run of River          0.000000
           Solar             47391.043207
           Storage Hydro         0.000000
           Waste                 0.000000
           Wind Offshore     61905.157344
           Wind Onshore     391519.829741
Name: generators, dtype: float64
  1. For metrics which have a time dimension, you can choose the aggregation method or decide to not aggregate them at all. Just use the aggregate_time attribute to specify what you want to do.

For example calculate the mean curtailment per time step is

[9]:
n.statistics.curtailment(comps=["Generator"], aggregate_time="mean")
[9]:
           carrier
Generator  Brown Coal           0.000000
           Gas                  0.000000
           Geothermal           0.000000
           Hard Coal            0.000000
           Multiple             0.000000
           Nuclear              0.000000
           Oil                  0.000000
           Other                0.000000
           Run of River         0.000000
           Solar             1974.626800
           Storage Hydro        0.000000
           Waste                0.000000
           Wind Offshore     2579.381556
           Wind Onshore     16313.326239
dtype: float64

Or retrieve the curtailment time series by not aggregating the time series.

[10]:
n.statistics.curtailment(comps=["Generator"], aggregate_time=False).iloc[:, :3]
[10]:
snapshot 2011-01-01 00:00:00 2011-01-01 01:00:00 2011-01-01 02:00:00
carrier
Generator Brown Coal 0.000000 0.000000 0.000000
Gas 0.000000 0.000000 0.000000
Geothermal 0.000000 0.000000 0.000000
Hard Coal 0.000000 0.000000 0.000000
Multiple 0.000000 0.000000 0.000000
Nuclear 0.000000 0.000000 0.000000
Oil 0.000000 0.000000 0.000000
Other 0.000000 0.000000 0.000000
Run of River 0.000000 0.000000 0.000000
Solar 0.000000 0.000000 0.000000
Storage Hydro 0.000000 0.000000 0.000000
Waste 0.000000 0.000000 0.000000
Wind Offshore 566.574857 755.663609 1301.079806
Wind Onshore 2676.177114 4182.915807 4256.623685
  1. You can choose how you want to group the components of the network and how to aggregate the groups. By default the components are grouped by their carriers and summed. However, you can change this by providing different groupby and aggregate_groups attributes.

[11]:
n.statistics.curtailment(comps=["Generator"], groupby=["bus"], aggregate_groups="max")
[11]:
           bus
Generator  1            2144.049844
           100_220kV     114.649818
           101          4849.953672
           102          3630.015750
           103             1.365577
                           ...
           95_220kV      498.291928
           96_220kV      137.857645
           97            121.489352
           98           1133.688722
           99_220kV       41.972185
Name: generators, Length: 489, dtype: float64

Now you obtained the maximal curtailment during one time step for every bus in the network.

Often it is better when inspecting your network to visualize the tables. Therefore, you can easily make plots to analyze your results. For example the generation/supply of the generators.

[12]:
n.statistics.supply(comps=["Generator"]).div(1e3).plot.bar(title="Generator in GWh")
[12]:
<Axes: title={'center': 'Generator in GWh'}, xlabel='None,carrier'>
../_images/examples_statistics_23_1.png

Or you could plot the generation time series of the generators.

[13]:
fig, ax = plt.subplots()
n.statistics.supply(comps=["Generator"], aggregate_time=False).iloc[:, :4].div(
    1e3
).T.plot.area(
    title="Generation in GW",
    ax=ax,
    legend=False,
    linewidth=0,
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[13]:
<matplotlib.legend.Legend at 0x7fad65882f10>
../_images/examples_statistics_25_1.png

Finally, we want to look at the energy balance of the network. The energy balance is not included in the overview of the statistics module. To calculate the energy balance, you can do

[14]:
n.statistics.energy_balance()
[14]:
             carrier        bus_carrier
Load         -              AC            -1.953571e+05
Line         AC             AC             1.136868e-12
Generator    Brown Coal     AC             4.285903e+04
             Gas            AC             7.943797e+01
             Geothermal     AC             0.000000e+00
             Hard Coal      AC             1.053566e+04
             Multiple       AC             0.000000e+00
             Nuclear        AC             3.147299e+04
             Oil            AC             0.000000e+00
             Other          AC             3.360000e+02
             Run of River   AC             1.372332e+04
             Solar          AC             0.000000e+00
             Storage Hydro  AC             3.580000e+03
             Waste          AC             4.760000e+03
             Wind Offshore  AC             7.568492e+03
             Wind Onshore   AC             8.192045e+04
StorageUnit  Pumped Hydro   AC            -1.478300e+03
Transformer  -              AC             3.410605e-13
dtype: float64

Note that there is now an additional index level called bus carrier. This is because an energy balance is defined for every bus carrier. The bus carriers you have in your network you can find by looking at n.buses.carrier.unique(). For this network, there is only one bus carrier which is AC. AC corresponds to electricity in the regarded network. However, you can have further bus carriers for example when you have a sector coupled network. You could for example have heat or CO \(_2\) as carrier. Therefore, for many statistics functions you have to be careful about the units of the values and it is not always given by the attr object of the DataFrame or Series.