Note
You can download this example as a Jupyter notebook or start it in interactive mode.
Using the statistics module in PyPSA#
The statistics
module is used to easily extract information from your networks. This is useful when inspecting your solved networks and creating first visualizations of your results.
With the statistics
module, you can look at different metrics of your network. A list of the implemented metrics are:
Capital expenditure
Operational expenditure
Installed capacities
Optimal capacities
Supply
Withdrawal
Curtailment
Capacity Factor
Revenue
Market value
Energy balance
Now lets look at an example.
[1]:
import pypsa
import matplotlib.pyplot as plt
import numpy as np
First, we open an example network we want to investigate.
[2]:
n = pypsa.examples.scigrid_de()
WARNING:pypsa.io:Importing network from PyPSA version v0.17.1 while current version is v0.25.2. Read the release notes at https://pypsa.readthedocs.io/en/latest/release_notes.html to prepare your network for import.
INFO:pypsa.io:Imported network scigrid-de.nc has buses, generators, lines, loads, storage_units, transformers
Lets run an overview of all statistics by calling:
[3]:
n.statistics().dropna()
[3]:
Capacity Factor | Capital Expenditure | Curtailment | Dispatch | Installed Capacity | Market Value | Operational Expenditure | Optimal Capacity | Revenue | Supply | Withdrawal | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
carrier |
So far the statistics
are not so interesting, because we have not solved the network yet. We can only see that the network already has some installed capacities for different components.
You can see that statistics
returns a pandas.DataFrame
. The MultiIndex of the DataFrame
provides the name of the network component (i.e. first entry of the MultiIndex, like Generator, Line,…) on the first index level. The carrier
index level provides the carrier name of the given component. For example, in n.generators
, we have the carriers Brown Coal, Gas and so on.
Now lets solve the network.
[4]:
n.optimize(n.snapshots[:4])
WARNING:pypsa.components:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
'32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
'87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
'120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
'159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
'233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
'267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
'315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
'362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
'458'],
dtype='object', name='Transformer')
WARNING:pypsa.components:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
'32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
'87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
'120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
'159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
'233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
'267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
'315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
'362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
'458'],
dtype='object', name='Transformer')
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'Generator' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'StorageUnit' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'StorageUnit' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'Line' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'Line' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'Transformer' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
/home/docs/checkouts/readthedocs.org/user_builds/pypsa/conda/v0.25.2/lib/python3.11/site-packages/linopy/expressions.py:176: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'Transformer' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = self.data.assign_coords({group_dim: idx})
INFO:linopy.model: Solve problem using Glpk solver
INFO:linopy.io:Writing objective.
Writing constraints.: 100%|██████████| 15/15 [00:00<00:00, 94.07it/s]
Writing continuous variables.: 100%|██████████| 7/7 [00:00<00:00, 262.13it/s]
INFO:linopy.io: Writing time: 0.2s
GLPSOL--GLPK LP/MIP Solver 5.0
Parameter(s) specified in the command line:
--lp /tmp/linopy-problem-azm1pcnu.lp --output /tmp/linopy-solve-sj8h_4wx.sol
Reading problem data from '/tmp/linopy-problem-azm1pcnu.lp'...
23828 rows, 9940 columns, 43654 non-zeros
126992 lines were read
GLPK Simplex Optimizer 5.0
23828 rows, 9940 columns, 43654 non-zeros
Preprocessing...
3772 rows, 5708 columns, 19366 non-zeros
Scaling...
A: min|aij| = 1.485e-02 max|aij| = 1.974e+02 ratio = 1.329e+04
GM: min|aij| = 1.907e-01 max|aij| = 5.245e+00 ratio = 2.751e+01
EQ: min|aij| = 3.688e-02 max|aij| = 1.000e+00 ratio = 2.711e+01
Constructing initial basis...
Size of triangular part is 3669
0: obj = -5.986899755e+08 inf = 1.551e+08 (2823)
6642: obj = 4.445245001e+06 inf = 4.547e-13 (0) 39
* 9066: obj = 1.038959807e+06 inf = 0.000e+00 (0) 10
OPTIMAL LP SOLUTION FOUND
Time used: 2.1 secs
Memory used: 17.2 Mb (18079220 bytes)
Writing basic solution to '/tmp/linopy-solve-sj8h_4wx.sol'...
INFO:linopy.constants: Optimization successful:
Status: ok
Termination condition: optimal
Solution: 9940 primals, 23828 duals
Objective: 1.04e+06
Solver model: not available
Solver message: optimal
INFO:pypsa.optimization.optimize:The shadow-prices of the constraints Generator-fix-p-lower, Generator-fix-p-upper, Line-fix-s-lower, Line-fix-s-upper, Transformer-fix-s-lower, Transformer-fix-s-upper, StorageUnit-fix-p_dispatch-lower, StorageUnit-fix-p_dispatch-upper, StorageUnit-fix-p_store-lower, StorageUnit-fix-p_store-upper, StorageUnit-fix-state_of_charge-lower, StorageUnit-fix-state_of_charge-upper, Kirchhoff-Voltage-Law, StorageUnit-energy_balance were not assigned to the network.
[4]:
('ok', 'optimal')
Now we can look at the statistics
of the solved network.
[5]:
n.statistics().round(1)
[5]:
Capacity Factor | Capital Expenditure | Curtailment | Dispatch | Installed Capacity | Market Value | Operational Expenditure | Optimal Capacity | Revenue | Supply | Withdrawal | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
carrier | ||||||||||||
Generator | Brown Coal | 0.1 | 0.0 | 0.0 | 42859.0 | 20879.5 | 13.7 | 428590.3 | 20879.5 | 588266.5 | 42859.0 | 0.0 |
Gas | 0.0 | 0.0 | 0.0 | 79.4 | 23913.1 | 50.0 | 3971.9 | 23913.1 | 3971.9 | 79.4 | 0.0 | |
Geothermal | 0.0 | 0.0 | 0.0 | 0.0 | 31.7 | NaN | 0.0 | 31.7 | 0.0 | 0.0 | 0.0 | |
Hard Coal | 0.0 | 0.0 | 0.0 | 10535.7 | 25312.6 | 25.8 | 263391.6 | 25312.6 | 272172.0 | 10535.7 | 0.0 | |
Multiple | 0.0 | 0.0 | 0.0 | 0.0 | 152.7 | NaN | 0.0 | 152.7 | 0.0 | 0.0 | 0.0 | |
Nuclear | 0.1 | 0.0 | 0.0 | 31473.0 | 12068.0 | 16.7 | 251784.0 | 12068.0 | 527108.2 | 31473.0 | 0.0 | |
Oil | 0.0 | 0.0 | 0.0 | 0.0 | 2710.2 | NaN | 0.0 | 2710.2 | 0.0 | 0.0 | 0.0 | |
Other | 0.0 | 0.0 | 0.0 | 336.0 | 3027.8 | 34.2 | 10752.0 | 3027.8 | 11503.2 | 336.0 | 0.0 | |
Run of River | 0.1 | 0.0 | 0.0 | 13723.3 | 3999.1 | 22.0 | 41170.0 | 3999.1 | 301812.2 | 13723.3 | 0.0 | |
Solar | 0.0 | 0.0 | 47391.0 | 0.0 | 37041.5 | NaN | 0.0 | 37041.5 | 0.0 | 0.0 | 0.0 | |
Storage Hydro | 0.1 | 0.0 | 0.0 | 3580.0 | 1445.0 | 20.4 | 10740.0 | 1445.0 | 72904.4 | 3580.0 | 0.0 | |
Waste | 0.1 | 0.0 | 0.0 | 4760.0 | 1645.9 | 20.5 | 28560.0 | 1645.9 | 97621.8 | 4760.0 | 0.0 | |
Wind Offshore | 0.1 | 0.0 | 62904.9 | 6568.8 | 2973.5 | 2.1 | 0.0 | 2973.5 | 14034.2 | 6568.8 | 0.0 | |
Wind Onshore | 0.1 | 0.0 | 391045.1 | 82395.2 | 37339.9 | 4.9 | 0.0 | 37339.9 | 406030.6 | 82395.2 | 0.0 | |
Line | AC | 0.0 | 0.0 | NaN | 17530.5 | 961101.1 | 59.6 | NaN | 961101.1 | 1045149.1 | 843709.7 | -843709.7 |
Load | - | NaN | NaN | NaN | -195357.1 | NaN | -17.4 | NaN | NaN | 3396077.6 | 0.0 | -195357.1 |
StorageUnit | Pumped Hydro | 0.0 | 0.0 | 0.0 | -953.3 | 9179.5 | -4.2 | 0.0 | 9179.5 | 4013.3 | 0.0 | -953.3 |
Transformer | - | 0.0 | 0.0 | NaN | -14003.7 | 192000.0 | -3.7 | NaN | 192000.0 | 51490.2 | 103414.7 | -103414.7 |
As you can see there is now much more information available. There are still no capital expenditures in the network, because we only performed an operational optimization with this example network.
If you are interested in a specific metric, e.g. curtailment, you can run
[6]:
curtailment = n.statistics.curtailment()
curtailment[curtailment != 0]
[6]:
carrier
Generator Solar 47391.043207
Wind Offshore 62904.857344
Wind Onshore 391045.130624
dtype: float64
Note that when calling a specific metric the statistics
module returns a pandas.Series
. To find the unit of the data returned by statistics
, you can call attrs
on the DataFrame
or Series
.
[7]:
curtailment.attrs
[7]:
{'name': 'Curtailment', 'unit': 'MWh'}
So the unit of curtailment is given in MWh
. You can also customize your request.
For this you have various options: 1. You can select the component from which you want to get the metric with the attribute comps
. Careful, comps
has to be a list of strings.
[8]:
n.statistics.supply(comps=["Generator"])
[8]:
carrier
Generator Brown Coal 42859.029000
Gas 79.437970
Geothermal 0.000000
Hard Coal 10535.663500
Multiple 0.000000
Nuclear 31472.994000
Oil 0.000000
Other 336.000000
Run of River 13723.318200
Solar 0.000000
Storage Hydro 3580.000000
Waste 4760.000000
Wind Offshore 6568.792000
Wind Onshore 82395.152709
Name: generators, dtype: float64
For metrics which have a time dimension, you can choose the aggregation method or decide to not aggregate them at all. Just use the
aggregate_time
attribute to specify what you want to do.
For example calculate the mean supply/generation per time step is
[9]:
n.statistics.supply(comps=["Generator"], aggregate_time="mean")
[9]:
carrier
Generator Brown Coal 1785.792875
Gas 3.309915
Geothermal 0.000000
Hard Coal 438.985979
Multiple 0.000000
Nuclear 1311.374750
Oil 0.000000
Other 14.000000
Run of River 571.804925
Solar 0.000000
Storage Hydro 149.166667
Waste 198.333333
Wind Offshore 273.699667
Wind Onshore 3433.131363
dtype: float64
Or retrieve the supply time series by not aggregating the time series.
[10]:
n.statistics.supply(comps=["Generator"], aggregate_time=False).iloc[:, :4]
[10]:
snapshot | 2011-01-01 00:00:00 | 2011-01-01 01:00:00 | 2011-01-01 02:00:00 | 2011-01-01 03:00:00 | |
---|---|---|---|---|---|
carrier | |||||
Generator | Brown Coal | 12563.897000 | 10848.560000 | 10006.395000 | 9440.17700 |
Gas | 35.826800 | 23.721700 | 13.083000 | 6.80647 | |
Geothermal | 0.000000 | 0.000000 | 0.000000 | 0.00000 | |
Hard Coal | 4778.482000 | 3124.547000 | 1722.869000 | 909.76550 | |
Multiple | 0.000000 | 0.000000 | 0.000000 | 0.00000 | |
Nuclear | 7842.041000 | 7863.186000 | 7882.532000 | 7885.23500 | |
Oil | 0.000000 | 0.000000 | 0.000000 | 0.00000 | |
Other | 84.000000 | 84.000000 | 84.000000 | 84.00000 | |
Run of River | 3458.909000 | 3419.332800 | 3413.465800 | 3431.61060 | |
Solar | 0.000000 | 0.000000 | 0.000000 | 0.00000 | |
Storage Hydro | 895.000000 | 895.000000 | 895.000000 | 895.00000 | |
Waste | 1167.500000 | 1240.500000 | 1184.500000 | 1167.50000 | |
Wind Offshore | 2489.407000 | 1040.562000 | 1672.360000 | 1366.46300 | |
Wind Onshore | 18735.003302 | 21223.999338 | 20947.135229 | 21489.01484 |
You can choose how you want to group the components of the network and how to aggregate the groups. By default the components are grouped by their carriers and summed. However, you can change this by providing different
groupby
andaggregate_groups
attributes.
[11]:
n.statistics.supply(comps=["Generator"], groupby=["bus"], aggregate_groups="max")
[11]:
bus
Generator 1 0.000000
100_220kV 0.000000
101 1392.589000
102 970.750000
103 0.127076
...
95_220kV 186.429300
96_220kV 53.032800
97 0.002514
98 335.603700
99_220kV 0.000000
Name: generators, Length: 489, dtype: float64
Now you obtained the maximal supply in one time step for every bus in the network.
Often it is better when inspecting your network to visualize the tables. Therefore, you can easily make plots to analyze your results. For example the supply of the generators.
[12]:
n.statistics.supply(comps=["Generator"]).droplevel(0).div(1e3).plot.bar(
title="Generator in GWh"
)
[12]:
<Axes: title={'center': 'Generator in GWh'}, xlabel='carrier'>
Or you could plot the generation time series of the generators.
[13]:
fig, ax = plt.subplots()
n.statistics.supply(comps=["Generator"], aggregate_time=False).droplevel(0).iloc[
:, :4
].div(1e3).T.plot.area(
title="Generation in GW",
ax=ax,
legend=False,
linewidth=0,
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[13]:
<matplotlib.legend.Legend at 0x7fe3497bd050>
Finally, we want to look at the energy balance of the network. The energy balance is not included in the overview of the statistics module. To calculate the energy balance, you can do
[14]:
n.statistics.energy_balance()
[14]:
carrier bus_carrier
Transformer - AC -5.684342e-13
Line AC AC -6.821210e-13
StorageUnit Pumped Hydro AC -9.533000e+02
Generator Brown Coal AC 4.285903e+04
Gas AC 7.943797e+01
Geothermal AC 0.000000e+00
Hard Coal AC 1.053566e+04
Multiple AC 0.000000e+00
Nuclear AC 3.147299e+04
Oil AC 0.000000e+00
Other AC 3.360000e+02
Run of River AC 1.372332e+04
Solar AC 0.000000e+00
Storage Hydro AC 3.580000e+03
Waste AC 4.760000e+03
Wind Offshore AC 6.568792e+03
Wind Onshore AC 8.239515e+04
Load - AC -1.953571e+05
dtype: float64
Note that there is now an additional index level called bus carrier. This is because an energy balance is defined for every bus carrier. The bus carriers you have in your network you can find by looking at n.buses.carrier.unique()
. For this network, there is only one bus carrier which is AC and corresponds to electricity. However, you can have further bus carriers for example when you have a sector coupled network. You could have heat or CO \(_2\) as bus carrier. Therefore, for many
statistics
functions you have to be careful about the units of the values and it is not always given by the attr
object of the DataFrame
or Series
.
Finally, we want to plot the energy balance and the energy balance time series for electrcity which has the bus carrier AC. In a sector coupled network, you could also choose other bus carriers like H2 or heat. Note that in this example “-” represents the load in the system.
[15]:
fig, ax = plt.subplots()
n.statistics.energy_balance().loc[:, :, "AC"].groupby(
"carrier"
).sum().to_frame().T.plot.bar(stacked=True, ax=ax, title="Energy Balance")
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[15]:
<matplotlib.legend.Legend at 0x7fe349886f90>
[16]:
fig, ax = plt.subplots()
n.statistics.energy_balance(aggregate_time=False).loc[:, :, "AC"].droplevel(0).iloc[
:, :4
].groupby("carrier").sum().where(lambda x: np.abs(x) > 1).fillna(0).T.plot.area(
ax=ax, title="Energy Balance Timeseries"
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
[16]:
<matplotlib.legend.Legend at 0x7fe3499bbf10>