Note

You can download this example as a Jupyter notebook or start it in interactive mode.

# Using the statistics module in PyPSA#

The `statistics`

module is used to easily extract information from your networks. This is useful when inspecting your solved networks and creating first visualizations of your results.

With the `statistics`

module, you can look at different metrics of your network. A list of the implemented metrics are:

Capital expenditure

Operational expenditure

Installed capacities

Optimal capacities

Supply

Withdrawal

Curtailment

Capacity Factor

Revenue

Market value

Energy balance

Now lets look at an example.

```
[1]:
```

```
import pypsa
import matplotlib.pyplot as plt
```

First, we open an example network we want to investigate.

```
[2]:
```

```
n = pypsa.examples.scigrid_de()
```

```
WARNING:pypsa.io:Importing network from PyPSA version v0.17.1 while current version is v0.25.1. Read the release notes at https://pypsa.readthedocs.io/en/latest/release_notes.html to prepare your network for import.
INFO:pypsa.io:Imported network scigrid-de.nc has buses, generators, lines, loads, storage_units, transformers
```

Lets run an overview of all statistics by calling:

```
[3]:
```

```
n.statistics()
```

```
[3]:
```

Capacity Factor | Capital Expenditure | Curtailment | Dispatch | Installed Capacity | Market Value | Operational Expenditure | Optimal Capacity | Revenue | Supply | Withdrawal | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|

carrier | ||||||||||||

Generator | Brown Coal | NaN | 0.0 | 0.0 | 0.0 | 20879.500000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

Gas | NaN | 0.0 | 0.0 | 0.0 | 23913.130000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Geothermal | NaN | 0.0 | 0.0 | 0.0 | 31.700000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Hard Coal | NaN | 0.0 | 0.0 | 0.0 | 25312.600000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Multiple | NaN | 0.0 | 0.0 | 0.0 | 152.700000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Nuclear | NaN | 0.0 | 0.0 | 0.0 | 12068.000000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Oil | NaN | 0.0 | 0.0 | 0.0 | 2710.200000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Other | NaN | 0.0 | 0.0 | 0.0 | 3027.800000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Run of River | NaN | 0.0 | 0.0 | 0.0 | 3999.100000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Solar | NaN | 0.0 | 0.0 | 0.0 | 37041.524779 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Storage Hydro | NaN | 0.0 | 0.0 | 0.0 | 1445.000000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Waste | NaN | 0.0 | 0.0 | 0.0 | 1645.900000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Wind Offshore | NaN | 0.0 | 0.0 | 0.0 | 2973.500000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Wind Onshore | NaN | 0.0 | 0.0 | 0.0 | 37339.895329 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |

Line | - | NaN | 0.0 | NaN | NaN | 961101.136714 | NaN | NaN | 0.0 | 0.0 | NaN | NaN |

Load | - | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 |

StorageUnit | Pumped Hydro | NaN | 0.0 | 0.0 | 0.0 | 9179.500000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

Transformer | - | NaN | 0.0 | NaN | NaN | 192000.000000 | NaN | NaN | 0.0 | 0.0 | NaN | NaN |

So far the `statistics`

are not so interesting, because we have not solved the network yet. We can only see that the network already has some installed capacities for different components.

You can see that `statistics`

returns a `pandas.DataFrame`

. The MultiIndex of the `DataFrame`

provides the name of the network component (i.e. first entry of the MultiIndex, like *Generator, Line,…*) on the first index level. The `carrier`

index level provides the carrier name of the given component. For example, in `n.generators`

, we have the carriers *Brown Coal, Gas* and so on.

Now lets solve the network.

```
[4]:
```

```
n.optimize(n.snapshots[:4])
```

```
WARNING:pypsa.components:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
'32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
'87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
'120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
'159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
'233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
'267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
'315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
'362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
'458'],
dtype='object', name='Transformer')
WARNING:pypsa.components:The following transformers have zero r, which could break the linear load flow:
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
'32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
'87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
'120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
'159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
'233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
'267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
'315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
'362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
'458'],
dtype='object', name='Transformer')
INFO:linopy.model: Solve problem using Glpk solver
INFO:linopy.io:Writing objective.
Writing constraints.: 100%|██████████| 15/15 [00:00<00:00, 72.93it/s]
Writing continuous variables.: 100%|██████████| 7/7 [00:00<00:00, 202.57it/s]
INFO:linopy.io: Writing time: 0.26s
```

```
GLPSOL--GLPK LP/MIP Solver 5.0
Parameter(s) specified in the command line:
--lp /tmp/linopy-problem-z_9z8m1_.lp --output /tmp/linopy-solve-dulfvg_o.sol
Reading problem data from '/tmp/linopy-problem-z_9z8m1_.lp'...
23828 rows, 9940 columns, 46170 non-zeros
129508 lines were read
GLPK Simplex Optimizer 5.0
23828 rows, 9940 columns, 46170 non-zeros
Preprocessing...
3772 rows, 5708 columns, 21882 non-zeros
Scaling...
A: min|aij| = 1.485e-02 max|aij| = 1.974e+02 ratio = 1.329e+04
GM: min|aij| = 1.965e-01 max|aij| = 5.090e+00 ratio = 2.591e+01
EQ: min|aij| = 3.882e-02 max|aij| = 1.000e+00 ratio = 2.576e+01
Constructing initial basis...
Size of triangular part is 3670
0: obj = 3.201467608e+09 inf = 5.855e+08 (2860)
6723: obj = 4.472917695e+06 inf = 0.000e+00 (0) 41
* 9338: obj = 1.038959807e+06 inf = 0.000e+00 (0) 12
OPTIMAL LP SOLUTION FOUND
Time used: 2.3 secs
Memory used: 17.8 Mb (18617364 bytes)
Writing basic solution to '/tmp/linopy-solve-dulfvg_o.sol'...
```

```
INFO:linopy.constants: Optimization successful:
Status: ok
Termination condition: optimal
Solution: 9940 primals, 23828 duals
Objective: 1.04e+06
Solver model: not available
Solver message: optimal
INFO:pypsa.optimization.optimize:The shadow-prices of the constraints Generator-fix-p-lower, Generator-fix-p-upper, Line-fix-s-lower, Line-fix-s-upper, Transformer-fix-s-lower, Transformer-fix-s-upper, StorageUnit-fix-p_dispatch-lower, StorageUnit-fix-p_dispatch-upper, StorageUnit-fix-p_store-lower, StorageUnit-fix-p_store-upper, StorageUnit-fix-state_of_charge-lower, StorageUnit-fix-state_of_charge-upper, Kirchhoff-Voltage-Law, StorageUnit-energy_balance were not assigned to the network.
```

```
[4]:
```

```
('ok', 'optimal')
```

Now we can look at the `statistics`

of the solved network.

```
[5]:
```

```
n.statistics().round(1)
```

```
[5]:
```

Capacity Factor | Capital Expenditure | Curtailment | Dispatch | Installed Capacity | Market Value | Operational Expenditure | Optimal Capacity | Revenue | Supply | Withdrawal | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|

carrier | ||||||||||||

Generator | Brown Coal | 0.1 | 0.0 | 0.0 | 42859.0 | 20879.5 | 13.7 | 428590.3 | 20879.5 | 588266.5 | 42859.0 | 0.0 |

Gas | 0.0 | 0.0 | 0.0 | 79.4 | 23913.1 | 50.0 | 3971.9 | 23913.1 | 3971.9 | 79.4 | 0.0 | |

Geothermal | 0.0 | 0.0 | 0.0 | 0.0 | 31.7 | NaN | 0.0 | 31.7 | 0.0 | 0.0 | 0.0 | |

Hard Coal | 0.0 | 0.0 | 0.0 | 10535.7 | 25312.6 | 25.8 | 263391.6 | 25312.6 | 272172.0 | 10535.7 | 0.0 | |

Multiple | 0.0 | 0.0 | 0.0 | 0.0 | 152.7 | NaN | 0.0 | 152.7 | 0.0 | 0.0 | 0.0 | |

Nuclear | 0.1 | 0.0 | 0.0 | 31473.0 | 12068.0 | 16.7 | 251784.0 | 12068.0 | 527108.2 | 31473.0 | 0.0 | |

Oil | 0.0 | 0.0 | 0.0 | 0.0 | 2710.2 | NaN | 0.0 | 2710.2 | 0.0 | 0.0 | 0.0 | |

Other | 0.0 | 0.0 | 0.0 | 336.0 | 3027.8 | 34.2 | 10752.0 | 3027.8 | 11503.2 | 336.0 | 0.0 | |

Run of River | 0.1 | 0.0 | 0.0 | 13723.3 | 3999.1 | 22.0 | 41170.0 | 3999.1 | 301812.2 | 13723.3 | 0.0 | |

Solar | 0.0 | 0.0 | 47391.0 | 0.0 | 37041.5 | NaN | 0.0 | 37041.5 | 0.0 | 0.0 | 0.0 | |

Storage Hydro | 0.1 | 0.0 | 0.0 | 3580.0 | 1445.0 | 20.4 | 10740.0 | 1445.0 | 72904.4 | 3580.0 | 0.0 | |

Waste | 0.1 | 0.0 | 0.0 | 4760.0 | 1645.9 | 20.5 | 28560.0 | 1645.9 | 97621.8 | 4760.0 | 0.0 | |

Wind Offshore | 0.1 | 0.0 | 61905.2 | 7568.5 | 2973.5 | 1.9 | 0.0 | 2973.5 | 14034.2 | 7568.5 | 0.0 | |

Wind Onshore | 0.1 | 0.0 | 391519.8 | 81920.5 | 37339.9 | 5.0 | 0.0 | 37339.9 | 406030.6 | 81920.5 | 0.0 | |

Line | AC | 0.0 | 0.0 | NaN | 17280.9 | 961101.1 | 60.5 | NaN | 961101.1 | 1045149.1 | 842740.0 | -842740.0 |

Load | - | NaN | NaN | NaN | -195357.1 | NaN | -17.4 | NaN | NaN | 3396077.6 | 0.0 | -195357.1 |

StorageUnit | Pumped Hydro | 0.0 | 0.0 | 0.0 | -1478.3 | 9179.5 | -2.7 | 0.0 | 9179.5 | 4013.3 | 0.0 | -1478.3 |

Transformer | - | 0.0 | 0.0 | NaN | -13293.1 | 192000.0 | -3.9 | NaN | 192000.0 | 51490.2 | 103485.6 | -103485.6 |

As you can see there is now much more information available. There are still no capital expenditures in the network, because we only performed an operational optimization with this example network.

If you are interested in a specific metric, e.g. curtailment, you can run

```
[6]:
```

```
curtailment = n.statistics.curtailment()
curtailment
```

```
[6]:
```

```
carrier
Generator Brown Coal 0.000000
Gas 0.000000
Geothermal 0.000000
Hard Coal 0.000000
Multiple 0.000000
Nuclear 0.000000
Oil 0.000000
Other 0.000000
Run of River 0.000000
Solar 47391.043207
Storage Hydro 0.000000
Waste 0.000000
Wind Offshore 61905.157344
Wind Onshore 391519.829741
StorageUnit Pumped Hydro 0.000000
dtype: float64
```

Note that when calling a specific metric the `statistics`

module returns a `pandas.Series`

. To find the unit of the data returned by `statistics`

, you can call `attrs`

on the `DataFrame`

or `Series`

.

```
[7]:
```

```
curtailment.attrs
```

```
[7]:
```

```
{'name': 'Curtailment', 'unit': 'MWh'}
```

So the unit of curtailment is given in `MWh`

. You can also customize your request.

For this you have various options: 1. You can select the component from which you want to get the metric with the attribute `comps`

. Careful, `comps`

has to be a list of strings.

```
[8]:
```

```
n.statistics.curtailment(comps=["Generator"])
```

```
[8]:
```

```
carrier
Generator Brown Coal 0.000000
Gas 0.000000
Geothermal 0.000000
Hard Coal 0.000000
Multiple 0.000000
Nuclear 0.000000
Oil 0.000000
Other 0.000000
Run of River 0.000000
Solar 47391.043207
Storage Hydro 0.000000
Waste 0.000000
Wind Offshore 61905.157344
Wind Onshore 391519.829741
Name: generators, dtype: float64
```

For metrics which have a time dimension, you can choose the aggregation method or decide to not aggregate them at all. Just use the

`aggregate_time`

attribute to specify what you want to do.

For example calculate the mean curtailment per time step is

```
[9]:
```

```
n.statistics.curtailment(comps=["Generator"], aggregate_time="mean")
```

```
[9]:
```

```
carrier
Generator Brown Coal 0.000000
Gas 0.000000
Geothermal 0.000000
Hard Coal 0.000000
Multiple 0.000000
Nuclear 0.000000
Oil 0.000000
Other 0.000000
Run of River 0.000000
Solar 1974.626800
Storage Hydro 0.000000
Waste 0.000000
Wind Offshore 2579.381556
Wind Onshore 16313.326239
dtype: float64
```

Or retrieve the curtailment time series by not aggregating the time series.

```
[10]:
```

```
n.statistics.curtailment(comps=["Generator"], aggregate_time=False).iloc[:, :3]
```

```
[10]:
```

snapshot | 2011-01-01 00:00:00 | 2011-01-01 01:00:00 | 2011-01-01 02:00:00 | |
---|---|---|---|---|

carrier | ||||

Generator | Brown Coal | 0.000000 | 0.000000 | 0.000000 |

Gas | 0.000000 | 0.000000 | 0.000000 | |

Geothermal | 0.000000 | 0.000000 | 0.000000 | |

Hard Coal | 0.000000 | 0.000000 | 0.000000 | |

Multiple | 0.000000 | 0.000000 | 0.000000 | |

Nuclear | 0.000000 | 0.000000 | 0.000000 | |

Oil | 0.000000 | 0.000000 | 0.000000 | |

Other | 0.000000 | 0.000000 | 0.000000 | |

Run of River | 0.000000 | 0.000000 | 0.000000 | |

Solar | 0.000000 | 0.000000 | 0.000000 | |

Storage Hydro | 0.000000 | 0.000000 | 0.000000 | |

Waste | 0.000000 | 0.000000 | 0.000000 | |

Wind Offshore | 566.574857 | 755.663609 | 1301.079806 | |

Wind Onshore | 2676.177114 | 4182.915807 | 4256.623685 |

You can choose how you want to group the components of the network and how to aggregate the groups. By default the components are grouped by their carriers and summed. However, you can change this by providing different

`groupby`

and`aggregate_groups`

attributes.

```
[11]:
```

```
n.statistics.curtailment(comps=["Generator"], groupby=["bus"], aggregate_groups="max")
```

```
[11]:
```

```
bus
Generator 1 2144.049844
100_220kV 114.649818
101 4849.953672
102 3630.015750
103 1.365577
...
95_220kV 498.291928
96_220kV 137.857645
97 121.489352
98 1133.688722
99_220kV 41.972185
Name: generators, Length: 489, dtype: float64
```

Now you obtained the maximal curtailment during one time step for every bus in the network.

Often it is better when inspecting your network to visualize the tables. Therefore, you can easily make plots to analyze your results. For example the generation/supply of the generators.

```
[12]:
```

```
n.statistics.supply(comps=["Generator"]).div(1e3).plot.bar(title="Generator in GWh")
```

```
[12]:
```

```
<Axes: title={'center': 'Generator in GWh'}, xlabel='None,carrier'>
```

Or you could plot the generation time series of the generators.

```
[13]:
```

```
fig, ax = plt.subplots()
n.statistics.supply(comps=["Generator"], aggregate_time=False).iloc[:, :4].div(
1e3
).T.plot.area(
title="Generation in GW",
ax=ax,
legend=False,
linewidth=0,
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)
```

```
[13]:
```

```
<matplotlib.legend.Legend at 0x7fad65882f10>
```

Finally, we want to look at the energy balance of the network. The energy balance is not included in the overview of the statistics module. To calculate the energy balance, you can do

```
[14]:
```

```
n.statistics.energy_balance()
```

```
[14]:
```

```
carrier bus_carrier
Load - AC -1.953571e+05
Line AC AC 1.136868e-12
Generator Brown Coal AC 4.285903e+04
Gas AC 7.943797e+01
Geothermal AC 0.000000e+00
Hard Coal AC 1.053566e+04
Multiple AC 0.000000e+00
Nuclear AC 3.147299e+04
Oil AC 0.000000e+00
Other AC 3.360000e+02
Run of River AC 1.372332e+04
Solar AC 0.000000e+00
Storage Hydro AC 3.580000e+03
Waste AC 4.760000e+03
Wind Offshore AC 7.568492e+03
Wind Onshore AC 8.192045e+04
StorageUnit Pumped Hydro AC -1.478300e+03
Transformer - AC 3.410605e-13
dtype: float64
```

Note that there is now an additional index level called bus carrier. This is because an energy balance is defined for every bus carrier. The bus carriers you have in your network you can find by looking at `n.buses.carrier.unique()`

. For this network, there is only one bus carrier which is AC. AC corresponds to electricity in the regarded network. However, you can have further bus carriers for example when you have a sector coupled network. You could for example have heat or CO \(_2\) as
carrier. Therefore, for many `statistics`

functions you have to be careful about the units of the values and it is not always given by the `attr`

object of the `DataFrame`

or `Series`

.