Calculating and plotting VOD

import gnssvod as gv
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
import matplotlib.dates as mdates

calc_vod()

The first step is to calculate VOD for different bands.

One problem is that the observations labels (‘S1C’,’S1X’, etc.) are not always associated to the same frequency depending on which constellation is observed. The labeling also changes according to the RINEX versions.

The calc_vod() function calculates VOD by first evaluating the SNR difference between the individual pairs of observations. The SNR from the clear-sky reference is subtracted from the SNR at the subcanopy station. The attenuation is corrected for the elevation of the satellite, yielding a VOD value.

# set which data files should be loaded
pattern = 'data_RINEX3.03/Laeg_paired/nc/*.nc'
# define how to associate stations together. Always put reference station first.
pairings = {'Laeg':('CH-Laeg_ref','CH-Laeg_grn')}
# define if some observables with similar frequencies should be combined together. In the future, this should be replaced by the selection of frequency bands.
bands = {'VOD1':['S1','S1X','S1C'],'VOD7':['S7','S7X','S7C']}

vod = gv.calc_vod(pattern,pairings,bands)
vod = vod['Laeg']
vod

		VOD1	VOD7	Azimuth	Elevation
Epoch	SV
2023-08-01 00:07:45	C19	NaN	NaN	-97.0	26.1
	C20	NaN	NaN	-144.2	10.6
	C23	NaN	NaN	97.5	3.1
	C27	NaN	NaN	153.1	29.2
	C28	NaN	NaN	92.9	42.9
...	...	...	...	...	...
2023-08-09 23:59:45	R14	2.327782	NaN	-42.8	57.4
	R23	NaN	NaN	NaN	NaN
	R24	-0.010382	NaN	149.7	26.8
	S23	NaN	NaN	NaN	NaN
	S36	NaN	NaN	NaN	NaN

1495088 rows × 4 columns

Hemispheric plot of VOD

Same as in the previous example, we will use the Hemi class to calculate and plot a gridded representation of VOD

# intialize hemispheric grid
hemi = gv.hemibuild(2)
# get patches for plotting later
patches = hemi.patches()
# classify vod into grid cells, drop azimuth and elevation afterwards as we don't need it anymore
vod = hemi.add_CellID(vod).drop(columns=['Azimuth','Elevation'])
# get average value per grid cell
vod_avg = vod.groupby(['CellID']).agg(['mean', 'std', 'count'])
# flatten the columns
vod_avg.columns = ["_".join(x) for x in vod_avg.columns.to_flat_index()]
vod_avg

	VOD1_mean	VOD1_std	VOD1_count	VOD7_mean	VOD7_std	VOD7_count
CellID
0	0.879183	0.507725	184	1.810452	0.509661	57
1	0.757022	0.561797	374	1.330002	0.449482	120
2	0.572596	0.531027	158	2.221168	0.778867	70
3	2.128975	1.155671	58	1.266589	0.548697	58
4	0.834365	0.637610	39	0.907239	0.424226	39
...	...	...	...	...	...	...
6420	0.020885	0.019975	26	0.034187	0.015323	26
6431	1.572118	1.512187	37	NaN	NaN	0
6432	0.086603	0.033302	18	NaN	NaN	0
6438	0.070679	0.056207	32	NaN	NaN	0
6439	0.039912	0.015835	8	NaN	NaN	0

5124 rows × 6 columns

fig, ax = plt.subplots(1,2,figsize=(17,14),subplot_kw=dict(projection='polar'))

vod_names = ['VOD1','VOD7']
for i, iname in enumerate(vod_names):
    # associate the mean values to the patches, join inner will drop patches with no data, making plotting slightly faster
    ipatches = pd.concat([patches,vod_avg[f"{iname}_mean"]],join='inner',axis=1)
    # plotting with colored patches
    pc = PatchCollection(ipatches.Patches,array=ipatches[f"{iname}_mean"],edgecolor='face', linewidth=1)
    pc.set_clim([-0.1,3])
    ax[i].add_collection(pc)
    ax[i].set_rlim([0,90])
    ax[i].set_theta_zero_location("N")
    ax[i].set_theta_direction(-1)
    ax[i].set_title(iname)

plt.colorbar(pc, ax=ax, location='bottom', shrink=.5, pad=0.05, label='GNSS-VOD')

<matplotlib.colorbar.Colorbar at 0x7f772615c520>

../_images/00060153bfd571a86fcd6a3743582c75646de3e3eebe9dcf111783498b4fe5fd.png

It can be a good idea to only plot the average if a minimum number of values were used to calculate an average. Sometimes results at the edges are only based on a few data points and are not very reliable.

fig, ax = plt.subplots(1,2,figsize=(17,14),subplot_kw=dict(projection='polar'))

vod_names = ['VOD1','VOD7']
for i, iname in enumerate(vod_names):
    # associate the mean values to the patches, join inner will drop patches with no data, making plotting slightly faster
    ivod_data = vod_avg[f"{iname}_mean"].where(vod_avg[f"{iname}_count"]>40)
    ipatches = pd.concat([patches,ivod_data],join='inner',axis=1)
    # plotting with colored patches
    pc = PatchCollection(ipatches.Patches,array=ipatches[f"{iname}_mean"],edgecolor='face', linewidth=1)
    pc.set_clim([-0.1,3])
    ax[i].add_collection(pc)
    ax[i].set_rlim([0,90])
    ax[i].set_theta_zero_location("N")
    ax[i].set_theta_direction(-1)
    ax[i].set_title(iname)

plt.colorbar(pc, ax=ax, location='bottom', shrink=.5, pad=0.05, label='GNSS-VOD')

<matplotlib.colorbar.Colorbar at 0x7f7725ba6df0>

../_images/861dd06f0080ad8c64b82f8c2e849a2eef8dea303b353d36822b301fdd456ead.png

Calculation of VOD time series

Because the tracks of the satellites are not random but tend to repeat every day, one has to be a bit careful when computing temporal averages of VOD. For instance, satellites of the GPS constellation repeat twice every sidereal day (1 sidereal day = 23h 56m 4s). Other constellations (below) have slightly more complex repeat times.

Imagine what would happen if most GNSS satellites happen to repeatedly go over slightly denser areas of the canopy every day at around 3:30pm. We would tend to see a peak in VOD each day at that time. However, the average VOD of the forest is not actually increasing, the peak exists only because we happen to be measuring parts of the canopy that are on average slightly denser. Especially when looking at very small changes in VOD (for instance diurnal variability), these repeating orbits create spurious signals and obfuscate the real variability. Check Humphrey & Frankenberg 2023, Figure 7 to see how this affects time series.

Processing a more accurate VOD time series involves three steps:

Calculating a baseline (long-term mean) hemispheric VOD over a sufficiently long time period (which we just did)
Calculate VOD anomalies by subtracting the baseline from each instantaneous VOD measurement
Calculate a time series based on the VOD anomalies

Calculating VOD anomalies

Here we merge back the VOD average to the original dataframe.

# merge statistics with the original VOD measurements
vod_anom = vod.join(vod_avg,on='CellID')
vod_anom

		VOD1	VOD7	CellID	VOD1_mean	VOD1_std	VOD1_count	VOD7_mean	VOD7_std	VOD7_count
Epoch	SV
2023-08-01 00:07:45	C19	NaN	NaN	3733	0.485518	0.242357	208	0.753132	0.240639	58
	C20	NaN	NaN	5444	0.416537	0.147121	102	0.385905	0.158380	62
	C23	NaN	NaN	6052	0.250855	0.050505	183	NaN	NaN	0
	C27	NaN	NaN	3267	0.051724	0.334082	54	0.867515	0.387465	38
	C28	NaN	NaN	2130	1.501628	0.634895	250	1.260002	0.435578	71
...	...	...	...	...	...	...	...	...	...	...
2023-08-09 23:59:45	R04	1.116509	NaN	3128	0.541466	0.430518	406	0.899058	0.428780	79
	R12	0.274069	NaN	6048	0.295512	0.070216	190	NaN	NaN	0
	R13	2.020736	NaN	1795	1.501337	0.760002	462	1.455787	0.513682	62
	R14	2.327782	NaN	1044	2.584283	0.825547	198	NaN	NaN	0
	R24	-0.010382	NaN	3668	-0.022531	0.266441	200	0.552734	0.203059	95

1329259 rows × 9 columns

Anomalies are simply obtained with:

vod_anom['VOD1_anom'] = vod_anom['VOD1']-vod_anom['VOD1_mean']
vod_anom['VOD7_anom'] = vod_anom['VOD7']-vod_anom['VOD7_mean']

Calculating the anomaly time series

Here we calculate an hourly average

vod_ts = vod_anom.groupby(pd.Grouper(freq='1h', level='Epoch')).mean()
vod_ts

	VOD1	VOD7	CellID	VOD1_mean	VOD1_std	VOD1_count	VOD7_mean	VOD7_std	VOD7_count	VOD1_anom	VOD7_anom
Epoch
2023-08-01 00:00:00	1.045700	0.857200	3161.386060	1.000563	0.432741	298.288867	0.975316	0.368189	67.404259	-0.014699	0.028733
2023-08-01 01:00:00	1.097543	0.906498	3125.360222	1.068297	0.440577	298.365611	1.010482	0.369235	76.660843	-0.007610	-0.021655
2023-08-01 02:00:00	1.007393	1.047247	3176.665345	0.988260	0.426496	286.419311	0.986190	0.358648	70.151769	-0.034646	0.010036
2023-08-01 03:00:00	0.969277	1.096157	2696.083424	1.026343	0.484076	331.838911	1.086580	0.405147	85.611051	0.004872	-0.000651
2023-08-01 04:00:00	1.104497	1.075735	2315.294587	1.080343	0.523946	336.100631	1.030475	0.411516	84.555629	0.024695	0.018333
...	...	...	...	...	...	...	...	...	...	...	...
2023-08-09 19:00:00	1.148011	1.177217	2475.636711	1.151665	0.518803	352.278738	1.122829	0.416244	81.463123	-0.020028	-0.066742
2023-08-09 20:00:00	0.987982	1.222887	2506.069470	1.025942	0.484028	321.785275	1.046329	0.407231	77.058833	-0.023637	-0.028753
2023-08-09 21:00:00	1.132818	1.001172	2610.778375	1.093195	0.537218	270.739590	1.073045	0.405216	68.214070	-0.051414	-0.114403
2023-08-09 22:00:00	0.941130	0.924344	2945.734087	0.920403	0.465508	295.216509	0.927142	0.354828	80.147437	0.014564	-0.033115
2023-08-09 23:00:00	0.922004	0.900242	2964.178333	0.969349	0.451153	317.742046	0.997615	0.381067	79.576440	-0.017199	-0.023575

216 rows × 11 columns

We should not forget to add the mean VOD back, otherwise the anomaly time series will be centred around zero

vod_ts['VOD1_anom'] = vod_ts['VOD1_anom']+vod_ts['VOD1'].mean()
vod_ts['VOD7_anom'] = vod_ts['VOD7_anom']+vod_ts['VOD7'].mean()

Now let’s compare our time series, with and without subtracting the baseline.

vod_names = ['VOD1','VOD1_anom']

fig, ax = plt.subplots(1,figsize=(10,5))
for i, iname in enumerate(vod_names): 
    # plot each measurement and color by signal to noise ratio
    hs = ax.plot(vod_ts.index.get_level_values('Epoch'),vod_ts[iname],label=iname)
    
myFmt = mdates.DateFormatter('%d-%b')
ax.xaxis.set_major_formatter(myFmt)
ax.set_ylabel('GNSS-VOD (L1)')
ax.legend()
plt.title('GNSS-VOD at Lägeren Tower (CH-Laeg, Switzerland), August 2023')
plt.savefig('figures/illustration_vod.png',facecolor='white', transparent=False,bbox_inches='tight')

../_images/cdce8b62dc11222b2ccf0aa3d573c831e0f21194fbcba6c5c1e3f8abe6335308.png

We clearly see how subtracting the gridded hemispheric average before calculating the time series has substantially reduced the noise in our VOD time series. Here we see short spikes usually corresponding to rain events and water interception by the canopy. A very small diurnal cycle is somewhat visible on August 3 to 6, with dehydration during day and rehydration overnight.

Longer time series

In principle, when calculating a time series over several months, the baseline should be regularly updated. One option could be to go for a “moving” baseline, with a window of a few weeks.

In the next release, there should be some functions to do exactly that with a batch logic (i.e. automatically).