Merging processed data

This notebook relies on the data from the previous notebook (but there is no need to run the previous notebook for this one to work however).

import gnssvod as gv
import pandas as pd

Merge

In the previous notebook, we processed raw RINEX observation files individually for each receiver and saved the results in corresponding NetCDF files.

In the case of a GNSS-VOD set up, receivers are analysed as pairs. One receiver lies above the forest canopy and provides a clear-sky reference, and the other one lies below the canopy and measures the forest attenuation.

Here we merge the data from these two receivers before making any plots. We also save the merged data in chunks that are always the same (for example we save them in daily chunks). This makes it easier to manipulate data and avoids relying on the temporal chunks with which data was initially logged (here data was logged in hourly log files that span from xx:07 too xx+1:06).

Function gather_stations()

The function gnssvod.gather_stations() will do several things

It will read processed observation files that were saved in NetCDF format (output of “preprocess”).
It will combine data from the various receivers/stations according to user-specified pairing rules.
It will only process data belonging to the requested time interval.
It will save paired data in temporal chunks specified by the time interval.
If requested, it will also return the paired data as an object

Specifying input files

# first let's indicate where to find the data for each receiver
pattern={'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/nc/*.nc',
         'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/nc/*.nc'}

Specifying time interval

Then we need to define the temporal interval and the temporal chunks we will want for the output data

Here we decide to process all data from ‘28-04-2021’ to ‘29-04-2021’, meaning 2 days, starting at ‘28-04-2021’

startday = pd.to_datetime('28-04-2021',format='%d-%m-%Y')
timeintervals=pd.interval_range(start=startday, periods=2, freq='D', closed='left')
timeintervals

IntervalIndex([[2021-04-28 00:00:00, 2021-04-29 00:00:00), [2021-04-29 00:00:00, 2021-04-30 00:00:00)], dtype='interval[datetime64[ns], left]')

Using the timeintervals above will save the results in chunks of 1 day. If we wanted the results in hourly chunks, we could have written instead:

timeintervals=pd.interval_range(start=startday, periods=48, freq='H', closed='left')

Now the only thing left is to define how to combine the stations, using the same dictionary keys as in ‘pattern’.

# define how to make pairs, always give reference station first, matching the dictionary keys of 'pattern'
pairings={'Dav':('Dav2_Twr','Dav1_Grnd')}

# run function
out = gv.gather_stations(pattern,pairings,timeintervals,outputresult=True)

Extracting Epochs from files
----- Processing Dav
-- Processing interval [2021-04-28 00:00:00, 2021-04-29 00:00:00)
Found 3 file(s) for Dav2_Twr
Reading
Found 3 file(s) for Dav1_Grnd
Reading
Concatenating stations
-- Processing interval [2021-04-29 00:00:00, 2021-04-30 00:00:00)
Found 4 file(s) for Dav2_Twr
Reading
Found 4 file(s) for Dav1_Grnd
Reading
Concatenating stations

If outputresult was set to ‘True’ (default is ‘False’), the returned result is of the form

out = dict(key=pd.DataFrame,
key=pd.DataFrame)

In our case, something like:

out = {‘Dav’:pd.DataFrame}

out

{'Dav':                                      S1    S2    S7  Azimuth  Elevation
 Station   Epoch               SV                                       
 Dav2_Twr  2021-04-28 21:07:00 C06  38.0  38.0  31.0     36.6       10.1
                               C09  41.0  41.0  36.0     49.0       32.7
                               C11  43.4  43.4  41.0    177.2       35.1
                               C14  45.0  45.0  42.3    -96.4       76.8
                               C16  38.0  38.0  33.0     38.3       15.2
 ...                                 ...   ...   ...      ...        ...
 Dav1_Grnd 2021-04-29 03:07:00 R16  32.3  31.8   NaN   -173.5       68.9
                               R23  27.7   NaN   NaN      NaN        NaN
                               S23  36.0   NaN   NaN      NaN        NaN
                               S27  29.1   NaN   NaN      NaN        NaN
                               S36  35.0   NaN   NaN      NaN        NaN
 
 [89349 rows x 5 columns]}

We can see that a new MultiIndex level named ‘Station’ has been added. Data from both stations now appear in the same table, with aligned Epochs and SV numbers.

Specifying output destination

Instead of just returning the result as an output of the function, we can specify where to save it instead. Again it may also be useful to get rid of some variables that are not useful in order to reduce file size.

# define where to save output data, matching the dictionary keys in 'pairings'
outputdir = {'Dav':'data_RINEX2.11/Dav_paired/'}
# define which variables to keep
keepvars = ['S*','Azimuth','Elevation']

# run function
out = gv.gather_stations(pattern,pairings,timeintervals,keepvars=keepvars,outputdir=outputdir)

Extracting Epochs from files
----- Processing Dav
-- Processing interval [2021-04-28 00:00:00, 2021-04-29 00:00:00)
Found 3 file(s) for Dav2_Twr
Reading
Found 3 file(s) for Dav1_Grnd
Reading
Concatenating stations
Saving result in data_RINEX2.11/Dav_paired/
Saved 43172 observations in Dav_20210428000000_20210429000000.nc
-- Processing interval [2021-04-29 00:00:00, 2021-04-30 00:00:00)
Found 4 file(s) for Dav2_Twr
Reading
Found 4 file(s) for Dav1_Grnd
Reading
Concatenating stations
Saving result in data_RINEX2.11/Dav_paired/
Saved 46177 observations in Dav_20210429000000_20210430000000.nc

As we asked, the results have been saved as daily files (even though the input files are hourly files). The file names are generated based on the key of the ‘pairing’ argument (here ‘Dav’) and the specified time intervals.