Merging processed data
This notebook relies on the data from the previous notebook (but there is no need to run the previous notebook for this one to work however).
import gnssvod as gv
import pandas as pd
Merge
In the previous notebook, we processed raw RINEX observation files individually for each receiver and saved the results in corresponding NetCDF files.
In the case of a GNSS-VOD set up, receivers are analysed as pairs. One receiver lies above the forest canopy and provides a clear-sky reference, and the other one lies below the canopy and measures the forest attenuation.
Here we merge the data from these two receivers before making any plots. We also save the merged data in chunks that are always the same (for example we save them in daily chunks). This makes it easier to manipulate data and avoids relying on the temporal chunks with which data was initially logged (here data was logged in hourly log files that span from xx:07 too xx+1:06).
Function gather_stations()
The function gnssvod.gather_stations() will do several things
It will read processed observation files that were saved in NetCDF format (output of “preprocess”).
It will combine data from the various receivers/stations according to user-specified pairing rules.
It will only process data belonging to the requested time interval.
It will save paired data in temporal chunks specified by the time interval.
If requested, it will also return the paired data as an object
Specifying input files
# first let's indicate where to find the data for each receiver
pattern={'Dav2_Twr':'data_RINEX2.11/Dav2_Twr/nc/*.nc',
'Dav1_Grnd':'data_RINEX2.11/Dav1_Grnd/nc/*.nc'}
Specifying time interval
Then we need to define the temporal interval and the temporal chunks we will want for the output data
Here we decide to process all data from ‘28-04-2021’ to ‘29-04-2021’, meaning 2 days, starting at ‘28-04-2021’
startday = pd.to_datetime('28-04-2021',format='%d-%m-%Y')
timeintervals=pd.interval_range(start=startday, periods=2, freq='D', closed='left')
timeintervals
IntervalIndex([[2021-04-28 00:00:00, 2021-04-29 00:00:00), [2021-04-29 00:00:00, 2021-04-30 00:00:00)], dtype='interval[datetime64[ns], left]')
Using the timeintervals above will save the results in chunks of 1 day. If we wanted the results in hourly chunks, we could have written instead:
timeintervals=pd.interval_range(start=startday, periods=48, freq='H', closed='left')
Now the only thing left is to define how to combine the stations, using the same dictionary keys as in ‘pattern’.
# define how to make pairs, always give reference station first, matching the dictionary keys of 'pattern'
pairings={'Dav':('Dav2_Twr','Dav1_Grnd')}
# run function
out = gv.gather_stations(pattern,pairings,timeintervals,outputresult=True)
Extracting Epochs from files
----- Processing Dav
-- Processing interval [2021-04-28 00:00:00, 2021-04-29 00:00:00)
Found 3 file(s) for Dav2_Twr
Reading
Found 3 file(s) for Dav1_Grnd
Reading
Concatenating stations
-- Processing interval [2021-04-29 00:00:00, 2021-04-30 00:00:00)
Found 4 file(s) for Dav2_Twr
Reading
Found 4 file(s) for Dav1_Grnd
Reading
Concatenating stations
If outputresult was set to ‘True’ (default is ‘False’), the returned result is of the form
out = dict(key=pd.DataFrame,
key=pd.DataFrame)
In our case, something like:
out = {‘Dav’:pd.DataFrame}
out
{'Dav': S1 S2 S7 Azimuth Elevation
Station Epoch SV
Dav2_Twr 2021-04-28 21:07:00 C06 38.0 38.0 31.0 36.6 10.1
C09 41.0 41.0 36.0 49.0 32.7
C11 43.4 43.4 41.0 177.2 35.1
C14 45.0 45.0 42.3 -96.4 76.8
C16 38.0 38.0 33.0 38.3 15.2
... ... ... ... ... ...
Dav1_Grnd 2021-04-29 03:07:00 R16 32.3 31.8 NaN -173.5 68.9
R23 27.7 NaN NaN NaN NaN
S23 36.0 NaN NaN NaN NaN
S27 29.1 NaN NaN NaN NaN
S36 35.0 NaN NaN NaN NaN
[89349 rows x 5 columns]}
We can see that a new MultiIndex level named ‘Station’ has been added. Data from both stations now appear in the same table, with aligned Epochs and SV numbers.
Specifying output destination
Instead of just returning the result as an output of the function, we can specify where to save it instead. Again it may also be useful to get rid of some variables that are not useful in order to reduce file size.
# define where to save output data, matching the dictionary keys in 'pairings'
outputdir = {'Dav':'data_RINEX2.11/Dav_paired/'}
# define which variables to keep
keepvars = ['S*','Azimuth','Elevation']
# run function
out = gv.gather_stations(pattern,pairings,timeintervals,keepvars=keepvars,outputdir=outputdir)
Extracting Epochs from files
----- Processing Dav
-- Processing interval [2021-04-28 00:00:00, 2021-04-29 00:00:00)
Found 3 file(s) for Dav2_Twr
Reading
Found 3 file(s) for Dav1_Grnd
Reading
Concatenating stations
Saving result in data_RINEX2.11/Dav_paired/
Saved 43172 observations in Dav_20210428000000_20210429000000.nc
-- Processing interval [2021-04-29 00:00:00, 2021-04-30 00:00:00)
Found 4 file(s) for Dav2_Twr
Reading
Found 4 file(s) for Dav1_Grnd
Reading
Concatenating stations
Saving result in data_RINEX2.11/Dav_paired/
Saved 46177 observations in Dav_20210429000000_20210430000000.nc
As we asked, the results have been saved as daily files (even though the input files are hourly files). The file names are generated based on the key of the ‘pairing’ argument (here ‘Dav’) and the specified time intervals.