Seasat – Technical Challenges – 4. Data Cleaning (Part 2)

4.3 Prep_Raw.sh

SEASAT Data Flow Prep Cleaning

After development of each of the software pieces described previously in this section, the entire data cleaning process was driven by the program prep_raw.sh. This procedure was run on all of the swaths that were output from SyncPrep to create the first version of the ASF online Seasat raw data archive (the fixed_ files). Analysis of these results is covered in the next section. What follows here are examples of the prep_raw.sh process and intermediate outputs.

Time Filtering in Stages: Each plot shows range line number versus MSEC metadata value. Top row is before filtering; bottom is after. From left to right: (a) Stage 1 — attempt to fix all time values > 513 from the linear trend. (b) Stage 2 — fix stair steps resulting from sticking clock on satellite. (c) Stage 3 — final linear fix before discontinuity removal.

Time Filtering in Stages: Each plot shows range line number versus MSEC metadata value. Top row is before filtering; bottom is after. From left to right: (a) Stage 1 — attempt to fix all time values > 513 from the linear trend. (b) Stage 2 — fix stair steps resulting from sticking clock on satellite. (c) Stage 3 — final linear fix before discontinuity removal.

Data Set #1

Decoded Signal Data
Data contain lots of bit errors, dropouts and stair steps.
Time Gap Corrections
Time Gap Corrections: A few large bit errors and some fill fixes have been applied.
Stair Corrections
Stair Corrections: A few of the stairs in this image have been partially corrected.
Time Value Corrections
Time Value Corrections: The time values for this file have all been brought into a reasonable range. No discontinuities needed to be filled in this area of the image.

Data Set #2

Decoded Signal Data
This shows a typical discontinuity situation. The data at the crossover from before to after the gap is spotty, filled with bit errors and dropouts.
Time Gap Corrections
Time Gap Corrections: Gaps in the data have mostly been filled in. Some are incorrect – particularly the green values that match the lower line but are on the right side of this plot.
Stair Corrections
Stair Corrections: Since no stair step anomaly exists in this data, there is no visual change here.
Linear Time Restored
Linear Time Restored: After the discontinuity is filled, linear time is restored in this file.
Decoded Time Data from Section 2.1 Examples
Cleaned Time Data from 2.1 Examples
Cleaned Time Data from 2.1 Examples

4.4 Results of Prep_Raw.sh

By November 15, 2012, the beta version of prep_raw.sh was delivered to ASF operations. It was run on individual swaths at first, with results spot-checked. Once confidence in the programs increased, all remaining Seasat swaths were processed through this decoding and cleaning software en masse. Overall, from the 1,840 files that SyncPrep created, 1,470 were successfully decoded, and 1,399 of those made it through the prep_raw.sh procedure to create a set of fixed decoded files. The files that failed comprise 242 GB of data, while 3,318 GB of decoded swath data files were created.

Processing Stage #files Size (GB)
Capture 38 2160
SyncPrep 1840 2431
Original Decoded 1470 3585
Fixed Decoded 1399 3318
Good Decoded 1346 3160
Bad Decoded 53 157

Summary of Data Cleaning

93% of data made it through SyncPrep
92% of that data decoded (assume 1.6 expansion)
93% of that data was “fixed”
95% of that data considered “good”
OVERALL: ~80% of SyncPrep’d data is “good”

Reasons for Failures

  • SyncPrep: Not all captured files could be interpreted by SyncPrep
  • Original Decoded: Because the decoder needs to interpret the subcommutated headers, it is more stringent on maintaining a “sync lock” than SyncPrep
  • Fixed Decoded: Some files are so badly mangled that a reasonable time sequence could not be recovered
  • Bad Decoded: Several subcategories of remaining data errors are discussed in section 5

4.5 Addition of fix_start_times

During analysis of the fixed metadata files, it was discovered that bad times occurred at the beginning of many files. This problem was not a big surprise; the nature of the sync code search is such that many errors occur in places where the sync codes cannot be found. This is why the files were broken in the first place. So, it is expected that the beginning of a lot of the swath files will have bad metadata, which means bad times. When bad times are linearly trended, the results are unpredictable.

Bad Start Time Examples

To fix this problem, yet another level of filtering was added to the processing flow – this time a post-filter to fix the start times. The code fix_start_times replaces the first 5,000 times in a file with the linear trend resulting from the next 10,000 lines in the file. This code was added as a post-processing step to follow prep_raw.sh and run on all of the decoded swath files.

Final Processing Flow: With the addition of the fix_start_times code, the processing flow for data decode and cleaning is finally completed.

Final Processing Flow: With the addition of the fix_start_times code, the processing flow for data decode and cleaning is finally completed.

Fixed Start Time Examples

Written by Tom Logan, July 2013

Comments are closed.