seasat_data_flow_prep_cleaning

Seasat – Technical Challenges – 4. Data Cleaning (Part 2)

4.3 Prep_Raw.sh

seasat_data_flow_prep_cleaning

After development of each of the software pieces described previously in this section, the entire data cleaning process was driven by the program prep_raw.sh. This procedure was run on all of the swaths that were output from SyncPrep to create the first version of the ASF online Seasat raw data archive (the fixed_ files). Analysis of these results is covered in the next section. What follows here are examples of the prep_raw.sh process and intermediate outputs.

Time Filtering in Stages: Each plot shows range line number versus MSEC metadata value. Top row is before filtering; bottom is after. From left to right: (a) Stage 1 — attempt to fix all time values > 513 from the linear trend. (b) Stage 2 — fix stair steps resulting from sticking clock on satellite. (c) Stage 3 — final linear fix before discontinuity removal.

Time Filtering in Stages: Each plot shows range line number versus MSEC metadata value. Top row is before filtering; bottom is after. From left to right: (a) Stage 1 — attempt to fix all time values > 513 from the linear trend. (b) Stage 2 — fix stair steps resulting from sticking clock on satellite. (c) Stage 3 — final linear fix before discontinuity removal.

Data Set #1

Decoded Signal Data
Time Gap Corrections
Stair Corrections
Time Value Corrections

Data Set #2

Decoded Signal Data
Time Gap Corrections
Stair Corrections
Linear Time Restored
Decoded Time Data from Section 2.1 Examples
Cleaned Time Data from 2.1 Examples

4.4 Results of Prep_Raw.sh

By November 15, 2012, the beta version of prep_raw.sh was delivered to ASF operations. It was run on individual swaths at first, with results spot-checked. Once confidence in the programs increased, all remaining Seasat swaths were processed through this decoding and cleaning software en masse. Overall, from the 1,840 files that SyncPrep created, 1,470 were successfully decoded, and 1,399 of those made it through the prep_raw.sh procedure to create a set of fixed decoded files. The files that failed comprise 242 GB of data, while 3,318 GB of decoded swath data files were created.

Processing Stage #files Size (GB)
Capture 38 2160
SyncPrep 1840 2431
Original Decoded 1470 3585
Fixed Decoded 1399 3318
Good Decoded 1346 3160
Bad Decoded 53 157

Summary of Data Cleaning

93% of data made it through SyncPrep
92% of that data decoded (assume 1.6 expansion)
93% of that data was “fixed”
95% of that data considered “good”
OVERALL: ~80% of SyncPrep’d data is “good”

Reasons for Failures

  • SyncPrep: Not all captured files could be interpreted by SyncPrep
  • Original Decoded: Because the decoder needs to interpret the subcommutated headers, it is more stringent on maintaining a “sync lock” than SyncPrep
  • Fixed Decoded: Some files are so badly mangled that a reasonable time sequence could not be recovered
  • Bad Decoded: Several subcategories of remaining data errors are discussed in section 5

4.5 Addition of fix_start_times

During analysis of the fixed metadata files, it was discovered that bad times occurred at the beginning of many files. This problem was not a big surprise; the nature of the sync code search is such that many errors occur in places where the sync codes cannot be found. This is why the files were broken in the first place. So, it is expected that the beginning of a lot of the swath files will have bad metadata, which means bad times. When bad times are linearly trended, the results are unpredictable.

Bad Start Time Examples

To fix this problem, yet another level of filtering was added to the processing flow – this time a post-filter to fix the start times. The code fix_start_times replaces the first 5,000 times in a file with the linear trend resulting from the next 10,000 lines in the file. This code was added as a post-processing step to follow prep_raw.sh and run on all of the decoded swath files.

Final Processing Flow: With the addition of the fix_start_times code, the processing flow for data decode and cleaning is finally completed.

Final Processing Flow: With the addition of the fix_start_times code, the processing flow for data decode and cleaning is finally completed.

Fixed Start Time Examples

Written by Tom Logan, July 2013

Bits Per Sample: This value should always be 5, as the parameter never changed during the entire Seasat mission.

Seasat – Technical Challenges – 2. Decoder Development

Starting in the summer of 2012, ASF undertook the significant challenge of developing a Seasat telemetry decoder in order to create raw data files suitable for focusing by a synthetic aperture radar (SAR) correlator. In this case, that means processible by ROI, the Repeat Orbit Interferometry package developed at Jet Propulsion Laboratory. In addition to creating the range lines out of minor frames, the decoder must interpret the 18 fields in the headers to create a metadata file describing the state of the satellite when the data was collected.

The main challenges in decoding the raw telemetry were:

  1. Overcoming bit error problems
  2. Properly forming major lines from a variable number of minor frames
  3. Maintaining sync lock
  4. Discovering sentinels marking data collection boundaries

2.1 Problems with Bit Fields

All 10 of the bit fields proved to be unreliable, and, thus, with the exception of the fill flag, they are ignored by all of the software developed during this project. This section describes the ways in which the bit fields are unreliable.

Each Seasat minor frame contains 8 bits to record time and status. These bits encode 18 metadata fields, subcommutated in the first 10 minor frames of each range line. There are 10 bit fields, four fields that should be constant for a data take, two fields that should change rarely and two fields that should change steadily. Unfortunately, due to the high bit error rate (BER) of the telemetry data, even fields that should be constant show high variability. The following plots, showing decoded metadata values plotted over 800,000 range lines, drive home the enormity of the bit error problems in these data.

Bits per Sample

Bits Per Sample: This value should always be 5, as the parameter never changed during the entire Seasat mission.

PRF rate code

PRF Rate Code: The PRF rate code should be a 4 for the entire mission, since this satellite parameter never changed.

Last Digit of Year

Last Digit of Year: The Seasat mission occurred entirely in 1978, so the last digit of the year should always be 8.

Station Code

Station Code: The Station Code should be a constant for any given datatake. 5: Fairbanks, Alaska; 6: Goldstone, Calif.; 7: Merrit Island, Fla.; 9: Oak Hanger, United Kingdom; 10: Shoe Cove, Newfoundland.

Day of Year

Day of Year: For a given datatake, the day of year should change at most once, since any single datatake cannot exceed even an hour in duration, much less an entire day.

Delay to Digitization

Delay to Digitization: The delay to digitization parameterizes the time between emission of a pulse from the satellite and recording of return echoes. Used to calculate the slant range to the first pixel, the delay should change only a handful of times in any given raw data file based upon changes in orbital altitude.

Clock Drift

Clock Drift: The spacecraft clock drift records the timing error of the spacecraft clock. This should be a smoothly changing field, generally in the 2,000- to 3,000-millisecond range. It is not known how this field was originally created, only that it is vital in getting reasonable geolocations for processed imagery.

MSEC

MSEC: This field records the millisecond of the day when the data were acquired and should, therefore, be a linearly increasing field with an exact slope of 1/PRF.

Given all of the fallout from these truly bizarre plots, it is no surprise that attempts to use the fill flag quickly proved difficult; the bit errors are so pervasive that the field is unreliable.

2.2 Determining Minor Frame Numbers

Several oddities in the raw data are exacerbated by the high BER. First, the data are organized into 1,180-bit minor frames. This means that they are each 147.5 bytes long. Although the .5 byte offset was easy to deal with, it turns out that sync codes may actually appear at 147, 147.5, or 148 bytes from each other at seemingly random places in the raw data file – a topic addressed in section 2.3

Moreover, a variable number of minor frames need to be combined to create a single range line. Some lines contain 59 minor frames and some contain 60. Considering that the frame number in the minor frames is only 7 bits, and no major frame numbers exist in the telemetry, the “simple” task of finding the start of each major line was at times quite difficult. Synchronization codes can be either byte-aligned or non-byte-aligned, and partial lines occur on a regular basis. As a result, the minor frame numbers eventually had to be determined by context.

The current frame number is determined using three previous minor frame numbers and the next frame number — along with a handful of heuristics. For example, if a gap is found in consecutive minor frame numbers, the following rules are applied:

  • If the next frame is 1 and last was either 59 or 60, assume this is frame zero
  • Else if (next_frame-last_frame)==2, put this frame in sequence
  • Else if last two frames are in sequence, try to put this one in sequence – If the last frame < 59, put this in sequence – If the last frame was 59 and the last line was length 60, then this HAS to be frame zero since we never have two length 60 lines in a row.
  • Else if last2 and last3 frames are in sequence and last frame is 0, then set this frame to 1
  • If we got to here, then we did not fix the error!

Even beyond these rules, additional checks for a bad frame number 0 and major frames that spuriously showed more than 60 minor frames still had to be performed.

Aside: Bit Errors in Frame Numbers

Random bit errors in the middle of a line: In these decoded minor frame numbers, we see that lines 1, 2, 4 and 5 have no frame number errors; they are in sequence starting from 1 and going up to 59 or 60 by ones. Meanwhile, line 3 has 24 frame numbers in a row that are in error.

Partial lines: The first line is missing minor frames 5-9; the second line is complete; the third line has repeated minor frames 15-19; and the fourth line is the decoder’s attempt to enforce the fact that at most 60 minor frames form a range line. The fifth line is missing minor frames 23-26; the sixth line is complete; the seventh line has repeated minor frames 32-36; and, again, the eighth line is the decoder’s attempt to enforce the fact that at most 60 minor frames form a range line. 

Multiple lines of bit errors: This example shows how bad random bit errors can be, even if no minor frames are actually missing. Incredibly, in five lines, 122 minor frames are in error out of 298 total, giving a 40 percent error rate for these frame numbers. Perhaps even more incredibly, the ASF Seasat decoder managed to fix all of these frame numbers.

  • Original non-fixed frame numbers: 

  • Frame numbers fixed by the ASF Seasat decoder:

2.3 Maintaining Sync Lock

One very important aspect of decoding telemetry data is maintaining a sync lock: The decoding program must be able to find the synchronization codes that occur at the beginning of each minor frame.

Early in the development of the decoder, it was determined that the sync codes are just as susceptible to bit errors as the rest of the data. Initially, finding sync codes required a considerable amount of searching in the file, with the hope that no false positives would be encountered. After much development and testing, it was determined that in order to maintain sync lock, some number of bit errors had to be allowed in the sync code. Therefore, the code was configured to allow 7 bit errors per sync code out of 24 bits. Values less than this needlessly split datatakes (single passes of data over a given ground station). Values greater than this showed too many “false positive” matches for sync codes.

As a result of this extensive analysis, a pattern was determined in the location of sync codes. That is, a byte-aligned, 147.5-byte frame followed by a non-byte-aligned, 147.5-byte frame, repeated 14,217 times, followed by a single instance of a 147-byte frame. In code form:

Once this pattern was established, most problems with locating sync codes were abated.

2.4 Data Sentinel Values – Breaking Datatakes

The next problem involved bad sections of data that defied attempts to match frame numbers. The only solution is to break the datatake into multiple pieces, closing the current output file when problems arise, and creating a new output file when sync is regained. This is much like what SyncPrep does, except that the ASF decoder has to be more stringent in its rules for maintaining sync since it must be able to properly build range lines in addition to just finding sync codes.

In addition to losing sync lock as a result of BER, two additional cases arose that will break a datatake into segments: either 60 occurrences of the fill flag in a row, or the repeated occurrence of frame number 127. The fill flag is a valid field but is so unreliable it can only be trusted to be correct after many consecutive hits. The frame number 127 showed to be a sentinel for no data; it occurred thousands of times in areas where no valid SAR data was being collected. Either of these happenings will also cause the ASF Seasat decoder to close the current output file and create a new one.

2.5 Results of Decoding

seasat_decoder:

  • Decode raw signal data into unpackaged byte signal data (.dat file)
    • 13680 unsigned bytes of signal data per line
    • File size is aways lines * 13680 bytes in length
  • Decodes all headers to ASCII (.hdr file)
    • 20 columns of integer numbers per line
    • One line entry per line of decoded signal data
  • Additional Features:
    • Allows both byte aligned and anon-byte aligned minor frames
    • Deals with variable length lines, partial lines
    • Fixes frame numbers from context if possible
    • Creates one or more output files per input based on sentinels
    • Assembles headers spread across 10 minor frames

In spite of all of the challenges and problems in the raw data, the ASF Seasat decoder is able to decode raw telemetry SAR data. Using five frame numbers in sequence and a handful of heuristics, telemetry data is decoded into byte-aligned, 8-bit samples. Concurrently, all of the metadata stored in the headers is decoded and placed in an external file.

The current strategy tried to err on the side of only allowing valid SAR data to be decoded. Still, 7-bit errors had to be allowed in a sync code match to even get through the raw data. In addition, the decoded header information is simply not reliable. For example, early in development, the ASF Seasat decoder broke one 7-GB chunk of raw data into 24 segments of decoded data, dumping a header at the beginning and ending of each segment. Analysis of the decoded times in these headers showed that of the 48 dumped, 3 were completely zero and an additional 12 were in error. In other words, the decoded times did not make sense in context with the surrounding time values.

Thus, even after completing the decoder development with bit error tolerance, frame number heuristics, proper sync code detection, and known sentinel values for good data boundaries, the decoded Seasat archives were still nearly unusable in any reliable fashion.

Processing Stage #Files Size (GB)
Capture 38 2610
SyncPrep 1840 2431
Original Decoded 1470 3585

Initial Data Recovery: 93 percent of the data captured from tape made it through SyncPrep; Approximately 92 percent of that data was decoded (assuming a 1.6-expansion factor).

Aside: ASF Tape Archive File Names

When the tapes were captured onto disk, files were named based upon tape number and section of tape read. For example, the first part of tape1 was initially named SEASAT_tape1_01Kto287K.

This file was run through SyncPrep, which created multiple subfiles based upon its ability to maintain a sync lock, sometimes creating over 100 such numbered files, e.g. SEASAT_tape1_01Kto287K.000 to SEASAT_tape1_280Kto668K.020

Next, the files go through the ASF Seasat decoder, gaining yet another subfile number, but the prefix “SEASAT_” is removed. Note that this stage creates a file pair of {.dat, .hdr}, e.g. tape1_01Kto287K.018_000, tape1_01Kto287K.018_001, and tape1_01Kto287K.018_002 file pairs were all created from a single decode of SEASAT_tape1_280Kto668K.018.

Thus, for a single captured file, SyncPrep could make tens to a few hundred data segments, while the ASF Seasat decoder could break each of these files into even more sub-segments.

Written by Tom Logan, July 2013

ROI Configuration File Creation Program: Many command line options are offered by create_roi_in, including the ability to process any amount of signal data or the exact amount of signal data needed to create a single 100-km2 image framed by European Space Agency (ESA) framing standards.

Seasat – Technical Challenges – 9. From Swaths to Products

At this stage in the development of the ASF Seasat Processing System (ASPS): 1,346 cleaned raw signal swaths were created; ROI was modified to handle Seasat offset video format; New state vectors were selected for use over two-line elements (TLE’s); Caltones were filtered from the range power spectra; Data window position files were created…

Seasat - Station Code - Raw Headers

Seasat – Technical Challenges – 4. Data Cleaning (Part 1)

In order to create a synthetic aperture for a radar system, one must combine many returns over time. For Seasat, a typical azimuth reference function — the number of returns combined into a single focused range line — is 5,600 samples. Each of these samples is actually a range line of radar echoes from the ground. Properly combining all of these lines requires knowing precisely when a range line was received by the satellite.

In practice, SAR systems transmit pulses of energy equally spaced in time. This time is set by the pulse repetition frequency (PRF); for Seasat, 1,647 pulses are transmitted every second. Alternatively, it can be said a pulse is transmitted every 0.607165 milliseconds, an interval commonly referred to as the pulse repetition interval or PRI. Without this constant time between pulses, the SAR algorithm would break down and data would not be focused to imagery.

Many errors existed in the Seasat raw data decoded at ASF. As a result, multiple levels of filtering were required to deal with issues present in the raw telemetry data, particularly with time values. Only after this filtering were the raw SAR data processible to images.

4.1 Median Filtering and Linear Regression

The first attempt at cleaning the data involved a simple one-pass filter of the pertinent metadata parameters. The following seven parameters were median filtered to pull out the most commonly occurring value: station code, day of year, clock drift, delay to digitization, least significant digit of year, bits per sample and PRF rate code. A linear regression was used to clean the MSEC of Day metadata field. This logic is encapsulated in the program fix_headers, which is discussed in more detail in “Final Form of Fix_Headers.”

Implementing the median filter was straightforward:

  1. Read through the header file and maintain histograms of the relevant parameters.
  2. Use the local median value to replace the decoded value and create a cleaned header file. Here “local” refers to the 400 values preceding the value to be replaced.

This scheme works well for cleaning the constant and rarely changing fields. It also seems to work quite well for the smoothly changing clock drift field. At the end of this section are the examples from “Problems with Bit Fields,” along with the corresponding median-filtered versions of the same metadata parameters. In each case, the median filter created clean usable metadata files.

Performing the linear regression on the MSEC of Day field was also straightforward. Unfortunately, the results were far from expected or optimal. The sheer volume of bit errors combined with discontinuities and timing dropouts made the line slopes and offsets highly variable inside a single swath. These issues will be examined in the next section.

Table of Filtered Parameters

Parameter Filter Applied Value
station_code Median Constant per datatake
day_of_year Median Varies
clock_drift Median Varies
delay_to_digitization Median Varies
least_significant_digit_of_year Median 8
bit_per_sample Median 5
prf_rate_code Median 4
msec_of_day Linear Regression Varies

Data Cleaning Examples

Station Code
RAW
NEW
Day of Year
RAW
NEW
Last Digit of the Year
RAW
NEW
Clock Drift
RAW
NEW
Bits Per Sample
RAW
NEW
PRF Rate Code
RAW
NEW
Delay to Digitzation
RAW
NEW

4.2 Time Cleaning

Creating a SAR image requires combining many radar returns over time. This requires that very accurate times are known for every SAR sample recorded. In the decoded Seasat data, the sheer volume of bit errors, combined with discontinuities and timing dropouts, resulted in highly variable times inside a single swath.

4.2.1 Restricting the Time Slope

One of the big problems with applying a simple linear regression to the Seasat timing information was that the local slope often changed drastically from one section of a file to another based upon bit errors, stair steps and discontinuities. Since a pulse is transmitted every 0.607165 milliseconds, it seemed that the easiest way to clean all of the MSEC times would be to simply find the fixed offset for a given swath file and then apply the known time slope to generate new time values for a cleaned header file.

This restricted slope regression was implemented when it became obvious that a simple linear regression was failing. By restricting the time slope of a file to be near the 0.607-msec/line known value, it was assumed that timing issues other than discontinuities could be removed. The discontinuities would still have to be found and fixed separately in order for the SAR focusing algorithm to work properly. Otherwise, the precise time of each line would not be known.

4.2.2 Removing Bit Errors from Times

fix_time

  • Crude time filtering, trying to fix all values that are > 513 from local linear trend:
    • Bit fixes – replace values powers of 2 from trend
    • Fill fixes – fill gaps in constant  consecutive values
    • Linear fixes – replace values with linear trend
  • Reads and writes a file of headers

Even with linear regressions and time slope limitations, times still were not being brought into reasonable ranges. Too many values were in error in some files, and a suitable linear trend could not be obtained. So, another layer of time cleaning was added as a pre-filter to the final linear regression done in fix_headers. The program fix_time was initially created just to look for bit errors, but was later expanded to incorporate each of three different filters at the gross level (i.e. only values > 513 from a local linear trend are changed):

  1. If the value is an exact power of 2 off from the local linear trend, then add that power of 2 into the value. This fix attempts to first change values that are wrong simply because of bit errors. The idea is that this is a common known error type and should be assumed as the first cause.
  2. Else if the value is between two values that are the same, make it the same as its neighbors. This fix takes advantage of the stair steps found in the timing fields. It was only added in conjunction with the fix_stairs program discussed below. The idea is to take advantage of the fact that the stair steps are easily corrected using the known satellite PRI.
  3. Else just replace the value with the local linear trend. At this point, it is better to bring the points close to the line than to leave them with very large errors.

4.2.3 Removing Stair Steps from Times

fix_stairs

  • Fix for sticky time field – Turns “stairs” into “lines” by replacing repeated time values with linear approximation for better linear trend
  • Reads and writes a file of headers

For yet another pre-filter, it was determined that the stair steps time anomaly should be removed before fitting points to a final linear trend. This task is relatively straightforward: If several time values in a row are the same, replace them with values that fit the known time slope of the satellite. The program fix_stairs was developed to deal with this problem.

Seasat - Extreme Stair Step: raw0.headers (blue) are the original unfiltered MSEC time values; newest.headers (red) shows the result of fixing the “stairs” that result from the sticky clock, presumably an artifact of the Seasat hardware, not the result of bit rot, transcription or other errors

Extreme Stair Step: raw0.headers (blue) are the original unfiltered MSEC time values; newest.headers (red) shows the result of fixing the “stairs” that result from the sticky clock, presumably an artifact of the Seasat hardware, not the result of bit rot, transcription or other errors

4.2.4 Final Form of fix_headers

fix_headers

  • Miscellaneous header cleaning using median filters:
    • Station Code
    • Least Significant Digit of Year
    • Day of Year
    • Clock Drift
    • Bits Per Sample
    • PRF Rate Code
    • Delay to Digitization
  • Time Discontinuity Location and Additional Filtering
    • Replace all values > 2 from linear trend with linear trend
    • Locates discontinuities in time, making an annotated file for later use.
      • 5 bad values with same offset from trend identity a discontinuity
      • +1 discontinuity is forward in time and can be fixed
      • if > 4000, too large – cannot be fixed
      • otherwise, slide time to fit discontinuity
      • if offset > 5 time values, save this discontinuity in a file
    • -1 discontinuity is backwards in time and cannot be fixed
  • Reads and writes a file of headers

Although it started out as the main cleaning program, fix_headers is currently the final link in the cleaning process. Metadata going through fix_headers has already been partially fixed by reducing bit errors, removing stair steps, and bringing all other values that show very large offsets into a rough linear fit. So in addition to performing median filtering on important metadata fields (see section 4.1), this program performs the final linear fit on the time data.

Initially, a regression is performed on the first window of 400 points and used to fix the first 200 time values of that window. Any values that are more than 5 msec from their predecessors are replaced by the linear fit. After this, a new fit is calculated every window/2 samples, but never within 100 samples of an actual discontinuity. The final task for fix_headers was to locate the rough locations of all real discontinuities that occur in the files. At least, it was designed to only find real discontinuities – those being the final problem hindering the placement of reasonable linear times in the swath files.

Identifying discontinuities was challenging. Much trial and error resulted in a code that worked for nearly all cases and was able to be configured to work for the other cases as well. The basic idea is that if a gap is found in the data, and if after the gap no other gaps occur within 5 values, then it is possible that a discontinuity exists. If so, the program determines the number of lines that would have to be missing to create such a gap and records the location and size in an external discontinuity file. Note that only forward discontinuities can be fixed in this manner and only discontinuities less than a certain size. In practice, the procedure attempts to locate gaps of up to 4,000 lines, discarding any datasets that show gaps larger than this.

Seasat - Extreme Discontinuity: This decoded signal data shows a 5.3-hour gap in time. This cannot be fixed.

Extreme DiscontinuityThis decoded signal data shows a 5.3-hour gap in time. This cannot be fixed.

In the end, all of the gaps in the data were identified, and, there is high confidence that any such discontinuities found are real and not just the result of bit errors or other problems. Unfortunately, this method was not able to pinpoint the start of problems, only that they existed, as shown in the following set of graphs.

Decoded Signal Data with No Y-range Clipping
Decoded Signal Data with Y-range Clipping
Linear Trend of Decoded Signal Data
Comparison of Decoded Signal Data and Linear Fit
False Discontinuity #1
False Discontinuity #2

4.2.5 Removing Discontinuities

dis_search

  • fix all time discontinuities in the raw swath files
    • for each entry in previously generated discontinuity file:
      • search backwards from discontinuity looking for points that don’t fit new trend line
      • when 20 consecutive values that are ont within 1.5 of new line are found, you have found start of discontinuity
      • For length of discontinuity
        • Repeat header line in .hdr file (fixing the time only)
        • Fill .dat file with random values
    • Reads discontinuity file, original .dat and .hdr file, and cleaned .hdr file.  Creates final cleaned .dat and .hdr file ready for processing

The first task in removing the discontinuities is locating them. The rough area of each real discontinuity can be found using the fix_headers code as described in the previous section.

Finding the exact start and length of each discontinuity still remains to be done. This search and the act of filling each gap thus discovered is performed by the program dis_search. The discontinuity search is performed backward, with 3000 lines after each discontinuity area being cached and then searched for a jump down in the time value to the previous line. These locations were marked as the actual start of the discontinuity. The gap in the raw data between the time before the discontinuity and the time after must then be filled in. Random values were used for fill, these being the best way to not impact the usefulness of the real SAR data.

Discontinuity Fills: Each plot shows range line number versus MSEC metadata value. Original decoded metadata (left) is spotty and contains an obvious time discontinuity. After the discontinuity is found and corrected, linear time is restored (right).

Original Data: This example shows a dataset that is relatively clean before any filtering is applied. It seems that this file should have been extremely easy to clean.

Seasat – Technical Challenges – 6. Slope Issues

During decoding and cleaning, it was assumed that the time slope of the files would be roughly guided by the Pulse Repetition Interval (PRI) of the satellite, i.e. a Pulse Repetition Frequency (PRF) of 1647 Hz means that 1,647 lines are being transmitted and received per second. This means that the PRI is 0.00060716 msec. Based upon this, then, each 1,000 lines of Seasat data should be equivalent to .60716 seconds.

Alternately, in milliseconds, the time slope for these files should always be 0.60716. It was discovered that this is not the case with much of the actual data, as shown in the following table and graphs:

Line Time Time Diff Calculated Slope
1 13851543
500 13851790 247 0.4950
10000 13856399 4856 0.4856
15000 13858818 7275 0.4850
20000 13861260 9717 0.4859
30000 13866139 14596 0.4865
35000 13868569 17026 0.4865
40000 13870998 19455 0.4864
45000 13873447 21904 0.4868

Seasat Times: PRF = 1647 Hz, so PRI is 0.0006071645 msec. In MSEC, the time slope should always be 0.6071645. Yet, for this datatake, the time slope is consistently only 0.486!

Original Data

Original Data: This example shows a dataset that is relatively clean before any filtering is applied. It seems that this file should have been extremely easy to clean.

Filtered Data

Filtered Data: After the dataset went through the prep_raw.sh procedure, this was the resulting time plot. It is, quite obviously, very wrong.

Comparison

Comparison of Original with Filtered: Although the times look fine in the first (unfiltered) plot, they are wrong for this satellite based upon the known PRI. The ASF cleaning software tried to fix these wrong time values using a known slope of 0.607. This introduced a discontinuity into the data and resulted in incorrect times.

These results, wherein the time slope of the raw data does not match the known PRI of the satellite, were incredibly perplexing. At first, it was assumed that these data could not be processed reliably and were simply categorized into the large time-slope error and wrong-fix error categories.

Analysis of the time slopes in the original unfiltered data only pointed out how extreme the problem really was. Well over 100 files showed slopes that were either less than 0.606 or more than 0.608, with the lowest in the 0.48 range. The highest reliable estimate showed a slope of well over 0.62.

6.1 Slope Issues Explained

Eventually, through conversation with original Seasat engineers at the Jet Propulsion Lab (JPL), it was discovered that the Seasat metadata field MSEC of Day actually contains not only the time of imaging but also the time to transmit data from the spacecraft to the ground station. This adds a variable time offset to the metadata field. Once this was understood, it was readily obvious that using the known PRF as a guide for filtering was an incorrect solution.

Thus, the entire cleaning process was revisited, with all of the codes allowing more relaxed slope values during linear regression. This worked considerably better than the previous cleaning attempt. However, it did not solve the problems entirely.

6.2 Final Results of Data Cleaning

The final set of cleaned Seasat raw swaths was assembled using three main passes through the archives with different search parameters, along with a few files that were fixed on a case-by-case basis. Basically, the final version of the code was run and the results examined for remaining time gaps. Any files with large or many time gaps were reprocessed using different parameters. In the end, 1,346 swaths were cleaned, 2 by hand, 14 from the first pass, 25 from the second pass, and the remainder in the final cleaning pass. These then are the final cleaned Seasat archives for the initial release of ASF’s Seasat products.

Date Total Datasets Dataset with Time Gaps Largest Time Gap Largest number of gaps in a file Files with >10 msec gap
1/31/13 1,399 728 54260282 1820
4/9/13 1298 263 180 34
4/9/13 1,299 122 95 33
4/9/13 1,299 55 2113 17
4/10/13 1,299 34 50
FINAL 1,346 49 34 26 28

Final Cleaned Seasat Swaths: Approximately one year after the project started, 1,346 raw Seasat swaths were cleaned and ready to be processed into SAR image products.

Written by Tom Logan, July 2013

seasat_equation_clock_drift

Seasat – Technical Challenges – 3. Decoded Data Analysis

With the Seasat archives decoded into range line format along with an auxiliary header file full of metadata, the next step is to focus the data into synthetic aperture radar (SAR) imagery. Focusing is the transformation of raw signal data into a spatial image. Unfortunately, pervasive bit errors, data drop outs, partial lines, discontinuities and many other irregularities were still present in the decoded data.

3.1 Important Metadata Fields

In order for the decoded SAR data to be focused properly, the satellite position at the time of data collection must be known. The position and velocity of the satellite are derived from the timestamp in each decoded data segment, making it imperative that the timestamps are correct in each of the decoded data frames.

Slant range is the line of sight distance from the satellite to the ground. This distance must be known for focusing reasons and for geolocation purposes. As the satellite distance from the ground changes during an orbit, the change is quantified using the delay-to-digitization field. During focusing, the slant range to the first pixel is calculated using these quantified values. More specifically, the slant range to the first pixel (srf) is determined using the delay to digitization (delay), the pulse repetition frequency (PRF) and the speed of light (c):

seasat_equation_clock_drift

It turns out that the clock drift is also an important metadata field. Clock drift records the timing error of the spacecraft clock. Although it is not known how this field was originally created, upon adding this offset to the day of year and millisecond of day more accurate geolocations were obtained in the focused Seasat products.

Finally, although not vital to the processing of images, the station code provides information about the where the data was collected and may be useful for future analysis of the removal of systematic errors.

3.2 Bit Errors

It is assumed that the vast majority of the problems in the original data are due to bit errors resulting from the long dormancy of the raw data on magnetic tapes. The plots in section 2.1 showed typical examples of the extreme problems introduced by these errors, as do the following time plots.

Time Plot: Very regular errors occur in much of the data, almost certainly some of which are due strictly to bit errors. Note that this plot should show a slope, but the many errors make it look flat instead.
Time Plot: This plot shows a typical occurrence in the Seasat raw data: Some areas of the data are completely fraught with random errors; other areas are fairly “calm” in comparison.

3.3 Systematic Errors in Timing

Beyond the bit errors, other, more systematic errors affect the Seasat timing fields. These include box patterns, stair steps and data dropouts.

To top off the problems with the time fields, discontinuities occur on a regular basis in these files. Some files have none; some have hundreds. Some discontinuities are small — only a few lines. Other discontinuities are very large — hundreds to thousands of lines. Focusing these data required identifying and dealing with discontinuities.

Box errors
Box Errors: Regular patterns of errors occurred in many datatakes. This is most likely the result of faulty hardware either on the platform or ground station. Note this plot also shows the cleaned times in green. See the next section for details on how this was accomplished.
Stair Steps
Stair Steps: This plot shows a small section of time data from lines 218500-218700 of one Seasat header file. Readily obvious are some random bit errors and the fairly typical “stair step” error. It is assumed that the stair steps are the result of a sticking clock either on the satellite or in the receiving hardware.
Data Dropouts
Forward Time Discontinuity
Backward Time Discontinuity
Double Discontinuity
Time Discontinuity

Aside: Initial Data Quality Assessment

Of the 1,470 original decoded data swaths

  • Datasets with Time Gaps (>5 msec): 728
  • Largest Time Gap: 54260282
  • Largest Number of Gaps in a Single File:1,820
  • Number of files with stair steps: 295
  • Largest percentage of valid repeated times: 63%
  • Number of files with more than one partial line: 1,170
  • Largest percentage of partial lines: 42%
  • Number of files with bad frame numbers: 1,470
  • Largest percentage of bad frame numbers: 17%