Eye-Tracking TSV Normalization: Dots vs Empty Strings

Decision: Use Empty Strings (Not NaNs or Dots)

Why Not NaNs?

  • NaN is a Python-specific numeric concept (float('nan'))

  • In TSV/CSV files (plain text), NaN cannot be represented directly

  • When data is written to text, it becomes the string "nan" (4 characters)

  • This violates BIDS/TSV standards for missing data

Why Not Dots?

  • While SR Research EyeLink exports use . to indicate missing values

  • BIDS standard prefers empty cells (no value between tabs) for missing data

  • Dots could be ambiguous - are they missing values or the string “.”?

  • Empty strings are more portable and universally understood

What We Do Instead: Empty Strings

BEFORE (EyeLink format):
AVERAGE_ACCELERATION_X  AVERAGE_GAZE_X
.                       963.20
-497.78                 965.30

AFTER (PRISM/BIDS format):
AVERAGE_ACCELERATION_X  x
                        963.20
-497.78                 965.30

The column between the tabs is completely empty - no dot, no “nan”, no “NA”.


BIDS/TSV Standard for Missing Values

According to BIDS specification on TSV files:

Missing values: Missing values SHOULD be left empty and not represented as a string.

Valid representations for missing values in TSV:

  • ✅ Empty cell (nothing between tabs)

  • ✅ Column not present (column dropped entirely)

Invalid representations:

  • "." (dot)

  • "NA" (string)

  • "NaN" (string representation of NaN)

  • "null" (JSON-style)


How to Handle Missing Values When Reading

When your analysis code reads these TSV files:

Python (pandas)

import pandas as pd

df = pd.read_csv('sub-17_ses-1_task-gaze_eyetrack.tsv', sep='\t')

# Empty strings are automatically treated as missing:
# To explicitly convert to NaN:
df = df.replace('', pd.NA)

# To work with them:
print(df['x'].isna())  # Shows True for empty cells
print(df['x'].dropna())  # Removes rows with missing x

R

df <- read.delim('sub-17_ses-1_task-gaze_eyetrack.tsv')

# Empty strings are automatically NA in R
df$x[is.na(df$x)]  # Find missing values

NumPy/SciPy

import numpy as np

data = np.genfromtxt('sub-17_ses-1_task-gaze_eyetrack.tsv', 
                     delimiter='\t', 
                     dtype=None, 
                     encoding='utf-8',
                     missing_values='',  # Treat empty as missing
                     filling_values=np.nan)  # Replace with NaN

Summary of Changes

The updated _process_eyetracking_tsv() function now:

  1. Drops RECORDING_SESSION_LABEL column

    • Redundant: filename already encodes sub-17_ses-1

    • Saves ~30 bytes per row × millions of rows

  2. Converts dots to empty strings

    • Complies with BIDS/TSV standard

    • Makes data more portable

    • Keeps analysis tools happy

  3. Renames columns to BIDS-style

    • AVERAGE_GAZE_Xx

    • AVERAGE_GAZE_Yy

    • AVERAGE_PUPIL_SIZEpupil_size

    • TIMESTAMPtimestamp

  4. Preserves all other columns

    • Kinematic data (accelerations, velocities)

    • Blink/saccade flags

    • Trial indices

    • Metadata (SAMPLE_MESSAGE)


Example: Before and After

AFTER (PRISM Normalized)

TRIAL_INDEX	AVERAGE_ACCELERATION_X	x	timestamp
1		963.20	5529512.00
1	-497.78	965.30	5529521.00

Configuration in JSON Sidecar

The JSON sidecar documents this normalization:

{
  "Technical": {
    "FileFormat": "tsv",
    "ProcessingLevel": "parsed",
    "NormalizationApplied": {
      "DroppedColumns": ["RECORDING_SESSION_LABEL"],
      "RenamedColumns": {
        "AVERAGE_GAZE_X": "x",
        "AVERAGE_GAZE_Y": "y",
        "AVERAGE_PUPIL_SIZE": "pupil_size",
        "TIMESTAMP": "timestamp"
      },
      "MissingValueNormalization": {
        "From": "dots (.)",
        "To": "empty strings",
        "Standard": "BIDS-compatible"
      }
    }
  }
}

References