Eye-Tracking TSV Normalization: Dots vs Empty Strings
Decision: Use Empty Strings (Not NaNs or Dots)
Why Not NaNs?
NaN is a Python-specific numeric concept (
float('nan'))In TSV/CSV files (plain text), NaN cannot be represented directly
When data is written to text, it becomes the string
"nan"(4 characters)This violates BIDS/TSV standards for missing data
Why Not Dots?
While SR Research EyeLink exports use
.to indicate missing valuesBIDS standard prefers empty cells (no value between tabs) for missing data
Dots could be ambiguous - are they missing values or the string “.”?
Empty strings are more portable and universally understood
What We Do Instead: Empty Strings
BEFORE (EyeLink format):
AVERAGE_ACCELERATION_X AVERAGE_GAZE_X
. 963.20
-497.78 965.30
AFTER (PRISM/BIDS format):
AVERAGE_ACCELERATION_X x
963.20
-497.78 965.30
The column between the tabs is completely empty - no dot, no “nan”, no “NA”.
BIDS/TSV Standard for Missing Values
According to BIDS specification on TSV files:
Missing values: Missing values SHOULD be left empty and not represented as a string.
Valid representations for missing values in TSV:
✅ Empty cell (nothing between tabs)
✅ Column not present (column dropped entirely)
Invalid representations:
❌
"."(dot)❌
"NA"(string)❌
"NaN"(string representation of NaN)❌
"null"(JSON-style)
How to Handle Missing Values When Reading
When your analysis code reads these TSV files:
Python (pandas)
import pandas as pd
df = pd.read_csv('sub-17_ses-1_task-gaze_eyetrack.tsv', sep='\t')
# Empty strings are automatically treated as missing:
# To explicitly convert to NaN:
df = df.replace('', pd.NA)
# To work with them:
print(df['x'].isna()) # Shows True for empty cells
print(df['x'].dropna()) # Removes rows with missing x
R
df <- read.delim('sub-17_ses-1_task-gaze_eyetrack.tsv')
# Empty strings are automatically NA in R
df$x[is.na(df$x)] # Find missing values
NumPy/SciPy
import numpy as np
data = np.genfromtxt('sub-17_ses-1_task-gaze_eyetrack.tsv',
delimiter='\t',
dtype=None,
encoding='utf-8',
missing_values='', # Treat empty as missing
filling_values=np.nan) # Replace with NaN
Summary of Changes
The updated _process_eyetracking_tsv() function now:
Drops
RECORDING_SESSION_LABELcolumnRedundant: filename already encodes
sub-17_ses-1Saves ~30 bytes per row × millions of rows
Converts dots to empty strings
Complies with BIDS/TSV standard
Makes data more portable
Keeps analysis tools happy
Renames columns to BIDS-style
AVERAGE_GAZE_X→xAVERAGE_GAZE_Y→yAVERAGE_PUPIL_SIZE→pupil_sizeTIMESTAMP→timestamp
Preserves all other columns
Kinematic data (accelerations, velocities)
Blink/saccade flags
Trial indices
Metadata (SAMPLE_MESSAGE)
Example: Before and After
BEFORE (Raw EyeLink Export)
RECORDING_SESSION_LABEL TRIAL_INDEX AVERAGE_ACCELERATION_X AVERAGE_GAZE_X TIMESTAMP
s17_nr_1 1 . 963.20 5529512.00
s17_nr_1 1 -497.78 965.30 5529521.00
AFTER (PRISM Normalized)
TRIAL_INDEX AVERAGE_ACCELERATION_X x timestamp
1 963.20 5529512.00
1 -497.78 965.30 5529521.00
Configuration in JSON Sidecar
The JSON sidecar documents this normalization:
{
"Technical": {
"FileFormat": "tsv",
"ProcessingLevel": "parsed",
"NormalizationApplied": {
"DroppedColumns": ["RECORDING_SESSION_LABEL"],
"RenamedColumns": {
"AVERAGE_GAZE_X": "x",
"AVERAGE_GAZE_Y": "y",
"AVERAGE_PUPIL_SIZE": "pupil_size",
"TIMESTAMP": "timestamp"
},
"MissingValueNormalization": {
"From": "dots (.)",
"To": "empty strings",
"Standard": "BIDS-compatible"
}
}
}
}