Skip to main content

Annotation types

Datasets may include the following annotations:
  • Temporal action segments
  • Hand keypoints and pose
  • Object interaction states
  • Task-level natural language descriptions
  • Vision–question–answer (VQA) pairs
Annotations are provided in JSON format with frame-level references.

Design principles

  • Minimal ambiguity
  • Consistent ontology
  • Compatible with supervised and self-supervised learning
  • Easy filtering and indexing