(Floe) Accessibility Metadata Automatic Generation by Content Type

Based on content type, it is possible to facilitate the authoring of metadata by automatically generating values based on what is known about the content's context and content type.

Table: Metadata by Content Type

This table attempts to organize the Access for All metadata according to content type - the idea is that certain types of content only require certain metadata.

Example 1: metadata for a video file will likely have Auditory and Visual metadata specified.

Example 2: metadata for an audio file will not require visual metadata - thus those metadata fields can be hidden from the user.

Thus it is possible to:

  1. Only show metadata fields that are required by certain media type and hide unnecessary metadata fields.
  2. Provide sensible defaults based on content type.

AC says: I've added some "questions" to the table as follows:
[x] = I don't think this should be available
[?] = Why is this available? I either don't understand why, or I'm not sure about it


Content TypeAuditoryTactileTextualVisual


on Visual
on Visual
on Visual
on Visual
on Visual



Flashing Hazard

No Flashing HazardMotion
Simulation Hazard 
No Motion
Simulation Hazard
Sound HazardNo Sound Hazard
Music / Dialog(tick)         (warning)(warning)(warning)   (tick)  
Video(tick) [x: use text-on-visual instead](tick)       (warning)(warning)      
Captions  (tick)(tick)?     [x]      (tick) (tick)
Image   (tick)      (warning)       (tick)
Image: diagram   (tick)  (tick)   (warning)[x: here, "captions" have a very specific meaning: "text for the audio portion"]    (tick)[?: what about animated gifs?] (tick)
Image: text  (warning)(tick)     (tick)(warning)[x: see above]    (tick) [?: see above]
Image: math  (warning)(tick)   (tick)  (warning)[x: see above]    (tick) [?: see above] (tick)
Image: chart  (warning)(tick) (tick)    (warning)[x: see above]    (tick) [?: see above] (tick)
ChemML  (tick)(tick)?            (tick) (tick)
MathML  (tick)(tick)?            (tick) (tick)
Sign Language [?][?](tick)[x]          [x]  (tick)
Braille[?](tick)[x][?][?]           (tick) (tick)
Transcript (text)[?][?](tick)[x]            (tick) (tick)
Long Description (text)  (tick)(tick)?            (tick) (tick)
Audio Description(tick)           (warning)    (tick) [x] 

(tick) Automatically generate this default value. (i.e. if music, automatically specify hasAuditory)   

grey = option is not presented to end user (i.e. don't show it if it isn't needed).

white = available option to end user.

(warning) = available option to end user which we recommend they specify (i.e. if the content is an image, we recommend alt text).

(tick)? = Unsure if this should be an automatically generated default value.



Modalities and their alternatives

  • Captions
  • Transcript
  • Tactile
  • Visual equivalent (image or video)
  • If video: Transcript, captions
  • If image: Alt text, Audio description
  • Tactile equivalent
  • Long description
  • Captions
  • Transcript
  • Long description
  • Audio description
  • Visual equivalent (image or video)
  • Audio description
  • Visual equivalent (image or video)
  • Tactile equivalent


Question for Inclusion: "Is a particular modality critical to understanding the content?"

  • If YES, then consider an alternative
  • If NO, then everything is fine - alternatives are not needed.

This brings up a notion of "primary" and "secondary" modalities

  • are we primarily concerned with "primary" modalities, and not "secondary"?
  • primary modalities can be determined by asking the author which modalities are important to consuming the content
  • Conclusion: There is no need to distinguish between "primary" and "secondary" modalities.
    • Reason 1: Classifying what is primary or secondary may be subjective. Thus author's opinion may be different from the end user.
    • Reason 2: In the process of specifying alternatives to content, the author will implicitly declare which modalities are important.

What about the case where there is a combination of primary modalities important to consuming the content? How is this handled?

  • Example: content requires ability to see AND hear
  • Example: content requires ability to see OR hear
  • Does being an AND or OR really change the alternative modalities?

Thinking aloud - a video requires a user to be able to see AND hear

  • If the user prefers visuals, but not audio - the alternative would be captions, or transcripts
  • If the user prefers auditory, but not visuals - the alternative would be captions, transcripts, or tactile
  • If the user prefers neither visuals or audio - the alternative would be tactile

Now for the "OR" scenario - a video requires that a user be able to see OR hear

  • If the user prefers visuals, but not audio - then the content is okay.
  • If the user prefers auditory, but not visuals - then the content is okay.
  • If the user prefers neither visuals or audio - the alternative would be tactile