Jake Teton‑Landis

Perfection enthusiast and byte craftsman splitting time between Miami, FL and New York, NY.

Interested in human productivity, big levers, and design.

GithubMastodon
TwitterLinkedIn

macOS Text Substitution

In which we look for the elusive data behind an everyday feature.

May 2022

Apple’s system text views automatically replace input characters like " with “smart” versions like . In any native app, you can toggles these on and off from Edit > Substitutions:

You can configure custom substitutions, and the way to read those and back them up is already well-documented online.

I’ve always wanted to know the full list of the system’s built-in replacements.

I started by looking at the archived documentation for the Cocoa text system, but none of these pages seemed to have clues:

I decided to snoop around in /System/Library/Frameworks to see if I could find any unicode characters from known substitutions like ... or those smart quotes.

cd /System/Library/Frameworks
rg --binary '…'

This found a ton of stuff – but mostly matched .nib/.xib files that define menu elements! So many menus items in macOS end with .

One hit that seemed promising was /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/DictionaryServices.framework/Versions/A/Resources/SubstituteCharacters.plist. But, this dictionary is in the wrong order! It maps from unicode character back to ASCII:

Maybe the substitution process loads this dictionary and then inverts it? But probably this is used internally when looking up dictionary definitions of words. Oh well.

The search continues.

Eventually, I gave up on /System/Library/Frameworks, and decided to search the whole of /System/Library. I also improved my regular expression search to exclude common UI text patterns:

cd /System/Library
rg --binary '[^\w  @%]…[^\w]' -g '!*.{nib,xib,js}'

Here’s a few that look promising:

Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3:
binary file matches (found "\0" byte around offset 15)

PrivateFrameworks/TextInput.framework/Versions/A/Resources/Keyboard-default.plist
73:             <string>… .</string>
75:             <string>… .</string>

PrivateFrameworks/CoreSuggestionsInternals.framework/Versions/A/Resources/Assets.suggestionsassets/AssetData/CompiledPatterns.pldat:
binary file matches (found "\0" byte around offset 642)
  • PrivateFrameworks/TextInput.framework/Versions/A/Resources/Keyboard-default.plist

    This looks like UI strings for a keyboard view somewhere. Doesn’t seem to have the kind of mapping we’re looking for.

  • PrivateFrameworks/CoreSuggestionsInternals.framework/Versions/A/Resources/Assets.suggestionsassets/AssetData/CompiledPatterns.pldat

    file tells us this is a SVR2 pure executable (Amdahl-UTS) not stripped - version 92485465... sure.

    The file is binary, but contains frequent runs of regular expression syntax

    These regexes look like they’re trying to extract things like phone numbers from arbitrary text, possibly for macOS’s “Data Detectors” substitution. No luck here either.

  • Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3

    Well, this isn’t what we’re looking for, but I thought it was interesting! It looks like this SQLite database could help power the “Emoji and Symbols” menu (cmd-ctrl-space)

    It is hard to tell what’s going on with this database from the command line. It has only two columns in the schema... but rows are longer? Or do they just contain literal | characters?

    Yes, it’s just | characters. Weird.

Dang – still nothing.