When Fonts Obfuscate: The Accessibility Trade‑off of Anti‑Scraping Font Scrambling

Web‑scraping has become a double‑edged sword: while it powers data‑driven services, it also fuels content theft and privacy violations. Some sites have turned to an unconventional defense—scrambling the glyph order in their web fonts—to make automated extraction harder. The result? A site that looks normal to humans but becomes a nightmare for screen readers, search engines, and even legitimate crawlers.

The Scramble Script in Action

Below is the core of the project’s Python script. It loads a TrueType font, selects the ASCII glyphs, shuffles them, rewrites the cmap table, and finally emits a mapping from original to scrambled characters. The mapping is then applied to the HTML body, preserving the visual appearance while breaking the textual relationship.

# requires-python = ">=3.12"
# dependencies = ["bs4", "fonttools"]

import random
import string
from typing import Dict

from bs4 import BeautifulSoup
from fontTools.ttLib import TTFont


def scramble_font(seed: int = 1234) -> Dict[str, str]:
    random.seed(seed)
    font = TTFont("src/fonts/Mulish-Regular.ttf")

    # Pick a Unicode cmap (Windows BMP preferred)
    cmap_table = None
    for table in font["cmap"].tables:
        if table.isUnicode() and table.platformID == 3:
            break
    cmap_table = table

    cmap = cmap_table.cmap

    # Filter codepoints for a-z and A-Z
    codepoints = [cp for cp in cmap.keys() if chr(cp) in string.ascii_letters]
    glyphs = [cmap[cp] for cp in codepoints]

    shuffled_glyphs = glyphs[:]
    random.shuffle(shuffled_glyphs)

    # Create new mapping
    scrambled_cmap = dict(zip(codepoints, shuffled_glyphs, strict=True))
    cmap_table.cmap = scrambled_cmap

    translation_mapping = {}
    for original_cp, original_glyph in zip(codepoints, glyphs, strict=True):
        for new_cp, new_glyph in scrambled_cmap.items():
            if new_glyph == original_glyph:
                translation_mapping[chr(original_cp)] = chr(new_cp)
                break

    font.save("src/fonts/Mulish-Regular-scrambled.ttf")

    return translation_mapping


def scramble_html(input: str, translation_mapping: Dict[str, str]) -> str:
    def apply_cipher(text):
        repl = "".join(translation_mapping.get(c, c) for c in text)
        return repl

    # Read HTML file
    soup = BeautifulSoup(input, "html.parser")

    # Find all main elements
    main_elements = soup.find_all("main")
    skip_tags = {"code", "h1", "h2"}

    # Apply cipher only to text within main
    for main in main_elements:
        for elem in main.find_all(string=True):
            if elem.parent.name not in skip_tags:
                elem.replace_with(apply_cipher(elem))

    return str(soup)

The script’s output font is visually identical to the original, but the underlying Unicode codepoints no longer match the displayed glyphs. Scrapers that rely on the DOM text will see gibberish, while browsers render the correct characters because the font file has been remapped.

Accessibility Fallout

Screen readers and other assistive technologies depend on the mapping between visible glyphs and their semantic Unicode values. When that mapping is broken, the spoken output becomes meaningless:

A screen reader will read a string of nonsense characters, rendering the content inaccessible to visually impaired users.

This violates the Web Content Accessibility Guidelines (WCAG) 2.1 AA, which require that content be perceivable by all users. Moreover, search engines crawl the DOM text, not the rendered glyphs. A scrambled font can therefore hurt SEO, as crawlers will index garbled content.

Legal and Ethical Considerations

Beyond the technical drawbacks, there are legal implications. In jurisdictions with strong accessibility laws—such as the Americans with Disabilities Act (ADA) in the U.S.—websites must provide equivalent access to all users. A font‑scramble defense could be challenged as a barrier to access.

Additionally, anti‑scraping measures that rely on obfuscation can be circumvented by more sophisticated bots, leading to a cat‑and‑mouse game that consumes server resources and may still fail to protect proprietary content.

Alternatives Worth Considering

  1. Rate Limiting & IP Blocking – Throttle requests from suspicious IP ranges.
  2. CAPTCHAs & Honeypots – Verify human interaction before rendering content.
  3. Content Delivery Network (CDN) Rules – Block or throttle bots at the edge.
  4. API‑First Approach – Serve data through authenticated endpoints rather than public scraping.
  5. Legal Deterrence – Publish a clear terms‑of‑service and enforce it through DMCA takedown notices.

These strategies preserve accessibility and SEO while still deterring casual scrapers.

Bottom Line

Scrambling a font is an elegant hack that can momentarily thwart automated scraping, but it comes at a steep price: loss of accessibility, potential legal exposure, and degraded search engine visibility. Developers should weigh the trade‑offs carefully and consider more sustainable, user‑friendly defenses. The goal should be to protect content without compromising the inclusive nature of the web.

Source: https://tilschuenemann.de/projects/sacrificing-accessibility-for-not-getting-web-scraped