An in‑depth guide covering the essential HTML basics, SEO, accessibility, security, performance, privacy, resilience, and internationalisation practices that every website should implement to be robust, user‑friendly, and future‑proof.
A Comprehensive Checklist for Modern Web Development

Creating a web site that works reliably for humans, search engines, and autonomous agents is no longer a matter of sprinkling a few meta tags onto a page. It requires a disciplined set of conventions that touch every layer of the stack—from the first line of the HTML document to the DNS records that announce the site’s identity. The following checklist organises those conventions into logical groups, explains why each item matters, and points out the broader implications for maintainability, security, and user experience.
1. Foundations – The HTML Document Skeleton
| Requirement | Reasoning |
|---|---|
<!doctype html> |
Signals standards mode; without it browsers fall back to quirks mode, causing layout inconsistencies. |
<html lang="…"> (BCP‑47) |
Enables screen readers, translators, and search engines to determine the page language, improving accessibility and SEO. |
<meta charset="utf-8"> within first 1024 bytes |
Guarantees correct character decoding before any non‑ASCII content appears, preventing mojibake. |
<meta name="viewport" content="width=device-width, initial-scale=1"> |
Prevents mobile browsers from emulating a desktop width; essential for responsive design. |
Exactly one non‑empty <title> |
Used by browsers, search results, social previews, and AI agents to identify the page. |
<meta name="description"> (recommended) |
Provides a concise summary for search snippets; a well‑written description reduces the chance of Google rewriting it. |
<link rel="canonical" href="…"> (recommended) |
Consolidates ranking signals when multiple URLs serve the same content, avoiding duplicate‑content penalties. |
| Favicons & app icons (SVG, ICO fallback, Apple touch, maskable PWA) | Guarantees a recognizable icon across browsers, OS taskbars, and home‑screen shortcuts. |
<meta name="theme-color"> (recommended) |
Tints the browser UI to match branding; using media lets you supply separate colours for light and dark modes. |
<meta name="color-scheme"> (recommended) |
Declares supported colour schemes, preventing the white‑flash on dark‑mode devices and allowing native styling of scrollbars and form controls. |
Open Graph tags (og:title, og:description, og:image, og:url, og:type) |
Controls how the page appears when shared on social platforms, improving click‑through rates. |
Feed discovery (<link rel="alternate" type="application/rss+xml" href="/feed.xml">) |
Allows browsers, feed readers, and crawlers to locate the site’s RSS/Atom/JSON feed without guessing the URL. |
Feed Hygiene
If you publish a feed, ensure it is well‑formed: include an atom:link rel="self", give each item a stable guid, declare an update cadence using the Syndication module, and validate the feed before deployment. A malformed feed can break aggregators and hurt SEO.
2. SEO – Making the Site Visible to Search Engines and Agents
| Item | Why it matters |
|---|---|
robots.txt (RFC 9309) |
Tells crawlers which paths are off‑limits; a missing or mis‑configured file can expose private resources. |
| XML sitemap (and optional sitemap index) | Provides a definitive list of canonical URLs, enabling fast discovery by search engines. |
| Image & video sitemap extensions | Useful when media is loaded via JavaScript or resides on a CDN that crawlers cannot follow. |
| Clean URL structure (lowercase, hyphenated, shallow) | URLs act as public APIs for content; stability aids caching, linking, and AI‑agent reference. |
| Proper redirects (301/308 for permanent, 302/307 for temporary) | Prevents link‑juice loss and avoids redirect chains that degrade performance. |
| Avoid soft 404s | Returning 200 OK for a “not found” page confuses crawlers and can lead to de‑indexing. |
Explicit indexing policy (<meta name="robots"> or X‑Robots‑Tag) |
Guarantees that staging, admin, or thin pages are not inadvertently indexed. |
Logical heading hierarchy (<h1>‑<h6> nesting) |
Provides a semantic outline that both assistive technologies and AI agents use to understand page structure. |
| Internal linking strategy | Strengthens topical relevance signals and distributes PageRank throughout the site. |
| Structured data (JSON‑LD, schema.org) | Supplies machine‑readable facts; search engines and LLMs rely on it for rich results and knowledge‑graph entries. |
Breadcrumb markup (BreadcrumbList JSON‑LD) |
Improves navigation for users and gives search engines a clear path hierarchy. |
| IndexNow (optional) | Notifies participating search engines of URL changes instantly; Google does not yet support it. |
Agent‑Readiness Add‑ons
/llms.txt– a curated index of high‑value pages for large language models./llms-full.txt– an optional concatenated markdown dump for small sites.- Expose raw Markdown source for documentation pages (e.g.,
page.md). - Use
Linkresponse headers to advertise these resources directly. robots.txtentries for AI crawlers, optionally with content‑signal directives (AI‑Preferences draft).
3. Accessibility – WCAG‑Aligned Practices
| Requirement | Implementation tip |
|---|---|
| Minimum colour contrast | Use tools like the WebAIM Contrast Checker; aim for 4.5:1 for normal text, 3:1 for large text. |
alt attribute on every <img> |
Provide a concise description of purpose; leave empty (alt="") only for decorative images. |
Form labels (<label for="…">) |
Associate each control with a label; placeholders are not substitutes. |
| Keyboard‑only navigation | Ensure all interactive elements are reachable via Tab, have a visible focus indicator, and do not trap focus. |
Skip‑link (<a href="#main" class="skip-link">Skip to main content</a>) |
Place as the first focusable element to let keyboard users bypass repeated navigation. |
Semantic landmarks (<header>, <nav>, <main>, <footer>) |
Allows screen readers to jump between sections efficiently. |
| ARIA rule of thumb – don’t use ARIA unless native HTML cannot achieve the goal | |
| Descriptive link text | Avoid generic "click here"; link text should convey destination. |
| No empty links or buttons | Every interactive element must have an accessible name. |
| Form error handling | Associate error messages with the offending input via aria-describedby and ensure they are announced. |
lang attribute on inline foreign text |
Enables correct pronunciation by screen readers. |
Respect prefers-reduced-motion |
Disable non‑essential animations for users who have opted out. |
| Avoid accessibility overlays | Third‑party scripts that claim to “fix WCAG” often break native semantics and can expose sites to legal risk. |
| Captions & transcripts for media | Provide synchronized captions for video and full transcripts for audio‑only content. |
Accessible data tables (<table> with <caption>, `<th scope="col |
row">`) |
| Minimum touch target size (24 × 24 px, 44 × 44 px for enhanced) | Meets WCAG 2.2 recommendations and improves tap accuracy on mobile. |
hidden="until-found" for collapsible sections |
Allows browsers’ find‑in‑page and assistive technologies to reveal hidden content when searched. |
Prefer native interactive elements (<button>, <details>, <dialog>) over div‑based widgets |
Native elements bring built‑in keyboard support and ARIA roles. |
CSS state selectors (:has(), :user-invalid, :focus-within) |
Replace JavaScript class toggling for form validation and component state, reducing race conditions. |
4. Security – Defending Visitors and Your Reputation
| Header / Policy | What it does |
|---|---|
| HTTPS with TLS 1.2/1.3 + redirect from HTTP | Encrypts all traffic; prevents man‑in‑the‑middle attacks. |
Strict-Transport-Security (HSTS) |
Forces browsers to use HTTPS for the domain for a defined period; include includeSubDomains and consider preload. |
| Content‑Security‑Policy (CSP) | Whitelists script, style, image, and frame sources; mitigates XSS and data‑exfiltration. |
/.well-known/security.txt |
Provides a standard contact address for vulnerability disclosure, encouraging responsible reporting. |
X-Content-Type-Options: nosniff |
Stops browsers from MIME‑type sniffing, blocking a class of content‑type confusion attacks. |
Clickjacking protection (frame‑ancestors in CSP or X-Frame-Options) |
Restricts which sites may embed your pages in an iframe. |
Referrer-Policy: strict-origin-when-cross-origin |
Limits leakage of full URLs when navigating away from the site. |
| Permissions‑Policy | Disables unused powerful features (camera, microphone, geolocation, etc.) for both the page and any embedded iframes. |
| Subresource Integrity (SRI) | Adds a cryptographic hash to external scripts/styles so tampered resources are rejected. |
Secure cookie attributes (Secure, HttpOnly, SameSite, __Host-/__Secure- prefixes) |
Reduces risk of session hijacking and cross‑site request forgery. |
| DNS CAA records | Instructs CAs which certificates they may issue for the domain, preventing mis‑issuance. |
| DNSSEC (optional) | Cryptographically signs DNS responses, protecting against cache poisoning. |
5. Performance – Delivering Fast, Fluid Experiences
| Metric / Technique | Target / Guidance |
|---|---|
| Core Web Vitals (LCP ≤ 2.5 s, INP ≤ 200 ms, CLS ≤ 0.1) | Measure on real‑user data; aim for 75th‑percentile compliance. |
Image optimisation (WebP/AVIF, correct dimensions, explicit width/height) |
Reduces payload and layout shift. |
Native lazy loading (loading="lazy") |
Defer off‑screen images, iframes, and video; never apply to the LCP element. |
Resource hints (preload, prefetch, preconnect) |
Prioritise critical assets (e.g., LCP image, critical fonts) and establish early connections to third‑party origins. |
Cache‑Control (immutable, max‑age=31536000 for fingerprinted assets) |
Enables long‑term CDN caching; use short/no‑cache for HTML to ensure freshness. |
No-Vary-Search header (recommended) |
Tells caches that certain query parameters do not affect the response, reducing duplicate fetches. |
| Compression (Brotli for HTTPS, fallback gzip) | Shrinks text payloads; avoid compressing already compressed media (JPEG, WebP). |
Self‑hosted WOFF2 fonts, subset, font-display: swap |
Guarantees text remains readable while the font loads, avoiding FOIT. |
| Critical CSS inlined, rest deferred | Eliminates render‑blocking CSS that stalls first paint. |
Script loading (defer for app code, async for independent third‑party, type=module for modern bundles) |
Prevents blocking the parser; avoid bare <script> in <head>. |
| HTTP/2 (minimum) and HTTP/3 (where possible) | Multiplexed streams reduce latency; QUIC removes TCP handshake overhead. |
| Speculation Rules (prefetch/prerender) | Anticipate user navigation for instant page loads, but monitor bandwidth impact. |
| View Transitions (CSS opt‑in) | Provides smooth page‑to‑page animations without JavaScript frameworks. |
| Back/Forward Cache (BFCache) eligibility | Keep pages cache‑friendly (no unload listeners, no cache-control: no-store) so back navigation restores instantly. |
content-visibility + contain-intrinsic-size |
Skips layout/paint for off‑screen sections, dramatically improving initial paint time. |
CSS containment (contain: layout paint style) |
Limits reflow/repaint to the affected subtree, aiding complex layouts. |
Scroll‑driven animations (scroll-timeline, view-timeline) |
Run animations on the compositor thread, avoiding main‑thread jank. |
scrollbar-gutter: stable |
Reserves space for scrollbars, preventing layout shift when pages toggle overflow. |
6. Privacy – Respecting Visitor Choice and Data Minimisation
- Privacy policy – Must be publicly accessible, describing data collection, legal basis, retention, and user rights.
- Cookie consent – In the EU/UK, non‑essential cookies require an explicit opt‑in; implement a clear, dismissible banner.
- Global Privacy Control (GPC) – Honour the browser‑level signal that the user opts out of data sale/sharing.
- Third‑party script audit – Every external script can read cookies and URLs; limit usage, sandbox where possible, and lock down with CSP.
- Privacy‑first analytics – Prefer cookieless, EU‑hosted tools (e.g., Plausible, Umami) that aggregate data without profiling.
- Data minimisation – Collect only what is needed, retain for the shortest reasonable period, and redact unnecessary fields from logs.
7. Resilience – Handling Failure Gracefully
- Custom error pages (404, 500, etc.) – Return the correct status code, explain the issue in plain language, and provide navigation options.
- Maintenance mode (503 with
Retry-After) – Signals temporary unavailability; include a friendly message and expected uptime. - Service workers (optional) – Cache a fallback page for offline use, turning network failures into a usable experience.
- Web app manifest – Enables installable PWAs, giving users a native‑like entry point and offline capabilities.
- External monitoring – Use synthetic checks and real‑user monitoring from a separate host; publish a status page that remains reachable even when the primary site is down.
8. Internationalisation – Serving a Global Audience
| Aspect | Best practice |
|---|---|
| URL strategy (ccTLD, subdomain, subdirectory) | Choose one pattern and apply consistently; consider localisation of slugs for SEO benefit. |
hreflang annotations |
Declare language/region variants in the HTML head (or XML sitemap) using BCP‑47 codes; ensure reciprocal links across all alternates. |
| Localised metadata | Translate title, meta description, Open Graph fields, JSON‑LD name/description, and image alt text, not just body copy. |
| Avoid IP‑based redirects | Let users choose language; automatic redirects break sharing and crawlability. |
| Language switcher UI | List each language in its own language (Deutsch, 日本語), use proper lang attributes, and avoid flag icons. |
| RTL support | Set dir="rtl" for right‑to‑left scripts and employ CSS logical properties (margin-inline-start, etc.) to avoid hard‑coded left/right values. |
| CJK line‑break handling | Apply word-break: keep-all and appropriate line-break values to respect script‑specific wrapping rules. |
| Locale‑aware formatting | Use the Intl API for dates, numbers, currencies, and pluralisation (Intl.PluralRules). |
| Internationalised Domain Names (IDN) – optional | Allows Unicode domain labels; ensure Punycode conversion on the wire and be aware of anti‑spoofing policies. |
9. Emerging Standards and Agent‑Centric Enhancements
/.well-known/URIs – Publish well‑known resources such assecurity.txt,change-password,openid-configuration,api-catalog,webfinger,assetlinks.json, andapple-app-site-associationto aid browsers, password managers, and federated platforms.- Model Context Protocol (MCP) – A JSON‑RPC‑style interface that lets autonomous agents query site‑specific tools; useful for highly structured content.
- Agent‑to‑Agent (A2A) cards – Expose
/.well-known/agent-card.jsonto advertise callable agent capabilities. - Agent Skills discovery – Publish a well‑known list of short, scoped instructions that LLMs can load to interact more intelligently with the site.
- DNS‑based AI discovery (SVCB/HTTPS records under
_agents.example.com) – Enables agents to locate services before any HTTP request, especially when combined with DNSSEC. - NLWeb and WebMCP – Conventions for conversational endpoints (
/ask) and in‑browser AI tool registration (navigator.modelContext). - Schemamap (
/schemamap.xml) – Indexes per‑resource JSON‑LD endpoints, allowing agents to fetch structured data directly.
10. Putting It All Together
The checklist may appear exhaustive, but each item addresses a concrete risk or opportunity. Skipping the <!doctype> leads to layout bugs; omitting CSP invites XSS; ignoring lang harms screen‑reader users; neglecting a sitemap slows discovery; failing to compress text inflates bandwidth costs. By treating the list as a living document—reviewed on every major release, integrated into CI pipelines, and validated with automated tools (HTML validators, Lighthouse, axe, CSP‑reporters)—teams can ensure that every page they ship is semantic, discoverable, accessible, secure, performant, privacy‑aware, resilient, and ready for the next generation of AI agents.
For further reading, see the official specifications linked throughout the checklist, such as the HTML Living Standard, the WCAG 2.2 guidelines, and the CSP Level 3 spec.

Comments
Please log in or register to join the discussion