Client Area

Preventing XSS (Cross-Site Scripting) in PHP and Node.js

ByDomain India Security Team
9 min read22 Apr 20262 views
# Preventing XSS (Cross-Site Scripting) in PHP and Node.js Cross-site scripting (XSS) is OWASP's third-most-common web application flaw. The attack surface is wider than most developers realise — every user input that eventually reaches a browser is a potential vector. This guide covers the three XSS types, the correct output-encoding defences in PHP and Node.js, and how Content Security Policy acts as a second line of defence. ## What XSS is and why it matters XSS happens when user-supplied content ends up in a page's HTML, CSS, or JavaScript without being properly encoded for that context — letting the attacker inject code that the victim's browser executes. A single working XSS on a site with session cookies can: - Steal session cookies → full account takeover - Read any data the user can read (private messages, admin panels) - Perform any action as the user (transfer funds, change email, delete posts) - Pivot to CSRF by reading CSRF tokens from the page - Serve malware to visitors XSS and SQL injection share a root cause: mixing code with data. Different layer, different defence. ## The three types ### Stored XSS (the worst) Attacker's payload is saved to the database and served to every visitor who views the page. Think: a comment field, forum post, user profile, product review. Impact: one submission poisons every viewer of the page forever, until removed. ### Reflected XSS Payload travels in a URL query parameter or form field and is reflected back in the response unescaped. Requires tricking the victim into clicking a link. Example vulnerability: ```php echo "Search results for: " . $_GET['q']; ``` Attack URL: `/search?q=` ### DOM-based XSS Pure client-side — JavaScript reads untrusted input and writes it to the DOM: ```javascript document.getElementById('greeting').innerHTML = 'Hello ' + location.hash.substring(1); ``` Attack URL: `https://yoursite.com/page#` The server never sees the payload — entirely client-side. ## Defence #1: output encoding (the primary defence) The fix for XSS is: **encode output for the context it goes into.** Different contexts need different encoders. | Context | Encoder | |---|---| | HTML body text | HTML-escape (`<` → `<` etc.) | | HTML attribute value | HTML-escape + quote-aware | | Inside a URL (`href`, `src`) | URL-encode + validate scheme | | Inside JavaScript | JSON-encode | | Inside CSS | CSS-escape (rare; use only for user-controlled colour / size values) | ### PHP — `htmlspecialchars` properly ```php // Bad — attacker can break out of attributes, inject HTML echo '' . $userName . ''; // Correct — encode every dynamic value for its context echo '' . htmlspecialchars($userName, ENT_QUOTES | ENT_HTML5, 'UTF-8') . ''; ``` Flags matter: - `ENT_QUOTES` — encodes both `"` and `'` (default only does `"`) - `ENT_HTML5` — use HTML5 named entities - `'UTF-8'` — specify the character set explicitly (avoids encoder errors on non-UTF-8 input) Template engines typically do this automatically: - **Twig:** `{{ var }}` auto-escapes; `{{ var|raw }}` does NOT - **Blade (Laravel):** `{{ $var }}` auto-escapes; `{!! $var !!}` does NOT — audit every `{!! !!}` - **Smarty:** `{$var}` auto-escapes if `$autoescape = true` Rule: if your template syntax requires extra effort to DISABLE escaping, you're safe by default. ### Node.js — by template engine **React / Next.js:** JSX auto-encodes any value: ```jsx
{userName}
// safe {userName} // safe ``` The one dangerous API is `dangerouslySetInnerHTML` (correctly named): ```jsx
// DANGEROUS ``` If you need to render user-provided HTML (rare), sanitise first with DOMPurify: ```jsx import DOMPurify from 'dompurify';
``` **Vue:** `v-text` safe, `{{ expr }}` safe, `v-html` dangerous (same principle as React). **Express + template engines:** - EJS: `<%= var %>` escapes; `<%- var %>` does NOT - Handlebars: `{{var}}` escapes; `{{{var}}}` does NOT - Pug: `#{var}` escapes; `!{var}` does NOT Every popular template engine follows a "safe by default, explicit opt-out" pattern. Use the safe form. ### Plain Node.js string building If you must build HTML yourself: ```javascript function escapeHtml(unsafe) { return unsafe .replace(/&/g, '&') .replace(//g, '>') .replace(/"/g, '"') .replace(/'/g, '''); } const safe = `
${escapeHtml(userInput)}
`; ``` But really — use a template engine. String-building HTML invites bugs. ## Defence #2: Content Security Policy Even with proper encoding, one missed escape is a vulnerability. CSP is the second line — a browser-enforced policy that blocks inline scripts and external-origin JavaScript that you didn't explicitly allow. Minimal starter CSP (place in HTTP header or `` tag): ``` Content-Security-Policy: default-src 'self'; script-src 'self' 'nonce-RANDOM'; object-src 'none'; base-uri 'self' ``` - `default-src 'self'` — only same-origin resources by default - `script-src 'self' 'nonce-RANDOM'` — scripts must be same-origin or carry a nonce you generated - `object-src 'none'` — block ``, `` (legacy attack vectors) - `base-uri 'self'` — prevent `` tag hijack Deploy CSP in report-only mode first — `Content-Security-Policy-Report-Only: ...` — monitor `report-uri` for violations, fix what breaks, then flip to enforcing. See our [Security Headers Explained](https://domainindia.com/support/kb/security-headers-explained-csp-hsts) for full CSP setup with nonces. ## Defence #3: input validation (secondary layer) Not a substitute for output encoding, but a useful reject-early pattern. Validate inputs against strict allowlists: ```php // Laravel validation $request->validate([ 'username' => 'required|string|alpha_dash|max:50', 'email' => 'required|email|max:255', 'age' => 'required|integer|between:18,120', ]); ``` ```javascript // Node.js with zod import { z } from 'zod'; const schema = z.object({ username: z.string().regex(/^[a-zA-Z0-9_]{3,50}$/), email: z.string().email().max(255), }); const data = schema.parse(req.body); // throws on invalid ``` For rich-text user content (blog post bodies, comments with formatting), use an allowlist sanitiser: - PHP: `HTMLPurifier` - JavaScript: `DOMPurify` or `sanitize-html` Both take user HTML and strip anything not in an allowlist (like `

`, ``, `` with safe attributes), returning safe HTML. ## Framework-specific notes ### WordPress - `esc_html($value)` — inside HTML body - `esc_attr($value)` — inside HTML attributes - `esc_url($value)` — inside `href`, `src` - `esc_js($value)` — inside inline JavaScript (rare) - `wp_kses($value, $allowed_tags)` — allow a limited set of HTML tags WordPress core is well-audited — problems come from third-party plugins mixing echoed values with unsanitised inputs. Audit plugin code before deploying to production sites. ### Laravel - `{{ $var }}` — Blade auto-escapes - `{!! $var !!}` — explicit opt-out, use only with trusted content - `e($value)` — the `htmlspecialchars` shortcut, already applied by Blade `{{ }}` - `Str::of($value)->sanitize()` — if you need raw-HTML with allowlist ### Express + Pug/EJS Template engines default to escaping. Audit any template that uses the raw-output syntax (`<%-`, `!{`, `{{{`). ## Testing for XSS ### Safe payloads to try On a staging copy of your app, enter these into every text field: ``` "> javascript:alert(1) "onmouseover=alert(1)" ``` If any of these produce an alert, you have XSS. If they render as text, encoding is working. ### Automated tools - **OWASP ZAP** — active scan includes XSS payload injection - **Burp Suite Scanner** — Community edition has basic XSS scanning - **XSStrike** — command-line XSS tester; use on staging / your own sites only ### Static analysis - PHP: `phpcs` with the security rules, `psalm --taint-analysis` - Node.js: `eslint-plugin-security`, `semgrep` with XSS rulesets ## Common pitfalls 1. **Encoding at the wrong layer.** Database-level escaping is for SQL injection. HTML-level escaping is for XSS. They are different — encode at output, not at storage. 2. **Double-encoding.** If your template engine auto-escapes and you also manually escape, values get double-encoded — `&lt;` shows as `<` in the page instead of `<`. 3. **Missing `ENT_QUOTES`.** Without it, `htmlspecialchars` doesn't encode single quotes. Attacker breaks out of a single-quoted attribute. 4. **Using `strip_tags` for XSS defence.** It removes tags but doesn't escape — `strip_tags("foo")` returns `"foo"`, but `strip_tags("foo & bar")` returns `"foo & bar"` which still lets `&` cause parser issues. 5. **URL validation without scheme check.** `javascript:alert(1)` passes `filter_var($url, FILTER_VALIDATE_URL)`. Always check `parse_url($url, PHP_URL_SCHEME)` is `http` / `https`. 6. **Assuming JSON is safe in HTML contexts.** `` — if `$json` contains ``, the browser closes the script tag early. Use `json_encode($data, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT)`. 7. **Blocking ``. Never `eval()` user input — use `JSON.parse()` for data, proper parsers for expressions. **What if my framework auto-escapes but I still need to output raw HTML (e.g., blog post body)?** Two safe paths: (1) pre-sanitise with DOMPurify / HTMLPurifier at save time; store only sanitised HTML. Or (2) use Markdown for user input — `marked.parse()` with sanitise option on. Never `unescape HTML → display user input as HTML`. --- Need help auditing your own code for XSS? [email protected] — our team can review specific templates or template-rendering patterns as a standard support request.

Was this article helpful?

Your feedback helps us improve our documentation

Still need help? Submit a support ticket