session.fetch() doesn't inherit cookies injected via ctx.add_cookies(),
so _check_login_state always fails even with valid cookies. Trust the
captured cookie directly since it comes from a successful login.
Cookies are captured from the browser context after Playwright form
submission succeeds, so they are guaranteed valid.
Direct goto('/login') gets blocked by CF in the container. Instead:
1. goto('/') to establish CF clearance
2. window.location.href = '/login' for SPA-internal navigation
3. Wait for #login-account-name form to appear
4. Fill and submit
Also increase wait_for_timeout to 8s for login completion.
The fetch('/login') approach returns 200 but doesn't actually log in
(Discourse SPA requires real form submission). Use Playwright's
page.fill() + page.click() with id selectors:
- #login-account-name (not input[name=username] which is hidden)
- #login-account-password (not input[name=password])
- #login-button (not button[type=submit])
Confirmed working: _forum_session cookie is set after form submission.
Forum session cookie is HttpOnly (invisible to JS document.cookie).
ctx.cookies() works because patchright reads cookies from the browser
engine directly, bypassing the JS visibility restriction.
Confirmed: login POST to /login returns 200 and sets _forum_session
cookie correctly with real credentials.
/session.json returns 403 'invalid_access' on linux.do (likely
restricted by the forum). /login POST works correctly as the standard
Discourse login endpoint.
Changes:
- POST to /login with username/password form data (no CSRF needed)
- /login returns 200 + HTML page on success (not JSON)
- Extract _forum_session cookie from document.cookie after login
- Simplify error handling (no more list/dict type confusion)
Two issues found from container logs:
1. Discourse returns 403 'BAD CSRF' when POST /session.json lacks a
valid CSRF token. Fix: fetch CSRF token from /session/csrf.json
(or meta tag), then send it as X-CSRF-Token header + authenticity_token
body field + X-Requested-With header.
2. The error data can be a list (e.g. ['BAD CSRF']) not a dict.
Fix: isinstance check before calling .get() on data.
Also remove debug logging now that the root cause is identified.
The Playwright form submission approach failed in the AstrBot container
because the /login SPA didn't render the form within the timeout. Replace
with a direct fetch POST to /session.json from within the page context:
1. Navigate to linux.do/ (any page) to establish CF clearance + session cookie
2. Use page.evaluate(fetch('/session.json', {method: 'POST', ...})) to login
3. Discourse validates CSRF via the _forum_session cookie (no XHR headers needed)
4. Capture the updated _forum_session cookie from browser context
This approach:
- Doesn't depend on SPA rendering or form selectors
- Works in headless containers
- Is faster (no waiting for form rendering)
When linuxdo_username and linuxdo_password are configured, the plugin
automatically logs in via Playwright and captures the _forum_session
cookie from the browser context. No manual cookie copy needed.
Auth priority:
1. Manual linuxdo_session_cookie (if configured)
2. Auto-login via username+password (if configured)
3. Anonymous access (fallback)
Also update README and CHANGELOG with dual-mode auth documentation.
linux.do doesn't provide user-level API keys, and the Playwright form
login approach was fragile (depends on SPA rendering, CSRF handling,
selector stability). Replace with a simpler cookie injection approach:
- User copies _forum_session cookie from browser DevTools
- Plugin injects it into StealthySession's browser context
- Validates via /session/current_user.json
- Falls back to anonymous if cookie is invalid/expired
Config changes:
- Add linuxdo_session_cookie (string, optional)
- Remove linuxdo_username and linuxdo_password (no longer needed)
When linuxdo_username and linuxdo_password are configured, the plugin
automatically logs in via Playwright's form POST to /login on Discourse.
The session cookies persist in the StealthySession context, enabling
authenticated API calls to access restricted categories, private
messages, and other non-public content.
The login flow:
1. _ensure_authenticated() is called once per StealthySession lifetime
2. _check_login_state() detects if already logged in (avoids re-login)
3. If not logged in, _do_login() navigates to /login, fills the form
(username/password), and clicks submit (no CSRF token needed for
HTML form POST — Discourse only enforces CSRF on XHR)
4. If login fails (wrong credentials, 2FA, rate limit), the plugin
gracefully degrades to anonymous access and proceeds normally
Config additions:
- linuxdo_username: string, optional
- linuxdo_password: string, optional (sensitive, plaintext in config)
Version: 1.1.3 → 1.2.0 (new feature)
The previous adaptive fix only kicked in when screenshot_full_page=True.
When a user had full_page=False, the path fell through to page.screenshot()
which captures the fixed viewport (820x1200 CSS, 1640x2400 actual) — leaving
huge empty space for short posts and truncating long posts mid-content.
Drop the full_page gate. .card element screenshot is now the default path
regardless of config. The flag is only used as a fallback when the element
screenshot fails (e.g. .card not present in HTML).
This makes short and long posts render correctly for every config:
- a3a7a0d (short, 47 views): 1640x2400 with empty gray -> 1520x698
- 3e6e0454 (long, 555 views + images): 1640x2400 cut off -> 1520x2596 full
Previously full_page=True captured document.body.scrollHeight which on
short posts left ~2000px of empty viewport padding at the bottom
(screenshot was 1640×2400 for a 400px-tall card). For long posts it
also included the body's max-height padding.
Switch to element-level screenshot via page.locator('.card').screenshot()
when full_page is enabled. The card's bounding box drives the output
size, so short posts produce compact previews and long posts still
capture the entire card with no empty space.
Element screenshot failure (rare) falls back to page.screenshot() with
the original full_page logic.
- _normalize_cooked_urls: when stripping lightbox-wrapper, only keep <img>
and drop <div class="meta"> (filename, dimensions, download button).
Also strip <span class="filename">, codeblock-buttons, pre-actions,
and download anchors.
- _render_html_screenshot: after page load, remove <img> elements that
failed to load (naturalWidth == 0) so broken images don't leave huge
blank space in the screenshot.
- Avatar handling: only substitute {size} when the template actually
contains the placeholder; emit a fallback circle with the user's
initial when avatar URL is missing or fails to load (onerror).
- Add CSS for .avatar-wrap / .avatar-fallback to render the initial.
Fixes the case shown in screenshot 450de4b where a 988x703 broken image
left massive vertical whitespace along with the visible "image 988x703
46.8 KB" alt text.
Replace the "goto + JS-hide + wait-networkidle" screenshot path with a
cleaner pipeline that fetches /t/{id}.json through the same StealthySession
(reuses cf_clearance) and then renders a self-contained HTML card via
page.set_content().
Benefits:
- Complete OP content (no truncation, no lazy-loading issues)
- No dependence on Discourse DOM/JS state
- Custom styling adapts to chat-platform preview aspect
- API failure falls back to the existing page-screenshot path
New methods:
- _fetch_topic_data, _safe_title, _extract_content_from_topic_data
- _build_preview_html, _format_count, _normalize_cooked_urls
- _render_html_screenshot
Also localize _conf_schema.json to Chinese and bump version to 1.1.0.
The full_page screenshot was capturing the entire thread including all
replies. Now hides all .topic-post elements except the first one,
so the screenshot shows only the OP's complete post cleanly.
The viewport-only mode cut off long posts. Keep full page as default
while retaining all other optimizations (networkidle, element hiding,
smart wait).
- Use networkidle instead of load — waits for Discourse JS rendering
- Wait for .cooked selector — smart content detection with fallback
- Hide fixed/sticky elements via JS injection:
- .d-header (top navigation bar)
- .sidebar-wrapper (left sidebar)
- .topic-navigation-wrapper, .footer-nav
- Scroll to first post content for clean framing
- Default to viewport screenshot (not full_page) — practical preview
- Add screenshot_full_page config option for users who want full thread
- Adjust viewport to 1280×900 (16:9-ish)
Old 29KB blank screenshots from previous broken code were being
served from cache (30min TTL) instead of being retaken. Now caches
under 50KB are considered invalid and trigger a fresh screenshot.
- Root cause: StealthySession.fetch() recycles page to about:blank
after returning, so screenshot captured blank page.
- Fix: Two-step approach - first fetch() to solve Cloudflare and
get cookies, then new_page() in same context (with cf_clearance)
to navigate and screenshot.
- Text extraction: switched from Scrapling CSS selectors to lxml
.cooked extraction, with regex fallback. Cleans HTML tags and
normalizes whitespace.