37 Commits

Author SHA1 Message Date
RainySY
c67b7779ab fix: 修复登录假成功,Cookie 改为每会话注入
根因:linux.do 登录表单启用 hCaptcha 人机验证,自动化浏览器无法通过,
旧自动登录永远不可能成功;而抓取的 _forum_session 是匿名会话本就存在的
cookie,误报「自动登录成功」,导致受限主题一直 404。

修复:
- 移除无效的账号密码自动登录(_auto_login_and_capture)
- Cookie 改为每个 StealthySession 会话都重新注入(旧代码跨请求丢失)
- 登录校验端点改用 /notifications.json(匿名 403 / 登录 200),
  弃用对匿名也返回 404 的 /session/current_user.json
- Cookie 配置支持多格式:完整 Cookie 头、单 name=value、裸值(向后兼容)

linuxdo_username/password 保留仅为兼容,不再生效。
2026-06-16 23:52:59 +08:00
RainySY
d27c006217 fix: skip _check_login_state after cookie injection
session.fetch() doesn't inherit cookies injected via ctx.add_cookies(),
so _check_login_state always fails even with valid cookies. Trust the
captured cookie directly since it comes from a successful login.

Cookies are captured from the browser context after Playwright form
submission succeeds, so they are guaranteed valid.
2026-06-16 23:02:05 +08:00
RainySY
5bfb69bd6d fix: increase _check_login_state timeout from 10ms to 30s
Scrapling fetch timeout is in milliseconds. 10ms was way too short,
causing every login verification to fail. Set to 30000ms (30s).
2026-06-16 22:57:59 +08:00
RainySY
6dc8c0cf43 fix: navigate to /login via SPA redirect instead of direct goto
Direct goto('/login') gets blocked by CF in the container. Instead:
1. goto('/') to establish CF clearance
2. window.location.href = '/login' for SPA-internal navigation
3. Wait for #login-account-name form to appear
4. Fill and submit

Also increase wait_for_timeout to 8s for login completion.
2026-06-16 22:49:17 +08:00
RainySY
b3ff053650 fix: use Playwright form fill with id selectors for login
The fetch('/login') approach returns 200 but doesn't actually log in
(Discourse SPA requires real form submission). Use Playwright's
page.fill() + page.click() with id selectors:

- #login-account-name (not input[name=username] which is hidden)
- #login-account-password (not input[name=password])
- #login-button (not button[type=submit])

Confirmed working: _forum_session cookie is set after form submission.
2026-06-16 22:47:03 +08:00
RainySY
f0eaaea1d5 debug: log ctx.cookies() count for diagnosing cookie visibility 2026-06-16 22:39:02 +08:00
RainySY
83d9317ae6 fix: use ctx.cookies() to get HttpOnly _forum_session cookie
Forum session cookie is HttpOnly (invisible to JS document.cookie).
ctx.cookies() works because patchright reads cookies from the browser
engine directly, bypassing the JS visibility restriction.

Confirmed: login POST to /login returns 200 and sets _forum_session
cookie correctly with real credentials.
2026-06-16 22:35:50 +08:00
RainySY
dacc2fd8d1 fix: use /login POST instead of /session.json for auto-login
/session.json returns 403 'invalid_access' on linux.do (likely
restricted by the forum). /login POST works correctly as the standard
Discourse login endpoint.

Changes:
- POST to /login with username/password form data (no CSRF needed)
- /login returns 200 + HTML page on success (not JSON)
- Extract _forum_session cookie from document.cookie after login
- Simplify error handling (no more list/dict type confusion)
2026-06-16 22:32:29 +08:00
RainySY
bb31f0a2c4 fix: handle BAD CSRF and list-typed data in login response
Two issues found from container logs:
1. Discourse returns 403 'BAD CSRF' when POST /session.json lacks a
   valid CSRF token. Fix: fetch CSRF token from /session/csrf.json
   (or meta tag), then send it as X-CSRF-Token header + authenticity_token
   body field + X-Requested-With header.
2. The error data can be a list (e.g. ['BAD CSRF']) not a dict.
   Fix: isinstance check before calling .get() on data.

Also remove debug logging now that the root cause is identified.
2026-06-16 22:24:46 +08:00
RainySY
01a750c819 debug: add type logging for fetch result 2026-06-16 22:20:52 +08:00
RainySY
ac3adbf9cd fix: extract cookie via page.evaluate instead of ctx.cookies()
patchright's ctx.cookies() returns empty list in the container. Use
page.evaluate to read document.cookie directly, with ctx.cookies()
as fallback.
2026-06-16 22:12:17 +08:00
RainySY
5670b45452 fix: replace fragile form-based login with direct fetch POST
The Playwright form submission approach failed in the AstrBot container
because the /login SPA didn't render the form within the timeout. Replace
with a direct fetch POST to /session.json from within the page context:

1. Navigate to linux.do/ (any page) to establish CF clearance + session cookie
2. Use page.evaluate(fetch('/session.json', {method: 'POST', ...})) to login
3. Discourse validates CSRF via the _forum_session cookie (no XHR headers needed)
4. Capture the updated _forum_session cookie from browser context

This approach:
- Doesn't depend on SPA rendering or form selectors
- Works in headless containers
- Is faster (no waiting for form rendering)
2026-06-16 16:20:39 +08:00
RainySY
7aaa39d6a1 fix: remove unreachable dead code in _ensure_authenticated 2026-06-16 14:37:22 +08:00
RainySY
136f15e3ec feat: add auto-login to capture session cookie
When linuxdo_username and linuxdo_password are configured, the plugin
automatically logs in via Playwright and captures the _forum_session
cookie from the browser context. No manual cookie copy needed.

Auth priority:
1. Manual linuxdo_session_cookie (if configured)
2. Auto-login via username+password (if configured)
3. Anonymous access (fallback)

Also update README and CHANGELOG with dual-mode auth documentation.
2026-06-16 14:25:23 +08:00
RainySY
70f4f6eb97 refactor: replace Playwright form login with cookie injection
linux.do doesn't provide user-level API keys, and the Playwright form
login approach was fragile (depends on SPA rendering, CSRF handling,
selector stability). Replace with a simpler cookie injection approach:

- User copies _forum_session cookie from browser DevTools
- Plugin injects it into StealthySession's browser context
- Validates via /session/current_user.json
- Falls back to anonymous if cookie is invalid/expired

Config changes:
- Add linuxdo_session_cookie (string, optional)
- Remove linuxdo_username and linuxdo_password (no longer needed)
2026-06-16 14:16:30 +08:00
RainySY
b64bfeba8c docs: update README with login feature and updated config table
- Add login feature to feature list (🔑 账户登录)
- Add use_api_render, linuxdo_username, linuxdo_password to config table
- Add 🔑 账户登录(可选) section with setup instructions and security note
- Update technology diagram to reflect API render pipeline
2026-06-16 13:33:27 +08:00
RainySY
b7111cdcac feat: add LinuxDo account login for accessing restricted content
When linuxdo_username and linuxdo_password are configured, the plugin
automatically logs in via Playwright's form POST to /login on Discourse.
The session cookies persist in the StealthySession context, enabling
authenticated API calls to access restricted categories, private
messages, and other non-public content.

The login flow:
1. _ensure_authenticated() is called once per StealthySession lifetime
2. _check_login_state() detects if already logged in (avoids re-login)
3. If not logged in, _do_login() navigates to /login, fills the form
   (username/password), and clicks submit (no CSRF token needed for
   HTML form POST — Discourse only enforces CSRF on XHR)
4. If login fails (wrong credentials, 2FA, rate limit), the plugin
   gracefully degrades to anonymous access and proceeds normally

Config additions:
- linuxdo_username: string, optional
- linuxdo_password: string, optional (sensitive, plaintext in config)

Version: 1.1.3 → 1.2.0 (new feature)
2026-06-16 13:29:54 +08:00
RainySY
7f3831f301 docs: update CHANGELOG and bump to 1.1.3
Document the screenshot decoupling from full_page flag (6aae30e) under
a new 1.1.3 patch entry. Bump metadata.yaml version.
2026-06-16 12:46:22 +08:00
RainySY
6aae30e5b0 fix: always use .card element screenshot, decouple from full_page flag
The previous adaptive fix only kicked in when screenshot_full_page=True.
When a user had full_page=False, the path fell through to page.screenshot()
which captures the fixed viewport (820x1200 CSS, 1640x2400 actual) — leaving
huge empty space for short posts and truncating long posts mid-content.

Drop the full_page gate. .card element screenshot is now the default path
regardless of config. The flag is only used as a fallback when the element
screenshot fails (e.g. .card not present in HTML).

This makes short and long posts render correctly for every config:
  - a3a7a0d (short, 47 views): 1640x2400 with empty gray -> 1520x698
  - 3e6e0454 (long, 555 views + images): 1640x2400 cut off -> 1520x2596 full
2026-06-16 12:44:14 +08:00
RainySY
f72efbc2d3 docs: update CHANGELOG and bump to 1.1.2
Document the adaptive .card element screenshot fix from 0496d68 under
a new 1.1.2 patch entry. Bump metadata.yaml version.
2026-06-16 12:36:50 +08:00
RainySY
0496d68f3d fix: render adaptive-sized preview by screenshotting the .card element
Previously full_page=True captured document.body.scrollHeight which on
short posts left ~2000px of empty viewport padding at the bottom
(screenshot was 1640×2400 for a 400px-tall card). For long posts it
also included the body's max-height padding.

Switch to element-level screenshot via page.locator('.card').screenshot()
when full_page is enabled. The card's bounding box drives the output
size, so short posts produce compact previews and long posts still
capture the entire card with no empty space.

Element screenshot failure (rare) falls back to page.screenshot() with
the original full_page logic.
2026-06-16 12:34:10 +08:00
RainySY
7dda0e5a9c docs: update CHANGELOG and bump to 1.1.1
Document the broken-image / Discourse meta cleanup fixes shipped in
f17dd28 under a new 1.1.1 patch entry. Bump metadata.yaml version.
2026-06-16 12:26:08 +08:00
RainySY
26336b7b44 chore: untrack the test screenshot file and ignore *.png 2026-06-16 12:23:48 +08:00
RainySY
f17dd28213 fix: remove broken images and Discourse meta debris from rendered preview
- _normalize_cooked_urls: when stripping lightbox-wrapper, only keep <img>
  and drop <div class="meta"> (filename, dimensions, download button).
  Also strip <span class="filename">, codeblock-buttons, pre-actions,
  and download anchors.
- _render_html_screenshot: after page load, remove <img> elements that
  failed to load (naturalWidth == 0) so broken images don't leave huge
  blank space in the screenshot.
- Avatar handling: only substitute {size} when the template actually
  contains the placeholder; emit a fallback circle with the user's
  initial when avatar URL is missing or fails to load (onerror).
- Add CSS for .avatar-wrap / .avatar-fallback to render the initial.

Fixes the case shown in screenshot 450de4b where a 988x703 broken image
left massive vertical whitespace along with the visible "image 988x703
46.8 KB" alt text.
2026-06-16 12:23:36 +08:00
RainySY
818bac1458 docs: add CHANGELOG.md
Track version history using Keep a Changelog format. Documents 1.1.0
(API + custom HTML render) and backfills 1.0.0 release notes from
git log.
2026-06-16 12:13:05 +08:00
RainySY
6de4c31fdb feat: render preview via Discourse JSON API + custom HTML template
Replace the "goto + JS-hide + wait-networkidle" screenshot path with a
cleaner pipeline that fetches /t/{id}.json through the same StealthySession
(reuses cf_clearance) and then renders a self-contained HTML card via
page.set_content().

Benefits:
- Complete OP content (no truncation, no lazy-loading issues)
- No dependence on Discourse DOM/JS state
- Custom styling adapts to chat-platform preview aspect
- API failure falls back to the existing page-screenshot path

New methods:
- _fetch_topic_data, _safe_title, _extract_content_from_topic_data
- _build_preview_html, _format_count, _normalize_cooked_urls
- _render_html_screenshot

Also localize _conf_schema.json to Chinese and bump version to 1.1.0.
2026-06-16 12:11:24 +08:00
RainySY
da9ad4d870 fix: capture complete OP post by expanding truncated content and triggering lazy loading
- Remove Discourse post truncation (.expand-post buttons, maxHeight limits)
- Remove gradient overlay (.gap-bottom) that hides collapsed content
- Scroll through entire page incrementally to trigger lazy-loaded images
- Wait for images to render after scrolling
2026-06-15 22:52:36 +08:00
RainySY
67a070eee0 fix: screenshot only the first post (OP) by hiding all replies
The full_page screenshot was capturing the entire thread including all
replies. Now hides all .topic-post elements except the first one,
so the screenshot shows only the OP's complete post cleanly.
2026-06-15 22:36:59 +08:00
RainySY
f99ba02dfb fix: revert screenshot default to full_page=True for complete capture
The viewport-only mode cut off long posts. Keep full page as default
while retaining all other optimizations (networkidle, element hiding,
smart wait).
2026-06-15 22:29:42 +08:00
RainySY
fa5c4e78b5 feat: optimize screenshot rendering quality
- Use networkidle instead of load — waits for Discourse JS rendering
- Wait for .cooked selector — smart content detection with fallback
- Hide fixed/sticky elements via JS injection:
  - .d-header (top navigation bar)
  - .sidebar-wrapper (left sidebar)
  - .topic-navigation-wrapper, .footer-nav
- Scroll to first post content for clean framing
- Default to viewport screenshot (not full_page) — practical preview
- Add screenshot_full_page config option for users who want full thread
- Adjust viewport to 1280×900 (16:9-ish)
2026-06-15 22:22:57 +08:00
RainySY
a13be98c26 fix: resolve code review issues - config integration, thread safety, cleanup
- Remove duplicate @staticmethod decorator on _take_screenshot
- Wire up _conf_schema.json config items to actual code:
  - max_content_length (was hardcoded 400)
  - screenshot_timeout (was hardcoded 30000/20000ms)
- Remove unused StealthyFetcher import and dead code (StealthyFetcher.adaptive=True)
- Fix _stats thread safety with threading.Lock
- Fix metadata.yaml author field (was plugin name, now 'RainySY')
- Sync README: correct screenshot size, remove non-existent screenshot_width config,
  fix asyncio.to_thread() -> run_in_executor()
- Add MIT LICENSE file
- Explicitly declare lxml>=5.0 in requirements.txt
2026-06-15 19:28:45 +08:00
RainySY
225d26d206 fix: add repo URL to metadata.yaml for plugin marketplace 2026-06-15 17:42:43 +08:00
RainySY
1558c43900 fix: add minimum size check to screenshot cache validation
Old 29KB blank screenshots from previous broken code were being
served from cache (30min TTL) instead of being retaken. Now caches
under 50KB are considered invalid and trigger a fresh screenshot.
2026-06-15 17:33:14 +08:00
RainySY
e9b80fe731 chore: remove unused imports 2026-06-15 17:24:09 +08:00
RainySY
1bded8efda fix: resolve black screenshot and text extraction issues
- Root cause: StealthySession.fetch() recycles page to about:blank
  after returning, so screenshot captured blank page.
- Fix: Two-step approach - first fetch() to solve Cloudflare and
  get cookies, then new_page() in same context (with cf_clearance)
  to navigate and screenshot.
- Text extraction: switched from Scrapling CSS selectors to lxml
  .cooked extraction, with regex fallback. Cleans HTML tags and
  normalizes whitespace.
2026-06-15 17:23:41 +08:00
RainySY
1ea2414c32 fix: correct scrapling version constraint in requirements.txt
The latest version of scrapling on PyPI is 0.4.9, not 1.0.0.
Using >=0.4 instead to allow installation.
2026-06-15 17:10:43 +08:00
RainySY
5f41aa73ea feat: initial release - linux.do link preview plugin
- Auto-detect linux.do URLs in chat messages
- Bypass Cloudflare Turnstile via Scrapling StealthyFetcher
- Full-page screenshot (1920×1080) with caching
- Text summary extraction (title + content)
- Configurable cache TTL / content length / screenshot timeout
- Stats and cache management commands: /linuxdo_stats, /linuxdo_clean
- Async non-blocking thread pool design
2026-06-15 17:00:58 +08:00