astrbot_plugin_linuxdo

Author	SHA1	Message	Date
RainySY	c67b7779ab	fix: 修复登录假成功，Cookie 改为每会话注入根因：linux.do 登录表单启用 hCaptcha 人机验证，自动化浏览器无法通过，旧自动登录永远不可能成功；而抓取的 _forum_session 是匿名会话本就存在的 cookie，误报「自动登录成功」，导致受限主题一直 404。修复： - 移除无效的账号密码自动登录（_auto_login_and_capture） - Cookie 改为每个 StealthySession 会话都重新注入（旧代码跨请求丢失） - 登录校验端点改用 /notifications.json（匿名 403 / 登录 200），弃用对匿名也返回 404 的 /session/current_user.json - Cookie 配置支持多格式：完整 Cookie 头、单 name=value、裸值（向后兼容） linuxdo_username/password 保留仅为兼容，不再生效。	2026-06-16 23:52:59 +08:00
RainySY	d27c006217	fix: skip _check_login_state after cookie injection session.fetch() doesn't inherit cookies injected via ctx.add_cookies(), so _check_login_state always fails even with valid cookies. Trust the captured cookie directly since it comes from a successful login. Cookies are captured from the browser context after Playwright form submission succeeds, so they are guaranteed valid.	2026-06-16 23:02:05 +08:00
RainySY	5bfb69bd6d	fix: increase _check_login_state timeout from 10ms to 30s Scrapling fetch timeout is in milliseconds. 10ms was way too short, causing every login verification to fail. Set to 30000ms (30s).	2026-06-16 22:57:59 +08:00
RainySY	6dc8c0cf43	fix: navigate to /login via SPA redirect instead of direct goto Direct goto('/login') gets blocked by CF in the container. Instead: 1. goto('/') to establish CF clearance 2. window.location.href = '/login' for SPA-internal navigation 3. Wait for #login-account-name form to appear 4. Fill and submit Also increase wait_for_timeout to 8s for login completion.	2026-06-16 22:49:17 +08:00
RainySY	b3ff053650	fix: use Playwright form fill with id selectors for login The fetch('/login') approach returns 200 but doesn't actually log in (Discourse SPA requires real form submission). Use Playwright's page.fill() + page.click() with id selectors: - #login-account-name (not input[name=username] which is hidden) - #login-account-password (not input[name=password]) - #login-button (not button[type=submit]) Confirmed working: _forum_session cookie is set after form submission.	2026-06-16 22:47:03 +08:00
RainySY	f0eaaea1d5	debug: log ctx.cookies() count for diagnosing cookie visibility	2026-06-16 22:39:02 +08:00
RainySY	83d9317ae6	fix: use ctx.cookies() to get HttpOnly _forum_session cookie Forum session cookie is HttpOnly (invisible to JS document.cookie). ctx.cookies() works because patchright reads cookies from the browser engine directly, bypassing the JS visibility restriction. Confirmed: login POST to /login returns 200 and sets _forum_session cookie correctly with real credentials.	2026-06-16 22:35:50 +08:00
RainySY	dacc2fd8d1	fix: use /login POST instead of /session.json for auto-login /session.json returns 403 'invalid_access' on linux.do (likely restricted by the forum). /login POST works correctly as the standard Discourse login endpoint. Changes: - POST to /login with username/password form data (no CSRF needed) - /login returns 200 + HTML page on success (not JSON) - Extract _forum_session cookie from document.cookie after login - Simplify error handling (no more list/dict type confusion)	2026-06-16 22:32:29 +08:00
RainySY	bb31f0a2c4	fix: handle BAD CSRF and list-typed data in login response Two issues found from container logs: 1. Discourse returns 403 'BAD CSRF' when POST /session.json lacks a valid CSRF token. Fix: fetch CSRF token from /session/csrf.json (or meta tag), then send it as X-CSRF-Token header + authenticity_token body field + X-Requested-With header. 2. The error data can be a list (e.g. ['BAD CSRF']) not a dict. Fix: isinstance check before calling .get() on data. Also remove debug logging now that the root cause is identified.	2026-06-16 22:24:46 +08:00
RainySY	01a750c819	debug: add type logging for fetch result	2026-06-16 22:20:52 +08:00
RainySY	ac3adbf9cd	fix: extract cookie via page.evaluate instead of ctx.cookies() patchright's ctx.cookies() returns empty list in the container. Use page.evaluate to read document.cookie directly, with ctx.cookies() as fallback.	2026-06-16 22:12:17 +08:00
RainySY	5670b45452	fix: replace fragile form-based login with direct fetch POST The Playwright form submission approach failed in the AstrBot container because the /login SPA didn't render the form within the timeout. Replace with a direct fetch POST to /session.json from within the page context: 1. Navigate to linux.do/ (any page) to establish CF clearance + session cookie 2. Use page.evaluate(fetch('/session.json', {method: 'POST', ...})) to login 3. Discourse validates CSRF via the _forum_session cookie (no XHR headers needed) 4. Capture the updated _forum_session cookie from browser context This approach: - Doesn't depend on SPA rendering or form selectors - Works in headless containers - Is faster (no waiting for form rendering)	2026-06-16 16:20:39 +08:00
RainySY	7aaa39d6a1	fix: remove unreachable dead code in _ensure_authenticated	2026-06-16 14:37:22 +08:00
RainySY	136f15e3ec	feat: add auto-login to capture session cookie When linuxdo_username and linuxdo_password are configured, the plugin automatically logs in via Playwright and captures the _forum_session cookie from the browser context. No manual cookie copy needed. Auth priority: 1. Manual linuxdo_session_cookie (if configured) 2. Auto-login via username+password (if configured) 3. Anonymous access (fallback) Also update README and CHANGELOG with dual-mode auth documentation.	2026-06-16 14:25:23 +08:00
RainySY	70f4f6eb97	refactor: replace Playwright form login with cookie injection linux.do doesn't provide user-level API keys, and the Playwright form login approach was fragile (depends on SPA rendering, CSRF handling, selector stability). Replace with a simpler cookie injection approach: - User copies _forum_session cookie from browser DevTools - Plugin injects it into StealthySession's browser context - Validates via /session/current_user.json - Falls back to anonymous if cookie is invalid/expired Config changes: - Add linuxdo_session_cookie (string, optional) - Remove linuxdo_username and linuxdo_password (no longer needed)	2026-06-16 14:16:30 +08:00
RainySY	b64bfeba8c	docs: update README with login feature and updated config table - Add login feature to feature list (🔑 账户登录) - Add use_api_render, linuxdo_username, linuxdo_password to config table - Add 🔑 账户登录（可选） section with setup instructions and security note - Update technology diagram to reflect API render pipeline	2026-06-16 13:33:27 +08:00
RainySY	b7111cdcac	feat: add LinuxDo account login for accessing restricted content When linuxdo_username and linuxdo_password are configured, the plugin automatically logs in via Playwright's form POST to /login on Discourse. The session cookies persist in the StealthySession context, enabling authenticated API calls to access restricted categories, private messages, and other non-public content. The login flow: 1. _ensure_authenticated() is called once per StealthySession lifetime 2. _check_login_state() detects if already logged in (avoids re-login) 3. If not logged in, _do_login() navigates to /login, fills the form (username/password), and clicks submit (no CSRF token needed for HTML form POST — Discourse only enforces CSRF on XHR) 4. If login fails (wrong credentials, 2FA, rate limit), the plugin gracefully degrades to anonymous access and proceeds normally Config additions: - linuxdo_username: string, optional - linuxdo_password: string, optional (sensitive, plaintext in config) Version: 1.1.3 → 1.2.0 (new feature)	2026-06-16 13:29:54 +08:00
RainySY	7f3831f301	docs: update CHANGELOG and bump to 1.1.3 Document the screenshot decoupling from full_page flag (`6aae30e`) under a new 1.1.3 patch entry. Bump metadata.yaml version.	2026-06-16 12:46:22 +08:00
RainySY	6aae30e5b0	fix: always use .card element screenshot, decouple from full_page flag The previous adaptive fix only kicked in when screenshot_full_page=True. When a user had full_page=False, the path fell through to page.screenshot() which captures the fixed viewport (820x1200 CSS, 1640x2400 actual) — leaving huge empty space for short posts and truncating long posts mid-content. Drop the full_page gate. .card element screenshot is now the default path regardless of config. The flag is only used as a fallback when the element screenshot fails (e.g. .card not present in HTML). This makes short and long posts render correctly for every config: - a3a7a0d (short, 47 views): 1640x2400 with empty gray -> 1520x698 - 3e6e0454 (long, 555 views + images): 1640x2400 cut off -> 1520x2596 full	2026-06-16 12:44:14 +08:00
RainySY	f72efbc2d3	docs: update CHANGELOG and bump to 1.1.2 Document the adaptive .card element screenshot fix from `0496d68` under a new 1.1.2 patch entry. Bump metadata.yaml version.	2026-06-16 12:36:50 +08:00
RainySY	0496d68f3d	fix: render adaptive-sized preview by screenshotting the .card element Previously full_page=True captured document.body.scrollHeight which on short posts left ~2000px of empty viewport padding at the bottom (screenshot was 1640×2400 for a 400px-tall card). For long posts it also included the body's max-height padding. Switch to element-level screenshot via page.locator('.card').screenshot() when full_page is enabled. The card's bounding box drives the output size, so short posts produce compact previews and long posts still capture the entire card with no empty space. Element screenshot failure (rare) falls back to page.screenshot() with the original full_page logic.	2026-06-16 12:34:10 +08:00
RainySY	7dda0e5a9c	docs: update CHANGELOG and bump to 1.1.1 Document the broken-image / Discourse meta cleanup fixes shipped in `f17dd28` under a new 1.1.1 patch entry. Bump metadata.yaml version.	2026-06-16 12:26:08 +08:00
RainySY	26336b7b44	chore: untrack the test screenshot file and ignore *.png	2026-06-16 12:23:48 +08:00
RainySY	f17dd28213	fix: remove broken images and Discourse meta debris from rendered preview - _normalize_cooked_urls: when stripping lightbox-wrapper, only keep <img> and drop <div class="meta"> (filename, dimensions, download button). Also strip <span class="filename">, codeblock-buttons, pre-actions, and download anchors. - _render_html_screenshot: after page load, remove <img> elements that failed to load (naturalWidth == 0) so broken images don't leave huge blank space in the screenshot. - Avatar handling: only substitute {size} when the template actually contains the placeholder; emit a fallback circle with the user's initial when avatar URL is missing or fails to load (onerror). - Add CSS for .avatar-wrap / .avatar-fallback to render the initial. Fixes the case shown in screenshot 450de4b where a 988x703 broken image left massive vertical whitespace along with the visible "image 988x703 46.8 KB" alt text.	2026-06-16 12:23:36 +08:00
RainySY	818bac1458	docs: add CHANGELOG.md Track version history using Keep a Changelog format. Documents 1.1.0 (API + custom HTML render) and backfills 1.0.0 release notes from git log.	2026-06-16 12:13:05 +08:00
RainySY	6de4c31fdb	feat: render preview via Discourse JSON API + custom HTML template Replace the "goto + JS-hide + wait-networkidle" screenshot path with a cleaner pipeline that fetches /t/{id}.json through the same StealthySession (reuses cf_clearance) and then renders a self-contained HTML card via page.set_content(). Benefits: - Complete OP content (no truncation, no lazy-loading issues) - No dependence on Discourse DOM/JS state - Custom styling adapts to chat-platform preview aspect - API failure falls back to the existing page-screenshot path New methods: - _fetch_topic_data, _safe_title, _extract_content_from_topic_data - _build_preview_html, _format_count, _normalize_cooked_urls - _render_html_screenshot Also localize _conf_schema.json to Chinese and bump version to 1.1.0.	2026-06-16 12:11:24 +08:00
RainySY	da9ad4d870	fix: capture complete OP post by expanding truncated content and triggering lazy loading - Remove Discourse post truncation (.expand-post buttons, maxHeight limits) - Remove gradient overlay (.gap-bottom) that hides collapsed content - Scroll through entire page incrementally to trigger lazy-loaded images - Wait for images to render after scrolling	2026-06-15 22:52:36 +08:00
RainySY	67a070eee0	fix: screenshot only the first post (OP) by hiding all replies The full_page screenshot was capturing the entire thread including all replies. Now hides all .topic-post elements except the first one, so the screenshot shows only the OP's complete post cleanly.	2026-06-15 22:36:59 +08:00
RainySY	f99ba02dfb	fix: revert screenshot default to full_page=True for complete capture The viewport-only mode cut off long posts. Keep full page as default while retaining all other optimizations (networkidle, element hiding, smart wait).	2026-06-15 22:29:42 +08:00
RainySY	fa5c4e78b5	feat: optimize screenshot rendering quality - Use networkidle instead of load — waits for Discourse JS rendering - Wait for .cooked selector — smart content detection with fallback - Hide fixed/sticky elements via JS injection: - .d-header (top navigation bar) - .sidebar-wrapper (left sidebar) - .topic-navigation-wrapper, .footer-nav - Scroll to first post content for clean framing - Default to viewport screenshot (not full_page) — practical preview - Add screenshot_full_page config option for users who want full thread - Adjust viewport to 1280×900 (16:9-ish)	2026-06-15 22:22:57 +08:00
RainySY	a13be98c26	fix: resolve code review issues - config integration, thread safety, cleanup - Remove duplicate @staticmethod decorator on _take_screenshot - Wire up _conf_schema.json config items to actual code: - max_content_length (was hardcoded 400) - screenshot_timeout (was hardcoded 30000/20000ms) - Remove unused StealthyFetcher import and dead code (StealthyFetcher.adaptive=True) - Fix _stats thread safety with threading.Lock - Fix metadata.yaml author field (was plugin name, now 'RainySY') - Sync README: correct screenshot size, remove non-existent screenshot_width config, fix asyncio.to_thread() -> run_in_executor() - Add MIT LICENSE file - Explicitly declare lxml>=5.0 in requirements.txt	2026-06-15 19:28:45 +08:00
RainySY	225d26d206	fix: add repo URL to metadata.yaml for plugin marketplace	2026-06-15 17:42:43 +08:00
RainySY	1558c43900	fix: add minimum size check to screenshot cache validation Old 29KB blank screenshots from previous broken code were being served from cache (30min TTL) instead of being retaken. Now caches under 50KB are considered invalid and trigger a fresh screenshot.	2026-06-15 17:33:14 +08:00
RainySY	e9b80fe731	chore: remove unused imports	2026-06-15 17:24:09 +08:00
RainySY	1bded8efda	fix: resolve black screenshot and text extraction issues - Root cause: StealthySession.fetch() recycles page to about:blank after returning, so screenshot captured blank page. - Fix: Two-step approach - first fetch() to solve Cloudflare and get cookies, then new_page() in same context (with cf_clearance) to navigate and screenshot. - Text extraction: switched from Scrapling CSS selectors to lxml .cooked extraction, with regex fallback. Cleans HTML tags and normalizes whitespace.	2026-06-15 17:23:41 +08:00
RainySY	1ea2414c32	fix: correct scrapling version constraint in requirements.txt The latest version of scrapling on PyPI is 0.4.9, not 1.0.0. Using >=0.4 instead to allow installation.	2026-06-15 17:10:43 +08:00
RainySY	5f41aa73ea	feat: initial release - linux.do link preview plugin - Auto-detect linux.do URLs in chat messages - Bypass Cloudflare Turnstile via Scrapling StealthyFetcher - Full-page screenshot (1920×1080) with caching - Text summary extraction (title + content) - Configurable cache TTL / content length / screenshot timeout - Stats and cache management commands: /linuxdo_stats, /linuxdo_clean - Async non-blocking thread pool design	2026-06-15 17:00:58 +08:00

37 Commits