netbird

Author	SHA1	Message	Date
Pascal Fischer	38ad2b67e8	[proxy] fix context for udprelay (#6444 )	2026-06-16 14:41:17 +02:00
Pascal Fischer	01aa49433e	[management] delete targets when deleting exposed service (#6442 )	2026-06-16 14:33:24 +02:00
Zoltan Papp	08a2b63675	[client] propagate exit-node deselect to synthesized v6 (::/0) route (#6296 ) * [client] propagate exit-node deselect to synthesized v6 (::/0) route When a client deselects an IPv4 exit node, the auto-generated IPv6 default route (::/0) was still selected and pushed onto the tunnel interface, even though the user disabled the exit node. On an exit node without a real IPv6 egress this blackholes IPv6 traffic, and because clients prefer IPv6 (happy eyeballs) it can break general connectivity. Root cause: the synthesized v6 route gets a different NetID than its v4 base (base + "-v6"). The route selector keys deselects by NetID and defaults unknown NetIDs to selected, so the "-v6" entry was never matched by the v4 deselect. The effectiveNetID() mirror that solves exactly this is used by HasUserSelectionForRoute and FilterSelectedExitNodes, but categorizeUserSelection called the raw IsSelected(), bypassing it and mis-categorizing the v6 pair as user-selected. Add RouteSelector.IsSelectedForExitNode(), which applies effectiveNetID before the selection check, and use it in categorizeUserSelection. IsSelected() is left untouched so non-exit code paths don't make unrelated "-v6" routes inherit v4 state. Adds regression tests for the v4/v6 deselect mirror and explicit-v6 override. [client] add DIAG logging to trace exit-node v6 (::/0) route filtering Temporary diagnostics to find why a deselected v4 exit node's synthesized ::/0 route still reaches the tunnel. Logs the full install path: incoming client networks, route-selector state before/after the management-driven update, what updateExitNodeSelections deselects/selects, and per-route KEEP/SKIP/DROP decisions in FilterSelectedExitNodes and applyExitNodeFilter. To be reverted once the real root cause is confirmed from a client log. * [client] clear orphaned v6 exit selection when v4 pair is toggled Root cause of the leaking ::/0 route, confirmed from client logs: the synthesized "-v6" exit route could stay explicitly selected in the persisted route-selector state while its v4 base was deselected (selected=[...-v6], deselected=[...v4base]). Because the v6 entry then has its own explicit state, effectiveNetID stops mirroring the v4 base, so FilterSelectedExitNodes keeps ::/0 and it is installed on the tunnel even though the user disabled the exit node. This happened because the iOS SDK's deselect only pairs the "-v6" sibling via ExpandV6ExitPairs when the v6 route is present in the current routesMap; a deselect at a moment it wasn't expanded left the v6 selection orphaned. Fix at the selector write path so it is independent of routesMap timing: when a v4 exit NetID is selected or deselected, clear any orphaned explicit state on its "-v6" sibling (clearPairedV6Locked), unless the sibling is part of the same batch (the deliberate ExpandV6ExitPairs case). The v6 then falls back to inheriting the v4 base via effectiveNetID, so a v4 deselect also drops ::/0 and a v4 select brings both back. Adds regression tests: a stale explicit v6 selection is cleared by a later v4 deselect, and an explicit v6 select made in the same batch is preserved. * [ios] compute route connection status in the bridge The iOS bridge exposed a route's Network as a possibly comma-joined string ("0.0.0.0/0, ::/0" for a merged exit node) but no connection status, forcing the UI to infer status by string-matching that joined value against peer routes — which never matched for the merged exit node, leaving it stuck as not-connected. Android already computes status in the core (findBestRoutePeer). Mirror that here: add a Status field to RoutesSelectionInfo and compute it from the connected peers' route tables, matching the route's primary prefix, a merged exit node's extra v6 prefix, or a dynamic route's domain pattern (the key the route manager records). The UI can now read the status directly. * [client] remove exit-node v6 DIAG logging and tidy routeselector Drop the temporary DIAG diagnostics added to trace the leaking ::/0 route (the root cause is fixed and confirmed). Also reorganize routeselector.go so the exit-node helpers (clearPairedV6Locked, isExitNode) sit next to the exit-node code paths and MarshalJSON/UnmarshalJSON are grouped together. * [client] mirror v4 exit selection onto v6 pair at write time The synthesized "-v6" exit route shares its v4 base's NetID plus a "-v6" suffix. Selection state was reconciled at read time via effectiveNetID, a mirror that could only be applied on exit-node code paths, which forced a parallel IsSelectedForExitNode() alongside IsSelected() and a clearPairedV6Locked() orphan cleanup on every toggle. That machinery still missed the case observed in the field: a persisted state with the v4 base deselected but its "-v6" sibling explicitly selected (orphaned). Because effectiveNetID returns the v6 entry itself once it carries explicit state, and clearPairedV6Locked only fires on a live toggle, the loaded orphan survived and the ::/0 route leaked onto the tunnel despite the exit node being disabled, breaking IPv6 (happy eyeballs). Treat the v4/v6 exit pair as a single toggle and keep state consistent at write time instead. RouteSelector.SyncPairedSelection forces the "-v6" entry to match its v4 base unconditionally, resetting any orphaned explicit state. The route manager, which knows the route prefixes, computes the pairs (V6ExitMergeSet) and calls it from updateRouteSelectorFromManagement before selection is read, so both collectExitNodeInfo and FilterSelectedExitNodes see consistent state, including pairs loaded from persisted selector state. This removes effectiveNetID, IsSelectedForExitNode and clearPairedV6Locked; the selector is literal again and no longer needs the "exit-node paths only" caveat. HasUserSelectionForRoute and applyExitNodeFilter use the raw NetID. Adds a selector test for SyncPairedSelection (including the orphaned-v6 case) and a route-manager test reproducing the persisted-orphan scenario from the field log. * [client] add DIAG logging to trace v6 exit-pair mirror The write-time mirror did not eliminate the leak in field testing. Re-add the DIAG diagnostics around the exit-node selection flow to capture a fresh trace: - UpdateRoutes: incoming client networks, selector state before/after the management update, and the networks remaining after FilterSelectedExitNodes. - mirrorV6ExitPairSelections: the NetIDs present in this update and the v6 pairs V6ExitMergeSet derives from them (reveals whether the v4 base and its ::/0 pair are present in the same update so the pair can be matched). - SyncPairedSelection: the base/paired state before and after the sync. - FilterSelectedExitNodes / applyExitNodeFilter: per-route SKIP/KEEP/DROP and the selection lookups behind each decision. - updateExitNodeSelections / logExitNodeUpdate: categorization and deselect set. Temporary; to be removed once the root cause is confirmed. * [client] remove v6 exit-pair mirror DIAG logging Drop the temporary DIAG diagnostics added to trace the v4/v6 exit-pair mirror. The field log confirmed the write-time mirror keeps the pair consistent (the ::/0 route is only ever applied alongside its v4 base and is dropped on deselect), so the diagnostics are no longer needed.	2026-06-16 12:27:58 +02:00
Maycon Santos	b3f9e6588a	[management] sync openapi spec and test for diff on workflows (#6437 ) * [management] sync openapi spec and test for diff on workflows * [management] pin oapi-codegen version to v2.7.1	2026-06-15 17:53:25 +02:00
Pascal Fischer	967e2d6864	[management] network map for affected peers (#6105 )	2026-06-15 17:43:22 +02:00
Zoltan Papp	e7c1d364c3	[management] treat ci- builds as development for remote jobs (#6436 ) * fix(management): treat ci- builds as development for remote jobs CI snapshot builds use a "ci-<sha>" version string that did not match IsDevelopmentVersion, so the remote-jobs minimum-version gate rejected them. Recognize the "ci-" prefix as a development build. * fix(management): treat dev- builds as development for remote jobs Dev snapshot builds use a "dev-<sha>" version string that did not match IsDevelopmentVersion, so the remote-jobs minimum-version gate rejected them. Recognize the "dev-" prefix as a development build, alongside the existing "ci-" prefix.	2026-06-15 17:22:40 +02:00
Viktor Liu	a44198fd77	[client] Add dialWebSocket method to WASM client (#5980 )	2026-06-15 16:43:24 +02:00
Viktor Liu	b57f714350	[client] Drop signaling-side ICE candidate filter, drop overlay STUN at mux read-side instead (#6142 )	2026-06-15 16:37:03 +02:00
Viktor Liu	f893abc41d	[client] Recover from tun device read/write panics and restart the client (#6419 )	2026-06-15 16:36:00 +02:00
Lee Sang Hoon	60067619a1	[proxy] Keep custom TCP listeners alive after mapping batches (#6415 )	2026-06-15 12:21:24 +02:00
Bethuel Mmbaga	cd777395f2	[management] Skip JWT group evaluation for embedded-IdP local users (#6422 ) When JWT group sync is enabled with a restrictive JWTAllowGroups list, the local owner of an embedded-IdP (Dex) deployment can get locked out. The allow-groups check runs account-wide but local password users do not receive external IdP group claims, so they can't satisfy the allowed list. This skips JWT group evaluation for local Dex users so the restriction and JWT group sync continue to apply to external-IdP users as intended.	2026-06-15 12:01:54 +03:00
Viktor Liu	b19467e3af	[client] Answer NODATA when a host resolves without addresses of the requested family (#6418 )	2026-06-12 14:50:46 +02:00
Riccardo Manfrin	2bcea9d582	[client] add MDM configuration profile support (Windows registry + macOS plist) (#6374 ) * Initial scaffolding * Applies MDM override * Unit tests * Helpers business logic * Return error if trying to modify any config that is gated by MDM * Add ManagedFields to returned config over GetConfig * Adds initial 101 MDM policy business logic testing * gRPC MDM changes * MDM Name scoping for clarity * Implements windows loading of MDM policy * Adds missing WGPort config * Cleanup setupKey to align to linear * Align split tunnel code * Adds some log * Prefix every log with MDM * Adds debug config cobra command This can be useful for troubleshooting and checking config now that its resolution is not trivial defaults > config > env cars > CLI/UI > MDM * Adds MDM 1m diff checker & reloader * Adds also up/start after cancel * Publishes event for UI to sync upon MDM changes * Add events to resync UI to actual config This also provide fixup for UI no aligning to changed config when coming from cli up with config flags. * UI behavior conflicts relaxation UI sends full config snapshot with all values. It doesn't make sense to block it if the values are aligned with the values constrained by the MDM policy. It's just simplier to allow values that are compliant. (this goes for the CLI as well at this point) * Lock toggle Settngs * Advanced Settings locking * Fixup presharedkey * Apply MDM locks * Toggle gray in/out for Advanced Settings * Adds support for disabling of Profiles and UpdateSettings feature flags * Adds Gate Login as well when --disable-update-settings=true is given to service This commit tries to settle things with an old PR-4237 which had relaxed the case where the SetConfig returned an `Unavailable` code error. Under this circumnstance the PR allowed the upFunc to just emit a warning and progress further with the login gRPC. Since the login call is consuming the --management-url coming from the `up` command, it might be possible to abuse the "Unavailable" code to inject a management URL that is different from the configured one even though the --disable-update-settings is set to true (?) * Evaluate disable-update-settings errors only when there's an actual override * [UI] Fixup advanced Settings * [UI] Fixup for preshared key * [UI] Fixup for profile enable/disable toggle We need to align the initial state to evaluate the delta in case. The initial state has to be "true" since the profile starts visible. Then we receive MDM and transition the cache bool value to the actual MDM imposed state * Enforces disable networks * [UI] Aligns to "enable/disable once on change only" * Fixup: MDM wins. always * Removes --disable-advanced-settings It was a typo in our meetings. the actual thing is --disable-update-settings * [PROTO] Removes --disable-advanced-settings * [UI] Removes --disable-advanced-settings * Pins feat profile retrieval to notif event * [UI] Fix for "hide" not working when propagating to parent with children * Adds dep for reading plist files * Introduces support for darwing plist loading * Tests MDM config reload via ticker * [PROVISIONING] ADMX/ADML/PS/bash scripts/templates * CI fixes - Add docstrings to `mdm_integration` - refactor for cognitive complexity - mod tidy * Linting * Add docstrings to `mdm_integration` * nil,nil is no policy and no error. Allow it * nil,nil is no policy and no error. Allow it * exclude MDM profile adminstrated keys data from debug bundle * Fixes Rosenpass left disable after MDM unlock * Partial revert coderabbit added docstrings * Renaming fix * Avoid locking on clientRunning bool when the connection is aborted for whatever reason We want to just signal this through the giveUpChan, we will manage the signal from the waiter side and in case set it to false there. THis way we avoid locking, which should allow the MDM down+wait_for_term_chan_signal_+up procedure clientRunning is used to signal two different conditions here: 1. the initialization procedure is over (we have an engine) 2. the connection being up (or being attempted) Probably these two functionalities should not alias, and the failure of the second condition (because of any error) should just drive a reconnection (currently it's not happening, and we silently go idle). OR, mor probably, the two things are the SAME and there should not exist a case where we did the "Up" initialization and connection attempt but we are not still attempting it. * Moves test helper at te very bottom * Addresses github comments * No lock no copy * Prevents engine not stopping within 10 secs from being paired by another instance We instead juts SKIP updating the policy, so 1. the MDM ticker will kick in 1 minute time, 2. find the policy misaligned, 3. enter the onMDMPolicyChange, 4. find the s.clientRunning == true (because it is set to false only in server cleanupConnection, and not by s.actCancel()) 5. call s.actCancel() again if not nil 6. immediately return from <-s.clientGiveUpChan 7. finally call s.restartEngineForMDMLocked() * Since we ARE running there should be a config If the config was cancelled midflight, connect will abort later on * DisableAutoConnect should not stop a running connection. DisableAutoConnect should just avoid the connection attempts when the service starts. If we are started and we are up and running, DisableAutoConnect should not kick in. Another PR will follow about this topic * Removes unused vars * Moves callback into Run method arg * align comment to removal of DisableAutoConnect DisableAutoConnect should just avoid the connection attempts when the service starts. If we are started and we are up and running, DisableAutoConnect should not kick in * Removes unused managed_fields data. This was initially used to drive the UI but approach changed to reload config/features upon notifications which makes this data redundant. * Reorder stuff * Unexport unrequired vars/functions PoliciesEqual → policiesEqual AllKeys → allKeys * Adds list of MDM managed fields in the debug bundle	2026-06-12 12:28:49 +02:00
Maycon Santos	8ff3b06cf1	[client] Index peer tunnel IPs for faster PeerStateByIP lookup (#6412 ) Some checks failed Release / FreeBSD Port / Build & Test (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / release_ui (push) Has been cancelled Details Release / release_ui_darwin (push) Has been cancelled Details Release / Windows Installer / Build Test (amd64, amd64) (push) Has been cancelled Details Release / Windows Installer / Build Test (arm64, arm64) (push) Has been cancelled Details Release / Comment release artifacts (push) Has been cancelled Details Release / trigger_signer (push) Has been cancelled Details sync tag / trigger_sync_tag (push) Has been cancelled Details sync tag / trigger_android_bump (push) Has been cancelled Details sync tag / trigger_ios_bump (push) Has been cancelled Details update docs / trigger_docs_api_update (push) Has been cancelled Details * [client] Index peer tunnel IPs for O(1) PeerStateByIP lookup Replace the linear scan over all peers with an ipToKey map maintained by AddPeer/RemovePeer, covering both IPv4 and IPv6 tunnel addresses. Offline peers are intentionally no longer resolvable by IP: only active peers can carry traffic, so IdentityForIP and the DNS disconnected-peer filter now treat them as unknown, same as foreign IPs. Skip the DNS answer filter for single-record responses; dropping the only answer was always restored by the empty-answer escape hatch, so the fast path is behavior-neutral. * Ensure `ipToKey` entries are only removed if they match the peer being deleted, preventing accidental removal of unrelated mappings. v0.72.4	2026-06-12 10:24:15 +02:00
Maycon Santos	d7703767d5	[client, proxy] cancel context before stopping engine on embedded client (#6397 ) Some checks failed Release / FreeBSD Port / Build & Test (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / release_ui (push) Has been cancelled Details Release / release_ui_darwin (push) Has been cancelled Details Release / Windows Installer / Build Test (amd64, amd64) (push) Has been cancelled Details Release / Windows Installer / Build Test (arm64, arm64) (push) Has been cancelled Details Release / Comment release artifacts (push) Has been cancelled Details Release / trigger_signer (push) Has been cancelled Details sync tag / trigger_sync_tag (push) Has been cancelled Details sync tag / trigger_android_bump (push) Has been cancelled Details sync tag / trigger_ios_bump (push) Has been cancelled Details update docs / trigger_docs_api_update (push) Has been cancelled Details - Engine.Start takes syncMsgMux with a deferred unlock (engine.go:445) and parks in receiveSignalEvents → WaitStreamConnected (engine.go:1762), which only wakes on signal-stream connect or client-context cancellation. - When signal never connects, the 30s startup timeout fires and embed.Client.Start's rollback (embed.go:281) called client.Stop() → Engine.Stop, which blocks acquiring syncMsgMux (engine.go:318). The cancel() that would unpark Start was deferred until Start returned — permanent cycle. RemovePeer calls (g43/g385) then queue behind the lifecycle mutex. - Notably, embed.Client.Stop and the daemon's cleanupConnection both cancel before stopping — the startup rollback was the only path that didn't. - Engine.Start takes syncMsgMux with a deferred unlock (engine.go:445) and parks in receiveSignalEvents → WaitStreamConnected (engine.go:1762), which only wakes on signal-stream connect or client-context cancellation. - When signal never connects, the 30s startup timeout fires and embed.Client.Start's rollback (embed.go:281) called client.Stop() → Engine.Stop, which blocks acquiring syncMsgMux (engine.go:318). The cancel() that would unpark Start was deferred until Start returned — permanent cycle. RemovePeer calls (g43/g385) then queue behind the lifecycle mutex. - Notably, embed.Client.Stop and the daemon's cleanupConnection both cancel before stopping — the startup rollback was the only path that didn't. v0.72.3	2026-06-10 21:26:54 +02:00
Maycon Santos	7feda907ca	[management] fix L4 service update when no custom port (#6396 ) This fixes an issue where L4 service update is not possible when proxy clusters don't support custom ports	2026-06-10 18:55:24 +02:00
Maycon Santos	62da482133	[management] Add version gate to stop sending deprecated RemotePeers field (#6371 ) * [management] Add version gate to stop sending deprecated RemotePeers field don't send top-level remote peers on peers in the v0.29.3 or newer * precompute deprecated remote peers version constraint * [management] update tests to validate network map-based remote peers * [management] move deprecatedRemotePeersVersion constant closer to its usage * fix misplaced precomputed constraint definition * ensure top-level RemotePeers is empty for v0.29.3+ clients	2026-06-10 16:59:09 +02:00
Philip Laine	079bce3c2f	Add commands to discover and write Kubernetes configuration (#6260 )	2026-06-10 15:00:10 +02:00
Maycon Santos	1a09aa6715	[misc] Update Go toolchain version in go.mod (#6377 )	2026-06-10 14:50:57 +02:00
Maycon Santos	61abf5b9ea	[proxy] Use UUID for proxy ID generation (#6391 ) Use UUID for proxy ID instead of the second to avoid race conditions when running multiple nodes at the same time.	2026-06-10 13:35:26 +02:00
Boris Dolgov	e229050ba3	[proxy] Notify certificate ready for domains covered by the static certificate (#6389 )	2026-06-10 12:05:34 +02:00
Zoltan Papp	e919b2d55d	[client] Preserve posture checks on config-only sync updates (#6373 ) * [client] Preserve posture checks on config-only sync updates When management sends a MessageTypeControlConfig update (e.g. relay token rotation), the SyncResponse carries no NetworkMap and no Checks. Moving the updateChecksIfNew call after the nm == nil guard ensures posture checks are only updated when a full network map is present, preventing relay token rotation from silently clearing the previously applied posture check state. * [client] Clarify posture check update logic with explicit comment * [client] Extract NetBird config and sync persistence into helpers Move the NetbirdConfig handling block out of handleSync into updateNetbirdConfig and the sync response persistence into persistSyncResponse, mirroring updateChecksIfNew. This flattens handleSync and makes the individual update steps unit-testable.	2026-06-10 11:43:24 +02:00
Pascal Fischer	a40028092d	[management] log user agent and return request id (#6380 )	2026-06-09 15:24:26 +02:00
Pascal Fischer	13200265d8	[proxy] Add no-blocking mapping updates (#6369 )	2026-06-09 13:57:17 +02:00
Viktor Liu	ed7a9363aa	[management] Emit IPv6 default permit firewall rule for exit node routes (#6368 )	2026-06-09 13:26:43 +02:00
Viktor Liu	d56859dc5d	[client] Filter DNS fallback upstreams matching our server IP to prevent loops (#6183 )	2026-06-09 12:26:03 +02:00
Viktor Liu	367d37050b	[relay, client] Fall back to WebSocket relay transport on oversized QUIC datagrams (#6339 )	2026-06-09 10:25:46 +02:00
Viktor Liu	106527182f	[client] Snapshot iptables rule maps before persisting state (#6345 )	2026-06-09 10:24:51 +02:00
Viktor Liu	8e1d5b78c2	[client] Preserve user deselect-all across management route sync (#6363 )	2026-06-09 10:24:17 +02:00
PizzaLovingNerd	d3b63c6be9	[infrastructure] Better support for atomic distros in install.sh, docker fixes in getting-started.sh (#6139 ) * Made the docker check first for getting-started.sh, better atomic support for install.sh * Check for docker socket perms * Added fallback for systems without rpm-ostree or bootc. * macOS fix for docker socket check * Change error message for docker group. No longer using a blanket recommendation for the docker group.	2026-06-08 21:38:46 +02:00
Maycon Santos	60d2fa08b0	[client] Mask sensitive data in debug bundle creation (#6364 ) * [client] Mask sensitive data in debug bundle creation * Avoid nil reference in turn and use masked constant	2026-06-08 13:17:04 +02:00
Maycon Santos	1e7b16db0a	[management] resolve private services on custom domains in synthesized DNS zones (#6348 ) Some checks failed Release / FreeBSD Port / Build & Test (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / release_ui (push) Has been cancelled Details Release / release_ui_darwin (push) Has been cancelled Details Release / Windows Installer / Build Test (amd64, amd64) (push) Has been cancelled Details Release / Windows Installer / Build Test (arm64, arm64) (push) Has been cancelled Details Release / Comment release artifacts (push) Has been cancelled Details Release / trigger_signer (push) Has been cancelled Details sync tag / trigger_sync_tag (push) Has been cancelled Details sync tag / trigger_android_bump (push) Has been cancelled Details sync tag / trigger_ios_bump (push) Has been cancelled Details update docs / trigger_docs_api_update (push) Has been cancelled Details private services on a custom domain didn't resolve on clients — the synthesized DNS zone was anchored to the cluster, and the account's custom domains weren't even loaded. - account.go — SynthesizePrivateServiceZones now keys zones by a resolved apex (privateServiceDomainZone): cluster suffix → registered account.Domains (filtered by matching TargetCluster, longest wins) → skip if none. One zone per apex; custom-domain services group under their registered domain. - sql_store.go — GetAccount now loads account.Domains on both loaders (gorm Preload("Domains") + pgx goroutine via ListCustomDomains; errChan buffer bumped 12→16). This was the reason the deploy didn't work — the relation was empty in prod. - Tests — custom-domain zone synthesis cases (apex resolution, free+custom separation, sibling collapse, cluster mismatch, mixed cluster/custom/public) + GetAccount domain-preload tests on sqlite and Postgres. v0.72.2	2026-06-06 12:56:01 +02:00
Maycon Santos	b377d99933	[management] Copy private field on shallowCloneMapping (#6347 ) Some checks failed Release / FreeBSD Port / Build & Test (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / release_ui (push) Has been cancelled Details Release / release_ui_darwin (push) Has been cancelled Details Release / Windows Installer / Build Test (amd64, amd64) (push) Has been cancelled Details Release / Windows Installer / Build Test (arm64, arm64) (push) Has been cancelled Details Release / Comment release artifacts (push) Has been cancelled Details Release / trigger_signer (push) Has been cancelled Details sync tag / trigger_sync_tag (push) Has been cancelled Details sync tag / trigger_android_bump (push) Has been cancelled Details sync tag / trigger_ios_bump (push) Has been cancelled Details update docs / trigger_docs_api_update (push) Has been cancelled Details * [management] Copy private field on shallowCloneMapping added test to ensure clone handles new fields * Remove unnecessary debug logs from proxy service * Increase Wasm binary size limit to 60MB in build validation v0.72.1	2026-06-05 22:45:49 +02:00
Theodor Midtlien	512899d82d	[client] Prevent corruption from competing log rotation and improve debug bundle (#6214 ) Some checks failed Release / FreeBSD Port / Build & Test (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / release_ui (push) Has been cancelled Details Release / release_ui_darwin (push) Has been cancelled Details Release / Windows Installer / Build Test (amd64, amd64) (push) Has been cancelled Details Release / Windows Installer / Build Test (arm64, arm64) (push) Has been cancelled Details Release / Comment release artifacts (push) Has been cancelled Details Release / trigger_signer (push) Has been cancelled Details sync tag / trigger_sync_tag (push) Has been cancelled Details sync tag / trigger_android_bump (push) Has been cancelled Details sync tag / trigger_ios_bump (push) Has been cancelled Details update docs / trigger_docs_api_update (push) Has been cancelled Details * Adds heuristic to detect an edge case on Linux where a system has configured logrotate as a separate service to rotate log files which would mangle our client log files. If we detect logrotate being configured for netbird, we disable our rotation. * Adds new env var to disable log rotation: NB_LOG_DISABLE_ROTATION * Adds compressed and plain logrotate files to debug bundle. * Replaces lumberjack with timberjack (maintained fork with bug fixes and extra features). * Clarifies which daemon version is running in the bundle stats. * Change logging for client service status to console v0.72.0	2026-06-04 17:36:45 +02:00
Theodor Midtlien	5993ec6e43	[client] Allow wireguard port to be zero in UI and show port in status command (#6158 ) * Allow wireguard port to be set to 0 in UI * Add wireguard port to cmd status * Correct protoc version	2026-06-04 15:04:11 +02:00
Maycon Santos	eac6d501c3	[infrastructure] allow docker image overrides for getting started (#6335 ) * [infrastructure] allow docker image overrides for getting started Make dashboard and server image configurations overrideable via environment variables * [infrastructure] update Traefik gRPC rule to include ProxyService PathPrefix * make Traefik and CrowdSec images configurable via environment variables	2026-06-04 11:24:47 +02:00
Maycon Santos	deeae30612	[misc] Add Codecov integration and coverage reporting across workflows (#6333 )	2026-06-03 19:08:45 +02:00
Bethuel Mmbaga	f3cdf163e1	[management] Export ResolveDomain (#6334 )	2026-06-03 19:53:57 +03:00
Zoltan Papp	3e61ccb162	[client] Persist sync response via pluggable store (disk on iOS) (#6331 ) * Persist sync response via pluggable store (disk on iOS) The latest Management sync response (which carries the network map) was kept in memory for debug bundle generation. On memory-constrained platforms like iOS the network map can be large enough to matter. Introduce a syncstore package with a Store interface and two backends: a memory backend (the previous behavior) and a disk backend that serializes the response to a file in the state directory. The backend is selected per-platform at build time: disk on iOS, memory elsewhere. The disk store clears any leftover file on construction so a fresh store never reads stale data from an earlier run (e.g. another profile's network map). In the engine, drop the separate persistSyncResponse bool: the store is only instantiated while persistence is enabled, and its presence is what marks persistence as active. The store is also cleared on engine close so the file does not linger on disk. * syncstore: silence nilnil linter on "nothing stored" returns Get returns (nil, nil) to signal that nothing is stored, which is part of the Store contract and preserves the original behaviour. Annotate both backends with //nolint:nilnil so golangci-lint does not flag it. * syncstore: hold syncRespMux for the whole store Set/Get Both handleSync and GetLatestSyncResponse snapshotted e.syncStore under the read lock and then released it before calling Set/Get. That allowed SetSyncResponsePersistence(false) or engine close to clear the store mid-call. In particular a concurrent Clear()+nil followed by a late Set could re-create the file that was just removed, defeating the leak/lingering protection. Hold syncRespMux for the duration of the store operation in both spots so the store cannot be cleared while a Set/Get is in flight. * syncstore: avoid StateDir "." when state path is empty On mobile the state path may be empty (the engine tolerates a missing state file). filepath.Dir("") returns ".", which would make a disk-backed syncstore write into the working directory instead of letting NewDiskStore fall back to os.TempDir(). Only set engineConfig.StateDir when path is non-empty.	2026-06-03 14:18:50 +02:00
Viktor Liu	a48c20d8d8	[client] Gate DNS forwarder on BlockInbound (#6257 )	2026-06-03 11:33:29 +02:00
Riccardo Manfrin	2b57a7d43b	[client, management, misc] expose VCS revision in dev build version output (#6263 ) * Refactor to use a common checker for development version * Adds commit sha to development version for cobra command only Leave dashboard unaffected * Adjust for "v0.31.1-dev" test case which must be considered pre-release * Drop synthetic "dev"/"0.50.0-dev" firewall feature-gate fixtures These test cases encoded the loose strings.Contains(v, "dev") semantics inherited from peerSupportedFirewallFeatures, but NetbirdVersion() never produces those values — only the literal "development" (and now "development-<sha>[-dirty]") ever flows through the wire. The agent owns the semantics of an ephemeral development build, so the tests should exercise the strings we actually emit. Replaced with development, development-<sha> and development-<sha>-dirty cases that match the HasPrefix("development") predicate introduced upstream. * Remove unexistent tests on wire format The sha / dirty flag are added only when the CLI asks the version. Account versions is unaffacted and can only strictly match "development" * Adds tests for IsDevelopmentVersion	2026-06-03 08:56:50 +02:00
Maycon Santos	fa1e241aea	[management, client, proxy] Follow-up fixes for private reverse-proxy services (#6268 ) * fix(proxy): gate tunnel-peer fast-path on inbound listener marker forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT source IP, so a public client whose address happened to fall in those ranges could bypass the configured operator auth scheme by colliding with a known tunnel IP. The fast-path is now gated on TunnelLookupFromContext(r.Context()) being present — that context value is attached only by the per-account inbound (overlay) listener, so the host-facing listener never enters this branch. Tests updated to reflect the new requirement: requests that don't carry the inbound marker now fall through to the regular auth flow. * fix(proxy): harden inbound listener resource + startup-ctx handling Three correctness fixes on the per-account inbound path, with tests: - Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands back an io.PipeWriter backed by a pipe + scanner goroutine that the caller owns; the two writers per account (https + plain) were never closed, leaking the pipe and goroutine on every teardown. - Run the post-Start hooks on context.Background(). runClientStartup is launched in a goroutine from AddPeer and was inheriting the caller's request-scoped ctx, so a cancelled request could abort the inbound bring-up or fail the management status notification. The tail is split into notifyClientReady so the contract is testable. Tests cover the PipeWriter close behaviour and assert the readyHandler + NotifyStatus calls receive a non-cancelled background context. feat(proxy): short-circuit peer-own-target loops with 421 When a peer that hosts the target of a private service dials its own service URL the request was being looped through the proxy and back over WireGuard to the same peer — twice the WG round-trip for no benefit, with no signal to the caller that something was wrong. Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request arrived on the per-account overlay listener (IsOverlayOrigin) and the source tunnel IP matches the target host, refuse the request with 421 Misdirected Request and a body pointing the operator at the backend directly. The gate is scoped to overlay origin so requests on the public listener that happen to share a source IP with the target host are forwarded normally. * fix(management): private-service validation + tunnel-IP lookup semantics - Require an explicit port for L4 cluster targets. validateL4Target exempted TargetTypeCluster from the port check, but buildPathMappings serializes every L4 target via net.JoinHostPort(host, port) — port=0 shipped a ":0" upstream. Cluster targets use the same Host/Port fields, so the same requirement applies. - GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping every error to Internal. The proxy's ValidateTunnelPeer probes IPs that legitimately aren't in the roster; the miss is expected and now distinguishable from a real store failure. - Thread ctx into getClusterCapability's gorm query so a cancelled request doesn't keep the store busy. Tests updated for the L4-cluster port requirement and the GetPeerByIP NotFound path. * fix(client): include offlinePeers in PeerStateByIP lookup ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP only scanned d.peers. Callers (the local DNS filter via localPeerConnectivity, embed.Client.IdentityForIP used by the proxy's tunnel-peer validator) were treating known-but-offline peers as unknown, which: - causes the DNS filter to keep returning records pointing at peers that have no live tunnel, AND - makes the proxy's local-roster check deny a request from such a peer rather than letting the cached management RPC carry the authorisation decision. Search both slices in PeerStateByIP. Adds a unit test for the IPv4 and IPv6 offline-match paths. * fix(rest): reject empty Delete path params in reverse-proxy clients ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed the path parameter into url.PathEscape without an empty check. PathEscape("") returns "" which collapses the request onto the collection endpoint ("/api/reverse-proxies/clusters/" / "/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no id reached a routable URL with surprising semantics (typically 405). Short-circuit with a typed error before the request is built. Tests mount a handler on the collection path that fails the test if hit, so the regression is impossible to reintroduce silently. * chore(api,ci,docs,test): private-service schema, proto-check, fixups Non-functional cleanups and contract/CI hardening around the private-service work: API schema (openapi.yml): - Require a non-empty access_groups and mode=http when private=true, on both Service and ServiceRequest, mirroring validatePrivateRequirements. mode stays optional-but-constrained (empty defaults to http server-side), matching runtime. CI (proto-version-check.yml): - Cover renamed .pb.go files (read base via previous_filename). - Match protoc-gen-go-grpc version headers (optional "- " prefix and -gen-go-grpc suffix) so grpc-generated files are in scope. Docs / comments: - Reword Config field docs to say defaults are applied at Server.Start (initDefaults), not New. - Rename the obsolete --private-inbound flag to --private across comments and the proto doc. Pre-existing test fixups surfaced by review: - Repair the integration-tagged validate_session_test.go (SignToken signature growth + new Manager interface methods). - Fix the CI-skip boolean precedence so Windows isn't skipped unconditionally. - Guard the router.HTTPListener type assertion with comma-ok. * fix(proxy): background ctx for already-started AddPeer notification The earlier ctx fix covered the async runClientStartup path but missed the synchronous branch: when a service is added to an already-started client, AddPeer called NotifyStatus with the caller's request-scoped ctx. A cancelled request/stream could drop the connected notification to management. Use context.Background() here too, matching notifyClientReady. Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to pass a pre-cancelled caller ctx and assert the notification still ran on a non-cancelled context. * use the cmd context for roundtripper	2026-06-02 13:40:09 +02:00
Viktor Liu	e7c9182ff9	[client] Offer injected ICMPv6 echo replies to packet capture (#6321 )	2026-06-01 19:38:00 +02:00
Pascal Fischer	9189625487	[management] enrich context in permissions manager (#6286 )	2026-05-29 16:36:38 +02:00
Bethuel Mmbaga	e9dbf9db6f	[management] Extend combined server initialization (#6156 )	2026-05-29 17:35:35 +03:00
Theodor Midtlien	5a9e9e7bc9	[Infrastructure] Pin actions with SHA and improve workflows (#6249 ) * Pin actions with SHA, replace unmaintained, add dependabot for actions * Update FreeBSD to version 15 for tests * Use shared actions * Update sign-pipelines version	2026-05-29 15:24:30 +02:00
Viktor Liu	43e041cf9f	[client] Apply netroute unspecified-destination workaround on android (#6192 )	2026-05-29 15:15:22 +02:00
Viktor Liu	77e5693200	[client] Recognize NetBird DNS forwarder port in capture text format (#6177 )	2026-05-29 15:14:32 +02:00
Zoltan Papp	174dc24867	[management] Add SSO session extend flow (management) (#6197 ) * add SSO session extend flow (management) Adds the management-server half of the SSO session-extension feature: - New ExtendAuthSession gRPC RPC that refreshes a peer's session expiry using a fresh JWT, validated through the same pipeline as Login but without tearing down the tunnel or redoing the NetworkMap sync. - Per-peer SessionExpiresAt timestamp on every LoginResponse and SyncResponse so connected clients learn the deadline on the existing long-lived stream, and admin-side changes (toggling expiration, changing the expiration window) reach every peer within seconds. - SessionExpiresAt(...) helper on Peer that derives the absolute UTC deadline from LastLogin + the account-level PeerLoginExpiration setting, returning zero when the peer is not SSO-tracked or expiration is disabled. The matching client-side consumer of these fields lands separately. * encode SessionExpiresAt as 3-state on the wire Previously the `sessionExpiresAt` field on LoginResponse, SyncResponse and ExtendAuthSessionResponse was 2-state: a valid timestamp meant "new deadline", and nil meant "clear". That conflated two distinct meanings — "no info in this snapshot" vs "expiry is explicitly off / peer is not SSO-tracked" — so a Sync push that legitimately couldn't compute the deadline (settings lookup failed) would silently clear the client's anchor and lose the warning window. Three states now, encoded on the same field number (no .proto schema churn — only comments and the server-side encoder change): - nil pointer (field absent) → "no info"; client preserves anchor - &Timestamp{} (seconds=0, nanos=0) → explicit "disabled / not SSO" sentinel; client clears - valid timestamp → new absolute UTC deadline A new encodeSessionExpiresAt helper centralises the zero/non-zero encoding and is shared by the Sync, Login and ExtendAuthSession builders. The Sync builder still emits nil when settings are missing. Login and ExtendAuthSession always carry an authoritative value. The matching client-side decoder lands on feature/session-extend. * add UserExtendedPeerSession activity event ExtendAuthSession previously reused UserLoggedInPeer for its audit record, which conflated two distinct user actions: a full interactive SSO login (tunnel re-established, network map resync) versus an in-place deadline refresh (tunnel untouched). Auditors reading the log couldn't tell which one happened, and downstream dashboards/alerts on "login" volume were polluted by routine extends. Adds a dedicated UserExtendedPeerSession Activity (code 125, "user.peer.session.extend") and switches ExtendPeerSession over to it. The peer-extend audit trail is now distinguishable from interactive logins. * make ExtendAuthSession JWT-retry backoff cancellable Skip the retry log and 200ms wait on the final attempt, and replace the uncancellable time.Sleep with a select on time.After/ctx.Done so an upstream cancellation aborts the wait instead of running it to completion.	2026-05-28 19:14:14 +02:00
Riccardo Manfrin	7ea5e37dd4	[client] Improve rosenpass support (#6136 ) * Updates rosenpass version go-rosenpass v0.4.0 → v0.5.42 bump — detailed findings Change summary cunicu.li/go-rosenpass v0.4.0 → v0.5.42 (target) cilium/ebpf v0.15.0 → v0.19.0 (transitive) gopacket/gopacket v1.1.1 → v1.4.0 (transitive) wireguard 2023-07 → 2023-12 (transitive) wireguard/wgctrl 2023-04 → 2024-12 (transitive) Wire interop v0.4.0 (in v0.70.5) <-> v0.5.42 OK v0.5.42 <-> v0.5.42 OK Quantum resistance: true both ends --- Replay error eliminated. Before (on v0.4.0): `ERROR Failed to handle message: failed to load biscuit (ICR1): detected replay` Recurring every ~50ms for minutes at a time. Gone entirely after both ends upgraded to v0.5.42. Upstream fix in biscuit/replay handling between v0.4.x and v0.5.x series. * Fixup [::]:port socket trying to send to v4 * Adds more tests on netbird<->rosenpass interactions * Anticipates rp handler creation before generateConfig * [client] Moves deterministic key gen into rosenpass * go mod tidy * Adds reminder to reason about rosenpass surface area * Apply code rabbit suggestions	2026-05-28 09:01:18 +02:00

1 2 3 4 5 ...

2957 Commits