* [client] propagate exit-node deselect to synthesized v6 (::/0) route
When a client deselects an IPv4 exit node, the auto-generated IPv6 default
route (::/0) was still selected and pushed onto the tunnel interface, even
though the user disabled the exit node. On an exit node without a real IPv6
egress this blackholes IPv6 traffic, and because clients prefer IPv6 (happy
eyeballs) it can break general connectivity.
Root cause: the synthesized v6 route gets a different NetID than its v4 base
(base + "-v6"). The route selector keys deselects by NetID and defaults
unknown NetIDs to selected, so the "-v6" entry was never matched by the v4
deselect. The effectiveNetID() mirror that solves exactly this is used by
HasUserSelectionForRoute and FilterSelectedExitNodes, but categorizeUserSelection
called the raw IsSelected(), bypassing it and mis-categorizing the v6 pair as
user-selected.
Add RouteSelector.IsSelectedForExitNode(), which applies effectiveNetID before
the selection check, and use it in categorizeUserSelection. IsSelected() is left
untouched so non-exit code paths don't make unrelated "*-v6" routes inherit v4
state. Adds regression tests for the v4/v6 deselect mirror and explicit-v6
override.
* [client] add DIAG logging to trace exit-node v6 (::/0) route filtering
Temporary diagnostics to find why a deselected v4 exit node's synthesized
::/0 route still reaches the tunnel. Logs the full install path: incoming
client networks, route-selector state before/after the management-driven
update, what updateExitNodeSelections deselects/selects, and per-route
KEEP/SKIP/DROP decisions in FilterSelectedExitNodes and applyExitNodeFilter.
To be reverted once the real root cause is confirmed from a client log.
* [client] clear orphaned v6 exit selection when v4 pair is toggled
Root cause of the leaking ::/0 route, confirmed from client logs: the
synthesized "-v6" exit route could stay explicitly selected in the persisted
route-selector state while its v4 base was deselected (selected=[...-v6],
deselected=[...v4base]). Because the v6 entry then has its own explicit state,
effectiveNetID stops mirroring the v4 base, so FilterSelectedExitNodes keeps
::/0 and it is installed on the tunnel even though the user disabled the exit
node. This happened because the iOS SDK's deselect only pairs the "-v6" sibling
via ExpandV6ExitPairs when the v6 route is present in the current routesMap; a
deselect at a moment it wasn't expanded left the v6 selection orphaned.
Fix at the selector write path so it is independent of routesMap timing: when a
v4 exit NetID is selected or deselected, clear any orphaned explicit state on
its "-v6" sibling (clearPairedV6Locked), unless the sibling is part of the same
batch (the deliberate ExpandV6ExitPairs case). The v6 then falls back to
inheriting the v4 base via effectiveNetID, so a v4 deselect also drops ::/0 and
a v4 select brings both back.
Adds regression tests: a stale explicit v6 selection is cleared by a later v4
deselect, and an explicit v6 select made in the same batch is preserved.
* [ios] compute route connection status in the bridge
The iOS bridge exposed a route's Network as a possibly comma-joined string
("0.0.0.0/0, ::/0" for a merged exit node) but no connection status, forcing
the UI to infer status by string-matching that joined value against peer
routes — which never matched for the merged exit node, leaving it stuck as
not-connected. Android already computes status in the core (findBestRoutePeer).
Mirror that here: add a Status field to RoutesSelectionInfo and compute it from
the connected peers' route tables, matching the route's primary prefix, a merged
exit node's extra v6 prefix, or a dynamic route's domain pattern (the key the
route manager records). The UI can now read the status directly.
* [client] remove exit-node v6 DIAG logging and tidy routeselector
Drop the temporary DIAG diagnostics added to trace the leaking ::/0 route
(the root cause is fixed and confirmed). Also reorganize routeselector.go so
the exit-node helpers (clearPairedV6Locked, isExitNode) sit next to the
exit-node code paths and MarshalJSON/UnmarshalJSON are grouped together.
* [client] mirror v4 exit selection onto v6 pair at write time
The synthesized "-v6" exit route shares its v4 base's NetID plus a "-v6"
suffix. Selection state was reconciled at read time via effectiveNetID, a
mirror that could only be applied on exit-node code paths, which forced a
parallel IsSelectedForExitNode() alongside IsSelected() and a clearPairedV6Locked()
orphan cleanup on every toggle. That machinery still missed the case observed
in the field: a persisted state with the v4 base deselected but its "-v6"
sibling explicitly selected (orphaned). Because effectiveNetID returns the v6
entry itself once it carries explicit state, and clearPairedV6Locked only fires
on a live toggle, the loaded orphan survived and the ::/0 route leaked onto the
tunnel despite the exit node being disabled, breaking IPv6 (happy eyeballs).
Treat the v4/v6 exit pair as a single toggle and keep state consistent at write
time instead. RouteSelector.SyncPairedSelection forces the "-v6" entry to match
its v4 base unconditionally, resetting any orphaned explicit state. The route
manager, which knows the route prefixes, computes the pairs (V6ExitMergeSet) and
calls it from updateRouteSelectorFromManagement before selection is read, so both
collectExitNodeInfo and FilterSelectedExitNodes see consistent state, including
pairs loaded from persisted selector state.
This removes effectiveNetID, IsSelectedForExitNode and clearPairedV6Locked; the
selector is literal again and no longer needs the "exit-node paths only" caveat.
HasUserSelectionForRoute and applyExitNodeFilter use the raw NetID.
Adds a selector test for SyncPairedSelection (including the orphaned-v6 case) and
a route-manager test reproducing the persisted-orphan scenario from the field log.
* [client] add DIAG logging to trace v6 exit-pair mirror
The write-time mirror did not eliminate the leak in field testing. Re-add the
DIAG diagnostics around the exit-node selection flow to capture a fresh trace:
- UpdateRoutes: incoming client networks, selector state before/after the
management update, and the networks remaining after FilterSelectedExitNodes.
- mirrorV6ExitPairSelections: the NetIDs present in this update and the v6 pairs
V6ExitMergeSet derives from them (reveals whether the v4 base and its ::/0 pair
are present in the same update so the pair can be matched).
- SyncPairedSelection: the base/paired state before and after the sync.
- FilterSelectedExitNodes / applyExitNodeFilter: per-route SKIP/KEEP/DROP and the
selection lookups behind each decision.
- updateExitNodeSelections / logExitNodeUpdate: categorization and deselect set.
Temporary; to be removed once the root cause is confirmed.
* [client] remove v6 exit-pair mirror DIAG logging
Drop the temporary DIAG diagnostics added to trace the v4/v6 exit-pair mirror.
The field log confirmed the write-time mirror keeps the pair consistent (the
::/0 route is only ever applied alongside its v4 base and is dropped on deselect),
so the diagnostics are no longer needed.
* fix(management): treat ci- builds as development for remote jobs
CI snapshot builds use a "ci-<sha>" version string that did not match
IsDevelopmentVersion, so the remote-jobs minimum-version gate rejected
them. Recognize the "ci-" prefix as a development build.
* fix(management): treat dev- builds as development for remote jobs
Dev snapshot builds use a "dev-<sha>" version string that did not match
IsDevelopmentVersion, so the remote-jobs minimum-version gate rejected
them. Recognize the "dev-" prefix as a development build, alongside the
existing "ci-" prefix.
When JWT group sync is enabled with a restrictive JWTAllowGroups list, the local owner of an embedded-IdP (Dex) deployment can get locked out. The allow-groups check runs account-wide but local password users do not receive
external IdP group claims, so they can't satisfy the allowed list.
This skips JWT group evaluation for local Dex users so the restriction and JWT group sync continue to apply to external-IdP users as intended.
* Initial scaffolding
* Applies MDM override
* Unit tests
* Helpers business logic
* Return error if trying to modify any config that is gated by MDM
* Add ManagedFields to returned config over GetConfig
* Adds initial 101 MDM policy business logic testing
* gRPC MDM changes
* MDM Name scoping for clarity
* Implements windows loading of MDM policy
* Adds missing WGPort config
* Cleanup setupKey to align to linear
* Align split tunnel code
* Adds some log
* Prefix every log with MDM
* Adds debug config cobra command
This can be useful for troubleshooting and checking config
now that its resolution is not trivial
defaults > config > env cars > CLI/UI > MDM
* Adds MDM 1m diff checker & reloader
* Adds also up/start after cancel
* Publishes event for UI to sync upon MDM changes
* Add events to resync UI to actual config
This also provide fixup for UI no aligning to changed config when coming from cli up with config flags.
* UI behavior conflicts relaxation
UI sends full config snapshot with all values. It doesn't
make sense to block it if the values are aligned with the
values constrained by the MDM policy. It's just simplier
to allow values that are compliant. (this goes for the CLI
as well at this point)
* Lock toggle Settngs
* Advanced Settings locking
* Fixup presharedkey
* Apply MDM locks
* Toggle gray in/out for Advanced Settings
* Adds support for disabling of Profiles and UpdateSettings feature flags
* Adds Gate Login as well when --disable-update-settings=true is given to service
This commit tries to settle things with an old PR-4237 which had relaxed
the case where the SetConfig returned an `Unavailable` code error.
Under this circumnstance the PR allowed the upFunc to just emit a warning and
progress further with the login gRPC. Since the login call is consuming
the --management-url coming from the `up` command, it might be possible
to abuse the "Unavailable" code to inject a management URL that is different
from the configured one even though the --disable-update-settings is set
to true (?)
* Evaluate disable-update-settings errors only when there's an actual override
* [UI] Fixup advanced Settings
* [UI] Fixup for preshared key
* [UI] Fixup for profile enable/disable toggle
We need to align the initial state to evaluate the delta in case.
The initial state has to be "true" since the profile starts visible.
Then we receive MDM and transition the cache bool value to the actual
MDM imposed state
* Enforces disable networks
* [UI] Aligns to "enable/disable once on change only"
* Fixup: MDM wins. always
* Removes --disable-advanced-settings
It was a typo in our meetings. the actual thing is --disable-update-settings
* [PROTO] Removes --disable-advanced-settings
* [UI] Removes --disable-advanced-settings
* Pins feat profile retrieval to notif event
* [UI] Fix for "hide" not working when propagating to parent with children
* Adds dep for reading plist files
* Introduces support for darwing plist loading
* Tests MDM config reload via ticker
* [PROVISIONING] ADMX/ADML/PS/bash scripts/templates
* CI fixes
- Add docstrings to `mdm_integration`
- refactor for cognitive complexity
- mod tidy
* Linting
* Add docstrings to `mdm_integration`
* nil,nil is no policy and no error. Allow it
* nil,nil is no policy and no error. Allow it
* exclude MDM profile adminstrated keys data from debug bundle
* Fixes Rosenpass left disable after MDM unlock
* Partial revert coderabbit added docstrings
* Renaming fix
* Avoid locking on clientRunning bool when the connection is aborted for whatever reason
We want to just signal this through the giveUpChan, we will manage the signal from
the waiter side and in case set it to false there. THis way we avoid locking,
which should allow the MDM down+wait_for_term_chan_signal_+up procedure
clientRunning is used to signal two different conditions here:
1. the initialization procedure is over (we have an engine)
2. the connection being up (or being attempted)
Probably these two functionalities should not alias, and the failure of the second condition
(because of any error) should just drive a reconnection (currently it's not happening,
and we silently go idle).
OR, mor probably, the two things are the SAME and there should not exist a case where
we did the "Up" initialization and connection attempt but we are not still attempting it.
* Moves test helper at te very bottom
* Addresses github comments
* No lock no copy
* Prevents engine not stopping within 10 secs from being paired by another instance
We instead juts SKIP updating the policy, so
1. the MDM ticker will kick in 1 minute time,
2. find the policy misaligned,
3. enter the onMDMPolicyChange,
4. find the s.clientRunning == true
(because it is set to false only in server cleanupConnection,
and not by s.actCancel())
5. call s.actCancel() again if not nil
6. immediately return from <-s.clientGiveUpChan
7. finally call s.restartEngineForMDMLocked()
* Since we ARE running there should be a config
If the config was cancelled midflight, connect will abort later on
* DisableAutoConnect should not stop a running connection.
DisableAutoConnect should just avoid the connection attempts *when the service starts*.
If we are started and we are up and running, DisableAutoConnect should not kick in.
Another PR will follow about this topic
* Removes unused vars
* Moves callback into Run method arg
* align comment to removal of DisableAutoConnect
DisableAutoConnect should just avoid the connection attempts *when the service starts*.
If we are started and we are up and running, DisableAutoConnect should not kick in
* Removes unused managed_fields data.
This was initially used to drive the UI but approach changed
to reload config/features upon notifications which makes this data redundant.
* Reorder stuff
* Unexport unrequired vars/functions
PoliciesEqual → policiesEqual
AllKeys → allKeys
* Adds list of MDM managed fields in the debug bundle
* [client] Index peer tunnel IPs for O(1) PeerStateByIP lookup
Replace the linear scan over all peers with an ipToKey map maintained
by AddPeer/RemovePeer, covering both IPv4 and IPv6 tunnel addresses.
Offline peers are intentionally no longer resolvable by IP: only active
peers can carry traffic, so IdentityForIP and the DNS disconnected-peer
filter now treat them as unknown, same as foreign IPs.
Skip the DNS answer filter for single-record responses; dropping the
only answer was always restored by the empty-answer escape hatch, so
the fast path is behavior-neutral.
* Ensure `ipToKey` entries are only removed if they match the peer being deleted, preventing accidental removal of unrelated mappings.
- Engine.Start takes syncMsgMux with a deferred unlock (engine.go:445) and parks in receiveSignalEvents → WaitStreamConnected (engine.go:1762), which only wakes on
signal-stream connect or client-context cancellation.
- When signal never connects, the 30s startup timeout fires and embed.Client.Start's rollback (embed.go:281) called client.Stop() → Engine.Stop, which blocks acquiring
syncMsgMux (engine.go:318). The cancel() that would unpark Start was deferred until Start returned — permanent cycle. RemovePeer calls (g43/g385) then queue behind the
lifecycle mutex.
- Notably, embed.Client.Stop and the daemon's cleanupConnection both cancel before stopping — the startup rollback was the only path that didn't.
- Engine.Start takes syncMsgMux with a deferred unlock (engine.go:445) and parks in receiveSignalEvents → WaitStreamConnected (engine.go:1762), which only wakes on
signal-stream connect or client-context cancellation.
- When signal never connects, the 30s startup timeout fires and embed.Client.Start's rollback (embed.go:281) called client.Stop() → Engine.Stop, which blocks acquiring
syncMsgMux (engine.go:318). The cancel() that would unpark Start was deferred until Start returned — permanent cycle. RemovePeer calls (g43/g385) then queue behind the
lifecycle mutex.
- Notably, embed.Client.Stop and the daemon's cleanupConnection both cancel before stopping — the startup rollback was the only path that didn't.
* [management] Add version gate to stop sending deprecated RemotePeers field
don't send top-level remote peers on peers in the v0.29.3 or newer
* precompute deprecated remote peers version constraint
* [management] update tests to validate network map-based remote peers
* [management] move deprecatedRemotePeersVersion constant closer to its usage
* fix misplaced precomputed constraint definition
* ensure top-level RemotePeers is empty for v0.29.3+ clients
* [client] Preserve posture checks on config-only sync updates
When management sends a MessageTypeControlConfig update (e.g. relay token
rotation), the SyncResponse carries no NetworkMap and no Checks. Moving the
updateChecksIfNew call after the nm == nil guard ensures posture checks are
only updated when a full network map is present, preventing relay token
rotation from silently clearing the previously applied posture check state.
* [client] Clarify posture check update logic with explicit comment
* [client] Extract NetBird config and sync persistence into helpers
Move the NetbirdConfig handling block out of handleSync into
updateNetbirdConfig and the sync response persistence into
persistSyncResponse, mirroring updateChecksIfNew. This flattens
handleSync and makes the individual update steps unit-testable.
* Made the docker check first for getting-started.sh, better atomic support for install.sh
* Check for docker socket perms
* Added fallback for systems without rpm-ostree or bootc.
* macOS fix for docker socket check
* Change error message for docker group.
No longer using a blanket recommendation for the docker group.
private services on a custom domain didn't resolve on clients — the synthesized DNS zone was anchored to the cluster, and the account's custom domains weren't even
loaded.
- account.go — SynthesizePrivateServiceZones now keys zones by a resolved apex (privateServiceDomainZone): cluster suffix → registered account.Domains (filtered by matching
TargetCluster, longest wins) → skip if none. One zone per apex; custom-domain services group under their registered domain.
- sql_store.go — GetAccount now loads account.Domains on both loaders (gorm Preload("Domains") + pgx goroutine via ListCustomDomains; errChan buffer bumped 12→16). This was
the reason the deploy didn't work — the relation was empty in prod.
- Tests — custom-domain zone synthesis cases (apex resolution, free+custom separation, sibling collapse, cluster mismatch, mixed cluster/custom/public) + GetAccount
domain-preload tests on sqlite and Postgres.
* [management] Copy private field on shallowCloneMapping
added test to ensure clone handles new fields
* Remove unnecessary debug logs from proxy service
* Increase Wasm binary size limit to 60MB in build validation
* Adds heuristic to detect an edge case on Linux where a system has configured logrotate as a separate service to rotate log files which would mangle our client log files. If we detect logrotate being configured for netbird, we disable our rotation.
* Adds new env var to disable log rotation: NB_LOG_DISABLE_ROTATION
* Adds compressed and plain logrotate files to debug bundle.
* Replaces lumberjack with timberjack (maintained fork with bug fixes and extra features).
* Clarifies which daemon version is running in the bundle stats.
* Change logging for client service status to console
* [infrastructure] allow docker image overrides for getting started
Make dashboard and server image configurations overrideable via environment variables
* [infrastructure] update Traefik gRPC rule to include ProxyService PathPrefix
* make Traefik and CrowdSec images configurable via environment variables
* Persist sync response via pluggable store (disk on iOS)
The latest Management sync response (which carries the network map) was
kept in memory for debug bundle generation. On memory-constrained
platforms like iOS the network map can be large enough to matter.
Introduce a syncstore package with a Store interface and two backends:
a memory backend (the previous behavior) and a disk backend that
serializes the response to a file in the state directory. The backend
is selected per-platform at build time: disk on iOS, memory elsewhere.
The disk store clears any leftover file on construction so a fresh
store never reads stale data from an earlier run (e.g. another
profile's network map).
In the engine, drop the separate persistSyncResponse bool: the store is
only instantiated while persistence is enabled, and its presence is
what marks persistence as active. The store is also cleared on engine
close so the file does not linger on disk.
* syncstore: silence nilnil linter on "nothing stored" returns
Get returns (nil, nil) to signal that nothing is stored, which is part
of the Store contract and preserves the original behaviour. Annotate
both backends with //nolint:nilnil so golangci-lint does not flag it.
* syncstore: hold syncRespMux for the whole store Set/Get
Both handleSync and GetLatestSyncResponse snapshotted e.syncStore under
the read lock and then released it before calling Set/Get. That allowed
SetSyncResponsePersistence(false) or engine close to clear the store
mid-call. In particular a concurrent Clear()+nil followed by a late
Set could re-create the file that was just removed, defeating the
leak/lingering protection.
Hold syncRespMux for the duration of the store operation in both spots
so the store cannot be cleared while a Set/Get is in flight.
* syncstore: avoid StateDir "." when state path is empty
On mobile the state path may be empty (the engine tolerates a missing
state file). filepath.Dir("") returns ".", which would make a
disk-backed syncstore write into the working directory instead of
letting NewDiskStore fall back to os.TempDir().
Only set engineConfig.StateDir when path is non-empty.
* Refactor to use a common checker for development version
* Adds commit sha to development version for cobra command only
Leave dashboard unaffected
* Adjust for "v0.31.1-dev" test case
which must be considered pre-release
* Drop synthetic "dev"/"0.50.0-dev" firewall feature-gate fixtures
These test cases encoded the loose strings.Contains(v, "dev")
semantics inherited from peerSupportedFirewallFeatures, but
NetbirdVersion() never produces those values — only the literal
"development" (and now "development-<sha>[-dirty]") ever flows
through the wire. The agent owns the semantics of an ephemeral
development build, so the tests should exercise the strings we
actually emit.
Replaced with development, development-<sha> and
development-<sha>-dirty cases that match the HasPrefix("development")
predicate introduced upstream.
* Remove unexistent tests on wire format
The sha / dirty flag are added only when the CLI asks the version.
Account versions is unaffacted and can only strictly match "development"
* Adds tests for IsDevelopmentVersion
* fix(proxy): gate tunnel-peer fast-path on inbound listener marker
forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.
Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.
* fix(proxy): harden inbound listener resource + startup-ctx handling
Three correctness fixes on the per-account inbound path, with tests:
- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
back an *io.PipeWriter backed by a pipe + scanner goroutine that the
caller owns; the two writers per account (https + plain) were never
closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
is launched in a goroutine from AddPeer and was inheriting the
caller's request-scoped ctx, so a cancelled request could abort the
inbound bring-up or fail the management status notification. The
tail is split into notifyClientReady so the contract is testable.
Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.
* feat(proxy): short-circuit peer-own-target loops with 421
When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.
Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.
The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.
* fix(management): private-service validation + tunnel-IP lookup semantics
- Require an explicit port for L4 cluster targets. validateL4Target
exempted TargetTypeCluster from the port check, but buildPathMappings
serializes every L4 target via net.JoinHostPort(host, port) — port=0
shipped a ":0" upstream. Cluster targets use the same Host/Port
fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
every error to Internal. The proxy's ValidateTunnelPeer probes IPs
that legitimately aren't in the roster; the miss is expected and now
distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
request doesn't keep the store busy.
Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.
* fix(client): include offlinePeers in PeerStateByIP lookup
ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:
- causes the DNS filter to keep returning records pointing at peers
that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
peer rather than letting the cached management RPC carry the
authorisation decision.
Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.
* fix(rest): reject empty Delete path params in reverse-proxy clients
ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).
Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.
* chore(api,ci,docs,test): private-service schema, proto-check, fixups
Non-functional cleanups and contract/CI hardening around the
private-service work:
API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
on both Service and ServiceRequest, mirroring
validatePrivateRequirements. mode stays optional-but-constrained
(empty defaults to http server-side), matching runtime.
CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
-gen-go-grpc suffix) so grpc-generated files are in scope.
Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
(initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
comments and the proto doc.
Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.
* fix(proxy): background ctx for already-started AddPeer notification
The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.
Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.
* use the cmd context for roundtripper
* Pin actions with SHA, replace unmaintained, add dependabot for actions
* Update FreeBSD to version 15 for tests
* Use shared actions
* Update sign-pipelines version
* add SSO session extend flow (management)
Adds the management-server half of the SSO session-extension feature:
- New ExtendAuthSession gRPC RPC that refreshes a peer's session expiry
using a fresh JWT, validated through the same pipeline as Login but
without tearing down the tunnel or redoing the NetworkMap sync.
- Per-peer SessionExpiresAt timestamp on every LoginResponse and
SyncResponse so connected clients learn the deadline on the existing
long-lived stream, and admin-side changes (toggling expiration,
changing the expiration window) reach every peer within seconds.
- SessionExpiresAt(...) helper on Peer that derives the absolute UTC
deadline from LastLogin + the account-level PeerLoginExpiration
setting, returning zero when the peer is not SSO-tracked or expiration
is disabled.
The matching client-side consumer of these fields lands separately.
* encode SessionExpiresAt as 3-state on the wire
Previously the `sessionExpiresAt` field on LoginResponse, SyncResponse
and ExtendAuthSessionResponse was 2-state: a valid timestamp meant
"new deadline", and nil meant "clear". That conflated two distinct
meanings — "no info in this snapshot" vs "expiry is explicitly off /
peer is not SSO-tracked" — so a Sync push that legitimately couldn't
compute the deadline (settings lookup failed) would silently clear the
client's anchor and lose the warning window.
Three states now, encoded on the same field number (no .proto schema
churn — only comments and the server-side encoder change):
- nil pointer (field absent) → "no info"; client preserves anchor
- &Timestamp{} (seconds=0, nanos=0) → explicit "disabled / not SSO"
sentinel; client clears
- valid timestamp → new absolute UTC deadline
A new encodeSessionExpiresAt helper centralises the zero/non-zero
encoding and is shared by the Sync, Login and ExtendAuthSession
builders. The Sync builder still emits nil when settings are missing.
Login and ExtendAuthSession always carry an authoritative value.
The matching client-side decoder lands on feature/session-extend.
* add UserExtendedPeerSession activity event
ExtendAuthSession previously reused UserLoggedInPeer for its audit
record, which conflated two distinct user actions: a full interactive
SSO login (tunnel re-established, network map resync) versus an
in-place deadline refresh (tunnel untouched). Auditors reading the log
couldn't tell which one happened, and downstream dashboards/alerts on
"login" volume were polluted by routine extends.
Adds a dedicated UserExtendedPeerSession Activity (code 125,
"user.peer.session.extend") and switches ExtendPeerSession over to it.
The peer-extend audit trail is now distinguishable from interactive
logins.
* make ExtendAuthSession JWT-retry backoff cancellable
Skip the retry log and 200ms wait on the final attempt, and replace the
uncancellable time.Sleep with a select on time.After/ctx.Done so an
upstream cancellation aborts the wait instead of running it to
completion.
* Updates rosenpass version
go-rosenpass v0.4.0 → v0.5.42 bump — detailed findings
Change summary
cunicu.li/go-rosenpass v0.4.0 → v0.5.42 (target)
cilium/ebpf v0.15.0 → v0.19.0 (transitive)
gopacket/gopacket v1.1.1 → v1.4.0 (transitive)
wireguard 2023-07 → 2023-12 (transitive)
wireguard/wgctrl 2023-04 → 2024-12 (transitive)
Wire interop
v0.4.0 (in v0.70.5) <-> v0.5.42 OK
v0.5.42 <-> v0.5.42 OK
Quantum resistance: true both ends
---
**Replay error eliminated.**
Before (on v0.4.0):
`ERROR Failed to handle message: failed to load biscuit (ICR1): detected replay`
Recurring every ~50ms for minutes at a time. Gone entirely after both ends upgraded to v0.5.42. Upstream fix in biscuit/replay handling between v0.4.x and v0.5.x series.
* Fixup [::]:port socket trying to send to v4
* Adds more tests on netbird<->rosenpass interactions
* Anticipates rp handler creation before generateConfig
* [client] Moves deterministic key gen into rosenpass
* go mod tidy
* Adds reminder to reason about rosenpass surface area
* Apply code rabbit suggestions