* Migrate to profile ids
* Migrate android profile manager
* Clean up
* Fix review
* Add ID type
* Fix test and runes in ShortID()
* Fix profile switch on up and android comments
* Revert android profile to string id
* Fix feedback
* Fix UI feedback
* Fix id assignment
* Add renaming of profiles
* Fix review
* Remove ui binary
* Fix getProfileConfigPath not validating id
* Change resolve handle order and fix server merge problems
* Fix mdm test
- introduce variables to avoid publishing latest docker tags and installers
- Refactor .goreleaser.yaml to simplify docker configurations and add environment-driven flags
- removed management debug containers (it was doing only log var)
- Stopped building arm v6 32bits in favor of v7 32 bits for services (not client)
- Add target argument to docker files
* [client] fix iOS route-update reordering that black-holed IPv6 on exit-node disable
On iOS the route notifier delivered each prefix update from its own
fire-and-forget goroutine (notify -> `go func`), so Go provided no ordering
guarantee between consecutive updates. It also read currentPrefixes inside
that goroutine without holding the lock, racing the next OnNewPrefixes write.
On exit-node disable the core removes the default routes as two separate
prefix updates (0.0.0.0/0, then the synthesized ::/0). When the two
goroutines were reordered, the stale snapshot still containing ::/0 was
delivered last and clobbered the correct default-free one. iOS then kept the
::/0 default route on the tunnel with no exit node to carry it, black-holing
all IPv6 traffic while IPv4 recovered correctly.
Fix: deliver updates through a single worker goroutine fed by a buffered
channel, preserving production order, and snapshot the joined prefix string
under the mutex so it can't race a concurrent update. Buffered so producers
(which run under the route manager lock) don't block on the listener callback.
* [client] close iOS notifier delivery goroutine on Stop, unbounded queue
The delivery goroutine was never stopped, leaking on every engine
restart. Add Notifier.Close, called from the route manager Stop after
routing cleanup.
Replace the buffered update channel with a cond-driven linked-list
queue so route-update producers (running under the route manager lock)
never block when the listener callback is slow.
Expose a network-free login-required check backed by the in-memory status
recorder. Unlike IsLoginRequired(), which creates a fresh auth client and
performs a blocking network call, IsLoginRequiredCached() reports whether the
LAST observed management error was an auth failure (PermissionDenied/
InvalidArgument).
This lets the iOS connection listener detect a mid-session token expiry from
within onDisconnected during teardown without blocking on a slow or
unavailable network.
* Add iOS debug bundle support in Go
Thread cacheDir through NewClient -> RunOniOS -> MobileDependency.TempDir
so the iOS client can pass its sandbox-writable cache directory for
debug bundle zip file creation instead of os.TempDir().
Move log collection into platform-dispatched addPlatformLog():
- iOS: adds the file-based Go client log (with rotation, stderr/stdout
companions and anonymization handled by addLogfile) plus the Swift app
log (swift-log.log) written by the iOS app into the same log directory
- Other non-Android platforms: existing file-based log + systemd fallback
Narrow the debug_nonandroid.go build tag to !android && !ios so iOS no
longer attempts the systemd journal fallback.
Add a DebugBundle() entry point to the iOS Go client that generates a
bundle, uploads it and returns the upload key. It works with or without
a running engine: when the engine is up it reuses the live config, sync
response and client metrics; otherwise it loads the config from disk (or
the preloaded tvOS config). Guard the live config/ConnectClient behind a
state mutex since DebugBundle may run on a different thread.
* Include the iOS state file in the debug bundle
addStateFile() resolved the state path via ServiceManager.GetStatePath(),
which on iOS points at a hard-coded default that does not exist in the app
sandbox, so the state file was silently skipped.
Add an optional StatePath to GeneratorDependencies and use it when set,
falling back to the ServiceManager default otherwise. The iOS DebugBundle
passes the client's actual state file path (the App Group profile state),
matching the Android bundle which includes the state file.
* ios: enable sync response persistence for debug bundle
Turn on sync response persistence before starting the engine so
DebugBundle can include the network map. On iOS the store is disk-backed
(see syncstore) to keep the map out of the constrained process memory.
* ios: pass log file path through NewClient constructor (#6393)
Add logFilePath field to Client struct and expose it as a parameter
in NewClient so callers provide the Go log path at construction time.
Wire it into DebugBundle via GeneratorDependencies.LogPath so the
debug bundle includes client.log and swift-log.log regardless of
whether the bundle is triggered by the app or the management server.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* ios: pass log file path to engine for remote debug bundles
RunOniOS started the engine with an empty LogPath, so EngineConfig.LogPath
was never set. Management-triggered (jobs) debug bundles read the log path
from the engine config, so they collected no client logs (client.log,
rotated logs, swift-log.log). The GUI path was unaffected because it passes
c.logFilePath directly to the bundle generator.
Thread c.logFilePath through RunOniOS into the engine config so remote
bundles include the client logs too.
---------
Co-authored-by: evgeniyChepelev <68751844+evgeniyChepelev@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [client] propagate exit-node deselect to synthesized v6 (::/0) route
When a client deselects an IPv4 exit node, the auto-generated IPv6 default
route (::/0) was still selected and pushed onto the tunnel interface, even
though the user disabled the exit node. On an exit node without a real IPv6
egress this blackholes IPv6 traffic, and because clients prefer IPv6 (happy
eyeballs) it can break general connectivity.
Root cause: the synthesized v6 route gets a different NetID than its v4 base
(base + "-v6"). The route selector keys deselects by NetID and defaults
unknown NetIDs to selected, so the "-v6" entry was never matched by the v4
deselect. The effectiveNetID() mirror that solves exactly this is used by
HasUserSelectionForRoute and FilterSelectedExitNodes, but categorizeUserSelection
called the raw IsSelected(), bypassing it and mis-categorizing the v6 pair as
user-selected.
Add RouteSelector.IsSelectedForExitNode(), which applies effectiveNetID before
the selection check, and use it in categorizeUserSelection. IsSelected() is left
untouched so non-exit code paths don't make unrelated "*-v6" routes inherit v4
state. Adds regression tests for the v4/v6 deselect mirror and explicit-v6
override.
* [client] add DIAG logging to trace exit-node v6 (::/0) route filtering
Temporary diagnostics to find why a deselected v4 exit node's synthesized
::/0 route still reaches the tunnel. Logs the full install path: incoming
client networks, route-selector state before/after the management-driven
update, what updateExitNodeSelections deselects/selects, and per-route
KEEP/SKIP/DROP decisions in FilterSelectedExitNodes and applyExitNodeFilter.
To be reverted once the real root cause is confirmed from a client log.
* [client] clear orphaned v6 exit selection when v4 pair is toggled
Root cause of the leaking ::/0 route, confirmed from client logs: the
synthesized "-v6" exit route could stay explicitly selected in the persisted
route-selector state while its v4 base was deselected (selected=[...-v6],
deselected=[...v4base]). Because the v6 entry then has its own explicit state,
effectiveNetID stops mirroring the v4 base, so FilterSelectedExitNodes keeps
::/0 and it is installed on the tunnel even though the user disabled the exit
node. This happened because the iOS SDK's deselect only pairs the "-v6" sibling
via ExpandV6ExitPairs when the v6 route is present in the current routesMap; a
deselect at a moment it wasn't expanded left the v6 selection orphaned.
Fix at the selector write path so it is independent of routesMap timing: when a
v4 exit NetID is selected or deselected, clear any orphaned explicit state on
its "-v6" sibling (clearPairedV6Locked), unless the sibling is part of the same
batch (the deliberate ExpandV6ExitPairs case). The v6 then falls back to
inheriting the v4 base via effectiveNetID, so a v4 deselect also drops ::/0 and
a v4 select brings both back.
Adds regression tests: a stale explicit v6 selection is cleared by a later v4
deselect, and an explicit v6 select made in the same batch is preserved.
* [ios] compute route connection status in the bridge
The iOS bridge exposed a route's Network as a possibly comma-joined string
("0.0.0.0/0, ::/0" for a merged exit node) but no connection status, forcing
the UI to infer status by string-matching that joined value against peer
routes — which never matched for the merged exit node, leaving it stuck as
not-connected. Android already computes status in the core (findBestRoutePeer).
Mirror that here: add a Status field to RoutesSelectionInfo and compute it from
the connected peers' route tables, matching the route's primary prefix, a merged
exit node's extra v6 prefix, or a dynamic route's domain pattern (the key the
route manager records). The UI can now read the status directly.
* [client] remove exit-node v6 DIAG logging and tidy routeselector
Drop the temporary DIAG diagnostics added to trace the leaking ::/0 route
(the root cause is fixed and confirmed). Also reorganize routeselector.go so
the exit-node helpers (clearPairedV6Locked, isExitNode) sit next to the
exit-node code paths and MarshalJSON/UnmarshalJSON are grouped together.
* [client] mirror v4 exit selection onto v6 pair at write time
The synthesized "-v6" exit route shares its v4 base's NetID plus a "-v6"
suffix. Selection state was reconciled at read time via effectiveNetID, a
mirror that could only be applied on exit-node code paths, which forced a
parallel IsSelectedForExitNode() alongside IsSelected() and a clearPairedV6Locked()
orphan cleanup on every toggle. That machinery still missed the case observed
in the field: a persisted state with the v4 base deselected but its "-v6"
sibling explicitly selected (orphaned). Because effectiveNetID returns the v6
entry itself once it carries explicit state, and clearPairedV6Locked only fires
on a live toggle, the loaded orphan survived and the ::/0 route leaked onto the
tunnel despite the exit node being disabled, breaking IPv6 (happy eyeballs).
Treat the v4/v6 exit pair as a single toggle and keep state consistent at write
time instead. RouteSelector.SyncPairedSelection forces the "-v6" entry to match
its v4 base unconditionally, resetting any orphaned explicit state. The route
manager, which knows the route prefixes, computes the pairs (V6ExitMergeSet) and
calls it from updateRouteSelectorFromManagement before selection is read, so both
collectExitNodeInfo and FilterSelectedExitNodes see consistent state, including
pairs loaded from persisted selector state.
This removes effectiveNetID, IsSelectedForExitNode and clearPairedV6Locked; the
selector is literal again and no longer needs the "exit-node paths only" caveat.
HasUserSelectionForRoute and applyExitNodeFilter use the raw NetID.
Adds a selector test for SyncPairedSelection (including the orphaned-v6 case) and
a route-manager test reproducing the persisted-orphan scenario from the field log.
* [client] add DIAG logging to trace v6 exit-pair mirror
The write-time mirror did not eliminate the leak in field testing. Re-add the
DIAG diagnostics around the exit-node selection flow to capture a fresh trace:
- UpdateRoutes: incoming client networks, selector state before/after the
management update, and the networks remaining after FilterSelectedExitNodes.
- mirrorV6ExitPairSelections: the NetIDs present in this update and the v6 pairs
V6ExitMergeSet derives from them (reveals whether the v4 base and its ::/0 pair
are present in the same update so the pair can be matched).
- SyncPairedSelection: the base/paired state before and after the sync.
- FilterSelectedExitNodes / applyExitNodeFilter: per-route SKIP/KEEP/DROP and the
selection lookups behind each decision.
- updateExitNodeSelections / logExitNodeUpdate: categorization and deselect set.
Temporary; to be removed once the root cause is confirmed.
* [client] remove v6 exit-pair mirror DIAG logging
Drop the temporary DIAG diagnostics added to trace the v4/v6 exit-pair mirror.
The field log confirmed the write-time mirror keeps the pair consistent (the
::/0 route is only ever applied alongside its v4 base and is dropped on deselect),
so the diagnostics are no longer needed.
* Initial scaffolding
* Applies MDM override
* Unit tests
* Helpers business logic
* Return error if trying to modify any config that is gated by MDM
* Add ManagedFields to returned config over GetConfig
* Adds initial 101 MDM policy business logic testing
* gRPC MDM changes
* MDM Name scoping for clarity
* Implements windows loading of MDM policy
* Adds missing WGPort config
* Cleanup setupKey to align to linear
* Align split tunnel code
* Adds some log
* Prefix every log with MDM
* Adds debug config cobra command
This can be useful for troubleshooting and checking config
now that its resolution is not trivial
defaults > config > env cars > CLI/UI > MDM
* Adds MDM 1m diff checker & reloader
* Adds also up/start after cancel
* Publishes event for UI to sync upon MDM changes
* Add events to resync UI to actual config
This also provide fixup for UI no aligning to changed config when coming from cli up with config flags.
* UI behavior conflicts relaxation
UI sends full config snapshot with all values. It doesn't
make sense to block it if the values are aligned with the
values constrained by the MDM policy. It's just simplier
to allow values that are compliant. (this goes for the CLI
as well at this point)
* Lock toggle Settngs
* Advanced Settings locking
* Fixup presharedkey
* Apply MDM locks
* Toggle gray in/out for Advanced Settings
* Adds support for disabling of Profiles and UpdateSettings feature flags
* Adds Gate Login as well when --disable-update-settings=true is given to service
This commit tries to settle things with an old PR-4237 which had relaxed
the case where the SetConfig returned an `Unavailable` code error.
Under this circumnstance the PR allowed the upFunc to just emit a warning and
progress further with the login gRPC. Since the login call is consuming
the --management-url coming from the `up` command, it might be possible
to abuse the "Unavailable" code to inject a management URL that is different
from the configured one even though the --disable-update-settings is set
to true (?)
* Evaluate disable-update-settings errors only when there's an actual override
* [UI] Fixup advanced Settings
* [UI] Fixup for preshared key
* [UI] Fixup for profile enable/disable toggle
We need to align the initial state to evaluate the delta in case.
The initial state has to be "true" since the profile starts visible.
Then we receive MDM and transition the cache bool value to the actual
MDM imposed state
* Enforces disable networks
* [UI] Aligns to "enable/disable once on change only"
* Fixup: MDM wins. always
* Removes --disable-advanced-settings
It was a typo in our meetings. the actual thing is --disable-update-settings
* [PROTO] Removes --disable-advanced-settings
* [UI] Removes --disable-advanced-settings
* Pins feat profile retrieval to notif event
* [UI] Fix for "hide" not working when propagating to parent with children
* Adds dep for reading plist files
* Introduces support for darwing plist loading
* Tests MDM config reload via ticker
* [PROVISIONING] ADMX/ADML/PS/bash scripts/templates
* CI fixes
- Add docstrings to `mdm_integration`
- refactor for cognitive complexity
- mod tidy
* Linting
* Add docstrings to `mdm_integration`
* nil,nil is no policy and no error. Allow it
* nil,nil is no policy and no error. Allow it
* exclude MDM profile adminstrated keys data from debug bundle
* Fixes Rosenpass left disable after MDM unlock
* Partial revert coderabbit added docstrings
* Renaming fix
* Avoid locking on clientRunning bool when the connection is aborted for whatever reason
We want to just signal this through the giveUpChan, we will manage the signal from
the waiter side and in case set it to false there. THis way we avoid locking,
which should allow the MDM down+wait_for_term_chan_signal_+up procedure
clientRunning is used to signal two different conditions here:
1. the initialization procedure is over (we have an engine)
2. the connection being up (or being attempted)
Probably these two functionalities should not alias, and the failure of the second condition
(because of any error) should just drive a reconnection (currently it's not happening,
and we silently go idle).
OR, mor probably, the two things are the SAME and there should not exist a case where
we did the "Up" initialization and connection attempt but we are not still attempting it.
* Moves test helper at te very bottom
* Addresses github comments
* No lock no copy
* Prevents engine not stopping within 10 secs from being paired by another instance
We instead juts SKIP updating the policy, so
1. the MDM ticker will kick in 1 minute time,
2. find the policy misaligned,
3. enter the onMDMPolicyChange,
4. find the s.clientRunning == true
(because it is set to false only in server cleanupConnection,
and not by s.actCancel())
5. call s.actCancel() again if not nil
6. immediately return from <-s.clientGiveUpChan
7. finally call s.restartEngineForMDMLocked()
* Since we ARE running there should be a config
If the config was cancelled midflight, connect will abort later on
* DisableAutoConnect should not stop a running connection.
DisableAutoConnect should just avoid the connection attempts *when the service starts*.
If we are started and we are up and running, DisableAutoConnect should not kick in.
Another PR will follow about this topic
* Removes unused vars
* Moves callback into Run method arg
* align comment to removal of DisableAutoConnect
DisableAutoConnect should just avoid the connection attempts *when the service starts*.
If we are started and we are up and running, DisableAutoConnect should not kick in
* Removes unused managed_fields data.
This was initially used to drive the UI but approach changed
to reload config/features upon notifications which makes this data redundant.
* Reorder stuff
* Unexport unrequired vars/functions
PoliciesEqual → policiesEqual
AllKeys → allKeys
* Adds list of MDM managed fields in the debug bundle
* [client] Index peer tunnel IPs for O(1) PeerStateByIP lookup
Replace the linear scan over all peers with an ipToKey map maintained
by AddPeer/RemovePeer, covering both IPv4 and IPv6 tunnel addresses.
Offline peers are intentionally no longer resolvable by IP: only active
peers can carry traffic, so IdentityForIP and the DNS disconnected-peer
filter now treat them as unknown, same as foreign IPs.
Skip the DNS answer filter for single-record responses; dropping the
only answer was always restored by the empty-answer escape hatch, so
the fast path is behavior-neutral.
* Ensure `ipToKey` entries are only removed if they match the peer being deleted, preventing accidental removal of unrelated mappings.
- Engine.Start takes syncMsgMux with a deferred unlock (engine.go:445) and parks in receiveSignalEvents → WaitStreamConnected (engine.go:1762), which only wakes on
signal-stream connect or client-context cancellation.
- When signal never connects, the 30s startup timeout fires and embed.Client.Start's rollback (embed.go:281) called client.Stop() → Engine.Stop, which blocks acquiring
syncMsgMux (engine.go:318). The cancel() that would unpark Start was deferred until Start returned — permanent cycle. RemovePeer calls (g43/g385) then queue behind the
lifecycle mutex.
- Notably, embed.Client.Stop and the daemon's cleanupConnection both cancel before stopping — the startup rollback was the only path that didn't.
- Engine.Start takes syncMsgMux with a deferred unlock (engine.go:445) and parks in receiveSignalEvents → WaitStreamConnected (engine.go:1762), which only wakes on
signal-stream connect or client-context cancellation.
- When signal never connects, the 30s startup timeout fires and embed.Client.Start's rollback (embed.go:281) called client.Stop() → Engine.Stop, which blocks acquiring
syncMsgMux (engine.go:318). The cancel() that would unpark Start was deferred until Start returned — permanent cycle. RemovePeer calls (g43/g385) then queue behind the
lifecycle mutex.
- Notably, embed.Client.Stop and the daemon's cleanupConnection both cancel before stopping — the startup rollback was the only path that didn't.
* [client] Preserve posture checks on config-only sync updates
When management sends a MessageTypeControlConfig update (e.g. relay token
rotation), the SyncResponse carries no NetworkMap and no Checks. Moving the
updateChecksIfNew call after the nm == nil guard ensures posture checks are
only updated when a full network map is present, preventing relay token
rotation from silently clearing the previously applied posture check state.
* [client] Clarify posture check update logic with explicit comment
* [client] Extract NetBird config and sync persistence into helpers
Move the NetbirdConfig handling block out of handleSync into
updateNetbirdConfig and the sync response persistence into
persistSyncResponse, mirroring updateChecksIfNew. This flattens
handleSync and makes the individual update steps unit-testable.
* Adds heuristic to detect an edge case on Linux where a system has configured logrotate as a separate service to rotate log files which would mangle our client log files. If we detect logrotate being configured for netbird, we disable our rotation.
* Adds new env var to disable log rotation: NB_LOG_DISABLE_ROTATION
* Adds compressed and plain logrotate files to debug bundle.
* Replaces lumberjack with timberjack (maintained fork with bug fixes and extra features).
* Clarifies which daemon version is running in the bundle stats.
* Change logging for client service status to console
* Persist sync response via pluggable store (disk on iOS)
The latest Management sync response (which carries the network map) was
kept in memory for debug bundle generation. On memory-constrained
platforms like iOS the network map can be large enough to matter.
Introduce a syncstore package with a Store interface and two backends:
a memory backend (the previous behavior) and a disk backend that
serializes the response to a file in the state directory. The backend
is selected per-platform at build time: disk on iOS, memory elsewhere.
The disk store clears any leftover file on construction so a fresh
store never reads stale data from an earlier run (e.g. another
profile's network map).
In the engine, drop the separate persistSyncResponse bool: the store is
only instantiated while persistence is enabled, and its presence is
what marks persistence as active. The store is also cleared on engine
close so the file does not linger on disk.
* syncstore: silence nilnil linter on "nothing stored" returns
Get returns (nil, nil) to signal that nothing is stored, which is part
of the Store contract and preserves the original behaviour. Annotate
both backends with //nolint:nilnil so golangci-lint does not flag it.
* syncstore: hold syncRespMux for the whole store Set/Get
Both handleSync and GetLatestSyncResponse snapshotted e.syncStore under
the read lock and then released it before calling Set/Get. That allowed
SetSyncResponsePersistence(false) or engine close to clear the store
mid-call. In particular a concurrent Clear()+nil followed by a late
Set could re-create the file that was just removed, defeating the
leak/lingering protection.
Hold syncRespMux for the duration of the store operation in both spots
so the store cannot be cleared while a Set/Get is in flight.
* syncstore: avoid StateDir "." when state path is empty
On mobile the state path may be empty (the engine tolerates a missing
state file). filepath.Dir("") returns ".", which would make a
disk-backed syncstore write into the working directory instead of
letting NewDiskStore fall back to os.TempDir().
Only set engineConfig.StateDir when path is non-empty.
* Refactor to use a common checker for development version
* Adds commit sha to development version for cobra command only
Leave dashboard unaffected
* Adjust for "v0.31.1-dev" test case
which must be considered pre-release
* Drop synthetic "dev"/"0.50.0-dev" firewall feature-gate fixtures
These test cases encoded the loose strings.Contains(v, "dev")
semantics inherited from peerSupportedFirewallFeatures, but
NetbirdVersion() never produces those values — only the literal
"development" (and now "development-<sha>[-dirty]") ever flows
through the wire. The agent owns the semantics of an ephemeral
development build, so the tests should exercise the strings we
actually emit.
Replaced with development, development-<sha> and
development-<sha>-dirty cases that match the HasPrefix("development")
predicate introduced upstream.
* Remove unexistent tests on wire format
The sha / dirty flag are added only when the CLI asks the version.
Account versions is unaffacted and can only strictly match "development"
* Adds tests for IsDevelopmentVersion
* fix(proxy): gate tunnel-peer fast-path on inbound listener marker
forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.
Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.
* fix(proxy): harden inbound listener resource + startup-ctx handling
Three correctness fixes on the per-account inbound path, with tests:
- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
back an *io.PipeWriter backed by a pipe + scanner goroutine that the
caller owns; the two writers per account (https + plain) were never
closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
is launched in a goroutine from AddPeer and was inheriting the
caller's request-scoped ctx, so a cancelled request could abort the
inbound bring-up or fail the management status notification. The
tail is split into notifyClientReady so the contract is testable.
Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.
* feat(proxy): short-circuit peer-own-target loops with 421
When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.
Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.
The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.
* fix(management): private-service validation + tunnel-IP lookup semantics
- Require an explicit port for L4 cluster targets. validateL4Target
exempted TargetTypeCluster from the port check, but buildPathMappings
serializes every L4 target via net.JoinHostPort(host, port) — port=0
shipped a ":0" upstream. Cluster targets use the same Host/Port
fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
every error to Internal. The proxy's ValidateTunnelPeer probes IPs
that legitimately aren't in the roster; the miss is expected and now
distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
request doesn't keep the store busy.
Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.
* fix(client): include offlinePeers in PeerStateByIP lookup
ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:
- causes the DNS filter to keep returning records pointing at peers
that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
peer rather than letting the cached management RPC carry the
authorisation decision.
Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.
* fix(rest): reject empty Delete path params in reverse-proxy clients
ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).
Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.
* chore(api,ci,docs,test): private-service schema, proto-check, fixups
Non-functional cleanups and contract/CI hardening around the
private-service work:
API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
on both Service and ServiceRequest, mirroring
validatePrivateRequirements. mode stays optional-but-constrained
(empty defaults to http server-side), matching runtime.
CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
-gen-go-grpc suffix) so grpc-generated files are in scope.
Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
(initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
comments and the proto doc.
Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.
* fix(proxy): background ctx for already-started AddPeer notification
The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.
Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.
* use the cmd context for roundtripper
* Pin actions with SHA, replace unmaintained, add dependabot for actions
* Update FreeBSD to version 15 for tests
* Use shared actions
* Update sign-pipelines version
* Updates rosenpass version
go-rosenpass v0.4.0 → v0.5.42 bump — detailed findings
Change summary
cunicu.li/go-rosenpass v0.4.0 → v0.5.42 (target)
cilium/ebpf v0.15.0 → v0.19.0 (transitive)
gopacket/gopacket v1.1.1 → v1.4.0 (transitive)
wireguard 2023-07 → 2023-12 (transitive)
wireguard/wgctrl 2023-04 → 2024-12 (transitive)
Wire interop
v0.4.0 (in v0.70.5) <-> v0.5.42 OK
v0.5.42 <-> v0.5.42 OK
Quantum resistance: true both ends
---
**Replay error eliminated.**
Before (on v0.4.0):
`ERROR Failed to handle message: failed to load biscuit (ICR1): detected replay`
Recurring every ~50ms for minutes at a time. Gone entirely after both ends upgraded to v0.5.42. Upstream fix in biscuit/replay handling between v0.4.x and v0.5.x series.
* Fixup [::]:port socket trying to send to v4
* Adds more tests on netbird<->rosenpass interactions
* Anticipates rp handler creation before generateConfig
* [client] Moves deterministic key gen into rosenpass
* go mod tidy
* Adds reminder to reason about rosenpass surface area
* Apply code rabbit suggestions
Adds a new "private" service mode for the reverse proxy: services reachable exclusively over the embedded WireGuard tunnel, gated by per-peer group membership instead of operator auth schemes.
Wire contract
- ProxyMapping.private (field 13): the proxy MUST call ValidateTunnelPeer and fail closed; operator schemes are bypassed.
- ProxyCapabilities.private (4) + supports_private_service (5): capability gate. Management never streams private mappings to proxies that don't claim the capability; the broadcast path applies the same filter via filterMappingsForProxy.
- ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer, checks the peer's groups against service.AccessGroups, and mints a session JWT on success. checkPeerGroupAccess fails closed when a private service has empty AccessGroups.
- ValidateSession/ValidateTunnelPeer responses now carry peer_group_ids + peer_group_names so the proxy can authorise policy-aware middlewares without an extra management round-trip.
- ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account inbound listener state surfaced to dashboards.
- PathTargetOptions.direct_upstream (11): bypass the embedded NetBird client and dial the target via the proxy host's network stack for upstreams reachable without WireGuard.
Data model
- Service.Private (bool) + Service.AccessGroups ([]string, JSON- serialised). Validate() rejects bearer auth on private services. Copy() deep-copies AccessGroups. pgx getServices loads the columns.
- DomainConfig.Private threaded into the proxy auth middleware. Request handler routes private services through forwardWithTunnelPeer and returns 403 on validation failure.
- Account-level SynthesizePrivateServiceZones (synthetic DNS) and injectPrivateServicePolicies (synthetic ACL) gate on len(svc.AccessGroups) > 0.
Proxy
- /netbird proxy --private (embedded mode) flag; Config.Private in proxy/lifecycle.go.
- Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS on the embedded NetBird client's WireGuard tunnel netstack.
- proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache with single-flight de-duplication and per-account eviction.
- Local peerstore short-circuit: when the inbound IP isn't in the account roster, deny fast without an RPC.
- proxy/server.go reports SupportsPrivateService=true and redacts the full ProxyMapping JSON from info logs (auth_token + header-auth hashed values now only at debug level).
Identity forwarding
- ValidateSessionJWT returns user_id, email, method, groups, group_names. sessionkey.Claims carries Email + Groups + GroupNames so the proxy can stamp identity onto upstream requests without an extra management round-trip on every cookie-bearing request.
- CapturedData carries userEmail / userGroups / userGroupNames; the proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the authenticated identity (strips client-supplied values first to prevent spoofing).
- AccessLog.UserGroups: access-log enrichment captures the user's group memberships at write time so the dashboard can render group context without reverse-resolving stale memberships.
OpenAPI/dashboard surface
- ReverseProxyService gains private + access_groups; ReverseProxyCluster gains private + supports_private. ReverseProxyTarget target_type enum gains "cluster". ServiceTargetOptions gains direct_upstream. ProxyAccessLog gains user_groups.