[management, client, proxy] Follow-up fixes for private reverse-proxy services (#6268)

* fix(proxy): gate tunnel-peer fast-path on inbound listener marker

forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.

Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.

* fix(proxy): harden inbound listener resource + startup-ctx handling

Three correctness fixes on the per-account inbound path, with tests:

- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
  back an *io.PipeWriter backed by a pipe + scanner goroutine that the
  caller owns; the two writers per account (https + plain) were never
  closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
  is launched in a goroutine from AddPeer and was inheriting the
  caller's request-scoped ctx, so a cancelled request could abort the
  inbound bring-up or fail the management status notification. The
  tail is split into notifyClientReady so the contract is testable.

Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.

* feat(proxy): short-circuit peer-own-target loops with 421

When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.

Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.

The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.

* fix(management): private-service validation + tunnel-IP lookup semantics

- Require an explicit port for L4 cluster targets. validateL4Target
  exempted TargetTypeCluster from the port check, but buildPathMappings
  serializes every L4 target via net.JoinHostPort(host, port) — port=0
  shipped a ":0" upstream. Cluster targets use the same Host/Port
  fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
  every error to Internal. The proxy's ValidateTunnelPeer probes IPs
  that legitimately aren't in the roster; the miss is expected and now
  distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
  request doesn't keep the store busy.

Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.

* fix(client): include offlinePeers in PeerStateByIP lookup

ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:

- causes the DNS filter to keep returning records pointing at peers
  that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
  peer rather than letting the cached management RPC carry the
  authorisation decision.

Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.

* fix(rest): reject empty Delete path params in reverse-proxy clients

ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).

Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.

* chore(api,ci,docs,test): private-service schema, proto-check, fixups

Non-functional cleanups and contract/CI hardening around the
private-service work:

API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
  on both Service and ServiceRequest, mirroring
  validatePrivateRequirements. mode stays optional-but-constrained
  (empty defaults to http server-side), matching runtime.

CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
  -gen-go-grpc suffix) so grpc-generated files are in scope.

Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
  (initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
  comments and the proto doc.

Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
  signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
  unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.

* fix(proxy): background ctx for already-started AddPeer notification

The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.

Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.

* use the cmd context for roundtripper
This commit is contained in:
Maycon Santos
2026-06-02 13:40:09 +02:00
committed by GitHub
parent e7c9182ff9
commit fa1e241aea
31 changed files with 711 additions and 105 deletions

View File

@@ -932,7 +932,11 @@ func (s *Service) validateL4Target(target *Target) error {
if target.TargetId == "" {
return errors.New("target_id is required for L4 services")
}
if target.TargetType != TargetTypeCluster && target.Port == 0 {
// Cluster targets resolve their upstream host:port from the target's
// own Host/Port fields just like the other L4 types — buildPathMappings
// emits net.JoinHostPort(target.Host, target.Port) for every L4
// target, so allowing port=0 here would let ":0" reach the proxy.
if target.Port == 0 {
return errors.New("target port is required for L4 services")
}
switch target.TargetType {

View File

@@ -1176,7 +1176,12 @@ func TestValidate_HTTPClusterTarget_RequiresDirectUpstream(t *testing.T) {
assert.ErrorContains(t, rp.Validate(), "direct upstream disabled", "cluster target must reject direct_upstream=false")
}
func TestValidate_L4ClusterTarget(t *testing.T) {
// TestValidate_L4ClusterTarget_RequiresPort confirms that an L4 cluster
// target without an explicit port is rejected. buildPathMappings emits
// net.JoinHostPort(target.Host, target.Port) for every L4 target — so
// allowing port=0 would let the proxy ship ":0" upstreams. The port
// requirement is the same as every other L4 target type.
func TestValidate_L4ClusterTarget_RequiresPort(t *testing.T) {
rp := validProxy()
rp.Mode = ModeTCP
rp.ListenPort = 9000
@@ -1186,7 +1191,12 @@ func TestValidate_L4ClusterTarget(t *testing.T) {
Protocol: "tcp",
Enabled: true,
}}
require.NoError(t, rp.Validate(), "L4 cluster target must validate without an explicit port")
assert.ErrorContains(t, rp.Validate(), "port is required",
"L4 cluster target must require an explicit port like other L4 target types")
rp.Targets[0].Port = 5432
rp.Targets[0].Host = "db.lan"
require.NoError(t, rp.Validate(), "L4 cluster target with host:port must validate")
}
func TestService_Copy_RoundtripsPrivate(t *testing.T) {

View File

@@ -102,7 +102,7 @@ func generateSessionKeyPair(t *testing.T) (string, string) {
func createSessionToken(t *testing.T, privKeyB64, userID, domain string) string {
t.Helper()
token, err := sessionkey.SignToken(privKeyB64, userID, domain, auth.MethodOIDC, nil, time.Hour)
token, err := sessionkey.SignToken(privKeyB64, userID, "", domain, auth.MethodOIDC, nil, nil, time.Hour)
require.NoError(t, err)
return token
}
@@ -394,6 +394,10 @@ func (m *testValidateSessionProxyManager) ClusterSupportsCrowdSec(_ context.Cont
return nil
}
func (m *testValidateSessionProxyManager) ClusterSupportsPrivate(_ context.Context, _ string) *bool {
return nil
}
type testValidateSessionUsersManager struct {
store store.Store
}
@@ -401,3 +405,24 @@ type testValidateSessionUsersManager struct {
func (m *testValidateSessionUsersManager) GetUser(ctx context.Context, userID string) (*types.User, error) {
return m.store.GetUserByUserID(ctx, store.LockingStrengthNone, userID)
}
func (m *testValidateSessionUsersManager) GetUserWithGroups(ctx context.Context, userID string) (*types.User, []*types.Group, error) {
user, err := m.store.GetUserByUserID(ctx, store.LockingStrengthNone, userID)
if err != nil {
return nil, nil, err
}
if len(user.AutoGroups) == 0 {
return user, nil, nil
}
groupsMap, err := m.store.GetGroupsByIDs(ctx, store.LockingStrengthNone, user.AccountID, user.AutoGroups)
if err != nil {
return nil, nil, err
}
groups := make([]*types.Group, 0, len(user.AutoGroups))
for _, id := range user.AutoGroups {
if g, ok := groupsMap[id]; ok && g != nil {
groups = append(groups, g)
}
}
return user, groups, nil
}

View File

@@ -4734,7 +4734,13 @@ func (s *SqlStore) GetPeerByIP(ctx context.Context, lockStrength LockingStrength
result := tx.
Take(&peer, fmt.Sprintf("account_id = ? AND %s = ?", column), accountID, jsonValue)
if result.Error != nil {
// no logging here
// A tunnel-IP miss is an expected outcome (e.g. the proxy's
// ValidateTunnelPeer probing an address that isn't in the
// account roster); surface it as NotFound so callers can tell
// it apart from a real store failure.
if errors.Is(result.Error, gorm.ErrRecordNotFound) {
return nil, status.Errorf(status.NotFound, "peer with ip %s not found", ip.String())
}
return nil, status.Errorf(status.Internal, "failed to get peer from store")
}
@@ -5962,6 +5968,7 @@ func (s *SqlStore) getClusterCapability(ctx context.Context, clusterAddr, column
}
err := s.db.
WithContext(ctx).
Model(&proxy.Proxy{}).
Select("COUNT(CASE WHEN "+column+" IS NOT NULL THEN 1 END) > 0 AS has_capability, "+
"COALESCE(MAX(CASE WHEN "+column+" = true THEN 1 ELSE 0 END), 0) = 1 AS any_true").

View File

@@ -13,7 +13,7 @@ import (
)
func TestSqlStore_GetAccount_PrivateServiceRoundtrip(t *testing.T) {
if (os.Getenv("CI") == "true" && runtime.GOOS == "darwin") || runtime.GOOS == "windows" {
if os.Getenv("CI") == "true" && (runtime.GOOS == "darwin" || runtime.GOOS == "windows") {
t.Skip("skip CI tests on darwin and windows")
}

View File

@@ -491,6 +491,27 @@ func Test_GetAccount(t *testing.T) {
})
}
// TestSqlStore_GetPeerByIP_NotFound pins the not-found semantics the
// proxy's ValidateTunnelPeer relies on: a tunnel-IP that isn't in the
// account roster must surface as a NotFound error (not a generic
// Internal) so callers can distinguish an expected miss from a real
// store failure. A known IP still resolves.
func TestSqlStore_GetPeerByIP_NotFound(t *testing.T) {
runTestForAllEngines(t, "../testdata/store.sql", func(t *testing.T, store Store) {
const accountID = "bf1c8084-ba50-4ce7-9439-34653001fc3b"
peer, err := store.GetPeerByIP(context.Background(), LockingStrengthNone, accountID, net.ParseIP("192.168.0.0"))
require.NoError(t, err, "known tunnel IP must resolve")
require.NotNil(t, peer)
_, err = store.GetPeerByIP(context.Background(), LockingStrengthNone, accountID, net.ParseIP("100.65.0.99"))
require.Error(t, err, "unknown tunnel IP must error")
parsedErr, ok := status.FromError(err)
require.True(t, ok, "error must be a status error")
require.Equal(t, status.NotFound, parsedErr.Type(), "tunnel-IP miss must be NotFound, not Internal")
})
}
func TestSqlStore_SavePeer(t *testing.T) {
store, cleanUp, err := NewTestStoreFromSQL(context.Background(), "../testdata/store.sql", t.TempDir())
t.Cleanup(cleanUp)