Conversation

voidzcy

Currently each subchannel implicitly refreshes the name resolution when its connection is broken. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).

This is likely to be a breaking change for users implementing their own LB policies (performing load balancing directly on backends). In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.

A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.

See details in #8088.

@voidzcyvoidzcy changed the title core, grpclb: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken Apr 9, 2021
@voidzcyvoidzcy requested a review from ejona86 April 9, 2021 22:18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any ideas how we're supposed to notice bugs with LB behavior? It would seem trivial for a LB policy to not be triggering refreshes and no test would fail.

…l's refresh for cases LoadBalancer intentionally does not want a refresh.
…when handling subchannel state changes. Log a warning and do a refresh if it is not.
… a name resolution refresh for the resolver in Channel, so use ignoreRefreshNameResolutionCheck() to avoid false-positive warnings.
@voidzcyvoidzcy force-pushed the bugfix/move_subchannel_refreshing_ns_to_lb branch from 9256ec1 to 2ff23c6 Compare April 15, 2021 19:34
@voidzcy

Do we have any ideas how we're supposed to notice bugs with LB behavior? It would seem trivial for a LB policy to not be triggering refreshes and no test would fail.

Fixed by adding a flag. The Channel will still do the automatic refresh if the LoadBalancer did not do so. PTAL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to help check that we've updated our code appropriately, replace the warning with an exception/panic and run the tests to make sure they pass. After a few releases with the warning, I wouldn't be surprised if we want to do something similar for a grpc-java release.

@voidzcyvoidzcy merged commit 9614738 into grpc:master Apr 16, 2021
@github-actions-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2021
Sign up for free to subscribe to this conversation on . Already have an account? Sign in.
None yet
None yet

Successfully merging this pull request may close these issues.