Skip to content

ETCD learner members cannot be defragmented, causing persistent storage bloat. #21740

@amolmishra23

Description

@amolmishra23

Bug report criteria

What happened?

Observed Behavior:

  • Learner member DB size remains at 30MB while voting members are successfully defragmented to ~7MB
  • etcdctl defrag command silently skips learner endpoints
  • Direct defragmentation of learner endpoint fails with: rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner
  • Learner database size grows unbounded over time without maintenance capability

Error Message:

etcdserver: rpc not supported for learner
Failed to defragment etcd member[https://181140267029446:25687] (etcdserver: rpc not supported for learner)

What did you expect to happen?

Learner members should support defragmentation operations to maintain database size, either:

  • etcdctl defrag should include learner endpoints by default, OR
  • etcdctl defrag --endpoints=<learner> should work without errors, OR
  • Provide a supported method to defragment learner members (e.g., --include-learners flag)

How can we reproduce it (as minimally and precisely as possible)?

Minimal Reproduction Steps:

# 1. Set up 3-node cluster + 1 learner member
# (cluster already running with learner)

# 2. Check initial status - note learner DB size
etcdctl endpoint status --write-out=table
# Result: Learner at 30MB, voting members at 8.5MB

# 3. Perform compaction 
etcdctl compact 2885950

# 4. Defragment cluster
etcdctl defrag  
# Result: Only voting members defragmented

# 5. Verify learner still has large DB
etcdctl endpoint status --write-out=table
# Result: Voting members ~7MB, learner still 30MB

# 6. Try to defrag learner directly
etcdctl --endpoints=https://181140267029446:25687 defrag
# Result: Fails with "rpc not supported for learner"

Cluster Topology:

  • 3 voting members (fb2f8c3838629cdb, 2228c0c31b9ff622, 63d10718366c821d)
  • 1 learner member (cf584302b1c47a59)

Anything else we need to know?

Impact:

  • Production issue causing learner storage to grow unbounded
  • No supported workaround for learner maintenance
  • Affects cluster operations in learner-promotion scenarios

Observations:

  • The CLI shows --includeLearner flag but it doesn't resolve the underlying RPC limitation
  • This appears to be an intentional restriction but creates operational problems

Etcd version (please run commands below)

etcd version: 3.5.25
etcdctl version: 3.5.25

Etcd configuration (command line flags or environment variables)

Cluster Configuration:

  • 4-member cluster (3 voting + 1 learner)
  • HTTPS endpoints on port 25687
  • Standard production setup

Relevant Settings:

--auto-compaction-retention (if applicable)
--quota-backend-bytes (if applicable)
--initial-cluster (4-member setup)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Member List:

etcdctl member list -w table

Endpoint Status (Before Defrag):

+-------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|           ENDPOINT            |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://181140264581536:25687 | fb2f8c3838629cdb |  3.5.25 |  8.5 MB |     false |      false |         9 |    3276978 |            3276978 |        |
| https://181140266914826:25687 | 2228c0c31b9ff622 |  3.5.25 |  8.5 MB |     false |      false |         9 |    3276978 |            3276978 |        |
| https://181140266590820:25687 | 63d10718366c821d |  3.5.25 |  8.5 MB |      true |      false |         9 |    3276978 |            3276978 |        |
| https://181140267029446:25687 | cf584302b1c47a59 |  3.5.25 |   30 MB |     false |       true |         9 |    3276980 |            3276980 |        |
+-------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Endpoint Status (After Defrag - Issue Persists):

+-------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|           ENDPOINT            |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://181140264581536:25687 | fb2f8c3838629cdb |  3.5.25 |  7.0 MB |     false |      false |         9 |    3278633 |            3278633 |        |
| https://181140266914826:25687 | 2228c0c31b9ff622 |  3.5.25 |  7.0 MB |     false |      false |         9 |    3278633 |            3278633 |        |
| https://181140266590820:25687 | 63d10718366c821d |  3.5.25 |  7.0 MB |      true |      false |         9 |    3278633 |            3278633 |        |
| https://181140267029446:25687 | cf584302b1c47a59 |  3.5.25 |   30 MB |     false |       true |         9 |    3278633 |            3278633 |        |
+-------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

**Defragmentation Failure Logs:**

{"level":"warn","ts":"2026-05-11T08:07:05.721070-0700","logger":"etcd-client","caller":"v3@v3.5.25/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000f61e0/181140264581536:25687","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
...
{"level":"warn","ts":"2026-05-11T08:07:06.622187-0700","logger":"etcd-client","caller":"v3@v3.5.25/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000f61e0/181140264581536:25687","attempt":99,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}
Failed to defragment etcd member[https://181140267029446:25687] (etcdserver: rpc not supported for learner)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions