Skip to content

[WIP] Cleanup Prometheus Metric Querying#382

Draft
Bslabe123 wants to merge 1 commit into
kubernetes-sigs:mainfrom
Bslabe123:prom-query
Draft

[WIP] Cleanup Prometheus Metric Querying#382
Bslabe123 wants to merge 1 commit into
kubernetes-sigs:mainfrom
Bslabe123:prom-query

Conversation

@Bslabe123
Copy link
Copy Markdown
Contributor

@Bslabe123 Bslabe123 commented Apr 1, 2026

Addresses: #386

Changes:

  • Added dedicated types for specifying tracked Prometheus metrics for model servers. These types handle forming queries per metric type (Gauge, Histogram, etc.)

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 1, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Bslabe123
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 1, 2026
@Bslabe123 Bslabe123 force-pushed the prom-query branch 4 times, most recently from 84b52bc to 5d8f295 Compare April 1, 2026 19:09
@Bslabe123 Bslabe123 changed the title [WIP] Cleanup Prometheus Metric Querying Cleanup Prometheus Metric Querying Apr 2, 2026
@Bslabe123 Bslabe123 changed the title Cleanup Prometheus Metric Querying [WIP] Cleanup Prometheus Metric Querying Apr 2, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2026
@Bslabe123 Bslabe123 force-pushed the prom-query branch 2 times, most recently from c2b727b to a75c0cb Compare April 6, 2026 20:56
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 7, 2026
@Bslabe123 Bslabe123 changed the title [WIP] Cleanup Prometheus Metric Querying Cleanup Prometheus Metric Querying Apr 7, 2026
@Bslabe123 Bslabe123 marked this pull request as ready for review April 7, 2026 17:24
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2026
@jjk-g
Copy link
Copy Markdown
Collaborator

jjk-g commented Apr 9, 2026

@Bslabe123 please address merge conflicts

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 9, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2026
@Bslabe123 Bslabe123 force-pushed the prom-query branch 2 times, most recently from c9e2eed to 93ab0d6 Compare April 30, 2026 15:26
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2026
@Bslabe123 Bslabe123 changed the title Cleanup Prometheus Metric Querying [WIP] Cleanup Prometheus Metric Querying Apr 30, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 30, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 11, 2026
@Bslabe123 Bslabe123 force-pushed the prom-query branch 2 times, most recently from ab4f0b3 to 35f831a Compare May 11, 2026 19:59
@Bslabe123 Bslabe123 changed the title [WIP] Cleanup Prometheus Metric Querying Cleanup Prometheus Metric Querying May 11, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 11, 2026
Copy link
Copy Markdown
Collaborator

@jjk-g jjk-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test that ensures the metric fidelity. For example, all vllm metrics are collected as expected via the names we have defined.

Comment thread inference_perf/client/server_metrics/prometheus_client/base.py Outdated
for metric in metrics_to_process:
for target_attr, query in metric.get_queries(query_duration):
if not hasattr(model_server_metrics, target_attr):
logger.debug(f"Attribute {target_attr} not found in ModelServerMetrics, skipping.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe this error condition and why this would be only a debug log vs error log?

Comment thread inference_perf/client/server_metrics/base.py Outdated
continue

result = self.execute_query(query, str(query_eval_time))
if result is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enforce a strict type like BaseMetrics and use descriptive function implementations instead of getattr setattr, etc

@Bslabe123
Copy link
Copy Markdown
Contributor Author

Do we have a test that ensures the metric fidelity. For example, all vllm metrics are collected as expected via the names we have defined.

For now our options are either relying on the sim or ad-hoc testing against live vLLM instances, testing is also non-trivial since malformed queries fail silently so report validation needs to be done on the report as a whole.

@Bslabe123 Bslabe123 force-pushed the prom-query branch 3 times, most recently from 6a6ff5a to de6113b Compare May 18, 2026 18:25
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Bslabe123 Bslabe123 force-pushed the prom-query branch 3 times, most recently from c7e078c to 6eea523 Compare May 18, 2026 18:27
@Bslabe123 Bslabe123 changed the title Cleanup Prometheus Metric Querying [WIP] Cleanup Prometheus Metric Querying May 21, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@Bslabe123 Bslabe123 marked this pull request as draft May 21, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants