[WIP] Cleanup Prometheus Metric Querying#382
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Bslabe123 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
84b52bc to
5d8f295
Compare
c2b727b to
a75c0cb
Compare
|
@Bslabe123 please address merge conflicts |
c9e2eed to
93ab0d6
Compare
ab4f0b3 to
35f831a
Compare
jjk-g
left a comment
There was a problem hiding this comment.
Do we have a test that ensures the metric fidelity. For example, all vllm metrics are collected as expected via the names we have defined.
| for metric in metrics_to_process: | ||
| for target_attr, query in metric.get_queries(query_duration): | ||
| if not hasattr(model_server_metrics, target_attr): | ||
| logger.debug(f"Attribute {target_attr} not found in ModelServerMetrics, skipping.") |
There was a problem hiding this comment.
Can you describe this error condition and why this would be only a debug log vs error log?
| continue | ||
|
|
||
| result = self.execute_query(query, str(query_eval_time)) | ||
| if result is not None: |
There was a problem hiding this comment.
Can we enforce a strict type like BaseMetrics and use descriptive function implementations instead of getattr setattr, etc
For now our options are either relying on the sim or ad-hoc testing against live vLLM instances, testing is also non-trivial since malformed queries fail silently so report validation needs to be done on the report as a whole. |
6a6ff5a to
de6113b
Compare
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
c7e078c to
6eea523
Compare
Addresses: #386
Changes: