Skip to content

regenerate system prompts per stage in conversation replay#480

Open
zetxqx wants to merge 4 commits into
kubernetes-sigs:mainfrom
zetxqx:conversationreplay
Open

regenerate system prompts per stage in conversation replay#480
zetxqx wants to merge 4 commits into
kubernetes-sigs:mainfrom
zetxqx:conversationreplay

Conversation

@zetxqx
Copy link
Copy Markdown
Contributor

@zetxqx zetxqx commented May 14, 2026

Fixes: #479

This PR updates the ConversationReplayDataGenerator to automatically regenerate system prompts at the start of every new benchmark stage.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zetxqx
Once this PR has been reviewed and has the lgtm label, please assign arangogutierrez for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 14, 2026
Comment thread inference_perf/datagen/conversation_replay_datagen.py Outdated
@achandrasekar
Copy link
Copy Markdown
Contributor

@alonh @LoganVegnaSHOP @Navjot10 Any thoughts on the behavior here?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 15, 2026
@zetxqx zetxqx force-pushed the conversationreplay branch from 7ea8a09 to eb304b5 Compare May 15, 2026 01:39
@zetxqx zetxqx force-pushed the conversationreplay branch from 9dadb00 to 44abf8f Compare May 15, 2026 03:42
@alonh
Copy link
Copy Markdown
Contributor

alonh commented May 20, 2026

This looks good overall — the cross-stage regeneration approach makes sense.

The seed derivation in load_lazy_data uses f"{self.cr_config.seed}stage{data.stage_idx}" without any per-conversation component. When dynamic_system_prompt_len is configured, each conversation originally gets a unique suffix at init time. But on stage transitions, all conversations derive the same RNG from the same seed string. Wouldn't that cause all conversations to collapse to the same prompt, losing the per-conversation uniqueness?

Nit: the new field is called stage_idx but the rest of the codebase uses stage_id — would be nice to keep consistent.

@zetxqx
Copy link
Copy Markdown
Contributor Author

zetxqx commented May 21, 2026

This looks good overall — the cross-stage regeneration approach makes sense.

The seed derivation in load_lazy_data uses f"{self.cr_config.seed}stage{data.stage_idx}" without any per-conversation component. When dynamic_system_prompt_len is configured, each conversation originally gets a unique suffix at init time. But on stage transitions, all conversations derive the same RNG from the same seed string. Wouldn't that cause all conversations to collapse to the same prompt, losing the per-conversation uniqueness?

Nit: the new field is called stage_idx but the rest of the codebase uses stage_id — would be nice to keep consistent.

Good point

  1. updated the logic, now the implementation make sure only the shared prompt will be regenerated for different stages. the dynamic prompt will just be reused. So the uniqueness is protected now.
  2. change stage_idx to stage_id

@zetxqx zetxqx force-pushed the conversationreplay branch from 3830dee to f7ba153 Compare May 21, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

system prompts across stages in ConversationReplayDataGenerator should be different

5 participants