Fix and polish seed warnings in tests#5822
Open
nickmuoh wants to merge 6 commits into
Open
Conversation
…ed-model Revert unintended change in tests/core/test_model.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes two DataFrame handling issues that made the warning fixes incomplete and hard to verify.
The first issue was in seed batch reading. CsvSeedReader.read() was yielding slices of the original DataFrame without copying them. In practice, that meant a consumer could mutate one returned batch and accidentally affect later reads from the same seed. For example, if the first batch changed a value after being returned, a later full read could reflect that mutation even though the seed content itself had not changed. That makes the behavior fragile and can also make warning-related regressions difficult to reason about because the returned batches are not actually isolated.
The solution is to return a copy of each batch instead of the slice itself. With that change, each batch is independent. Mutating one returned batch no longer changes subsequent batches or later reads from the same seed source.
The second issue was in the Snowflake DataFrame upload path. When SQLMesh prepared a pandas DataFrame for write_pandas, it preserved any existing non-default index from the source DataFrame. For example, a DataFrame with row labels [1, 2] would be passed through with that index intact. Even though the upload logic only intends to load the declared columns, carrying a custom index into the Snowflake write path is unnecessary and can lead to warning-prone or inconsistent behavior.
The fix is to normalize the DataFrame before upload by preserving the declared column order and resetting the index with drop=True. As a result, Snowflake now receives a clean DataFrame with the expected columns and a standard RangeIndex, regardless of how the input DataFrame was indexed.
Test Plan
The accompanying tests demonstrate both cases directly. The seed test shows that modifying one returned batch no longer leaks into other batches or later reads. The Snowflake test shows that a DataFrame with a non-default index is normalized before write_pandas is called, while the row data and column order remain unchanged.
Checklist
make styleand fixed any issuesmake fast-test)git commit -s) per the DCO