Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run logs arrive corrupted #185

Closed
kengz opened this issue Feb 12, 2023 · 8 comments · Fixed by #192
Closed

Run logs arrive corrupted #185

kengz opened this issue Feb 12, 2023 · 8 comments · Fixed by #192
Assignees
Labels
bug Something isn't working

Comments

@kengz
Copy link

kengz commented Feb 12, 2023

Describe the bug
When running Conda install with large dependencies (pytorch) and creating artifacts out of it, occasional the run will fail with 'utf-8' codec can't decode bytes in position 4094-4095: unexpected end of data

Version

  • The dstack CLI version: 0.1
  • The operational system version: Apple M2 MacOS 13.1
  • The Python version 3.10.4
  • (Optional) Other Python packages versions (pip freeze or conda list)

Minimal example
Use the main branch commit kengz/lean-dl-example@f4c06f0
Run dstack run setup-conda

Steps to reproduce
Use the main branch commit kengz/lean-dl-example@f4c06f0
Run dstack run setup-conda
Try it multiple times since the error is random. This only happens when running it with dstack local, but never when running conda directly on user machine.

Expected behavior
Finish normally, which it does when rerun, so the error is random.

Logs
6cda13c1f54b4a31b8e1a5038b4a4433.zip

Screenshots
Attach screenshots (if any).
Screenshot 2023-02-12 at 9 40 56 AM

Additional context
Add any other context about the problem here.

@kengz kengz added the bug Something isn't working label Feb 12, 2023
@peterschmidt85
Copy link
Contributor

I've seen this bug quite a few times but never was able to find what causes it or find a way to reproduce it consistently.

@peterschmidt85
Copy link
Contributor

@kengz Does the workflow itself fail or the problem is only with the output?

@kengz
Copy link
Author

kengz commented Feb 12, 2023

The workflow fails too

@peterschmidt85
Copy link
Contributor

peterschmidt85 commented Feb 12, 2023

Hmm. Are you sure? According to the runner logs, the job was marked as Done.
Just want to be sure where the problem is.

@kengz
Copy link
Author

kengz commented Feb 12, 2023

That's where the log ended. I had it print "done" at the end and it didn't get there.

@kengz
Copy link
Author

kengz commented Feb 12, 2023

went back to double check: yep the log is right, for the run named silent-sloth-0. Screenshot 2023-02-12 at 4 08 14 PM

interestingly, dstack ps shows it's completed:
Screenshot 2023-02-12 at 4 09 00 PM

@r4victor r4victor self-assigned this Feb 15, 2023
@r4victor
Copy link
Collaborator

I found the cause of the issue. dstack's runner sends logs in chunks as utf-8 websocket messages, but multibyte unicode characters can be at the boundaries of the messages causing the message to be invalid utf-8:

_ = connection.WriteMessage(websocket.TextMessage, s.buf[currentPos])

The solution would be to send raw bytes over websocket.

@r4victor
Copy link
Collaborator

@kengz, thanks for the issue. The bug will be fixed with the next release.

@peterschmidt85 peterschmidt85 changed the title decoding error when create large Conda artifacts Run logs arrive corrupted Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants