• 5 Posts
  • 58 Comments
Joined 2 years ago
cake
Cake day: June 11th, 2023

help-circle












  • Its a paradigm shift from pandas. In polars, you define a pipeline, or a set of instructions, to perform on a dataframe, and only execute them all at once at the end of your transformation. In other words, its lazy. Pandas is eager, which every part of the transformation happens sequentially and in isolation. Polars also has an eager API, but you likely want to use the lazy API in a production script.

    Because its lazy, Polars performs query optimization, like a database does with a SQL query. At the end of the day, if you’re using polars for data engineering or in a pipeline, it’ll likely work much faster and more memory efficient. Polars also executes operations in parallel, as well.









  • I’ve used minio briefly, and I’ve never used any other self hosted object storage. In the context of spinning it up with docker, it’s pretty easy. The difficult part in my project was that I wanted some buckets predefined. The docker image doesn’t provide this functionality directly, so I had to spin up an adjacent container with the minio cli that would create the buckets automatically every time I spun up minio.

    But for your use case you would manage bucket creation manually, from the UI. It seems straight forward enough, and I don’t have complaints. I think it would work for your use case, but I can’t say its any worse or better than alternatives.