6 Python Tools I'd Like To Try In 2021
I could probably have tried all these tools in the time it took me to write this ...
In looking ahead at the projects I’d like to a take on in 2021 I realized I had a growing list of new (to me) Python projects I wanted to incorporate into my projects. I’ve also been trying to find more bite-sized content to cover so I decided to write about these tools and what I find interesting about them.
My list of 6 Python packages contains 3 tools for setting up and maintaining your project and 3 frameworks for building out your application. I think this reflects my growing interest in learning how to maintain project quality across my personal and professional work. The tools are presented in roughly the order I’d like to try them out. If you think I’m missing something please let me know!
Cookiecutter
In my last newsletter I did a deep dive on my Python documentation toolchain and how I use it to encourage software best practices for myself and my team. As I mentioned in that article, this is part of a larger set of ideals I’ve been honing over the past few years about how to set up a modern Python project for a data team. Just like with documentation, I’ve been particularly focused on configuring projects that encourage best practices.
I recently discovered most of what I’ve arrived at (in addition to a bunch of other stuff) is covered by Claudio Jolowicz’s excellent (and wonderfully illustrated) Hypermodern Python series. My ideal project setup includes “hypermodern” tools like Black, pyenv, and Poetry but also more traditional tools like PyTest and Flake8, build pipelines in Makefiles and GitLab CI/CD config files, as well as more mundane things like README
layouts.
At work, I’ve been slowly adding these things to a template project but this is mostly used as a reference implementation for individual tools that we gradually back port to other projects as time allows. In previous roles, we tried to take this a step further by using a GitHub-style development model to fork new projects off of a base project but this never quite got traction.
I thought about building a separate template project for my personal projects on GitHub but something about it just felt inefficient. I could hear Raymond Hettinger’s catch phrase from his PyCon talks in the back of my mind “There MUST be a better way!”
Enter Cookiecutter. From their website:
A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python package project template.
This looks very promising and is the top project I want to try this year. I’m an advocate of getting folks out of notebooks workflows and into a proper project format when work starts to move beyond exploratory analysis. Giving everyone a dynamic template to start with is a huge enabler for that. You can even find maintained templates for things like Python packages and data science.
pre-commit
pre-commit is another newish package that feels like an emerging standard to me (checkout the Hypermodern post). Currently, I wrap all my “pre-commit” steps like Black and Flake8 into a Makefile. This works pretty well, but you still have to remember to run a make pre-commit
before you push. While this isn’t a huge issue in my team’s current workflow, I don’t like anything that relies on the user having to remember something.
One solution to this issue to write a pre-commit hook in git. These are scripts that are automatically run by git before committing code. I’ve known about pre-commit hooks for a long time but never quite felt compelled to write my own.
Enter pre-commit. From their website:
We built pre-commit to solve our hook issues. It is a multi-language package manager for pre-commit hooks. You specify a list of hooks you want and pre-commit manages the installation and execution of any hook written in any language before every commit.
I learned about pre-commit sometime last year and it seemed like something nice to get around to eventually. But what I’m now realizing is that people are using pre-commit as a way to share git-hook recipes. For example, instead of showing everyone how to set up their text editors to remove trailing whitespace I can just turn on the trailing-whitespace
hook like in this Hypermodern post. This connected the dots for me with cookiecutter as a way to increase standardization and reusability at my job. I think the possibilities really became clear to me when I read about the 3rd project on my list.
pre-commit-dbt
Our data warehouse at work is centered around the dbt package, its recommended design patterns, and even the concept of an Analytics Engineer. I’ll be honest, my team has a pretty good handle on dbt which means I’m largely hands-off on this project. But, from our team meetings my engineering sense was telling me that we could be doing more in terms of enforcing things like testing and documentation. So I was excited to connect the dots back to the pre-commit project (maybe framework is a better word here?) in the latest dbt community newsletter.
From the pre-commit-dbt GitHub page:
List of pre-commit hooks to ensure the quality of your dbt projects.
You’ve got my attention! It continues:
dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.
Perfect, and these are exactly the kinds of code/project quality issues I’m trying to improve on my current team and I think using some of these dbt pre-commit hooks could really help improve the overall quality of our data warehouse as a platform.
And thinking about our data warehouse as a platform leads me to the next interesting project on our list.
Dagster
One of the benefits of doing the Advent of Code puzzles this last December is that it really helped me understand my personal software design preferences. Getting into a daily rhythm of completing a mini-project and then discussing them in a Discord channel with my friends allowed me to refine and articulate some programming ideas I had been taking for granted (link to my code repo). Upon reflection, also I realized that these were ideas that I was failing to communicate effectively while mentoring other team members.
From my point of view, the Advent of Code problems are all essentially mini data pipelines with an input file and a single summary output. This had me advocating to my friends about the merits of highly-decoupled and functional programming paradigms assembled using composition and with strong test coverage (there’s a draft of a newsletter just about this).
You can almost directly scale those ideas of composing decoupled functions with no side-effects from toy python pipelines into full data architectures. At a certain scale this argument runs into pushback about the practical (not computational) scalability of microservices architectures. However, I think there’s still a lot of value in the concept of highly independent chunks of code that do exactly one thing perfectly.
I was thinking about these types of patterns when I remembered a recent issue of Tristan Handy’s Data Science Roundup (Tristan is the founder of Fishtown Analytics which maintains dbt core) talking about a new data orchestration tool called Dagster. From their website:
Dagster is a data orchestrator for machine learning, analytics, and ETL
Continuing:
With Dagster’s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.
You can read more from one of the project organizers on this Medium post including a some helpful sample code. I’ve scanned their docs and there seems to be a lot of overlap with how I’m already thinking about pipelines. I’d have to read more to see how this would compare to something like Airflow, Databricks, or Matillin but I could see this being a very cool project. In fact, this is the project that has me the most excited of anything on my list. I see a lot of possibilities for composing various scripts and processes that are starting to pop up in our team’s work.
There’s even typing:
Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.
Type hints in Python is something I’m increasing pulling into my work. In working on my Advent of Code puzzles I finally started using mypy and type hints and loved the structure and constraints they added to my code. Unfortunately, everything I’ve read and tried has led to believe the type hint integration just isn’t quite there yet in terms of full integration with Pandas + MyPy. Nonetheless, the next two projects on my list for 2021 are interesting in-and-of-themselves but also because of how they leverage type hints for autogeneration of functionality and documentation.
Typer
I’ve been writing command line interfaces since the beginning of my career when I was creating image analysis pipelines for the Hubble Space Telescope and I’ll always have a soft spot for a solid CLI framework. I started out like everyone just manually parsing sys.agrv
, then move to the now deprecated optparse, then argparse, and finally moved on to Click a few years ago which is the modern Python standard and a great tool.
Typer is a newish CLI framework and the self described “little sibling” of FastAPI, the next and final project on this list. From their website:
Typer is a library for building CLI applications that users will love using and developers will love creating. Based on Python 3.6+ type hints
Check out this “Hello world” example:
import typer
def main(name: str):
typer.echo(f"Hello {name}")
if __name__ == "__main__":
typer.run(main)
That’s it. That’s enough to generate a CLI complete with unix style --help
flags and everything:
python main.py --help
Usage: main.py [OPTIONS] NAME
Arguments:
NAME [required]
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or customize the installation.
--help Show this message and exit.
Amazing! This reminds me of the Flask API hello world! Which brings us to our final project I’ve got on my radar for 2021.
FastAPI
I spent a lot of time early in my engineering career bouncing back and forth between Django and Flask. Both are great projects with their own merits. Django is the go-to batteries-included option but a lot of times my projects ended up needing something much more lightweight. Flask offered us that minimal code footprint (at first) but as our projects scaled I found we had to spend more and more time thinking about which design principles we wanted to impose on our app. Or, to be more honest, thinking about how to reconcile the various patterns we had half-implemented throughout our app.
Some of this is inevitable as projects start to scale but Flask’s hands-off nature seems to make the issue especially acute from the get-go. Often times, while I still felt like I didn’t want to go full Django with it’s CRM and ORM, I did still wish Flask was making more decisions for me.
Since then I’ve been increasingly interested in how opinionated projects can enforce a certain structure which can help guide your work and this is one of the things that drew to me to FastAPI. From their website:
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.
FastAPI is Typer’s (older?) sibling. The idea is the same, using type hints allows you to more richly define boilerplate for your API. In the same way that Typer generates a command line help interface, FastAPI generates rich Swagger docs.
As that name would suggest, FastAPI is also fast. Actually a subclass of another project called starlette it’s one of the fastest Python Frameworks. I don’t see myself building too many APIs in the near future but if I do I’m certainly going to reach for this project.
Odds and Ends
Around the Internet:
Welcome to the Center of the Universe: This a long read about the Deep Space Network, the network of satellite dishes that has given us 24/7 contact with all of our deep space missions for over 50 years. It’s an incredible feat of engineering and commitment characteristic of our space program. It also had me reminiscing of my first career in astronomy when I was doing overnight shifts of thermal vacuum testing on the Hubble WFC3 camera at Goddard Space Flight Center.
Reading:
Nature’s Metropolis: I finally wrapped up this long but fantastic read about the economic history of Chicago and the greater midwest. I learned about the many historic economic connections between the places I’ve lived, studied, visited, and vacationed, down to individual buildings I’ve been in. It changed the way I think of my city and the region it’s situated in.
Go Ahead in the Rain: Notes to A Tribe Called Quest: I’m looking forward to poet and writer Hanif Abdurraqib’s part history and part personal memoir of ATCQ. I haven’t done a deep dive on the group’s history since a tumultuous 2016 when in a few short months I watched the Beats, Rhymes & Life documentary for the first time, only to learn of Phife Dawg’s untimely death a few months later, closely followed by the release of their last album.
Listening:
Chick Corea: It’s unfortunate that so recently after losing MF DOOM I have to say goodbye to another treasured musician: jazz pianist Chick Corea. I’ve been fascinated by Chick’s playing since I came across Return to Forever’s Romantic Warrior in high school. Here he is with his long-time collaborator Gary Burton in in a 2016 Tiny Desk Concert:
And here he is 1992 at the Tokyo Blue Note with his New Akoustic Band trio.
Chick knew he was on his way out with a rare form of cancer soon after it was discovered so I’ll close out this newsletter with an excerpt from his goodbye message to his fans.
It is my hope that those who have an inkling to play, write, perform or otherwise, do so. If not for yourself then for the rest of us.
Thanks for reading! You can catch me on twitter at @AlexVianaPro, on GitHub, or on LinkedIn. If you want to subscribe there’s probably a button on this page to do that. Thanks to my wonderful wife for helping with my innumerable typos.