Fold proteins with Chai-1

In biology, function follows form quite literally: the physical shapes of proteins dictate their behavior. Measuring those shapes directly is difficult and first-principles physical simulation prohibitively expensive.

And so predicting protein shape from content — determining how the one-dimensional chain of amino acids encoded by DNA folds into a 3D object — has emerged as a key application for machine learning and neural networks in biology.

In this example, we demonstrate how to run the open source Chai-1 protein structure prediction model on Modal’s flexible serverless infrastructure. For details on how the Chai-1 model works and what it can be used for, see the authors’ technical report on bioRxiv.

This simple script is meant as a starting point showing how to handle fiddly bits like installing dependencies, loading weights, and formatting outputs so that you can get on with the fun stuff. To experience the full power of Modal, try scaling inference up and running on hundreds or thousands of structures!

Setup


    import hashlib
    import json
    from pathlib import Path
    from uuid import uuid4
    
    import modal
    
    here = Path(__file__).parent  # the directory of this file
    
    MINUTES = 60  # seconds
    
    app = modal.App(name="example-chai1-inference")

Fold a protein from the command line

The logic for running Chai-1 is encapsulated in the function below, which you can trigger from the command line by running


    modal run chai1

This will set up the environment for running Chai-1 inference in Modal’s cloud, run it, and then save the results remotely and locally. The results are returned in the Crystallographic Information File format, which you can render with the online Molstar Viewer.

To see more options, run the command with the --help flag.

Code running on Modal runs inside containers built from container images that include that code’s dependencies.

Because Modal images include GPU drivers by default, installation of higher-level packages like chai_lab that require GPUs is painless.

Here, we do it with one line, using the uv package manager for extra speed.

    
    image = modal.Image.debian_slim(python_version="3.12").run_commands(
        "uv pip install --system --compile-bytecode chai_lab==0.5.0 hf_transfer==0.1.8"
    )

Not all “dependencies” belong in a container image. Chai-1, for example, depends on the weights of several models.

Rather than loading them dynamically at run-time (which would add several minutes of GPU time to each inference), or installing them into the image (which would require they be re-downloaded any time the other dependencies changed), we load them onto a Modal Volume. A Modal Volume is a file system that all of your code running on Modal (or elsewhere!) can access. For more on storing model weights on Modal, see this guide.

    
    chai_model_volume = (
        modal.Volume.from_name(  # create distributed filesystem for model weights
            "chai1-models",
            create_if_missing=True,
        )
    )
    models_dir = Path("/models/chai1")

The details of how we handle the download here (e.g. running concurrently for extra speed) are in the Addenda.

Fold proteins with Chai-1

Setup

Fold a protein from the command line

Installing Chai-1 Python dependencies on Modal

Storing Chai-1 model weights on Modal with Volumes

Try this on Modal!