Make.py

Copyright 2021 Brian Davis - CC-BY-NC-SA

In a number of cases I've needed to distribute a python application to computers that might not have python installed. There are numerous tools and methods for this. Py2Exe fit my use case for a time but more recently PyInstaller has been a better fit, producing executables for multiple platforms and with more options for including resources.

Why not use X?

Setuptools/distools have never made sense to me and don't target bundling the interpreter with the source files (AFAIKT). I looked into using Cython, which does have a mode for including the python interpreter but it is more finicky, requiring a full build environment on any platform you wish to distribute to.

As far as automating the build and packaging, there's always some steps outside the build tool that need to be done. Copying in resources that were hard to include, zipping up the final package with the proper naming scheme, writing out the changelog, etc. To make these steps automatic and hence repeatable I tried shell scripts and makefiles. Both could do the job but at the end of the day, I write python mainly because I like writing python. I find it pleasant to work with. Might as well use it for anything it's good for and it's actually pretty good at this kind of thing.

Repeatible Builds

Another thing I wanted was to provide a repository tag to the build script and have it produce a byte-for-byte identical artifact every time. That was surprisingly hard but a python build script made it easier.

Build VMs

I do have to run this process on every platform I want to distribute to, which means I need a working python development evironment on each operating system. I use VirtualBox for that.

Annotated Code

I'm going to present a complete example make.py script here with comments as the annotations.

import os           # various shell related things and of course os.path
import shutil as sh
import subprocess as sub    # calling out to subprocesses is easy with Python3
import tempfile             # a temporary folder is used for the build
import datetime             # See below notes on file timestamps
import time
import zipfile      # I use pyinstallers folder mode but like to zip that up. Users must
                    # extract it to install but that the only step.



NAME = "MYPROG"     # Passed to Pyinstaller as the application name
SEED = "11111"      # Python uses this as essentially a random seed
MAIN = "main"       # the entry point script


def makeExe():
    """ This process makes the generated EXE file and zipfile distribution 
        reproducible 
    """
    repo = os.getcwd()      # usually I keep this script in the repo with the project. 
                            # Often there are minor customizations that needed and it 
                            # makes it so others can build executables if they need to


    # make a temporary directory for the build
    tempdir = tempfile.TemporaryDirectory()
    cwd = tempdir.name
    print("Make temp directory: ", cwd)     # simple print statements give me a sense of
                                            # how the build is progressing


    # create virtual environment            
    print("Creating venv")
    # a bit of a hack to make it work on debian
    pythonbin = (os.name == "posix") and "python3" or "python"

    # TODO: I should specify interpreter version number

    sub.run([pythonbin, "-m", "venv", cwd])  # this makes the python environment   
                                             # reproducible and isn't too slow once the 
                                             # packages are cached.

    os.environ["VIRTUAL_ENV"] = cwd
    # activate the virtual environment
    os.environ["PATH"] = os.path.join(cwd, "bin") + os.pathsep + os.environ["PATH"]     
    # I suspected I would need to unset the below environmental variables but I haven't
    # encountered them on my build systems
    # unset PYTHONHOME, PYTHONLIBS, PYTHONPATH?
    # not set on my systems (Debian 10, Win 10)


    # you must specify versions of libraries you use in requirements.txt
    sub.run(["pip", "install", "-r", "requirements.txt"])


    # get the name of the current checkout

    # I used mercurial exclusively when I first wrote this.
    """checkout = sub.run(
        ["hg", "identify"], capture_output=True
    ).stdout.decode("latin-1")
    """

    # and have since ported it to use git, I'll include both versions.
    checkout = sub.run(
        ["git", "describe", "--tags", "--always"], capture_output=True
    ).stdout.decode()
    tag = checkout.strip()


    # I could modify the script to accept the correct tag from the user and then check   
    # that out but most often the current checkout is the build I want.
    yesno = input("Is ({}) the correct tag? (y/n)".format(tag)).lower().strip()
    if yesno != "y":
        print("Abort")
        return

    # clone the source
    cwd = os.path.join(cwd, "source")   # this makes a good check that I have everything
                                        # of importance in the repo and leaves my working
                                        # repo alone


    os.mkdir(cwd)                       # create a "source" folder inside our temp folder 
                                        # to house the repo

    #["hg", "clone", "--updaterev=" + tag, repo, cwd], capture_output=True
    result = sub.run(
        ["git", "clone", repo, cwd], capture_output=True
    )
    #if len(result.stderr) != 0:    # hg version
    if not result.stderr.decode().endswith("done.\n"):
        print("Failed to clone {}".format(repo))
        print(result.stderr)
        return
    os.chdir(cwd)
    print("Cloned ", repo)


    # hg didn't need this step thanks to --updaterev=
    result = sub.run(
        ["git", "checkout", f"tags/{tag}"], capture_output=True
    )
    if not result.stderr.decode().startswith("Note: switching"):
        print("Failed to checkout {}".format(tag))
        print(result.stderr)
        return
    print(f"Checkout {tag}")


    # get the date of the current checkout
    #result = sub.run(["hg", "log", "-r", tag], capture_output=True)
    result = sub.run(["git", "log", f"tags/{tag}"], capture_output=True)
    #str_date = result.stdout.decode("latin-1").splitlines()[3][5:].strip() # hg version
    str_date = result.stdout.decode().splitlines()[2][5:].strip()
    co_date = time.mktime(datetime.datetime.strptime(str_date, "%c %z").timetuple())
    # you'll see why the commit date is important in a minute


    # python basically uses a random seed for generating dictionary hashes. Without 
    # setting this seed your .pyc files will never be binary reproducible.
    os.environ["PYTHONHASHSEED"] = SEED

    # TODO: add excludes

    # Build the pyinstaller command
    args = ["pyinstaller", "--windowed", "--noconfirm"]
    # args += ["--onefile"] # onefile version
    # usually I use the folder option (default)
    # separate icon files in appropriate formats are needed for windows and OSX
    if sys.platform == "darwin":
        args += ["--osx-bundle-identifier", "com.dev.app", "--icon", "Icon.icns"]
    else:
        args += ["--icon", "ico.ico"]
    args += ["--name", NAME, MAIN + ".py"]


    # run pyinstaller
    print("Running PyInstaller")
    result = sub.run(args)


    # build.txt to dist
    # I like to include the commit hash and tag in a text file as part of the package.
    open(os.path.join("dist", "build.txt"), "w").write(
        "Build: {} {}".format(checkout, str_date)
    )


    # copy data files
    # settings files, images, other assets, etc
    print("Copy files in data")
    sh.copytree("data", os.path.join("dist", "data"))


    # I usually distribute the changelog along with the package. If I'm providing support 
    # I'll also convert this into a RELEASE_NOTES.txt with bulleted list of features and 
    # bugs fixed for each release.
    # In the release notes I try to translate my often terse commit messages into 
    # something more descriptive and provide any necessary context.
    # write the changelog
    print("Writing changelog.txt")
    #result = sub.run(["hg", "log"], capture_output=True)
    result = sub.run(["git", "log"], capture_output=True)
    open(os.path.join("dist", "changelog.txt"), "w").write(
        result.stdout.decode("latin-1")
    )



    # zip up the dist folder
    # TODO: Make self-extracting
    print("Create .zip file")
    sh.move("dist", NAME)


    # by using python's zipfile library instead of subbing out to zip I reduce the    
    # dependencies of the the build environment. 
    # Remember I need to run this on all the platforms I build for.

    zip_filename = NAME + "_v" + tag + "_" + os.name + ".zip"
    zip_archive = zipfile.ZipFile(zip_filename, "w", zipfile.ZIP_DEFLATED)
    filenames = []
    for root, dirs, files in os.walk(NAME):
        for f in files:
            filenames.append(os.path.join(root, f))
    # filenames are sorted so that the zipfile has consistent order (and hash)
    filenames = sorted(filenames)


    # the final key to getting reproducible artefacts are the file modified times in 
    # the zipfile they will be set to time that the repo was cloned so here we overwrite 
    # that with the time from the commit log. Now you will get a consistent zip file for 
    # the commit you're building.

    for fullname in filenames:
        print("Adding: ", fullname)
        # file modified times are set to the date of the selected checkout
        os.utime(fullname, times=(co_date, co_date))
        zip_archive.write(fullname)


    zip_archive.close()
    sh.copy(zip_filename, repo)

    print("****COMPLETE****")


if __name__ == "__main__":
    makeExe()