Antoine Prouvost
Gerad, Feb 20th 2020
Think like a git is a lesser known ressource to think about git as graphs.
#!/usr/bin/env python
import sys
count = 0
for line in sys.stdin:
count += len(line.split())
print(count)
echo "Hello world!" | python word_counter.py
2
echo "Hello world!" | wc -w
2
counter/
├── counter.py
├── word_counter.py
└── line_counter.py
#!/usr/bin/env python
import word_counter
#!/usr/bin/env python
import .word_counter
#!/usr/bin/env python
from . import word_counter
#!/usr/bin/env python
import sys
# FIXME: is this current dir or dir of current file?
sys.path.append(".")
import word_counter
#!/usr/bin/env python
import sys
sys.path.append(__file__)
import word_counter
counter/
├── __init__.py
├── counter.py
├── word_counter.py
└── line_counter.py
ipython
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
In [1]:
python -m IPython
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
In [1]:
pip show ipython
Name: ipython
Version: 7.12.0
Summary: IPython: Productive Interactive Computing
Location: .../miniconda3/lib/python3.7/site-packages
...
A collection of
that is
PyPA (Python Package Authority), is the reference on packaging.
For example, the packages
are all available on PyPI (Python Package Index).
This is what yout get with pip install
(by default).
counter-project/
├── counter/
│ ├── __init__.py
│ ├── word_counter.py
│ └── line_counter.py
├── setup.py
├── README.md
├── LICENSE
└── .gitignore
from setuptools import setup, find_packages
setup(
name="counter", # Name shown in Pip
version="0.1.0",
packages=find_packages(), # Name of module
author="Václav Chvátal",
install_requires=["numpy"], # Resolved by Pip
... # More information
)
pip install counter-project/
python -m pip install counter-project/
pip install git+https://github.com/Chvatal/counter-project
Making a package does not mean you have to upload it to PyPI.
You can use absolute import anywhere (inside the package code, in a script, test, jupyter...)
import counter.line_counter
You can only use relative import inside the package (e.g. in
word_counter
).
from . import line_counter
Never hack sys.path
ever again.
pwd
/home/chvatal/counter-project
ls
counter/ setup.py README.md LICENSE
pip install .
...
vim counter/word_counter.py
...
ipython
In [1]: import counter.word_counter
In [2]:
The module imported is the local file, not the installed package!
Packages are self-contained. Pip made a copy of the counter project code.
Local files have higher priority. The module is imported in a local fashion (no package as in the scripting example).
pip install --editable counter-project/
src
layout.
Optional, but recommended for more complex project (e.g. compiled extension).
src
layout
The src
layout is
debated
Is it overkill?
Probably for simple projects.
But for more complex setup.py
, it gets relevant to always use the
package.
Is it hard?
Litterally two lines.
counter-project/
├── src/
│ └── counter/
│ ├── __init__.py
│ ├── word_counter.py
│ └── line_counter.py
├── setup.py
├── README.md
├── LICENSE
└── .gitignore
from setuptools import setup, find_packages
setup(
packages=find_packages(where="src"),
package_dir={"": "src"},
...
)
counter-project/
├── src/
│ └── counter/
│ ├── __init__.py
│ ├── word_counter.py
│ └── line_counter.py
├── test/
│ └── test_word_counter.py
├── setup.py
├── README.md
├── LICENSE
└── .gitignore
Test can be executables, or part of a testing framework (std testing
,
PyTest
).
import counter.word_counter
in out tests as easily as
import numpy
from counter.word_counter import WordCounter
def test_WordCounter_default():
counter = WordCounter()
assert counter("Hello world!\n I am alive!") == 5
if __name__ == "__main__":
test_WordCounter_default()
python test/test_word_counter.py
A test runner
A test framework, with utility and cutomization
PyTest collects all test as functions starting with test_
in all
python files starting with test_
.
Use assert
to test for anything.
from counter.word_counter import WordCounter
def test_WordCounter_default():
counter = WordCounter()
assert counter("Hello world!\n I am alive!") == 5
python -m pytest test/
Create fake objects that imitates a class, and record its own usage
Available in standard lib
unitest.mock
Complex optimization algorithms with auxillary function for gradient descent step (also user implemented).
Failure in the gradient update leads to errors in the overall algorithm (integration test).
Mocking the gradient update, we can test the algorithm independently (unit test).
counter-project/
├── src/counter/...
├── test/...
├── script/
│ └── word-counter
├── setup.py
├── README.md
├── LICENSE
└── .gitignore
word-counter
script
#!/usr/bin/env python
import argparse
import sys
from counter.word_counter import WordCounter
# Some command line arguments parsing ...
counter = WordCounter()
print(counter(sys.stdin.read()))
Scripts are only visible locally from the source directory.
Add them to the package with:
from setuptools import setup, find_packages
setup(
packages=find_packages(where="src"),
package_dir={"": "src"},
scripts=["script/word-counter"],
...
)
Anywhere the package is installed, you can run
echo "Hello word!" | word-counter
In the package src/counter/word_counter.py
...
def main():
import argparse
import sys
# Some command line arguments parsing ...
counter = WordCounter()
print(counter(sys.stdin.read()))
if __name__ == "__main__":
main()
Anywhere the package is installed, you can run
echo "Hello word!" | python -m counter.word_counter
from setuptools import setup, find_packages
setup(
packages=find_packages(where="src"),
package_dir={"": "src"},
entry_points = {
"console_scripts": [
"word-counter=counter.word_counter:main"
],
}
...
)
Anywhere the package is installed, you can run
echo "Hello word!" | word-counter
scipts
option can ship non python scripts.script
directory for developpement scripts.
ArgParse
check out
Click
.
grep
, a website).
Both are fine!
Write modular algorithms in the package
As configurable as possible.
No experiment definitions
Do not impose any choice, e.g. no filenames or config reading.
Write experiements as scripts
Use an experiment/
folder for experiment scripts.
CPython is the most
popular implemntation of the Python interpreter, and is written in C
.
Everything that can be done in Python, can be done in C
through the
library.
In particular, it is possible to create a Python function that runs arbitrary
C
code.
from setuptools import setup, find_packages, Extension
ext = Extension(
name="counter.fast_word_counter",
... # More compiling options
)
setup(
packages=find_packages(where="src"),
package_dir={"": "src"},
ext_modules=[ext],
...
)
Need to reinstall everytime the extension is recompiled (not editable for that part).
Extension
Options
ext = Extension(
name="counter.fast_word_counter",
language="c++",
sources=..., # Files to compile
include_dirs=..., # -I
library_dirs=..., # -L
runtime_library_dirs=..., # -R
libraries=..., # -l
define_macros=..., # -D
...
)
Python libraries are added automatically.
Cython makes it easy to write compiled extensions without knowing the CPython API.
.pyx
and .pxd
can be added directy to
Extension
sources
and include_dirs
directly.
cythonize
will be called automatically.
However, Cython needs to be installed.
Cython is required to build (compile) the extension but not to run it.
from setuptools import setup, find_packages, Extension
ext = Extension(
name="counter.fast_word_counter",
... # More compiling options
)
setup(
packages=find_packages(where="src"),
package_dir={"": "src"},
ext_modules=[ext],
setup_requires=["cython"],
...
)
PyBind11 is much more appropriate than Cython to bind C++ code.
Great for object oriented extensions, and modern C++ (e.g. smart pointers).
General, should be as wide as supported.
from setuptools import setup, find_packages
setup(
packages=find_packages(where="src"),
package_dir={"": "src"},
install_requires=["numpy", "scipy>=1.2"]
...
)
Pip will install the dependencies and their recursive dependencies.
Should be as precise as possible (locked) for full reproducability.
Packages API change between versions.
Locking dependencies is complicated. Multiple solutions exists.
pip freeze
pip freeze
numpy==1.18.1
scipy==1.4.1
pip freeze > requirements.txt
pip install -r requirements.txt
...
pip freeze
Isolated Python installations.
Each with installed packages.
Disposable if a dependency file (e.g. requirements.txt
) is kept
along.
Minimal, avoid versions conflicts.
Ideal to install locked dependencies.
Ideally one per project.
python -m venv counter-venv/
source counter-venv/bin/activate
which pip
counter-venv/bin/pip
pip install .
...
Pipenv is a tool to combine
virtual environments and pip
, with stronger locking mechnisms.
Unfortunately the future of the project is uncertain
Dependencies can be stated in setup.py
and additionally locked
(frozen) for experiment reproducability.
Having both general and locked dependencies makes it easier to manage & upgrade.
Lot of libraries depends on external libraries (e.g. C
).
On PyPI (pip
), developers do their best to package them.
Conda is a general purpose package manager (Python, JavaScript, C/C++, general Linux tooling).
Anaconda is Conda with lots of pre-downloaded packages.
Conda-Forge is a Conda channel with lot of open-source projects.
conda
.
name: environment_name
channels:
- conda-forge
- pytorch
dependencies:
- python=3.8
- numpy
conda env create --file environment_name.yml
Or for an already exsiting environment
conda env update --file environment.yml
conda list > environment_name.txt
Or stronger (OS dependant)
conda list > environment_name.txt
conda create --name environment_name --file environment_name.txt
Or for an already exsiting environment
conda install --name environment_name --file environment_name.txt
Is more complicated, and outside the scope of this tutorial.
"Light virtual machines"
OS-level reproducability
E.g. Docker and Singularity.
"If all you have is a hammer, everything looks like a nail"
Using type hints Mypy can detect inconsistencies without running the code.
Black is the uncompromising Python code formatter.