And cleaning them up.
TLDR; Download snakefood and look at this gist.
Recently I started working on a python project with a large and mature codebase. During development, the codebase was evolving and some parts were left behind, unused.
To clean things up, I wanted to find all unused python modules in the project.
Vulture is a great tool to find unused code inside your files, but I wanted to find whole files that weren’t being used. For this, I needed some dependency analysis tool.
Coverage allows us to see which code is being used by running our project, but we have many entrypoints and code flows.
Snakefood is a command line tool that analyses dependencies, and can great graphs from them. I decided to use the dependency analysis to find which files weren’t included by any of our code.
A summary is available in this gist, and I explain the details below.
How It Works
To generate a dependency file, we run sfood with the internal flag -i
on our directory. The internal flag will exclude any files outside of our project. Since this can take some time, we cache results to a temporary file:
sfood -i <dir> > /tmp/out.deps
The dependency file contains many dependency lines, like the following:
# file main.py depends on common/redis.py
(('/tmp', ‘project/main.py’), ('/tmp', ‘project/common/redis.py’))
We’re interested in which files are being dependant on, using pipes we can create a command that parses those:
cat /tmp/out.deps | \
grep -v test | \
cut -d"'" -f8 | \
sort | \
uniq > /tmp/required.txt
- Split the dependency by
'
and get the 8th field, the filename. - Sort filenames and get unique filenames
- Save result for later comparison.
We can generate a list of all our python files using some bash scripting:
find . -name '*.py' | \
grep -v "__init__.py" | \
grep -v "test" | \
sort > /tmp/modules.txt
Comparing those files, we get what we wanted, the list of files that aren’t dependant on:
> diff /tmp/modules.txt /tmp/required.txtproject/main.py
project/common/unused.py
Filtering out our entry point project/main.py
and framework files, we get our list of dead files.
EOF
That’s it. Hope it helped someone, wasn’t enough information about this when I was searching. Comment below if you know a better way, I’ll be glad to hear it.