Package Managers in Python
This page is in edition mode
In data science working with various libraries, packages, tools and environments is essential. Package managers help streamline this process by allowing you to install, manage, and keep track of these software components.
Contents
Package Manager Installers
Python distributions such as Anaconda are not the only way for getting started with Python and data science. Miniconda and Miniforge are lightweight alternatives that focus solely on the Conda package manager. While Anaconda requires more than 4 GB of disk space, Miniconda and Miniforge need only ~400 MB. This is because these package manager installers are less complex in their construction, for example, they don't have built-in integrated development environments (IDE); however, in order to use them properly, you have to learn the basics of command line tools (CLI) and getting familiarized with the computer's terminal.
Miniconda
It is the free minimal installer of conda, Python and other basic packages such as pip. It also provides the core conda functionality for creating and managing environments without the pre-installed packages that come with Anaconda. This makes it a good choice if you already know which packages (and their versions) you need for certain projects. Miniconda uses the Anaconda's channel to retrieve libraries and packages, and is free to use for individuals and small organizations. If you are unsure whether use Anaconda or Miniconda, check this page: "Should I use Anaconda Distribution or Miniconda?"
To install Miniconda using a Graphic User Interface (GUI) follow the steps indicated here: Miniconda installation (GUI) . If you prefer use the terminal, check Minconda installation (CLI)
Miniforge
Built on top of Miniconda, Miniforge is Python installer uses the conda-forge channel by default and not Anaconda's. Conda-forge is an open community-driven repository known for its extensive collection of scientific Python packages. To install Miniforge you need to go to Miniforge download's page and select the installer according to your device or to use the CLI (only for MacOs and Linux devices). For detailed instructions regarding its download and installation, read the conda-forge github repository.
Miniconda vs Miniforge
Now you are probably wondering, which one to use. For a straightforward decision, consider that Miniforge is more suitable for ARM computer architectures (e.g., Apple M1-M4 or Snapdragon's processors) than x86 architectures (e.g., Intel processors). In this sense, if you have Macbook Air M2, you will have a better performance with Miniforge. Moreover, conda-forge channel contains a more curated and updated set of packages. This doesn't mean is not a good option, it is certainly a good option if you are used or prefer the Anaconda ecosystem and don't mind about the specific advantages of conda-forge.
Package managers
After having installed either Anaconda, Miniconda or Miniforge, now you will have Python and Conda package manager installed in your computer. You can check the installion by typing python --version and conda --version in your terminal computer. To open a terminal on windows, press CTRL + R, type cmd. exe (write this without a space, we're sorry, this is due to Wiki formatting) and press enter. On macOS, open launcher and type terminal into the search box, clicking the icon when it appears.
Verifying Python:
(base) user91@mydevice ~ % python --version
Verifying conda:
(base) user91@mydevice ~ % python --version
Conda
Conda is a command line tool used in a terminal to interact with Anaconda. It is a package and environment management software. It can be used to install or update packages, create, save and load environments. To start using conda, open a terminal, type conda and press enter.
Conda is a popular package manager specifically designed for scientific computing in Python. It's often included with Anaconda, a pre-configured Python distribution that comes bundled with a vast array of data science packages.
Conda offers features like:
- **Comprehensive package ecosystem:** Conda includes repositories like conda-forge, which cater specifically to scientific Python packages, providing a wider selection of data science tools than PyPI. - **Environment management:** Conda excels at creating and managing isolated environments for your projects, ensuring compatibility between different package versions. - **Binary packages:** Conda provides pre-built binary packages for many libraries, which can be faster to install compared to pip's source-based installations.
However, Conda also has some drawbacks:
- **Complexity:** Compared to pip, Conda's command-line interface can be more complex for beginners. - **Large package size:** Anaconda, which includes Conda, can be quite large to download due to the pre-installed packages.
Pip
Pip (Package Installer for Python) is the official package manager for Python. It's a simple and widely used tool that comes bundled with most Python installations (Python 3.3 onwards). Pip connects to the Python Package Index (PyPI), a vast repository containing thousands of free and open-source Python packages for various purposes.
Here's what pip offers:
- **Easy installation:** Install packages with a simple `pip install <package_name>` command. - **Dependency management:** Pip automatically downloads and installs any dependencies required by the package you're installing. - **Package updates:** Easily update packages to their latest versions using `pip install --upgrade <package_name>`. - **Manages virtual environments:** Pip can be used within virtual environments to isolate project dependencies.
While pip is great for general Python packages, it might not be ideal for data science specifically due to:
- **Limited package selection:** PyPI primarily focuses on general-purpose Python packages. While it includes many data science libraries, it might not have the most specialized tools for niche areas. - **Dependency conflicts:** With a vast number of packages, managing dependencies across different projects can sometimes lead to conflicts.
Pip vs. Conda: Choosing the Right Tool
Both pip and Conda are valuable tools for data science, but the best choice depends on your specific needs:
- **For beginners or smaller projects:** Pip is a simpler option with a wider user base and extensive documentation. Its ease of use and focus on core Python packages make it a great starting point. - **For data science projects:** Conda offers a wider range of scientific computing libraries and excels at managing complex environments. If you're working on data science projects that require specialized tools and version control, Conda might be a better fit.
Ultimately, many data scientists utilize both tools. Pip can manage core Python functionalities, while Conda takes care of the data science-specific environment and its extensive libraries.