NumPy用户指南(2)——安装NumPy
此系列文章参照NumPy
官方用户指南1.2.1
版本进行翻译、解读(直译说不清楚的可能会加入自己的一些理解)。
安装NumPy
安装NumPy
的唯一必备条件就是已经安装了Python
。如果还没有安装Python
,想用最简单的方式安装NumPy
,我们推荐使用Anaconda
发行版,它包含了Python
、 NumPy
以及其他Python科学计算和数据科学常用的包。
NumPy
可以使用conda
、 pip
、 macOS
和 Linux
的包管理器或者源码进行安装。具体命令可参考下方的Python
和 NumPy
安装指南。
CONDA
如果使用conda
安装,可以通过默认或者conda-forge
频道安装NumPy
。
# Best practice, use an environment rather than install in the base env
conda create -n my-env
conda activate my-env
# If you want to install from conda-forge
conda config --env --add channels conda-forge
# The actual install command
conda install numpy
PIP
如果使用pip
安装,可以通过以下命令安装NumPy
。
pip install numpy
如果使用pip
安装,最好使用虚拟环境。
Python
和 NumPy
安装指南
Python
的安装方式和包管理方式非常复杂,对于大部分任务来说,不同的解决方案有很多种。该指南提供了一种最佳(或最流行)的解决方案并给出明确的建议。指南重点关注通用操作系统和硬件上的Python
、NumPy
和PyData
(或数值计算)技术栈用户。
建议
We’ll start with recommendations based on the user’s experience level and operating system of interest. If you’re in between “beginning” and “advanced”, please go with “beginning” if you want to keep things simple, and with “advanced” if you want to work according to best practices that go a longer way in the future.
初学者
对于所有Windows, macOS, and Linux操作系统:
- 安装
Anaconda
(它会按照所有所需软件以及下面提到的其他软件)。 - 对于编写和执行代码,使用
JupyterLab
中的笔记本(notebook)进行探索性和交互式计算,使用Spyder
或visual studio
代码编写脚本和包。 - 使用
Anaconda Navigator
管理包和启动JupyterLab
、Spyder
或Visual Studio Code
。
高级用户
Windows或 macOS
- 安装
Miniconda
。 - 保留最小
conda
环境,使用一个或更多conda
环境当前项目或任务所需安装的包。 - 除非只想安装默认频道中的包,否则通过频道优先级将
conda-forge
设置为默认频道。
Linux
如果能够接受稍微过时的软件包,并且更喜欢稳定而不是最新版本的库:
-
尽可能使用操作系统自带的包管理器(安装Python、
NumPy
以及其他库)。 -
使用
pip Install somepackage --user
安装包管理器中未提供的包。
使用GPU:
- 安装
Miniconda
。 - 保留最小
conda
环境,使用一个或更多conda
环境当前项目或任务所需安装的包。 - 使用默认的
conda
频道 (目前,conda-forge
中的包对GPU支持不太完善)。
其他
- 安装
Miniforge
。 - 保留最小
conda
环境,使用一个或更多conda
环境当前项目或任务所需安装的包。
如果更喜欢使用pip/PyPI
For users who know, from personal preference or reading about the main differences between conda and pip below, they prefer a pip/PyPI-based solution, we recommend:
Install Python from python.org, Homebrew, or your Linux package manager.
Use Poetry as the most well-maintained tool that provides a dependency resolver and environment management capabilities in a similar fashion as conda does.
Python
包管理
Managing packages is a challenging problem, and, as a result, there are lots of tools. For web and general purpose Python development there’s a whole host of tools complementary with pip. For high-performance computing (HPC), Spack is worth considering. For most NumPy users though, conda and pip are the two most popular tools.
PIP & CONDA
The two main tools that install Python packages are pip and conda. Their functionality partially overlaps (e.g. both can install numpy), however, they can also work together. We’ll discuss the major differences between pip and conda here - this is important to understand if you want to manage packages effectively.
The first difference is that conda is cross-language and it can install Python, while pip is installed for a particular Python on your system and installs other packages to that same Python install only. This also means conda can install non-Python libraries and tools you may need (e.g. compilers, CUDA, HDF5), while pip can’t.
The second difference is that pip installs from the Python Packaging Index (PyPI), while conda installs from its own channels (typically “defaults” or “conda-forge”). PyPI is the largest collection of packages by far, however, all popular packages are available for conda as well.
The third difference is that conda is an integrated solution for managing packages, dependencies and environments, while with pip you may need another tool (there are many!) for dealing with environments or complex dependencies.
REPRODUCIBLE INSTALLS
As libraries get updated, results from running your code can change, or your code can break completely. It’s important to be able to reconstruct the set of packages and versions you’re using. Best practice is to:
use a different environment per project you’re working on,
record package names and versions using your package installer; each has its own metadata format for this:
Conda: conda environments and environment.yml
Pip: virtual environments and requirements.txt
Poetry: virtual environments and pyproject.toml
NUMPY PACKAGES & ACCELERATED LINEAR ALGEBRA LIBRARIES
NumPy doesn’t depend on any other Python packages, however, it does depend on an accelerated linear algebra library - typically Intel MKL or OpenBLAS. Users don’t have to worry about installing those (they’re automatically included in all NumPy install methods). Power users may still want to know the details, because the used BLAS can affect performance, behavior and size on disk:
The NumPy wheels on PyPI, which is what pip installs, are built with OpenBLAS. The OpenBLAS libraries are included in the wheel. This makes the wheel larger, and if a user installs (for example) SciPy as well, they will now have two copies of OpenBLAS on disk.
In the conda defaults channel, NumPy is built against Intel MKL. MKL is a separate package that will be installed in the users’ environment when they install NumPy.
In the conda-forge channel, NumPy is built against a dummy “BLAS” package. When a user installs NumPy from conda-forge, that BLAS package then gets installed together with the actual library - this defaults to OpenBLAS, but it can also be MKL (from the defaults channel), or even BLIS or reference BLAS.
The MKL package is a lot larger than OpenBLAS, it’s about 700 MB on disk while OpenBLAS is about 30 MB.
MKL is typically a little faster and more robust than OpenBLAS.
Besides install sizes, performance and robustness, there are two more things to consider:
Intel MKL is not open source. For normal use this is not a problem, but if a user needs to redistribute an application built with NumPy, this could be an issue.
Both MKL and OpenBLAS will use multi-threading for function calls like np.dot, with the number of threads being determined by both a build-time option and an environment variable. Often all CPU cores will be used. This is sometimes unexpected for users; NumPy itself doesn’t auto-parallelize any function calls. It typically yields better performance, but can also be harmful - for example when using another level of parallelization with Dask, scikit-learn or multiprocessing.
TROUBLESHOOTING
If your installation fails with the message below, see Troubleshooting ImportError.
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed. This error can happen for
different reasons, often due to issues with your setup.
更多推荐
所有评论(0)