Наперегонки со временем: на что способен python в однопоточных вычислениях?

Содержание:

The main features of PyPy:
Creating virtual environments
Совместимость с CPython
Application Configuration
История проекта
Цели проекта
- Трансляция
- PyPy как средство реализации интерпретаторов
Unicode Issues
Supporting Older (
The Application/Framework Side
Итоги эксперимента

The main features of PyPy:

Speed

Our comes with a Just-in-Time compiler. It is
really fast in running most benchmarks—including very large and
complicated Python applications, not just 10-liners.

There are two cases that you should be aware where PyPy will not be
able to speed up your code:

Short-running processes: if it doesn’t run for at least a few seconds,
then the JIT compiler won’t have enough time to warm up.
If all the time is spent in run-time libraries (i.e. in C functions),
and not actually running Python code, the JIT compiler will not help.

So the case where PyPy works best is when executing long-running
programs where a significant fraction of the time is spent executing
Python code. This is the case covered by the majority of our
benchmarks, but not all of them — the goal of PyPy is to get speed
but still support (ideally) any Python program.

Memory usage

Memory-hungry Python programs (several hundreds of MBs or more) might
end up taking less space than they do in CPython. It is not always
the case, though, as it depends on a lot of details. Also note that
the baseline is higher than CPython’s.

Stackless

Support for Stackless and greenlets are now integrated in the normal
PyPy. More detailed information is available here.

Other features

PyPy has many secondary features and semi-independent
projects. We will mention here:

Other languages: we also implemented other languages that makes
use of our RPython toolchain: Prolog (almost complete), as
well as Smalltalk, JavaScript, Io, Scheme and Gameboy.
There is also a Ruby implementation called Topaz and a PHP implementation
called HippyVM.

Sandboxing

PyPy’s sandboxing is a working prototype for the idea of running untrusted
user programs. Unlike other sandboxing approaches for Python, PyPy’s does not
try to limit language features considered «unsafe». Instead we replace all
calls to external libraries (C or platform) with a stub that communicates
with an external process handling the policy.

Note

Please be aware that it is a prototype only. It needs work to become
more complete, and you are welcome to help. In particular, almost none
of the extension modules work (not even ), and
is merely a demo. Also, a more complete system would include a way
to do the same as from other languages than Python,
to embed a sandboxed interpreter inside programs written in other
languages.

To run the sandboxed process, you need to get the full sources and
build from it (see ). These
instructions give you a that you should rename to
to avoid future confusion. Then run:

cd pypy/sandbox
pypy_interact.py path/to/pypy-sandbox
# don't confuse it with pypy/goal/pyinteractive.py!

You get a fully sandboxed interpreter, in its own filesystem hierarchy
(try ). For example, you would run an untrusted
script as follows:

mkdir virtualtmp
cp untrusted.py virtualtmp/
pypy_interact.py --tmp=virtualtmp pypy-sandbox /tmp/untrusted.py

Note that the path is a path inside the sandboxed
filesystem. You don’t have to put in the real
directory at all.

To read more about its features, try or go to
our documentation site.

Creating virtual environments

This PEP also proposes adding a new venv module to the standard
library which implements the creation of virtual environments. This
module can be executed using the -m flag:

python3 -m venv /path/to/new/virtual/environment

A pyvenv installed script is also provided to make this more
convenient:

pyvenv /path/to/new/virtual/environment

Running this command creates the target directory (creating any parent
directories that don’t exist already) and places a pyvenv.cfg file
in it with a home key pointing to the Python installation the
command was run from. It also creates a bin/ (or Scripts on
Windows) subdirectory containing a copy (or symlink) of the python3
executable, and the pysetup3 script from the packaging standard
library module (to facilitate easy installation of packages from PyPI
into the new venv). And it creates an (initially empty)
lib/pythonX.Y/site-packages (or Lib\site-packages on Windows)
subdirectory.

If the target directory already exists an error will be raised, unless
the --clear option was provided, in which case the target
directory will be deleted and virtual environment creation will
proceed as usual.

The created pyvenv.cfg file also includes the
include-system-site-packages key, set to true if pyvenv is
run with the --system-site-packages option, false by default.

Multiple paths can be given to pyvenv, in which case an identical
venv will be created, according to the given options, at each
provided path.

The venv module also places «shell activation scripts» for POSIX and
Windows systems in the bin or Scripts directory of the
venv. These scripts simply add the virtual environment’s bin (or
Scripts) directory to the front of the user’s shell PATH. This is
not strictly necessary for use of a virtual environment (as an explicit
path to the venv’s python binary or scripts can just as well be used),
but it is convenient.

In order to allow pysetup and other Python package managers to
install packages into the virtual environment the same way they would
install into a normal Python installation, and avoid special-casing
virtual environments in sysconfig beyond using sys.base_prefix
in place of sys.prefix where appropriate, the internal virtual
environment layout mimics the layout of the Python installation itself
on each platform. So a typical virtual environment layout on a POSIX
system would be:

pyvenv.cfg
bin/python3
bin/python
bin/pysetup3
include/
lib/python3.3/site-packages/

While on a Windows system:

pyvenv.cfg
Scripts/python.exe
Scripts/python3.dll
Scripts/pysetup3.exe
Scripts/pysetup3-script.py
        ... other DLLs and pyds...
Include/
Lib/site-packages/

Third-party packages installed into the virtual environment will have
their Python modules placed in the site-packages directory, and
their executables placed in bin/ or Scripts.

Note

On a normal Windows system-level installation, the Python binary
itself wouldn’t go inside the «Scripts/» subdirectory, as it does
in the default venv layout. This is useful in a virtual
environment so that a user only has to add a single directory to
their shell PATH in order to effectively «activate» the virtual
environment.

Совместимость с CPython

Версия 5.6.0 совместима с версией Python 2.7.12 и может работать на 32- и 64-битных платформах (кроме Windows, где поддерживается только 32-битная версия). PyPy полностью поддерживает модули, написанные на чистом Python. Для использования двоичных расширений (.so и .pyd) PyPy имеет хорошую поддержку API CPython в виде отдельного модуля cpyext. Для нормальной работы этих расширений требуется их перекомпиляция.

Версия PyPy3 5.5 совместима с CPython 3.3.5.

Так же ведётся активная разработка PyPy3.5, реализующей Python 3.5.

Известно, что следующие библиотеки и фреймворки могут работать в PyPy:

ctypes
django
sqlalchemy
flask
twisted
pylons
IPython
Selenium
nevow (en:nevow)
pyglet (en:pyglet)
pillow (форк Python Imaging Library)
lxml
NumPy (неполная совместимость).
а также множество других (менее популярных) библиотек

Application Configuration

This specification does not define how a server selects or obtains an
application to invoke. These and other configuration options are
highly server-specific matters. It is expected that server/gateway
authors will document how to configure the server to execute a
particular application object, and with what options (such as
threading options).

Framework authors, on the other hand, should document how to create an
application object that wraps their framework’s functionality. The
user, who has chosen both the server and the application framework,
must connect the two together. However, since both the framework and
the server now have a common interface, this should be merely a
mechanical matter, rather than a significant engineering effort for
each new server/framework pair.

Finally, some applications, frameworks, and middleware may wish to
use the environ dictionary to receive simple string configuration
options. Servers and gateways should support this by allowing
an application’s deployer to specify name-value pairs to be placed in
environ. In the simplest case, this support can consist merely of
copying all operating system-supplied environment variables from
os.environ into the environ dictionary, since the deployer in
principle can configure these externally to the server, or in the
CGI case they may be able to be set via the server’s configuration
files.

Applications should try to keep such required variables to a
minimum, since not all servers will support easy configuration of
them. Of course, even in the worst case, persons deploying an
application can create a script to supply the necessary configuration
values:

from the_app import application

def new_app(environ, start_response):
    environ = 'something'
    return application(environ, start_response)

История проекта

PyPy является продолжением проекта Psyco, JIT-компилятора для Python, разработанный Армином Риго (Armin Rigo). Цель PyPy в том, чтобы иметь JIT-компилятор с охватом, который не был доступен для Psyco. PyPy начался как исследовательский проект для разработчиков.

Когда проект достиг зрелой стадии развития и официальной версии 1.0 в середине 2007 года, следующий акцент был сделан на выпуск production-ready версии с большей совместимостью с CPython.

Версия 1.1 была выпущена 28 апреля 2009 года.

В марте 2010 года вышла версия 1.2, в которой особое внимание было уделено скорости. Эта версия включает в себя JIT-компилятор, который работает, но не рекомендуется для использования в production.. 26 ноября 2010 года была выпущена версия 1.4

Эта версия впервые в режиме JIT-компилятора по скорости превосходит CPython. Также разработчики считают, что эта версия готова для использования в production.

26 ноября 2010 года была выпущена версия 1.4. Эта версия впервые в режиме JIT-компилятора по скорости превосходит CPython. Также разработчики считают, что эта версия готова для использования в production.

В рамках PyPy разрабатывается специальная версия интерпретатора pypy-stm, в которой реализована программная транзакционная память. Использование транзакционной памяти позволит избавиться от GIL и упростит распараллеливание Python приложений на многоядерных системах.

9 мая 2013 года вышла вторая версия PyPy, в число новшеств которой входит бесстековый режим и новый интерфейс работы с внешними функциями на языке Си — cffi.

10 марта 2016 года вышла пятая версия PyPy, в которой была улучшена производительность, и API CPython получило множество улучшений.

9 августа 2016 года PyPy получил финансирование в размере $200,000 от Mozilla для поддержки Python 3.5.

12 ноября 2016 года вышла версия PyPy2 v5.6, самое главное изменение — стандартная библиотека Python 2.7.12

Цели проекта

PyPy был задуман как реализация Python, написанная на Python. Тот факт, что PyPy реализован на языке высокого уровня, делает его более гибким и позволяет легче экспериментировать с новыми возможностями, чем CPython, а также легко определить области, где он может быть улучшен.

PyPy призван обеспечить единый механизм трансляции. Он поддерживает фреймворк для реализации динамических языков программирования и осуществляет чёткое разделение между спецификацией языка и его реализацией.

Он также призван обеспечить совместимость, гибкость и быстроту реализации языка программирования Python и позволяет реализовывать новые возможности без необходимости программирования на языке низкого уровня.

Трансляция

PyPy состоит из стандартного интерпретатора и транслятора.

Интерпретатор полностью реализует язык Python. Сам интерпретатор написан на ограниченном подмножестве этого же языка, называемом RPython (Restricted Python). В отличие от стандартного Python, RPython является статически типизированным для более эффективной компиляции.

Транслятор является набором инструментов, который анализирует код RPython и переводит его в языки более низкого уровня, такие как C, байт-код Java или CIL. Он также поддерживает подключаемые сборщики мусора и позволяет опционально включать Stackless. Также он включает JIT-компилятор для трансляции кода в машинные инструкции во время исполнения программы.

PyPy как средство реализации интерпретаторов

Компилятор языка RPython можно использовать и для написания интерпретаторов с других языков программирования. Добавив в код такого интерпретатора импорт класса JitDriver и создание его экземпляра, а затем передав в этот класс списки глобальных переменных, изменяемых и неизменяемых в ходе выполнения программы, а также сделав ещё несколько очевидных деклараций, мы, после трансляции с флагом , получаем работающий JIT-компилятор языка.

Unicode Issues

HTTP does not directly support Unicode, and neither does this
interface. All encoding/decoding must be handled by the application;
all strings passed to or from the server must be standard Python byte
strings, not Unicode objects. The result of using a Unicode object
where a string object is required, is undefined.

Note also that strings passed to start_response() as a status or
as response headers must follow RFC 2616 with respect to encoding.
That is, they must either be ISO-8859-1 characters, or use RFC 2047
MIME encoding.

On Python platforms where the str or StringType type is in
fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
«strings» referred to in this specification must contain only
code points representable in ISO-8859-1 encoding (\u0000 through
\u00FF, inclusive). It is a fatal error for an application to
supply strings containing any other Unicode character or code point.
Similarly, servers and gateways must not supply
strings to an application containing any other Unicode characters.

Supporting Older (

Some servers, gateways, or applications may wish to support older
(<2.2) versions of Python. This is especially important if Jython
is a target platform, since as of this writing a production-ready
version of Jython 2.2 is not yet available.

For servers and gateways, this is relatively straightforward:
servers and gateways targeting pre-2.2 versions of Python must
simply restrict themselves to using only a standard «for» loop to
iterate over any iterable returned by an application. This is the
only way to ensure source-level compatibility with both the pre-2.2
iterator protocol (discussed further below) and «today’s» iterator
protocol (see PEP 234).

(Note that this technique necessarily applies only to servers,
gateways, or middleware that are written in Python. Discussion of
how to use iterator protocol(s) correctly from other languages is
outside the scope of this PEP.)

For applications, supporting pre-2.2 versions of Python is slightly
more complex:

You may not return a file object and expect it to work as an iterable,
since before Python 2.2, files were not iterable. (In general, you
shouldn’t do this anyway, because it will perform quite poorly most
of the time!) Use wsgi.file_wrapper or an application-specific
file wrapper class. (See
for more on wsgi.file_wrapper, and an example class you can use
to wrap a file as an iterable.)
If you return a custom iterable, it must implement the pre-2.2
iterator protocol. That is, provide a __getitem__ method that
accepts an integer key, and raises IndexError when exhausted.
(Note that built-in sequence types are also acceptable, since they
also implement this protocol.)

Finally, middleware that wishes to support pre-2.2 versions of Python,
and iterates over application return values or itself returns an
iterable (or both), must follow the appropriate recommendations above.

The Application/Framework Side

The application object is simply a callable object that accepts
two arguments. The term «object» should not be misconstrued as
requiring an actual object instance: a function, method, class,
or instance with a __call__ method are all acceptable for
use as an application object. Application objects must be able
to be invoked more than once, as virtually all servers/gateways
(other than CGI) will make such repeated requests.

(Note: although we refer to it as an «application» object, this
should not be construed to mean that application developers will use
WSGI as a web programming API! It is assumed that application
developers will continue to use existing, high-level framework
services to develop their applications. WSGI is a tool for
framework and server developers, and is not intended to directly
support application developers.)

Here are two example application objects; one is a function, and the
other is a class:

Итоги эксперимента

Без оптимизаций	Константы в namedtuple	Сокращение вызовов	Локальные переменные	PyPy	C++
338.19	215	25.04	20.56	3.29	0.44

В ходе оптимизации нам удалось повысить скорость расчёта более чем в 100 раз. И всё же использование низкоуровневого языка C++ в нашем случае всё равно оказалось более эффективным. Делаем вполне предсказуемый вывод: Python не подходит для объёмных однопоточных вычислений. Нужно всегда помнить о рамках применимости языка и грамотно выбирать инструменты для решения той или иной задачи. Оптимизация и поиск обходных путей — очень увлекательный процесс. Но время и силы можно потратить и с большей пользой.