Настройка параметров ядра linux для оптимизации postgresql

Содержание:

Disabling Transparent HugePages (RHEL6/OL6 and RHEL7/OL7)
Включение Hugepages в Linux
Virtualisation
- KVM
- Xen
Tips and tricks
- Nested virtualization
- Enabling huge pages
Hugepages in tmpfs/shmem¶
Monitoring usage¶
Using Huge Pages¶
Альтернативные способы входа
Enabling HugeTlbPage
Using Huge Pages¶
How to allocate HugePages?

Disabling Transparent HugePages (RHEL6/OL6 and RHEL7/OL7)

Starting from RHEL6/OL6, Transparent HugePages are implemented and enabled by default. They are meant to improve memory management by allowing HugePages to be allocated dynamically by the «khugepaged» kernel thread, rather than at boot time like conventional HugePages. That sounds like a good idea, but unfortunately Transparent HugePages don’t play well with Oracle databases and are associated with node reboots in RAC installations and performance problems on both single instance and RAC installations. As a result Oracle recommends disabling Transparent HugePages on all servers running Oracle databases, as described in this MOS note.

ALERT: Disable Transparent HugePages on SLES11, RHEL6, RHEL7, OL6, OL7 and UEK2 Kernels (Doc ID 1557478.1)

The following examples use the base path of «/sys/kernel/mm/transparent_hugepage/» which is used by OL6/OL7. For RHEL6/RHEL7 use «/sys/kernel/mm/redhat_transparent_hugepage/» as the base path.

You can check the current setting using the following command, which is displaying the default value of «enabled=».

# cat /sys/kernel/mm/transparent_hugepage/enabled
 madvise never
#

For Oracle Linux 6 the preferred method to disable Transparent HugePages is to add «transparent_hugepage=never» to the kernel boot line in the «/boot/grub/grub.conf» file.

title Oracle Linux Server (2.6.39-400.24.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.39-400.24.1.el6uek.x86_64 ro root=/dev/mapper/vg_ol6112-lv_root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=uk
LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16  rd_NO_DM rd_LVM_LV=vg_ol6112/lv_swap rd_LVM_LV=vg_ol6112/lv_root rhgb quiet numa=off
transparent_hugepage=never
        initrd /initramfs-2.6.39-400.24.1.el6uek.x86_64.img

Oracle Linux 7 is similar, but uses GRUB2 so you need to edit the «/boot/grub2/grub.cfg» file using the command.

# grubby --default-kernel
/boot/vmlinuz-4.1.12-61.1.6.el7uek.x86_64

# grubby --args="transparent_hugepage=never" --update-kernel /boot/vmlinuz-4.1.12-61.1.6.el7uek.x86_64

# grubby --info /boot/vmlinuz-4.1.12-61.1.6.el7uek.x86_64
index=2
kernel=/boot/vmlinuz-4.1.12-61.1.6.el7uek.x86_64
args="ro vconsole.font=latarcyrheb-sun16 rd.lvm.lv=ol/swap rd.lvm.lv=ol/root crashkernel=auto  vconsole.keymap=uk rhgb quiet LANG=en_GB.UTF-8 transparent_hugepage=never"
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-61.1.6.el7uek.x86_64.img
title=Oracle Linux Server 7.2, with Unbreakable Enterprise Kernel 4.1.12-61.1.6.el7uek.x86_64

The server must be rebooted for this to take effect.

Alternatively, add the following lines into the «/etc/rc.local» file and reboot the server.

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

Whichever method you choose, remember to check the change has work after reboot.

# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise 
#

In OL7/RHEL7 you also need to consider the «tuned profile». The following script shows how to create and enable an amended version of the currently active tuned profile.

# # Check the active profile
# tuned-adm active
Current active profile: virtual-guest
#

# # Create directory to hold revised profile.
# mkdir /etc/tuned/virtual-guest-nothp

# # Create new profile based on the curren active profile.
# cat <<EOF >> /etc/tuned/virtual-guest-nothp/tuned.conf 

include= virtual-guest


transparent_hugepages=never
EOF

# # Make the script executable.
# chmod +x /etc/tuned/virtual-guest-nothp/tuned.conf 

# # Enable the new profile.
# tuned-adm profile virtual-guest-nothp

Thanks to Mor for pointing this out and directing me to the notes here and

With Transparent HugePages disabled, you should proceed to configure conventional HugePages, as described above.

Включение Hugepages в Linux

Переходим ко второму пункту — активации механизма Hugepages в Linux. Этот этап делится на 2: выделение памяти под Hugepages и монтировании специальной файловой системы hugetlbfs — интерфейса взаимодействия программ с Hugepages.

Чтобы выделить определенное количество памяти под Hugepages, следует отредактировать файл sysctl.conf — файл, содержащий параметры ядра. В одних системах, таких как Debian версии 7-8 и Ubuntu версии 12.04 он располагается в /etc (/etc/sysctl.conf), в новых системах, таких как arch linux, в /etc вы его не найдете. Его, если нет, следует создать в /etc/sysctl.d/ под именем, например, hugepages.conf. Разобравшись с sysctl, добавляем (или изменяем) следующую строчку

vm.nr_hugepages = 2048

где 2048 указывает ядру на то, сколько страниц памяти следует выделить для Hugepages. Посчитать этот пораметр не сложно. Предположим нам нужно запустить виртуальную машину, отдав ей 4 гигабайта. 4 гигабайта = 4096 мегабайт. 4096 мегабайт / 2 мегабайта на страницу (размер одной hugepage) = 2048 страниц. Это число следует указать после знака «=» для параметра «vm.nr_hugepages». Если вам потом потребуется дать виртуалке больше памяти, или запустить новую виртуалку, нужно будет количество страниц увеличивать. Само собой, объем памяти должен быть достаточным для проведения этой операции. Помните, что память, отданная под Hugepages, уже не сможет быть использована обычными программами, которые поддержки Hugepages лишены. И после каждого изменения этого параметра, вам следует перезагружать систему.

Есть вариант изменения количества Hugepages без перезагрузки. Для этого следует ввести команду

echo 2048 > /proc/sys/vm/nr_hugepages

После нажатие клавиши «Enter» ядро попытается аллоцировать 2048 больших страниц памяти, и, если количества свободной физической памяти будет достаточно, всё пройдет хорошо. Но только изменение параметра «vm.nr_hugepages» в sysctl.conf является перманентным (сохранится после перезагрузки), поэтому мы рекомендуем первый метод.

Проверить, что ядро зарезервировало необходимое количество страниц можно командой

cat /proc/meminfo | grep Huge

Вы увидите сколько всего больших страниц памяти готовы принимать данные, сколько свободно, сколько зарезервировано, размер страницы и т.д. Вполне логично, что на данном шаге число больших страниц будет равняться числу свободных больших страниц.

Следующим шагом мы должны смонтировать специальную файловую систему hugetlbfs — интерфейс взаимодействия программ с Higepages. Для начала проверьте, не смонтирована ли она уже, дистрибьютор мог и об этом позаботиться:

mount | grep huge

Если вывод непустой (в Debian версии 8 и выше эта настройка по-умолчанию) и содержит запись с указанием того, куда смонтирована hugetlbfs — данный шаг можно смело пропускать. Если нет, монтируем ФС вручную:

mount -t hugetlbfs hugetlbfs /hugepages

Само собой точка монтирования — папка /hugepages, должна существовать. А чтобы hugetlbfs монтировалась автоматически при каждой загрузке, в системах на базе init, следует добавить запись в файл /etc/fstab вида

hugetlbfs /hugepages hugetlbfs defaults 0 0

Если ваша система построена на базе SystemD, вам следует создать unit-файл типа mount следующего содержания

Description=Huge Pages File System
DefaultDependencies=no
Before=sysinit.target
ConditionPathExists=/sys/kernel/mm/hugepages
ConditionCapability=CAP_SYS_ADMIN

What=hugetlbfs
Where=/hugepages
Type=hugetlbfs

Назвать его hugepages.mount (суффикс .mount обязателен) и положить в /lib/systemd/system/

Virtualisation

Some consideration about hugepages an virtualisation.

Before enabling hugepages in a virtual machine, you should make sure the that your virtualization tool can handle it.
Whether a virtualization tools supports hugepages for it’s client or for itself are probably two different aspects.

KVM

(TODO), see:
Get a performance boost by backing your KVM guest with hugetlbfs http://www.linux-kvm.com/content/get-performance-boost-backing-your-kvm-guest-hugetlbfs (not tested)

Xen

(TODO)

It is unclear which version of Xen supports huge pages, and how it is used.

See https://wiki.xenproject.org/wiki/Huge_Page_Support

Tips and tricks

Note: See and for general tips and tricks.

Nested virtualization

This article or section needs expansion.

Nested virtualization enables existing virtual machines to be run on third-party hypervisors and on other clouds without any modifications to the original virtual machines or their networking.

On host, enable nested feature for :

# modprobe -r kvm_intel
# modprobe kvm_intel nested=1

To make it permanent (see ):

/etc/modprobe.d/kvm_intel.conf

options kvm_intel nested=1

Verify that feature is activated:

$ systool -m kvm_intel -v | grep nested

nested              = "Y"

Enable the «host passthrough» mode to forward all CPU features to the guest system:

If using QEMU, run the guest virtual machine with the following command: .
If using virt-manager, change the CPU model to (it will not be in the list, just write it in the box).
If using virsh, use and change the CPU line to

Boot VM and check if vmx flag is present:

$ grep -E --color=auto 'vmx|svm' /proc/cpuinfo

Enabling huge pages

This article or section is a candidate for merging with QEMU.

You may also want to enable hugepages to improve the performance of your virtual machine.
With an up to date Arch Linux and a running KVM you probably already have everything you need. Check if you have the directory . If not, create it.
Now we need the right permissions to use this directory. The default permission is root’s uid and gid with 0755, but we want anyone in the kvm group to have access to hugepages.

Add to your :

hugetlbfs       /dev/hugepages  hugetlbfs       mode=01770,gid=78        0 0

Of course the gid must match that of the group. The mode of allows anyone in the group to create files but not unlink or rename each other’s files. Make sure is mounted properly:

# umount /dev/hugepages
# mount /dev/hugepages
$ mount | grep huge

hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,mode=1770,gid=78)

Now you can calculate how many hugepages you need. Check how large your hugepages are:

$ grep Hugepagesize /proc/meminfo

Normally that should be 2048 kB ≙ 2 MB. Let us say you want to run your virtual machine with 1024 MB. 1024 / 2 = 512. Add a few extra so we can round this up to 550. Now tell your machine how many hugepages you want:

# echo 550 > /proc/sys/vm/nr_hugepages

If you had enough free memory you should see:

$ grep HugePages_Total /proc/meminfo

HugesPages_Total:  550

If the number is smaller, close some applications or start your virtual machine with less memory (number_of_pages x 2):

$ qemu-system-x86_64 -enable-kvm -m 1024 -mem-path /dev/hugepages -hda <disk_image>

Note the parameter. This will make use of the hugepages.

Now you can check, while your virtual machine is running, how many pages are used:

$ grep HugePages /proc/meminfo

HugePages_Total:     550
HugePages_Free:       48
HugePages_Rsvd:        6
HugePages_Surp:        0

Now that everything seems to work you can enable hugepages by default if you like. Add to your :

vm.nr_hugepages = 550

Hugepages in tmpfs/shmem¶

You can control hugepage allocation policy in tmpfs with mount option
. It can have following values:

always: Attempt to allocate huge pages every time we need a new page;
never: Do not allocate huge pages;
within_size: Only allocate huge page if it will be fully within i_size.
Also respect fadvise()/madvise() hints;
advise: Only allocate huge pages if requested with fadvise()/madvise();

The default policy is .

works fine after mount: remounting
will not attempt to break up huge pages at all, just stop more
from being allocated.

There’s also sysfs knob to control hugepage allocation policy for internal
shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount
is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or
MAP_ANONYMOUS), GPU drivers’ DRM objects, Ashmem.

In addition to policies listed above, shmem_enabled allows two further
values:

Monitoring usage¶

The number of anonymous transparent huge pages currently used by the
system is available by reading the AnonHugePages field in .
To identify what applications are using anonymous transparent huge pages,
it is necessary to read and count the AnonHugePages fields
for each mapping.

The number of file transparent huge pages mapped to userspace is available
by reading ShmemPmdMapped and ShmemHugePages fields in .
To identify what applications are mapping file transparent huge pages, it
is necessary to read and count the FileHugeMapped fields
for each mapping.

Note that reading the smaps file is expensive and reading it
frequently will incur overhead.

There are a number of counters in that may be used to
monitor how successfully the system is providing huge pages for use.

thp_fault_alloc: is incremented every time a huge page is successfully
allocated to handle a page fault. This applies to both the
first time a page is faulted and for COW faults.
thp_collapse_alloc: is incremented by khugepaged when it has found
a range of pages to collapse into one huge page and has
successfully allocated a new huge page to store the data.
thp_fault_fallback: is incremented if a page fault fails to allocate
a huge page and instead falls back to using small pages.
thp_collapse_alloc_failed: is incremented if khugepaged found a range
of pages that should be collapsed into one huge page but failed
the allocation.
thp_file_alloc: is incremented every time a file huge page is successfully
allocated.
thp_file_mapped: is incremented every time a file huge page is mapped into
user address space.
thp_split_page: is incremented every time a huge page is split into base
pages. This can happen for a variety of reasons but a common
reason is that a huge page is old and is being reclaimed.
This action implies splitting all PMD the page mapped with.
thp_split_page_failed: is incremented if kernel fails to split huge
page. This can happen if the page was pinned by somebody.
thp_deferred_split_page: is incremented when a huge page is put onto split
queue. This happens when a huge page is partially unmapped and
splitting it would free up some memory. Pages on split queue are
going to be split under memory pressure.
thp_split_pmd: is incremented every time a PMD split into table of PTEs.
This can happen, for instance, when application calls mprotect() or
munmap() on part of huge page. It doesn’t split huge page, only
page table entry.
thp_zero_page_alloc: is incremented every time a huge zero page is
successfully allocated. It includes allocations which where
dropped due race with other allocation. Note, it doesn’t count
every map of the huge zero page, only its allocation.
thp_zero_page_alloc_failed: is incremented if kernel fails to allocate
huge zero page and falls back to using small pages.
thp_swpout: is incremented every time a huge page is swapout in one
piece without splitting.
thp_swpout_fallback: is incremented if a huge page has to be split before swapout.
Usually because failed to allocate some continuous swap space
for the huge page.

As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in to help
monitor this overhead.

compact_stall: is incremented every time a process stalls to run
memory compaction so that a huge page is free for use.
compact_success: is incremented if the system compacted memory and
freed a huge page for use.
compact_fail: is incremented if the system tries to compact memory
but failed.
compact_pages_moved: is incremented each time a page is moved. If
this value is increasing rapidly, it implies that the system
is copying a lot of data to satisfy the huge page allocation.
It is possible that the cost of copying exceeds any savings
from reduced TLB misses.
compact_pagemigrate_failed: is incremented when the underlying mechanism
for moving a page failed.
compact_blocks_moved: is incremented each time memory compaction examines
a huge page aligned range of pages.

Using Huge Pages¶

If the user applications are going to request huge pages using mmap system
call, then it is required that system administrator mount a file system of
type hugetlbfs:

mount -t hugetlbfs \
      -o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\
      min_size=<value>,nr_inodes=<value> none /mnt/huge

This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
. Any file created on uses huge pages.

The and options sets the owner and group of the root of the
file system. By default the and of the current process
are taken.

The option sets the mode of root of file system to value & 01777.
This value is given in octal. By default the value 0755 is picked.

If the platform supports multiple huge page sizes, the option can
be used to specify the huge page size and associated pool.
is specified in bytes. If is not specified the platform’s
default huge page size and associated pool will be used.

The option sets the maximum value of memory (huge pages) allowed
for that filesystem (). The option can be specified
in bytes, or as a percentage of the specified huge page pool ().
The size is rounded down to HPAGE_SIZE boundary.

The option sets the minimum value of memory (huge pages) allowed
for the filesystem. can be specified in the same way as ,
either bytes or a percentage of the huge page pool.
At mount time, the number of huge pages specified by are reserved
for use by the filesystem.
If there are not enough free huge pages available, the mount will fail.
As huge pages are allocated to the filesystem and freed, the reserve count
is adjusted so that the sum of allocated and reserved huge pages is always
at least .

The option sets the maximum number of inodes that
can use.

If the , or option is not provided on
command line then no limits are set.

For , , and options, you can
use // to represent giga/mega/kilo.
For example, size=2K has the same meaning as size=2048.

While read system calls are supported on files that reside on hugetlb
file systems, write system calls are not.

Regular chown, chgrp, and chmod commands (with right permissions) could be
used to change the file attributes on hugetlbfs.

Also, it is important to note that no such mount command is required if
applications are going to use only shmat/shmget system calls or mmap with
MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see
below.

Users who wish to use hugetlb memory via shared memory segment should be
members of a supplementary group and system admin needs to configure that gid
into . It is possible for same or different
applications to use any combination of mmaps and shm* calls, though the mount of
filesystem will be required for using mmap calls without MAP_HUGETLB.

Альтернативные способы входа

Защитить учётную запись Windows безусловно нужно. Правда, вводить его каждый раз не так удобно. Разработчики Windows предлагают воспользоваться альтернативными путями.

Зайдите в «Параметры» → «Учётные записи» → «Параметры входа».
Выберите, что хотите использовать: ПИН-код или графический ключ.

Если в первом случае всё просто и понятно, то второй вариант выглядит интереснее. Вы выбираете любую картинку и придумываете для неё три разных жеста. В следующий раз, когда вы захотите разблокировать компьютер, вам нужно будет повторить эти жесты. Правда, без сенсорного экрана это делать не очень удобно.

Enabling HugeTlbPage

Currently, there is no standard way to enable HugeTLBfs, mainly because the FHS has no provision for such kind of virtual file system, see . (Fedora mounts it in /dev/hugepages/, so don’t be surprised if you find some example on the web that use this location)

Linux support «Huge page tables» (HugeTlb) is available in Debian since DebianLenny (actually, since 2.6.23). A good introduction to large pages is available from ibm.com.

Create a group for users of hugepages, and retrieve it’s GID (is this example, 2021) then add yourself to the group.
Note: this should not be needed for libvirt (see /etc/libvirt/qemu.conf)

% groupadd my-hugetlbfs

% getent group my-hugetlbfs
my-hugetlbfs:x:2021:

% adduser franklin my-hugetlbfs
Adding user `franklin' to group `my-hugetlbfs' ...
Adding user franklin to group my-hugetlbfs
Done.

Edit /etc/sysctl.conf and add this text to specify the number of pages you want to reserve (see )

# Allocate 256*2MiB for HugePageTables (YMMV)
vm.nr_hugepages = 256

# Members of group my-hugetlbfs(2021) can allocate "huge" Shared memory segment 
vm.hugetlb_shm_group = 2021

Create a mount point for the file system
```
% mkdir /hugepages
```
Add this line in /etc/fstab (The mode of 1770 allows anyone in the group to create files but not unlink or rename each other’s files.)
```
hugetlbfs /hugepages hugetlbfs mode=1770,gid=2021 0 0
```
Reboot (This is the most reliable method of allocating huge pages before the memory gets fragmented. You don’t necessarily have to reboot. You can try to run sysctl -p to apply the changes. if grep "Huge" /proc/meminfo don’t show all the pages, you can try to free the cache with sync ; echo 3 > /proc/sys/vm/drop_caches (where «3» stands for «purge pagecache, dentries and inodes») then try sysctl -p again. )

Using Huge Pages¶

If the user applications are going to request huge pages using mmap system
call, then it is required that system administrator mount a file system of
type hugetlbfs:

mount -t hugetlbfs \
      -o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\
      min_size=<value>,nr_inodes=<value> none /mnt/huge

This command mounts a (pseudo) filesystem of type hugetlbfs on the directory
. Any file created on uses huge pages.

The and options sets the owner and group of the root of the
file system. By default the and of the current process
are taken.

The option sets the mode of root of file system to value & 01777.
This value is given in octal. By default the value 0755 is picked.

The option sets the maximum number of inodes that
can use.

If the , or option is not provided on
command line then no limits are set.

For , , and options, you can
use // to represent giga/mega/kilo.
For example, size=2K has the same meaning as size=2048.

While read system calls are supported on files that reside on hugetlb
file systems, write system calls are not.

Regular chown, chgrp, and chmod commands (with right permissions) could be
used to change the file attributes on hugetlbfs.

How to allocate HugePages?

You can allocate hugepages on runtime from the command line using «». Now before making the reservation let us validate our hugepage reservation

# grep -i huge /proc/meminfo
AnonHugePages:     10240 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

So there are no reservation for hugepages, below are the available and used memory details

# free -m
              total        used        free      shared  buff/cache   available
Mem:           3790         194        3318          46         277        3314
Swap:           759           0         759

IMPORTANT NOTE:This will only work if contiguous memory is available so you may have to do this right on startup of your machine because otherwise you’ll risk running out of contiguous memory.

Let us reserve 512MB for Huge Pages

# sysctl -w vm.nr_hugepages=512
vm.nr_hugepages = 512

If you observe immediately some part of our memory which was earlier free is now not available any more and they are reserved for hugepages

# free -m
              total        used        free      shared  buff/cache   available
Mem:           3790        1220        2292          46         277        2289
Swap:           759           0         759

Validate the hugepage reservation again

# grep -i huge /proc/meminfo
AnonHugePages:     14336 kB
HugePages_Total:     512
HugePages_Free:      512
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

To make the changes permanent, add these values to . I will create a new file «» under «»

# cat /etc/sysctl.d/10-hugepages.conf
vm.nr_hugepages=512

IMPORTANT NOTE:When working with huge pages you should always notice that huge pages are no longer available as general memory so if you use it for huge pages you cannot use it for anything else.