참고 사이트


http://hwengineer.blogspot.com

http://shareithw.blogspot.kr

http://leearmykr.blogspot.com/






Procedure

  1. Download the ISO file for the Linux distribution and create a bootable USB device or DVD.
  2. Insert your USB device into the front USB port.
  3. Verify that USB device is shown as an option in Petitboot as USB: sdc:
    USB: sdc /
    		Rescue mode
    		Installcopy to clipboard
    Note:Select Rescan devices if USB device does not appear.

    If you are installing Red Hat Enterprise Linux 7.2, you must provide some additional details to the installer. Follow these steps:

    1. Record the UUID of the USB device. For example, the UUID of the USB device in the following example is 2015-10-30-11-05-03-00.
      [USB: sdb1 / 2015-10-30-11-05-03-00]
                  Rescue a Red Hat Enterprise Linux system (64-bit kernel)
                  Test this media & install Red Hat Enterprise Linux 7.2  (64-bit kernel)
               *  Install Red Hat Enterprise Linux 7.2 (64-bit kernel)copy to clipboard
    2. Select Install Red Hat Enterprise Linux 7.2 (64-bit kernel) and press e (Edit) to open the Petitboot Option Editor window.
    3. Move the cursor to the Boot arguments section and add the following information:
      inst.stage2=hd:UUID=your_UUIDcopy to clipboard
      where your_UUID is the UUID that you recorded.
      Petitboot Option Editor
                    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
      
                    Device:    ( ) sda2 [f8437496-78b8-4b11-9847-bb2d8b9f7cbd]
                                    (*) sdb1 [2015-10-30-11-05-03-00]
                                    ( ) Specify paths/URLs manually
      
                    Kernel:         /ppc/ppc64/vmlinuz
                    Initrd:         /ppc/ppc64/initrd.img
                    Device tree:
                    Boot arguments: ro inst.stage2=hd:UUID=2015-10-30-11-05-03-00    
      
                       [    OK    ]  [   Help   ]  [  Cancel  ]copy to clipboard
    4. Select OK to save your options and return to the Main menu.
  4. Select Install.
    Note:Be patient! It can sometimes take a couple minutes for the installation to begin.
  5. Follow the installation wizard for your Linux distribution to set up disk options, your user name and password, time zones, and so on. The last step is to restart your system.
  6. After the system restarts, Petitboot displays the option to boot the Linux distribution that you installed. Select this option and press Enter.

 




OS 환경 구성

 

 ● Selinux disable

 

    현재 상태 확인

[root@ac922 ~]# getenforce

Enforcing

è  Enforcingenable 상태로 보안 적용이 되어 있는 상태

 

    Selinux를 임시로 disable 적용

[root@ac922 ~]# setenforce 0

[root@ac922 ~]# getenforce

Permissive

è  Setenforce 명령어로 현재 적용되어 있는 selinuxdisable OS 재부팅시 enable로 변경됨 (임시 적용)

 

    Selinux up되지 않도록 설정

[root@ac922 ~]# vi /etc/selinux/config

 

SELINUX=enforcing

SELINUX=disabled

로 변경 후 저장

è  OS 부팅시 selinux를 사용하지 않게 변경

 

 

 ● Firewall (iptables) disable

 

 

    Firewall service 상태 확인

[root@ac922 ~]# systemctl status firewalld.service

firewalld.service - firewalld - dynamic firewall daemon

   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)

   Active: active (running) since Thu 2018-02-22 15:40:27 KST; 19h ago

     Docs: man:firewalld(1)

 Main PID: 2012 (firewalld)

   CGroup: /system.slice/firewalld.service

           └─2012 /usr/bin/python -Es /usr/sbin/firewalld --nofork –nopid

 

è  Active (running) 상태면 firewalld.service 를 내려야 함

 

    Firewall service shutdown

[root@ac922 ~]# systemctl disable firewalld.service

Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.

Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

 

è  OS 재부팅시 firewall (iptables) 서비스가 실행되지 않게 변경

 

    Firewall service shutdown

 

[root@ac922 ~]# iptables –F

è  Firewall 설정을 clear

[root@ac922 ~]# iptables –L

è  Firewall 적용내역 확인

 

 

 

 ● 인터넷이 없는 환경에서 Local repository 구성방법

 

 

    CD-ROM 을 이용한 local repository 구성

 

[root@ac922 ~]# mkdir /cdrom

[root@ac922 ~]# mount -t iso9660 /dev/sr0 /cdrom

è  OS 설치 CDROM/cdrommount

 

[root@ac922 ~]# cd /etc/yum.repos.d/

[root@ac922 yum.repos.d]# vi local.repo

 

[root@ac922 yum.repos.d]# cat local.repo

[local-repository]

name=local-repository

baseurl=file:///cdrom/

gpgcheck=0

 

    YUMrepository meta-db 업데이트

 

[root@ac922 yum.repos.d]# yum update

Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager

This system is not registered with an entitlement server. You can use subscription-manager to register.

local-repository                                                                                    | 4.1 kB  00:00:00

(1/2): local-repository/group_gz                                                                    | 129 kB  00:00:00

(2/2): local-repository/primary_db                                                                  | 3.2 MB  00:00:00

No packages marked for update

 

è  YUM repository update 실시

 

 

 

 

 

 

 ● Redhat epel 설치

-       Extra Package for Enterprise Linux repository configuration

-       기본적으로 Redhat(CentOS) 제공하는 패키지 외 extra package 사용을 원할 때 epel-release를 설치 

 

[root@ac922 ~]# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

 

[root@ac922 ~]# rpm -ihv epel-release-latest-7.noarch.rpm

 

[root@ac922 ~]# yum update

 

 

[root@ac922 ~]# yum install nmon

Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager

This system is not registered with an entitlement server. You can use subscription-manager to register.

Resolving Dependencies

--> Running transaction check

---> Package nmon.ppc64le 0:16g-3.el7 will be installed

--> Finished Dependency Resolution

 

Dependencies Resolved

 

===========================================================================================================================

 Package                    Arch                          Version                         Repository                  Size

===========================================================================================================================

Installing:

 nmon                       ppc64le                       16g-3.el7                       epel                        70 k

 

Transaction Summary

===========================================================================================================================

Install  1 Package

 

Total download size: 70 k

Installed size: 199 k

Is this ok [y/d/N]: y

 

è  테스트용 nmon 패키지 설치 

 

 

 

 

 

 ●  Redhat subscription enable


-> 당장 subscription 이 없을때 redhat.com 에서 30일 체험판 신청 후 subscrtion 등록 하면 30일간은 사용가능

 




### Operating System and Repository Setup


   1. Enable 'optional' and 'extra' repo channels


          $ sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms

          $ sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms


   2. Install packages needed for the installation


          $ sudo yum -y install wget nano bzip2


   3. Enable EPEL repo ( 위에 설치함 생략 )


          $ wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

          $ sudo rpm -ihv epel-release-latest-7.noarch.rpm


   4. Load the latest kernel


          $ sudo yum update kernel kernel-tools kernel-tools-libs kernel-bootwrapper

          $ reboot # This reboot may be deferred until after the NVIDIA steps below.


      Or do a full update


          $ sudo yum update

          $ sudo reboot # This reboot may be deferred until after the NVIDIA steps below.

 


 

 ● Power9 nvidia driver 설정




### NVIDIA Components


Before installing the NVIDIA components the udev Memory Auto-Onlining Rule must

be disabled for the CUDA driver to function properly. To disable it:


   1. Edit the /lib/udev/rules.d/40-redhat.rules file.


          $ sudo nano /lib/udev/rules.d/40-redhat.rules


   2. Comment out the following line and save the change:


          SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"


   3. Reboot the system for the changes to take effect.


          $ sudo reboot

 




The Deep Learning packages require CUDA, cuDNN, and GPU driver packages

from NVIDIA.


The required and recommended versions of these components are:


    | Component    | Required | Recommended |

    |--------------|----------|-------------|

    | CUDA Toolkit | 9.1      | 9.1.85      |

    | cuDNN        | 7.0.5    | 7.0.5       |

    | GPU Driver   | 387.36   | 387.36      |



-------------------------------------------------------------------------------------------------------------------------

 ● CUDA 9.1 & Cudnn 설치

-> cudnn은 압축 풀어서 덮어쓰기 만 하면 설치 된것 ( 따로 패키지 설치 x )



   1. Download and install NVIDIA CUDA 9.1 from [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)

      - Select *Operating System:* **Linux**

      - Select *Architecture:* **ppc64le**

      - Select *Distribution* **RHEL**

      - Select *Version* **7**

      - Select the *Installer Type* that best fits your needs

      - Follow the **Linux POWER9** installation instructions in the *CUDA Quick

        Start Guide* (linked from

        [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)),

        including the steps describing how to set up the CUDA development

        environment by updating `PATH` and `LD_LIBRARY_PATH`.


   2. Download NVIDIA cuDNN 7.0.5 for CUDA 9.1 from

      [https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn)

      (Registration in NVIDIA's Accelerated Computing Developer Program is required)


      - cuDNN v7.0.5 Library for Linux (Power8/Power9)


   3. Install the cuDNN v7.0 packages


          $ sudo tar -C /usr/local --no-same-owner -xzvf cudnn-9.1-linux-ppc64le-v7.0.5.tgz


 


 ● Anaconda 설치

- anaconda 2 와 anaconda 3 모두 설치 (아래는 anaconda2 설치 예시 - anaconda3 도 동일한 절차로 설치)
- 설치시 default 설치 위치만 변경 권장 "/root/anaconda2" 에서 "/opt/anaconda2" 로 PATH 변경



A number of the Deep Learning frameworks require Anaconda. Anaconda is a platform-agnostic data science distribution with a collection of 1,000+ open source packages with free community support.


Download and Install Anaconda. Installation requires input for license agreement, install location (default is `$HOME/anaconda2`) and permission to modify the `PATH` environment variable (via `.bashrc`).


        $ wget https://repo.continuum.io/archive/Anaconda2-5.0.0-Linux-ppc64le.sh

        $ bash Anaconda2-5.0.0-Linux-ppc64le.sh

        $ source ~/.bashrc

 



 


 ● Installing the Deep Learning Frameworks ( PowerAI 설치 )


### IBM Spectrum MPI Install



Download IBM Spectrum MPI from the ESP download site.


   1. Install the rpms

   

          $ sudo rpm -ihv ibm_smpi_lic_s-10.02.00*.ppc64le.rpm ibm_smpi-10.02.00*.ppc64le.rpm


### Software Repository Setup


IBM TensorFlow ESP for Power AC922 Deep Learning packages are distributed in an rpm file and is available from the ESP download site.

Installing the rpm creates an installation repository on the local machine.


   1. Install the repository package:


          $ sudo rpm -ihv mldl-repo-local*.rpm


### Installing all frameworks at once


All the Deep Learning frameworks can be installed at once using the

`power-mldl` meta-package:


        $ sudo yum install power-mldl-esp



### Installing frameworks individually  

 -> 위에 power-mldl-esp 로 설치 했으면 SKIP!!!!!!


The Deep Learning frameworks can be installed individually if preferred.

The framework packages are:


   - `tensorflow` - Google TensorFlow, v1.4.0

   - `tensorboard` - Web Applications for inspecting TensorFlow runs and graphs, v0.4.0rc3

   - `ddl-tensorflow` - Distributed Deep Learning custom operator for TensorFlow


Each can be installed with:


        $ sudo yum install <framework>-cuda9.1


### Accept the License Agreement


Read the license agreements and accept the terms and conditions before using Spectrum MPI or any of the frameworks.


        $ sudo IBM_SPECTRUM_MPI_LICENSE_ACCEPT=no /opt/ibm/spectrum_mpi/lap_se/bin/accept_spectrum_mpi_license.sh

        $ sudo /opt/DL/license/bin/accept-powerai-license.sh


After reading the license agreements, future installs may be automated to silently accept the license agreements.


        $ sudo IBM_SPECTRUM_MPI_LICENSE_ACCEPT=yes /opt/ibm/spectrum_mpi/lap_se/bin/accept_spectrum_mpi_license.sh

        $ sudo IBM_POWERAI_LICENSE_ACCEPT=yes /opt/DL/license/bin/accept-powerai-license.sh


 


 ● PowerAI 환경을 위한 OS 튜닝 




## Tuning Recommendations


Recommended settings for optimal Deep Learning performance on the IBM Power System AC922 are:


   - Enable Performance Governor


         $ sudo yum install kernel-tools

         $ sudo cpupower -c all frequency-set -g performance


   - Enable GPU persistence mode

      

         $ sudo systemctl enable nvidia-persistenced

         $ sudo systemctl start nvidia-persistenced


   - For TensorFlow, set the SMT mode


         $ sudo ppc64_cpu --smt=4


   - For TensorFlow with DDL, set the SMT mode


         $ sudo ppc64_cpu --smt=2



## Getting Started with MLDL Frameworks


### General Setup


Most of the PowerAI packages install outside the normal system search

paths (to `/opt/DL/...`), so each framework package provides a shell

script to simplify environmental setup (e.g. `PATH`, `LD_LIBRARY_PATH`,

`PYTHONPATH`).


We recommend users update their shell rc file (e.g. `.bashrc`) to source

the desired setup scripts. For example:


    $ source /opt/DL/<framework>/bin/<framework>-activate


Each framework also provides a test script to verify basic function:


    $ <framework>-test


### Note about dependencies


A number of the PowerAI frameworks (for example, TensorFlow, and TensorBoard)

have their dependencies satisfied via Anaconda packages.  These dependencies are validated

by the `<framework>-activate` script to ensure they are installed and, if not, the script will fail.


For these frameworks, the `/opt/DL/<framework>/bin/install_dependencies` script must be run

prior to activation to install the required packages.


For example:


    $ source /opt/DL/tensorflow/bin/tensorflow-activate

    Missing dependencies ['backports.weakref', 'mock', 'protobuf']

    Run "/opt/DL/tensorflow/bin/install_dependencies" to resolve this problem.


    $ /opt/DL/tensorflow/bin/install_dependencies

    Fetching package metadata ...........

    Solving package specifications: .


    Package plan for installation in environment /home/rhel/anaconda2:


    The following NEW packages will be INSTALLED:


        backports.weakref: 1.0rc1-py27_0

        libprotobuf:       3.4.0-hd26fab5_0

        mock:              2.0.0-py27_0

        pbr:               1.10.0-py27_0

        protobuf:          3.4.0-py27h7448ec6_0


    Proceed ([y]/n)? y


    libprotobuf-3. 100% |###############################| Time: 0:00:02   2.04 MB/s

    backports.weak 100% |###############################| Time: 0:00:00  12.83 MB/s

    protobuf-3.4.0 100% |###############################| Time: 0:00:00   2.20 MB/s

    pbr-1.10.0-py2 100% |###############################| Time: 0:00:00   3.35 MB/s

    mock-2.0.0-py2 100% |###############################| Time: 0:00:00   3.26 MB/s


    $ source /opt/DL/tensorflow/bin/tensorflow-activate

    $


 




 ● Getting Started with Tensorflow

- TF 기본 테스트



The TensorFlow homepage

([https://www.tensorflow.org/](https://www.tensorflow.org/)) has a

variety of information, including Tutorials, How Tos, and a Getting

Started guide.


Additional tutorials and examples are available from the community, for

example:


   - [https://github.com/nlintz/TensorFlow-Tutorials](https://github.com/nlintz/TensorFlow-Tutorials)

   - [https://github.com/aymericdamien/TensorFlow-Examples](https://github.com/aymericdamien/TensorFlow-Examples)


#### Distributed Deep Learning (DDL) Custom Operator for TensorFlow


IBM TensorFlow ESP for Power AC922 includes a Technology Preview of the

IBM PowerAI Distributed Deep Learning (DDL) custom operator for TensorFlow.

The DDL custom operator uses IBM Spectrum MPI and NCCL to provide high-speed

communications for distributed TensorFlow.


The DDL custom operator can be found in the `ddl-tensorflow` package.

For more information about DDL and about the TensorFlow operator, see:


   - `/opt/DL/ddl-doc/doc/README.md`

   - `/opt/DL/ddl-tensorflow/doc/README.md`

   - `/opt/DL/ddl-tensorflow/doc/README-API.md`


The DDL TensorFlow operator makes it easy to enable models

for distribution. The package includes examples of models enabled

with DDL including TensorFlow High Performance and Slim models:


    $ source /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-activate


    $ ddl-tensorflow-install-samples <somedir>


The Slim model examples are based on a specific commit of the TensorFlow models

repo with a small adjustment. If you prefer to work from an upstream clone,

rather than the packaged examples:


    $ git clone https://github.com/tensorflow/models.git

    $ cd models

    $ git checkout 11883ec6461afe961def44221486053a59f90a1b

    $ git revert fc7342bf047ec5fc7a707202adaf108661bd373d

    $ cp /opt/DL/ddl-tensorflow/examples/slim/train_image_classifier.py slim/


#### Additional TensorFlow Features


The PowerAI TensorFlow packages include TensorBoard. See:

[https://www.tensorflow.org/get_started/summaries_and_tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard)


The TensorFlow 1.4.0 package includes support for additional features:


   - HDFS

   - NCCL

   - experimental XLA JIT compilation

     (see [https://www.tensorflow.org/performance/xla/](https://www.tensorflow.org/performance/xla/))


## Uninstalling MLDL Frameworks


The MLDL Framework packages can be uninstalled individually the same way they were installed. In order to uninstall all MLDL packages and the repo used to install them run:


    $ sudo yum remove powerai-license

    $ sudo yum remove mldl-repo-local-esp


 



 ●  Docker / Nvidia-Docker 설치


참고 -> http://hwengineer.blogspot.com/2018/02/ppc64le-docker-nvidia-docker-repository.html



1. Redhat 환경의 경우

[root@ac922 ~]# cat /etc/yum.repos.d/docker.repo
[docker]
name=Docker
baseurl=http://ftp.unicamp.br/pub/ppc64el/rhel/7/docker-ppc64el/
enabled=1
gpgcheck=0

 

# systemctl enable docker.service
# systemctl start docker.service
-> Docker 서비스 시작!

 

# docker images

-> Docker 실행 확인

 

///////////////////////////////////////////////////////////////////////////////

 

[이하 nvidia-docker]


RHEL-based distributions


# distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
# curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo

-> Nvidia-docker repository 업데이트

 

# yum update

# yum install nvidia-docker

 

# nvidia-docker

-> nvidia-docker 설치 확인

 

# /usr/bin/nvidia-docker-plugin &

-> nvidia-docker 사용을 위해 plugin 을 background로 실행

-> 재부팅후에도 background 로 실행하기 위해 rc.local 에 등록

 

# vi /etc/rc.d/rc.local

/usr/bin/nvidia-docker-plugin &

추가해줌

 

# chmod +x /etc/rc.d/rc.local

-> rc.local에 실행권한 필수

 

 

 

 


 ● TF 1.5 버전 설치 시 참조


http://hwengineer.blogspot.com/2018/04/ac922-redhat-74-python-36-tensorflow.html


 ●  AC922 에서 nvidia-smi 시 unknown 에러 발생시 추가 작업 및 설정 확인


https://hwengineer.blogspot.com/2018/04/ac922-cuda-91.html



 ●  인터넷 안되는 환경에서 -> Redhat Subscrtion-manager DISABLE !!!!!!!!!!!!!!!!!!!!!!!!



Disabling the Subscription-Manager Repository

When a system is registered using Subscription-Manager, the rhsmcertd process creates a special yum repository — redhat.repo. As “Enabling Supplementary and Optional Repositories” describes, as the system adds subscriptions, the product channels are added to the redhat.repo file.


Maintaining a redhat.repo file may not be desirable in some environments. It can create static in content management operations if that repository is not the one actually used for subscriptions, such as for a disconnected system or a system using a local content mirror.


This default redhat.repo repository can be disabled by editing the Subscription-Manager configuration and setting the manage_repos value to zero (0).


Raw

[root@server ~]# subscription-manager config --rhsm.manage_repos=0 



///////////// 이하 테스트  ///////////////

 

 

 

newwell 설치 정리

 

- Extra Packages for Enterprise Linux (EPEL)
-> 추가 패키지 repository 등록

 

redhat7 버전 repository 설치

# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# rpm -ivh epel-release-latest-7.noarch.rpm
# yum update

# yum repolist




- cuda 9.1

# rpm -ivh cuda-repo-ubuntu1604-9-0-local_9.0.176-1_ppc64el.deb

# yum install cuda

 


- cudnn 7.0 for cuda 9.1

# wget http://developer2.download.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.1_20171129/cudnn-9.1-linux-ppc64le-v7.tgz?9GVxLevnEbiZ58fRLwXMF4dcgjWPoUHm1vfRDm_87tF5yDIjNeOyAV5vZwaygOrMjgXlVlAeEPaB9CL2oPbggLw08gUYN8xq62eGOwbacmvE9X7Lyvdp7_yqzQQCMfyfGjHH40qyjLlMwt3l4CypdNdCtw4XyBRQdpOdUI8k5eAylpHnPnngkIcE9-ReD70rYBM50Oi75p75itEl

# mv http://developer2.download.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.1_20171129/cudnn-9.1-linux-ppc64le-v7.tgz?9GVxLevnEbiZ58fRLwXMF4dcgjWPoUHm1vfRDm_87tF5yDIjNeOyAV5vZwaygOrMjgXlVlAeEPaB9CL2oPbggLw08gUYN8xq62eGOwbacmvE9X7Lyvdp7_yqzQQCMfyfGjHH40qyjLlMwt3l4CypdNdCtw4XyBRQdpOdUI8k5eAylpHnPnngkIcE9-ReD70rYBM50Oi75p75itEl cudnn-9.1-linux-ppc64le-v7.solitairetheme8

# tar -xf cudnn-9.1-linux-ppc64le-v7.solitairetheme8

cd cuda/tagets/ppc64le-linux
# scp -rp include/ /usr/local/cuda-9.1/targets/ppc64le-linux/
# scp -rp lib/ /usr/local/cuda-9.1/targets/ppc64le-linux/

 

- Bazel 9.0 설치

# unzip bazel-0.9.0-dist.zip

# yum install *jdk*

# compile.sh

# scp -rp output/bazel /usr/local/bin/


- protobuf 설치

yum install autoconf automake libtool

git clone https://github.com/google/protobuf

 

- Tensorflow 1.4

# yum install -y git patch python-pip python-wheel numpy

# git clone --recurse-submodules https://github.com/tensorflow/tensorflow
# cd tensorflow
# git checkout master
# ./configure

# bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

+ Recent posts