Машинное обучение и Большие данные: различия между версиями

Материал из Artem Aleksashkin's Wiki
Перейти к навигации Перейти к поиску
Нет описания правки
Строка 1: Строка 1:
[[Файл:Ai-brain.jpg|400px]]
[[Файл:Ai-brain.jpg|400px]]


= CUDA + Tensorflow Installation =
= Software installation =


Last versions:
* Anaconda - https://www.anaconda.com/products/individual
 
* Check all versions here: https://www.tensorflow.org/install/source
 
<pre>
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.6.1/local_installers/cuda-repo-ubuntu2004-11-6-local_11.6.1-510.47.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.1-510.47.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
</pre>
 
== Issue Could not load dynamic library 'libcudart.so.11.0' ==
<pre>
Python 3.8.8 (default, Apr 13 2021, 19:58:26)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-03-24 01:07:07.010171: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-24 01:07:07.010189: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
</pre>
 
FIX HERE -> https://github.com/tensorflow/tensorflow/issues/45930#issuecomment-770342299
 
= Hardware =
 
* Lenovo x230 + eGPU
** CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
** MEM: 16 GB, with eGPU works only 8 GB
** [https://aliexpress.ru/item/32983647923.html Expresscard V8.0 EXP GDC Beast PCIe PCI-E]
** Блок питания на 350-600 ватт
** Nvidia GeForce 760 4gb
** [https://egpu.io/forums/builds/thinkpad-x230-express-card-2-0-5-gt-s-windows-10-by-boelly/ Similar setup]
** [https://egpu.io/forums/expresscard-mpcie-m-2-adapters/mpcieecngff-m2-resolving-detection-bootup-and-stability-problems/ Troubleshooting]
** 16 GB of mem will produce lags. Remove 1 stick of mem to 8 GB
** Be sure that you GPU conneted to power fully(8+6 or 8+8) - it can produce 43 error
** [https://www.youtube.com/watch?v=p59MNoqWY9c eGPU setup Lenovo Thinkpad x230 with GTX 760 Part 1 ( setup )]
** [https://www.youtube.com/watch?v=xJsHLTCo9Ho eGPU setup Lenovo Thinkpad x230 with GTX 760 Part 2 ( Fixing Error 12)]
** [https://www.youtube.com/watch?v=qOoY30pubBg eGPU setup Lenovo Thinkpad x230 with GTX 760 Part 3 ( Gameplay )]
** In Windows go to Control panel, Hardware setup, Nvidia settings, 3d graphics, There you can select default video adapter
*** But it won't help to use GPU in games - you need to connect external screen to GPU and disable laptop screen. Then games will run on eGPU.
 
= Software =
* [https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html NVIDIA CUDA Installation Guide for Linux]
<pre>
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda-repo-ubuntu2004-11-2-local_11.2.2-460.32.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-2-local_11.2.2-460.32.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-2-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda nvidia-cuda-toolkit
</pre>
* [https://www.tensorflow.org/install/gpu TensorFlow for GPU]
* [https://developer.nvidia.com/rdp/cudnn-download cuDNN SDK]
* [https://developer.nvidia.com/nvidia-tensorrt-7x-download TensorRT]
* [https://towardsdatascience.com/installing-tensorflow-gpu-in-ubuntu-20-04-4ee3ca4cb75d Installing TensorFlow GPU in Ubuntu 20.04]
* https://developer.nvidia.com/cuda-gpus
== Change Default Python ==
<pre>
sudo update-alternatives --install /usr/bin/python python /usr/bin/python2 1
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 2
sudo update-alternatives --config python
</pre>
== Define Your Software Versions ==
* Ubuntu 20.04. Kernel 5.4.0-70-generic
** All downloads I took for ubuntu 18.04.
* '''Nvidia GeForce GTX 760 4gb''' -> '''Nvidia Kepler'''
* Nvidia Kepler -> CUDA SDK 10.0 – 10.2 support for compute capability 3.0 – 7.5 ('''Kepler''', Maxwell, Pascal, Volta, Turing). Last version with support for compute capability 3.x (Kepler). 10.2 is the last official release for macOS, as support will not be available for macOS in newer releases.
* Check all possible TensorFlow and Cuda versions here: https://www.tensorflow.org/install/source#gpu
* For me - '''tenorflow-2.3.2''', '''cuda 10.2''', '''nvidia-440.33.0''', '''cuDNN 7.6''', '''Bazel 3.1.0''', '''GCC 7.5.0''', '''TensorRT 6.0'''
** Cuda 10.1 won't install due 418 driver is not comportable with new kernel 5.4.0-70-generic
 
== Installation ==
* https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal
<pre>
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
</pre>
* https://developer.nvidia.com/rdp/cudnn-archive -> Download cuDNN v7.6.5 (November 18th, 2019), for CUDA 10.2
** https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.2_20191118/Ubuntu18_04-x64/libcudnn7_7.6.5.32-1%2Bcuda10.2_amd64.deb
** https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.2_20191118/Ubuntu18_04-x64/libcudnn7-dev_7.6.5.32-1%2Bcuda10.2_amd64.deb
** https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.2_20191118/Ubuntu18_04-x64/libcudnn7-doc_7.6.5.32-1%2Bcuda10.2_amd64.deb
<pre>
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb
</pre>
* https://developer.nvidia.com/nvidia-tensorrt-6x-download
** Only 6 or you'll get error on configure: Could not find any NvInferVersion.h matching version '6' in any subdirectory
<pre>
sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.2-trt6.0.1.8-ga-20191108_1-1_amd64.deb
sudo apt-get update
sudo apt-key add /var/nv-tensorrt-repo-cuda10.2-trt6.0.1.8-ga-20191108/7fa2af80.pub
sudo apt-get install tensorrt
</pre>
* ccache won't work. Do not install it or disable. Bazel has built in cache
** Or you'll ger error on build: C++ compilation of rule '@com_google_protobuf//:protobuf' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf/descriptor_database.d ... (remaining 48 argument(s) skipped) ccache: error: invalid size: D
* GCC 7
** Only 7 or you'll get error on build: # 138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
<pre>
sudo apt install gcc-7 g++-7
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 100 --slave /usr/bin/g++ g++ /usr/bin/g++-9 --slave /usr/bin/gcov gcov /usr/bin/gcov-9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 90 --slave /usr/bin/g++ g++ /usr/bin/g++-7 --slave /usr/bin/gcov gcov /usr/bin/gcov-7
sudo update-alternatives --config gcc # gcc (Ubuntu 7.5.0-6ubuntu2) 7.5.0
</pre>
* https://docs.bazel.build/versions/master/install-ubuntu.html
<pre>
sudo apt install apt-transport-https curl gnupg
curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg
sudo mv bazel.gpg /etc/apt/trusted.gpg.d/
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
sudo apt update
sudo apt install bazel-3.2.0
sudo ln -s /usr/bin/bazel-3.2.0 /usr/bin/bazel
bazel --version  # 3.2.0
</pre>
* https://github.com/tensorflow/tensorflow/issues/40688
* https://github.com/tensorflow/tensorflow/pull/40654
<pre>
ERROR: /home/artem/ml/tensorflow/tensorflow/python/BUILD:501:11: '''C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1)'''
tensorflow/python/lib/core/bfloat16.cc: In function ‘bool tensorflow::{anonymous}::Initialize()’:
tensorflow/python/lib/core/bfloat16.cc:664:36: error: no match for call to ‘(tensorflow::{anonymous}::Initialize()::<lambda(const char*, PyUFuncGenericFunction, const std::array<int, 3>&)>) (const char [6], <unresolved overloaded function type>, const std::array<int, 3>&)’
                      compare_types)) {
                                    ^
 
wget https://github.com/tensorflow/tensorflow/commit/782c2be595a8920019e6259fb12d876d4895a4a5.patch
cat 782c2be595a8920019e6259fb12d876d4895a4a5.patch | patch -p1
</pre>
* You have to install numpy 1.20 to prevent this error
* https://stackoverflow.com/questions/49756080/opencv-numpy-issue-module-compiled-against-api-version-x-but-this-version-of-n
<pre>
ERROR: /home/artem/ml/tensorflow/tensorflow/python/keras/api/BUILD:137:1: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2 failed (Aborted): bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)
2021-04-02 08:03:25.855138: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
 
pip install numpy==1.20
</pre>
* '''Build it'''
* https://www.tensorflow.org/install/source
** You have to build it from source due this error: 2021-03-29 00:19:23.703486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1657] Ignoring visible gpu device (device: 0, name: GeForce GTX 760, pci bus id: 0000:04:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
** In 3.0 won't work XLA: https://www.tensorflow.org/xla
** Be patient: compilation could take long time. May be some hours.
** Close all proccess to have enough free memory.
** https://github.com/tensorflow/tensorflow/issues/46653
** add config to
<pre>
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout r2.3
./configure
# edit .tf_configure.bazelrc
# --REMOVE--
build --config=xla
# --ADD--
build --define=with_xla_support=false
build --action_env TF_ENABLE_XLA=0
# ----
bazel build --config=cuda --config=opt --copt=-DTF_EXTRA_CUDA_CAPABILITIES=3.0 //tensorflow/tools/pip_package:build_pip_package
</pre>
* '''Wait for success'''
<pre>
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1:
2021-04-22 03:01:58.505888: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
INFO: From Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2:
2021-04-22 03:01:58.501818: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
INFO: From Executing genrule //tensorflow:tf_python_api_gen_v2:
2021-04-22 03:01:58.601217: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 338.340s, Critical Path: 32.48s
INFO: 203 processes: 203 local.
INFO: Build completed successfully, 252 total actions
</pre>
* '''Install'''
<pre>
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-2.3.2-cp38-cp38-linux_x86_64.whl
</pre>
 
* To prevent this error just leave tensorflow source folder
<pre>
artem@ThinkPad-X230:~/ml/tensorflow$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/artem/ml/tensorflow/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/artem/ml/tensorflow/tensorflow/python/__init__.py", line 40, in <module>
    from tensorflow.python.eager import context
  File "/home/artem/ml/tensorflow/tensorflow/python/eager/context.py", line 32, in <module>
    from tensorflow.core.framework import function_pb2
ImportError: cannot import name 'function_pb2' from 'tensorflow.core.framework' (unknown location)
>>>
</pre>
* '''Test it'''
<pre>
artem@ThinkPad-X230:~/ml$ python
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-04-22 13:15:11.586291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
>>> tf.__version__
'2.3.2'
>>> tf.test.is_built_with_cuda()
True
</pre>
 
== Testing ==
<pre>
>>> import tensorflow as tf
>>> tf.__version__
'2.3.0'
>>> tf.test.is_built_with_cuda()
True
</pre>
 
<pre>
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
</pre>
 
<pre>
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices("GPU")
2021-03-29 00:19:23.520023: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-29 00:19:23.575281: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-29 00:19:23.575801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:04:00.0 name: GeForce GTX 760 computeCapability: 3.0
coreClock: 1.15GHz coreCount: 6 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 179.05GiB/s
2021-03-29 00:19:23.576789: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-29 00:19:23.581937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-29 00:19:23.583502: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-29 00:19:23.585776: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-29 00:19:23.591336: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-29 00:19:23.593271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-29 00:19:23.701034: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-29 00:19:23.701468: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-29 00:19:23.702637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-29 00:19:23.703486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1657] Ignoring visible gpu device (device: 0, name: GeForce GTX 760, pci bus id: 0000:04:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
[]
</pre>
* https://medium.com/@mccann.matt/compiling-tensorflow-with-cuda-3-0-support-42d8fe0bf3b5
<pre>
git clone https://github.com/tensorflow/tensorflow.git
cd ./tensorflow
git checkout r2.2
 
sudo apt install apt-transport-https curl gnupg
curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg
sudo mv bazel.gpg /etc/apt/trusted.gpg.d/
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
sudo apt update && sudo apt install bazel-2.0.0
</pre>


= Курсы =
= Курсы =

Версия от 02:05, 24 марта 2022

Ai-brain.jpg

Software installation

Курсы

Большие данные

Методы

  • Теорема Байеса
  • Функции ошибки и регуляризация
  • Расстояние Кульбака-Лейблера и перекрестная энтропия
  • Градиентный спуск: основы
  • Граф вычислений и дифференцирование на нем
  • Перцептрон
  • Глубокие нейронные сети
  • Классификация
  • Кластеризация
  • Регрессия
  • Машинное зрение
  • Метод к-средних
  • word2vec

Библиотеки

Датасеты

NLP

Железо и драйверы

Темы

Face Recognition

Speech Recognition

Image Object Recognition

Anomaly Detection

Prediction

StereoVision

Anaconda

Некоторые полезные ресурсы