{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# GPT-2 private inference with SPU\n", "\n", "In lab [Neural Network with SPU](./nn_with_spu.ipynb), we have demonstrated how to use SecretFlow/SPU to train a Neural Network model privately.\n", "\n", "In this lab, we showcase how to run private inference on a pre-trained [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) model for text generation with SPU.\n", "\n", "First, we show how to use JAX and the Hugging Face Transformers library for text generation with the pre-trained GPT-2 model. After that, we show how to use SPU for private text generation with minor modifications to the plaintext counterpart. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">The following codes are demos only. It's **NOT for production** due to system security concerns, please **DO NOT** use it directly in production." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This tutorial may need more resources than 16c48g." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Text generation using GPT-2 with JAX/FLAX\n", "### Install the transformers library" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install transformers[flax]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">The JAX version required by transformers is not satisfied with SPU. But it's ok to run with the conflicted JAX with SPU in this example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the pre-trained GPT-2 Model\n", "\n", "Please refer to this [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/gpt2) for more details." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n" ] } ], "source": [ "from transformers import AutoTokenizer, FlaxGPT2LMHeadModel, GPT2Config\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n", "pretrained_model = FlaxGPT2LMHeadModel.from_pretrained(\"gpt2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the text generation function\n", "\n", "\n", "We use a [greedy search strategy](https://huggingface.co/blog/how-to-generate) for text generation here." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def text_generation(input_ids, params):\n", " config = GPT2Config()\n", " model = FlaxGPT2LMHeadModel(config=config)\n", "\n", " for _ in range(10):\n", " outputs = model(input_ids=input_ids, params=params)\n", " next_token_logits = outputs[0][0, -1, :]\n", " next_token = jnp.argmax(next_token_logits)\n", " input_ids = jnp.concatenate([input_ids, jnp.array([[next_token]])], axis=1)\n", " return input_ids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run text generation on CPU" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-06-15 17:07:55.627043: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "2023-06-15 17:07:55.627112: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "2023-06-15 17:07:55.627118: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "-----------------------------------------------------------------\n", "Run on CPU:\n", "-----------------------------------------------------------------\n", "I enjoy walking with my cute dog, but I'm not sure if I'll ever\n", "-----------------------------------------------------------------\n" ] } ], "source": [ "import jax.numpy as jnp\n", "\n", "inputs_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='jax')\n", "outputs_ids = text_generation(inputs_ids, pretrained_model.params)\n", "\n", "print('-' * 65 + '\\nRun on CPU:\\n' + '-' * 65)\n", "print(tokenizer.decode(outputs_ids[0], skip_special_tokens=True))\n", "print('-' * 65)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we generate 10 tokens. Keep the generated text in mind, we are going to generate text on SPU in the next step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Run text generation on SPU" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2023-06-15 17:08:14,157\tINFO worker.py:1538 -- Started a local Ray instance.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(pid=2109508)\u001b[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n", "\u001b[2m\u001b[36m(pid=2109408)\u001b[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n", "\u001b[2m\u001b[36m(pid=2121303)\u001b[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n", "\u001b[2m\u001b[36m(pid=2121304)\u001b[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n", "\u001b[2m\u001b[36m(pid=2121301)\u001b[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n", "\u001b[2m\u001b[36m(_run pid=2109408)\u001b[0m INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: \n", "\u001b[2m\u001b[36m(_run pid=2109408)\u001b[0m INFO:absl:Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: \"cuda\". Available platform names are: Interpreter Host\n", "\u001b[2m\u001b[36m(_run pid=2109408)\u001b[0m INFO:absl:Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.\n", "\u001b[2m\u001b[36m(_run pid=2109408)\u001b[0m WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n", "\u001b[2m\u001b[36m(_run pid=2109508)\u001b[0m INFO:absl:Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: \n", "\u001b[2m\u001b[36m(_run pid=2109508)\u001b[0m INFO:absl:Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: \"cuda\". Available platform names are: Interpreter Host\n", "\u001b[2m\u001b[36m(_run pid=2109508)\u001b[0m INFO:absl:Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.\n", "\u001b[2m\u001b[36m(_run pid=2109508)\u001b[0m WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(_run pid=2109408)\u001b[0m [2023-06-15 17:08:24.221] [info] [thread_pool.cc:30] Create a fixed thread pool with size 127\n" ] } ], "source": [ "import secretflow as sf\n", "\n", "# In case you have a running secretflow runtime already.\n", "sf.shutdown()\n", "\n", "sf.init(['alice', 'bob', 'carol'], address='local')\n", "\n", "alice, bob = sf.PYU('alice'), sf.PYU('bob')\n", "conf = sf.utils.testing.cluster_def(['alice', 'bob', 'carol'])\n", "conf['runtime_config']['fxp_exp_mode'] = 1\n", "conf['runtime_config']['experimental_disable_mmul_split'] = True\n", "spu = sf.SPU(conf)\n", "\n", "\n", "def get_model_params():\n", " pretrained_model = FlaxGPT2LMHeadModel.from_pretrained(\"gpt2\")\n", " return pretrained_model.params\n", "\n", "\n", "def get_token_ids():\n", " tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n", " return tokenizer.encode('I enjoy walking with my cute dog', return_tensors='jax')\n", "\n", "\n", "model_params = alice(get_model_params)()\n", "input_token_ids = bob(get_token_ids)()\n", "\n", "device = spu\n", "model_params_, input_token_ids_ = model_params.to(device), input_token_ids.to(device)\n", "\n", "output_token_ids = spu(text_generation)(input_token_ids_, model_params_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check the SPU output\n", "\n", "As you can see, it's very easy to run GPT-2 inference on SPU. Now let's reveal the generated text from SPU program." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(_spu_compile pid=2109408)\u001b[0m 2023-06-15 17:09:12.722333: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(_spu_compile pid=2109408)\u001b[0m 2023-06-15 17:09:12.722414: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/devtoolset-11/root/usr/lib64:/opt/rh/devtoolset-11/root/usr/lib:/opt/rh/devtoolset-11/root/usr/lib64/dyninst:/opt/rh/devtoolset-11/root/usr/lib/dyninst\n", "\u001b[2m\u001b[36m(_spu_compile pid=2109408)\u001b[0m 2023-06-15 17:09:12.722421: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(SPURuntime(device_id=None, party=bob) pid=2121303)\u001b[0m 2023-06-15 17:09:32.011 [info] [thread_pool.cc:ThreadPool:30] Create a fixed thread pool with size 127\n", "\u001b[2m\u001b[36m(SPURuntime(device_id=None, party=alice) pid=2121301)\u001b[0m 2023-06-15 17:09:32.011 [info] [thread_pool.cc:ThreadPool:30] Create a fixed thread pool with size 127\n", "\u001b[2m\u001b[36m(SPURuntime(device_id=None, party=carol) pid=2121304)\u001b[0m 2023-06-15 17:09:32.011 [info] [thread_pool.cc:ThreadPool:30] Create a fixed thread pool with size 127\n", "-----------------------------------------------------------------\n", "Run on SPU:\n", "-----------------------------------------------------------------\n", "I enjoy walking with my cute dog, but I'm not sure if I'll ever\n", "-----------------------------------------------------------------\n" ] } ], "source": [ "outputs_ids = sf.reveal(output_token_ids)\n", "print('-' * 65 + '\\nRun on SPU:\\n' + '-' * 65)\n", "print(tokenizer.decode(outputs_ids[0], skip_special_tokens=True))\n", "print('-' * 65)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, the generated text from SPU is exactly same as the generated text from CPU!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the end of the lab." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15" }, "vscode": { "interpreter": { "hash": "db45a4cb4cd37a8de684dfb7fcf899b68fccb8bd32d97c5ad13e5de1245c0986" } } }, "nbformat": 4, "nbformat_minor": 2 }