SplitRec:在隐语拆分学习中使用流水线并行#

以下代码仅作为示例,请勿在生产环境直接使用。

本示例基于基于“拆分学习:银行营销”教程制作,建议先观看那个教程。

在拆分学习中,由于模型被拆分在多个设备当中,进行训练的时候,各方需要对中间结果和梯度进行多次传输,计算和网络通信存在大量 idle 时间,我们在隐语中参考论文《PipeLearn: Pipeline Parallelism for Collaborative Machine Learnin》,利用隐语底座 RayFed 的任务调度能力,实现了流水线并行,使得计算和通信能够交叠隐藏部分计算时间,提高资源利用率。由于计算和通信的并发执行,使用流水线并行可能会带来模型准确性上的损失,用户可以根据实际场景平衡性能和精度。

下面我们通过一个例子来看一下如何使用在隐语拆分学习中使用流水线并行。

环境设置#

首先,我们在 secretflow 环境中创造 2 个实体 alice 和 bob。

[1]:
import secretflow as sf

sf.shutdown()
sf.init(['alice', 'bob'], address='local')
alice, bob = sf.PYU('alice'), sf.PYU('bob')
2023-09-26 19:49:23,600 INFO worker.py:1538 -- Started a local Ray instance.

准备数据#

接下来我们准备要学习的数据。

我们使用“拆分学习:银行营销”中的数据准备和处理方法,下载银行营销数据集并进行处理。alice 和 bob 的角色和之前的教程完全相同:

[2]:
from secretflow.utils.simulation.datasets import load_bank_marketing
from secretflow.preprocessing.scaler import MinMaxScaler
from secretflow.preprocessing.encoder import LabelEncoder
from secretflow.data.split import train_test_split

random_state = 1234

data = load_bank_marketing(parts={alice: (0, 4), bob: (4, 16)}, axis=1)
label = load_bank_marketing(parts={alice: (16, 17)}, axis=1)

encoder = LabelEncoder()
data['job'] = encoder.fit_transform(data['job'])
data['marital'] = encoder.fit_transform(data['marital'])
data['education'] = encoder.fit_transform(data['education'])
data['default'] = encoder.fit_transform(data['default'])
data['housing'] = encoder.fit_transform(data['housing'])
data['loan'] = encoder.fit_transform(data['loan'])
data['contact'] = encoder.fit_transform(data['contact'])
data['poutcome'] = encoder.fit_transform(data['poutcome'])
data['month'] = encoder.fit_transform(data['month'])
label = encoder.fit_transform(label)

scaler = MinMaxScaler()
data = scaler.fit_transform(data)

train_data, test_data = train_test_split(
    data, train_size=0.8, random_state=random_state
)
train_label, test_label = train_test_split(
    label, train_size=0.8, random_state=random_state
)

定义模型结构#

接下来我们创建联邦模型,同样地,我们使用“拆分学习:银行营销”中的建模,构建出 base_model 和 fuse_model,然后就可以定义 SLModel 用于训练:

[3]:
def create_base_model(input_dim, output_dim, name='base_model'):
    # Create model
    def create_model():
        from tensorflow import keras
        from tensorflow.keras import layers
        import tensorflow as tf

        model = keras.Sequential(
            [
                keras.Input(shape=input_dim),
                layers.Dense(100, activation="relu"),
                layers.Dense(output_dim, activation="relu"),
            ]
        )
        # Compile model
        model.summary()
        model.compile(
            loss='binary_crossentropy',
            optimizer='adam',
            metrics=["accuracy", tf.keras.metrics.AUC()],
        )
        return model

    return create_model


# prepare model
hidden_size = 64

model_base_alice = create_base_model(4, hidden_size)
model_base_bob = create_base_model(12, hidden_size)


def create_fuse_model(input_dim, output_dim, party_nums, name='fuse_model'):
    def create_model():
        from tensorflow import keras
        from tensorflow.keras import layers
        import tensorflow as tf

        # input
        input_layers = []
        for i in range(party_nums):
            input_layers.append(
                keras.Input(
                    input_dim,
                )
            )

        merged_layer = layers.concatenate(input_layers)
        fuse_layer = layers.Dense(64, activation='relu')(merged_layer)
        output = layers.Dense(output_dim, activation='sigmoid')(fuse_layer)

        model = keras.Model(inputs=input_layers, outputs=output)
        model.summary()

        model.compile(
            loss='binary_crossentropy',
            optimizer='adam',
            metrics=["accuracy", tf.keras.metrics.AUC()],
        )
        return model

    return create_model


model_fuse = create_fuse_model(input_dim=hidden_size, party_nums=2, output_dim=1)

base_model_dict = {alice: model_base_alice, bob: model_base_bob}

定义 SLModel#

这里如果使用流水线并行,设置 strategy = ‘pipline’,并设置参数 pipeline_size,pipeline_size 增大并发程度会增大,但达到一定阈值,当一方的计算或网络被打满,性能将不会再有提升,通常 pipeline_size 设为 2-4。

[4]:
from secretflow.ml.nn import SLModel

sl_model_origin = SLModel(
    base_model_dict=base_model_dict,
    device_y=alice,
    model_fuse=model_fuse,
)

sl_model_pipeline = SLModel(
    base_model_dict=base_model_dict,
    device_y=alice,
    model_fuse=model_fuse,
    strategy='pipeline',
    pipeline_size=2,
)
(_run pid=1817588) /home/ssd2/zhaocaibei/miniconda3/envs/jupyter/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=1817588)   warnings.warn(
(_run pid=1817882) /home/ssd2/zhaocaibei/miniconda3/envs/jupyter/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=1817882)   warnings.warn(
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.strategy.pipeline.PYUPipelineTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.strategy.pipeline.PYUPipelineTFModel'> with party bob.

开始训练#

我们分别对没有使用通讯压缩的模型和使用了量化压缩的模型进行训练,并把训练轮次拉高到40轮,看看效果如何。

[5]:
import time

histories = []
cost_time = []
for sl_model in [sl_model_origin, sl_model_pipeline]:
    begin = time.time()
    history = sl_model.fit(
        train_data,
        train_label,
        validation_data=(test_data, test_label),
        epochs=40,
        batch_size=128,
        shuffle=True,
        verbose=1,
        validation_freq=1,
    )
    end = time.time()
    cost_time.append((end - begin) / 60)
    histories.append(history)

print(cost_time)
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0aa04c0>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bcd0>}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0ac8430>}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bc10>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874b400>}, aligned=True), VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874be20>}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 5731, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f1368738df0>}
(pid=1825002) 2023-09-26 19:49:33.620129: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825080) 2023-09-26 19:49:33.727665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825145) 2023-09-26 19:49:34.036428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825269) 2023-09-26 19:49:36.259301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825002) 2023-09-26 19:49:36.484667: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825002) 2023-09-26 19:49:36.484792: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825002) 2023-09-26 19:49:36.484806: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=1825080) 2023-09-26 19:49:36.757167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825080) 2023-09-26 19:49:36.757286: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825080) 2023-09-26 19:49:36.757300: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=1825145) 2023-09-26 19:49:36.855295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825145) 2023-09-26 19:49:36.855437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825145) 2023-09-26 19:49:36.855455: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=1825269) 2023-09-26 19:49:37.154045: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825269) 2023-09-26 19:49:37.154148: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825269) 2023-09-26 19:49:37.154161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(PYUSLTFModel pid=1825002) 2023-09-26 19:49:38.824645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUSLTFModel pid=1825002) 2023-09-26 19:49:38.824774: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=1825080) 2023-09-26 19:49:38.909952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUSLTFModel pid=1825080) 2023-09-26 19:49:38.909992: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=1825002) Model: "sequential"
(PYUSLTFModel pid=1825002) _________________________________________________________________
(PYUSLTFModel pid=1825002)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=1825002) =================================================================
(PYUSLTFModel pid=1825002)  dense (Dense)               (None, 100)               500
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) =================================================================
(PYUSLTFModel pid=1825002) Total params: 6,964
(PYUSLTFModel pid=1825002) Trainable params: 6,964
(PYUSLTFModel pid=1825002) Non-trainable params: 0
(PYUSLTFModel pid=1825002) _________________________________________________________________
(PYUSLTFModel pid=1825002) Model: "model"
(PYUSLTFModel pid=1825002) __________________________________________________________________________________________________
(PYUSLTFModel pid=1825002)  Layer (type)                   Output Shape         Param #     Connected to
(PYUSLTFModel pid=1825002) ==================================================================================================
(PYUSLTFModel pid=1825002)  input_2 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002)  input_3 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002)  concatenate (Concatenate)      (None, 128)          0           ['input_2[0][0]',
(PYUSLTFModel pid=1825002)                                                                   'input_3[0][0]']
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002)  dense_2 (Dense)                (None, 64)           8256        ['concatenate[0][0]']
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002)  dense_3 (Dense)                (None, 1)            65          ['dense_2[0][0]']
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) ==================================================================================================
(PYUSLTFModel pid=1825002) Total params: 8,321
(PYUSLTFModel pid=1825002) Trainable params: 8,321
(PYUSLTFModel pid=1825002) Non-trainable params: 0
(PYUSLTFModel pid=1825002) __________________________________________________________________________________________________
(PYUSLTFModel pid=1825080) Model: "sequential"
(PYUSLTFModel pid=1825080) _________________________________________________________________
(PYUSLTFModel pid=1825080)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=1825080) =================================================================
(PYUSLTFModel pid=1825080)  dense (Dense)               (None, 100)               1300
(PYUSLTFModel pid=1825080)
(PYUSLTFModel pid=1825080)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=1825080)
(PYUSLTFModel pid=1825080) =================================================================
(PYUSLTFModel pid=1825080) Total params: 7,764
(PYUSLTFModel pid=1825080) Trainable params: 7,764
(PYUSLTFModel pid=1825080) Non-trainable params: 0
(PYUSLTFModel pid=1825080) _________________________________________________________________
  0%|          | 0/29 [00:00<?, ?it/s](PYUPipelineTFModel pid=1825145) 2023-09-26 19:49:39.329376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUPipelineTFModel pid=1825145) 2023-09-26 19:49:39.329425: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUPipelineTFModel pid=1825145) Model: "sequential"
(PYUPipelineTFModel pid=1825145) _________________________________________________________________
(PYUPipelineTFModel pid=1825145)  Layer (type)                Output Shape              Param #
(PYUPipelineTFModel pid=1825145) =================================================================
(PYUPipelineTFModel pid=1825145)  dense (Dense)               (None, 100)               500
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145)  dense_1 (Dense)             (None, 64)                6464
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) =================================================================
(PYUPipelineTFModel pid=1825145) Total params: 6,964
(PYUPipelineTFModel pid=1825145) Trainable params: 6,964
(PYUPipelineTFModel pid=1825145) Non-trainable params: 0
(PYUPipelineTFModel pid=1825145) _________________________________________________________________
(PYUPipelineTFModel pid=1825269) 2023-09-26 19:49:39.538987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUPipelineTFModel pid=1825269) 2023-09-26 19:49:39.539186: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUPipelineTFModel pid=1825145) Model: "model"
(PYUPipelineTFModel pid=1825145) __________________________________________________________________________________________________
(PYUPipelineTFModel pid=1825145)  Layer (type)                   Output Shape         Param #     Connected to
(PYUPipelineTFModel pid=1825145) ==================================================================================================
(PYUPipelineTFModel pid=1825145)  input_2 (InputLayer)           [(None, 64)]         0           []
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145)  input_3 (InputLayer)           [(None, 64)]         0           []
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145)  concatenate (Concatenate)      (None, 128)          0           ['input_2[0][0]',
(PYUPipelineTFModel pid=1825145)                                                                   'input_3[0][0]']
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145)  dense_2 (Dense)                (None, 64)           8256        ['concatenate[0][0]']
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145)  dense_3 (Dense)                (None, 1)            65          ['dense_2[0][0]']
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) ==================================================================================================
(PYUPipelineTFModel pid=1825145) Total params: 8,321
(PYUPipelineTFModel pid=1825145) Trainable params: 8,321
(PYUPipelineTFModel pid=1825145) Non-trainable params: 0
(PYUPipelineTFModel pid=1825145) __________________________________________________________________________________________________
(PYUPipelineTFModel pid=1825269) Model: "sequential"
(PYUPipelineTFModel pid=1825269) _________________________________________________________________
(PYUPipelineTFModel pid=1825269)  Layer (type)                Output Shape              Param #
(PYUPipelineTFModel pid=1825269) =================================================================
(PYUPipelineTFModel pid=1825269)  dense (Dense)               (None, 100)               1300
(PYUPipelineTFModel pid=1825269)
(PYUPipelineTFModel pid=1825269)  dense_1 (Dense)             (None, 64)                6464
(PYUPipelineTFModel pid=1825269)
(PYUPipelineTFModel pid=1825269) =================================================================
(PYUPipelineTFModel pid=1825269) Total params: 7,764
(PYUPipelineTFModel pid=1825269) Trainable params: 7,764
(PYUPipelineTFModel pid=1825269) Non-trainable params: 0
(PYUPipelineTFModel pid=1825269) _________________________________________________________________
  7%|▋         | 2/29 [00:02<00:38,  1.43s/it](_run pid=1817588) 2023-09-26 19:49:42.338308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:43.232958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:43.233049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:43.233059: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=1817588) 2023-09-26 19:49:45.244641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:45.244680: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:08<00:00,  3.54it/s, epoch: 1/40 -  train_loss:0.44887787103652954  train_accuracy:0.850215494632721  train_auc_1:0.5317299365997314  val_loss:0.39494723081588745  val_accuracy:0.8729282021522522  val_auc_1:0.5657897591590881 ]
100%|██████████| 29/29 [00:01<00:00, 16.09it/s, epoch: 2/40 -  train_loss:0.3432118892669678  train_accuracy:0.8857954740524292  train_auc_1:0.6473174095153809  val_loss:0.363627165555954  val_accuracy:0.8729282021522522  val_auc_1:0.6689268350601196 ]
100%|██████████| 29/29 [00:01<00:00, 26.94it/s, epoch: 3/40 -  train_loss:0.32648009061813354  train_accuracy:0.8863146305084229  train_auc_1:0.7191672921180725  val_loss:0.35098856687545776  val_accuracy:0.8729282021522522  val_auc_1:0.7191359400749207 ]
100%|██████████| 29/29 [00:01<00:00, 26.13it/s, epoch: 4/40 -  train_loss:0.31278952956199646  train_accuracy:0.8833512663841248  train_auc_1:0.7800465226173401  val_loss:0.34081292152404785  val_accuracy:0.8729282021522522  val_auc_1:0.7596642971038818 ]
  0%|          | 0/29 [00:00<?, ?it/s](_run pid=1817740) 2023-09-26 19:49:51.567820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:52.431220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:52.431321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:52.431334: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=1817740) 2023-09-26 19:49:54.400120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:54.400158: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:04<00:00,  6.83it/s, epoch: 5/40 -  train_loss:0.29879459738731384  train_accuracy:0.8844026327133179  train_auc_1:0.8006787300109863  val_loss:0.32317325472831726  val_accuracy:0.8729282021522522  val_auc_1:0.801051139831543 ]
100%|██████████| 29/29 [00:01<00:00, 26.10it/s, epoch: 6/40 -  train_loss:0.29146328568458557  train_accuracy:0.875  train_auc_1:0.8463853597640991  val_loss:0.3064562678337097  val_accuracy:0.8696132302284241  val_auc_1:0.8238029479980469 ]
100%|██████████| 29/29 [00:01<00:00, 26.24it/s, epoch: 7/40 -  train_loss:0.24985690414905548  train_accuracy:0.8960176706314087  train_auc_1:0.8685527443885803  val_loss:0.30176323652267456  val_accuracy:0.8718231916427612  val_auc_1:0.839279055595398 ]
100%|██████████| 29/29 [00:01<00:00, 26.07it/s, epoch: 8/40 -  train_loss:0.25740572810173035  train_accuracy:0.8872159123420715  train_auc_1:0.8768747448921204  val_loss:0.2859904170036316  val_accuracy:0.8806629776954651  val_auc_1:0.8433902263641357 ]
100%|██████████| 29/29 [00:01<00:00, 27.53it/s, epoch: 9/40 -  train_loss:0.24906550347805023  train_accuracy:0.8995150923728943  train_auc_1:0.8653609156608582  val_loss:0.2812442481517792  val_accuracy:0.8828729391098022  val_auc_1:0.8512052893638611 ]
100%|██████████| 29/29 [00:01<00:00, 27.23it/s, epoch: 10/40 -  train_loss:0.2445402294397354  train_accuracy:0.892699122428894  train_auc_1:0.8836410641670227  val_loss:0.2827773094177246  val_accuracy:0.8773480653762817  val_auc_1:0.8469785451889038 ]
100%|██████████| 29/29 [00:01<00:00, 26.02it/s, epoch: 11/40 -  train_loss:0.2518855035305023  train_accuracy:0.8954645991325378  train_auc_1:0.8761004209518433  val_loss:0.295387327671051  val_accuracy:0.8784530162811279  val_auc_1:0.8485745787620544 ]
100%|██████████| 29/29 [00:01<00:00, 27.55it/s, epoch: 12/40 -  train_loss:0.22354044020175934  train_accuracy:0.9102909564971924  train_auc_1:0.8964704275131226  val_loss:0.30353987216949463  val_accuracy:0.8795580267906189  val_auc_1:0.8534507155418396 ]
100%|██████████| 29/29 [00:01<00:00, 28.33it/s, epoch: 13/40 -  train_loss:0.22443315386772156  train_accuracy:0.9079092741012573  train_auc_1:0.8922196626663208  val_loss:0.2777591645717621  val_accuracy:0.8795580267906189  val_auc_1:0.8531700372695923 ]
100%|██████████| 29/29 [00:01<00:00, 25.85it/s, epoch: 14/40 -  train_loss:0.21603752672672272  train_accuracy:0.9125000238418579  train_auc_1:0.9022694230079651  val_loss:0.2857709228992462  val_accuracy:0.8817679286003113  val_auc_1:0.8522068858146667 ]
100%|██████████| 29/29 [00:01<00:00, 26.38it/s, epoch: 15/40 -  train_loss:0.2281351238489151  train_accuracy:0.9126105904579163  train_auc_1:0.8849776983261108  val_loss:0.27802714705467224  val_accuracy:0.8773480653762817  val_auc_1:0.853461742401123 ]
100%|██████████| 29/29 [00:01<00:00, 27.39it/s, epoch: 16/40 -  train_loss:0.2165425419807434  train_accuracy:0.9150568246841431  train_auc_1:0.8966040015220642  val_loss:0.2783280313014984  val_accuracy:0.8828729391098022  val_auc_1:0.8554485440254211 ]
100%|██████████| 29/29 [00:01<00:00, 28.48it/s, epoch: 17/40 -  train_loss:0.21801750361919403  train_accuracy:0.9081858396530151  train_auc_1:0.8976572155952454  val_loss:0.2761484980583191  val_accuracy:0.8784530162811279  val_auc_1:0.8577655553817749 ]
100%|██████████| 29/29 [00:01<00:00, 28.16it/s, epoch: 18/40 -  train_loss:0.2221526950597763  train_accuracy:0.907866358757019  train_auc_1:0.9024293422698975  val_loss:0.2797839343547821  val_accuracy:0.8828729391098022  val_auc_1:0.8589048385620117 ]
100%|██████████| 29/29 [00:01<00:00, 27.62it/s, epoch: 19/40 -  train_loss:0.2285359650850296  train_accuracy:0.9059734344482422  train_auc_1:0.8901480436325073  val_loss:0.27858439087867737  val_accuracy:0.8828729391098022  val_auc_1:0.860638439655304 ]
100%|██████████| 29/29 [00:01<00:00, 27.14it/s, epoch: 20/40 -  train_loss:0.2182772159576416  train_accuracy:0.915099561214447  train_auc_1:0.9078006744384766  val_loss:0.291841983795166  val_accuracy:0.8751381039619446  val_auc_1:0.8562520742416382 ]
100%|██████████| 29/29 [00:01<00:00, 27.61it/s, epoch: 21/40 -  train_loss:0.20946133136749268  train_accuracy:0.9164772629737854  train_auc_1:0.9116370677947998  val_loss:0.2933138310909271  val_accuracy:0.8828729391098022  val_auc_1:0.8598623275756836 ]
100%|██████████| 29/29 [00:01<00:00, 26.59it/s, epoch: 22/40 -  train_loss:0.23493120074272156  train_accuracy:0.9054203629493713  train_auc_1:0.8970139026641846  val_loss:0.27499568462371826  val_accuracy:0.8773480653762817  val_auc_1:0.8625756502151489 ]
100%|██████████| 29/29 [00:00<00:00, 29.05it/s, epoch: 23/40 -  train_loss:0.21671472489833832  train_accuracy:0.9101216793060303  train_auc_1:0.9046225547790527  val_loss:0.2828799784183502  val_accuracy:0.8850829005241394  val_auc_1:0.8609355688095093 ]
100%|██████████| 29/29 [00:01<00:00, 28.19it/s, epoch: 24/40 -  train_loss:0.22586138546466827  train_accuracy:0.9110991358757019  train_auc_1:0.90799880027771  val_loss:0.28323644399642944  val_accuracy:0.8850829005241394  val_auc_1:0.8611777424812317 ]
100%|██████████| 29/29 [00:01<00:00, 28.10it/s, epoch: 25/40 -  train_loss:0.21121767163276672  train_accuracy:0.9133522510528564  train_auc_1:0.9088505506515503  val_loss:0.27677592635154724  val_accuracy:0.8861878514289856  val_auc_1:0.860908031463623 ]
100%|██████████| 29/29 [00:01<00:00, 27.28it/s, epoch: 26/40 -  train_loss:0.20616813004016876  train_accuracy:0.9184659123420715  train_auc_1:0.908971905708313  val_loss:0.2886063754558563  val_accuracy:0.8828729391098022  val_auc_1:0.8584039807319641 ]
100%|██████████| 29/29 [00:01<00:00, 25.97it/s, epoch: 27/40 -  train_loss:0.24042247235774994  train_accuracy:0.8976293206214905  train_auc_1:0.8970732688903809  val_loss:0.2905130684375763  val_accuracy:0.8817679286003113  val_auc_1:0.8628398180007935 ]
100%|██████████| 29/29 [00:01<00:00, 28.27it/s, epoch: 28/40 -  train_loss:0.2106049805879593  train_accuracy:0.9131637215614319  train_auc_1:0.9120412468910217  val_loss:0.2760363817214966  val_accuracy:0.8883978128433228  val_auc_1:0.8631425499916077 ]
100%|██████████| 29/29 [00:01<00:00, 27.53it/s, epoch: 29/40 -  train_loss:0.19771815836429596  train_accuracy:0.9181034564971924  train_auc_1:0.9175819158554077  val_loss:0.2815971374511719  val_accuracy:0.8872928023338318  val_auc_1:0.8594056367874146 ]
100%|██████████| 29/29 [00:01<00:00, 27.45it/s, epoch: 30/40 -  train_loss:0.21977882087230682  train_accuracy:0.9065265655517578  train_auc_1:0.908697247505188  val_loss:0.2787911891937256  val_accuracy:0.8806629776954651  val_auc_1:0.8623830080032349 ]
100%|██████████| 29/29 [00:01<00:00, 26.70it/s, epoch: 31/40 -  train_loss:0.2060454785823822  train_accuracy:0.9121767282485962  train_auc_1:0.909172534942627  val_loss:0.29584282636642456  val_accuracy:0.8806629776954651  val_auc_1:0.8634011149406433 ]
100%|██████████| 29/29 [00:01<00:00, 27.61it/s, epoch: 32/40 -  train_loss:0.20517688989639282  train_accuracy:0.9191810488700867  train_auc_1:0.907102108001709  val_loss:0.28416600823402405  val_accuracy:0.8839778900146484  val_auc_1:0.8632196187973022 ]
100%|██████████| 29/29 [00:01<00:00, 27.90it/s, epoch: 33/40 -  train_loss:0.21313920617103577  train_accuracy:0.9112278819084167  train_auc_1:0.9166741371154785  val_loss:0.2824288308620453  val_accuracy:0.8817679286003113  val_auc_1:0.8626307249069214 ]
100%|██████████| 29/29 [00:01<00:00, 27.56it/s, epoch: 34/40 -  train_loss:0.20695945620536804  train_accuracy:0.9164823293685913  train_auc_1:0.9163408279418945  val_loss:0.2820662260055542  val_accuracy:0.8861878514289856  val_auc_1:0.8618767261505127 ]
100%|██████████| 29/29 [00:01<00:00, 27.02it/s, epoch: 35/40 -  train_loss:0.2142862230539322  train_accuracy:0.9129849076271057  train_auc_1:0.9152473211288452  val_loss:0.2856462597846985  val_accuracy:0.8806629776954651  val_auc_1:0.8672426342964172 ]
100%|██████████| 29/29 [00:01<00:00, 27.22it/s, epoch: 36/40 -  train_loss:0.18980328738689423  train_accuracy:0.92578125  train_auc_1:0.9262045621871948  val_loss:0.2957724332809448  val_accuracy:0.8850829005241394  val_auc_1:0.855277955532074 ]
100%|██████████| 29/29 [00:01<00:00, 27.44it/s, epoch: 37/40 -  train_loss:0.2022796869277954  train_accuracy:0.9156526327133179  train_auc_1:0.9175050854682922  val_loss:0.28874075412750244  val_accuracy:0.8828729391098022  val_auc_1:0.8578150272369385 ]
100%|██████████| 29/29 [00:01<00:00, 26.17it/s, epoch: 38/40 -  train_loss:0.20826201140880585  train_accuracy:0.917588472366333  train_auc_1:0.9161174893379211  val_loss:0.28712138533592224  val_accuracy:0.889502763748169  val_auc_1:0.8563125729560852 ]
100%|██████████| 29/29 [00:01<00:00, 28.39it/s, epoch: 39/40 -  train_loss:0.20791961252689362  train_accuracy:0.9195243120193481  train_auc_1:0.9068150520324707  val_loss:0.28275591135025024  val_accuracy:0.8828729391098022  val_auc_1:0.8621078133583069 ]
100%|██████████| 29/29 [00:01<00:00, 26.34it/s, epoch: 40/40 -  train_loss:0.2018997073173523  train_accuracy:0.9161931872367859  train_auc_1:0.9205420613288879  val_loss:0.2885696589946747  val_accuracy:0.8839778900146484  val_auc_1:0.8595322370529175 ]
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0aa04c0>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bcd0>}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0ac8430>}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bc10>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874b400>}, aligned=True), VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874be20>}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 57815, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f1368738e20>}
100%|██████████| 29/29 [00:03<00:00,  9.21it/s, epoch: 1/40 -  train_loss:0.4289693236351013  train_accuracy:0.841261088848114  train_auc_1:0.5522745847702026  val_loss:0.4278267025947571  val_accuracy:0.8729282021522522  val_auc_1:0.596114456653595 ]
100%|██████████| 29/29 [00:01<00:00, 27.58it/s, epoch: 2/40 -  train_loss:0.3457517921924591  train_accuracy:0.8894886374473572  train_auc_1:0.5915822982788086  val_loss:0.36710259318351746  val_accuracy:0.8729282021522522  val_auc_1:0.6552668809890747 ]
100%|██████████| 29/29 [00:01<00:00, 27.34it/s, epoch: 3/40 -  train_loss:0.33160045742988586  train_accuracy:0.8857758641242981  train_auc_1:0.6996316909790039  val_loss:0.3565421402454376  val_accuracy:0.8729282021522522  val_auc_1:0.6939350962638855 ]
100%|██████████| 29/29 [00:01<00:00, 26.56it/s, epoch: 4/40 -  train_loss:0.3203859031200409  train_accuracy:0.8879978060722351  train_auc_1:0.7228385210037231  val_loss:0.34700706601142883  val_accuracy:0.8729282021522522  val_auc_1:0.7337919473648071 ]
100%|██████████| 29/29 [00:01<00:00, 27.51it/s, epoch: 5/40 -  train_loss:0.29412248730659485  train_accuracy:0.8941271305084229  train_auc_1:0.7773510217666626  val_loss:0.3330070972442627  val_accuracy:0.8729282021522522  val_auc_1:0.7747275233268738 ]
100%|██████████| 29/29 [00:01<00:00, 26.64it/s, epoch: 6/40 -  train_loss:0.284542053937912  train_accuracy:0.8840909004211426  train_auc_1:0.8406177759170532  val_loss:0.3239462077617645  val_accuracy:0.8729282021522522  val_auc_1:0.8016015291213989 ]
100%|██████████| 29/29 [00:01<00:00, 28.29it/s, epoch: 7/40 -  train_loss:0.2678506672382355  train_accuracy:0.8896570801734924  train_auc_1:0.8536190986633301  val_loss:0.2891094386577606  val_accuracy:0.8773480653762817  val_auc_1:0.8471270799636841 ]
100%|██████████| 29/29 [00:01<00:00, 27.18it/s, epoch: 8/40 -  train_loss:0.24808000028133392  train_accuracy:0.9051136374473572  train_auc_1:0.8591896891593933  val_loss:0.29273611307144165  val_accuracy:0.8718231916427612  val_auc_1:0.8348816633224487 ]
100%|██████████| 29/29 [00:01<00:00, 26.80it/s, epoch: 9/40 -  train_loss:0.252238392829895  train_accuracy:0.8917025923728943  train_auc_1:0.8791320323944092  val_loss:0.2933768332004547  val_accuracy:0.8773480653762817  val_auc_1:0.8507869243621826 ]
100%|██████████| 29/29 [00:01<00:00, 27.14it/s, epoch: 10/40 -  train_loss:0.24061305820941925  train_accuracy:0.8982300758361816  train_auc_1:0.8796927332878113  val_loss:0.28491833806037903  val_accuracy:0.8784530162811279  val_auc_1:0.843962550163269 ]
100%|██████████| 29/29 [00:01<00:00, 27.45it/s, epoch: 11/40 -  train_loss:0.24184739589691162  train_accuracy:0.9035560488700867  train_auc_1:0.8807085752487183  val_loss:0.29128599166870117  val_accuracy:0.8773480653762817  val_auc_1:0.855217456817627 ]
100%|██████████| 29/29 [00:01<00:00, 28.02it/s, epoch: 12/40 -  train_loss:0.24324959516525269  train_accuracy:0.8994318246841431  train_auc_1:0.8777059316635132  val_loss:0.2829277217388153  val_accuracy:0.8762431144714355  val_auc_1:0.8485085368156433 ]
100%|██████████| 29/29 [00:01<00:00, 28.26it/s, epoch: 13/40 -  train_loss:0.24976494908332825  train_accuracy:0.9017045497894287  train_auc_1:0.8789190649986267  val_loss:0.3205900490283966  val_accuracy:0.8762431144714355  val_auc_1:0.8470830917358398 ]
100%|██████████| 29/29 [00:01<00:00, 27.93it/s, epoch: 14/40 -  train_loss:0.25038978457450867  train_accuracy:0.9027478694915771  train_auc_1:0.87300044298172  val_loss:0.2858302593231201  val_accuracy:0.8817679286003113  val_auc_1:0.8585360646247864 ]
100%|██████████| 29/29 [00:01<00:00, 28.00it/s, epoch: 15/40 -  train_loss:0.22572343051433563  train_accuracy:0.9065194129943848  train_auc_1:0.8839739561080933  val_loss:0.27690911293029785  val_accuracy:0.8795580267906189  val_auc_1:0.8565822839736938 ]
100%|██████████| 29/29 [00:01<00:00, 26.32it/s, epoch: 16/40 -  train_loss:0.21374236047267914  train_accuracy:0.9098557829856873  train_auc_1:0.8846680521965027  val_loss:0.28506413102149963  val_accuracy:0.8817679286003113  val_auc_1:0.8479802012443542 ]
100%|██████████| 29/29 [00:01<00:00, 27.78it/s, epoch: 17/40 -  train_loss:0.2231924682855606  train_accuracy:0.9081858396530151  train_auc_1:0.8896657228469849  val_loss:0.28099507093429565  val_accuracy:0.8817679286003113  val_auc_1:0.8554320335388184 ]
100%|██████████| 29/29 [00:01<00:00, 27.45it/s, epoch: 18/40 -  train_loss:0.21955524384975433  train_accuracy:0.9089439511299133  train_auc_1:0.8989342451095581  val_loss:0.2811129093170166  val_accuracy:0.8795580267906189  val_auc_1:0.8558282852172852 ]
100%|██████████| 29/29 [00:01<00:00, 28.06it/s, epoch: 19/40 -  train_loss:0.2427460104227066  train_accuracy:0.90625  train_auc_1:0.8821660280227661  val_loss:0.2881392240524292  val_accuracy:0.8762431144714355  val_auc_1:0.853461742401123 ]
100%|██████████| 29/29 [00:01<00:00, 27.76it/s, epoch: 20/40 -  train_loss:0.22428719699382782  train_accuracy:0.9095686078071594  train_auc_1:0.8950278759002686  val_loss:0.27665039896965027  val_accuracy:0.8773480653762817  val_auc_1:0.859268069267273 ]
100%|██████████| 29/29 [00:01<00:00, 28.75it/s, epoch: 21/40 -  train_loss:0.23987431824207306  train_accuracy:0.8982300758361816  train_auc_1:0.8865035772323608  val_loss:0.2787191569805145  val_accuracy:0.8795580267906189  val_auc_1:0.8540340662002563 ]
100%|██████████| 29/29 [00:01<00:00, 28.38it/s, epoch: 22/40 -  train_loss:0.23535579442977905  train_accuracy:0.9059734344482422  train_auc_1:0.868804931640625  val_loss:0.2837145924568176  val_accuracy:0.8850829005241394  val_auc_1:0.852718710899353 ]
100%|██████████| 29/29 [00:01<00:00, 26.92it/s, epoch: 23/40 -  train_loss:0.22587361931800842  train_accuracy:0.9102272987365723  train_auc_1:0.9052188396453857  val_loss:0.31282860040664673  val_accuracy:0.8751381039619446  val_auc_1:0.8482223749160767 ]
100%|██████████| 29/29 [00:01<00:00, 28.01it/s, epoch: 24/40 -  train_loss:0.23454731702804565  train_accuracy:0.9004424810409546  train_auc_1:0.8923315405845642  val_loss:0.2845218777656555  val_accuracy:0.8817679286003113  val_auc_1:0.8535387516021729 ]
100%|██████████| 29/29 [00:01<00:00, 27.66it/s, epoch: 25/40 -  train_loss:0.21877069771289825  train_accuracy:0.9135237336158752  train_auc_1:0.8983669877052307  val_loss:0.2817881405353546  val_accuracy:0.8872928023338318  val_auc_1:0.8562355637550354 ]
100%|██████████| 29/29 [00:01<00:00, 28.56it/s, epoch: 26/40 -  train_loss:0.23317019641399384  train_accuracy:0.9047897458076477  train_auc_1:0.8926054835319519  val_loss:0.2959703505039215  val_accuracy:0.8806629776954651  val_auc_1:0.8549916744232178 ]
100%|██████████| 29/29 [00:01<00:00, 27.97it/s, epoch: 27/40 -  train_loss:0.22137722373008728  train_accuracy:0.9076327681541443  train_auc_1:0.9086238145828247  val_loss:0.2853763699531555  val_accuracy:0.8806629776954651  val_auc_1:0.8527958989143372 ]
100%|██████████| 29/29 [00:00<00:00, 30.42it/s, epoch: 28/40 -  train_loss:0.21273373067378998  train_accuracy:0.915409505367279  train_auc_1:0.9035125970840454  val_loss:0.2835741639137268  val_accuracy:0.8773480653762817  val_auc_1:0.8541442155838013 ]
100%|██████████| 29/29 [00:00<00:00, 30.00it/s, epoch: 29/40 -  train_loss:0.2242117077112198  train_accuracy:0.9054203629493713  train_auc_1:0.9061750173568726  val_loss:0.28261181712150574  val_accuracy:0.8806629776954651  val_auc_1:0.8538470268249512 ]
100%|██████████| 29/29 [00:01<00:00, 28.50it/s, epoch: 30/40 -  train_loss:0.23390451073646545  train_accuracy:0.90625  train_auc_1:0.8956843614578247  val_loss:0.2910405099391937  val_accuracy:0.8773480653762817  val_auc_1:0.856081485748291 ]
100%|██████████| 29/29 [00:01<00:00, 28.54it/s, epoch: 31/40 -  train_loss:0.21303458511829376  train_accuracy:0.9097546935081482  train_auc_1:0.9092994928359985  val_loss:0.2907060384750366  val_accuracy:0.8795580267906189  val_auc_1:0.854667067527771 ]
100%|██████████| 29/29 [00:01<00:00, 28.50it/s, epoch: 32/40 -  train_loss:0.2017047256231308  train_accuracy:0.9213067889213562  train_auc_1:0.9215606451034546  val_loss:0.2792257070541382  val_accuracy:0.8784530162811279  val_auc_1:0.8622398972511292 ]
100%|██████████| 29/29 [00:01<00:00, 27.78it/s, epoch: 33/40 -  train_loss:0.21978729963302612  train_accuracy:0.90625  train_auc_1:0.9032835364341736  val_loss:0.2790915071964264  val_accuracy:0.8828729391098022  val_auc_1:0.8596147894859314 ]
100%|██████████| 29/29 [00:01<00:00, 28.67it/s, epoch: 34/40 -  train_loss:0.2279532253742218  train_accuracy:0.900053858757019  train_auc_1:0.9082023501396179  val_loss:0.28917449712753296  val_accuracy:0.8850829005241394  val_auc_1:0.8572481870651245 ]
100%|██████████| 29/29 [00:01<00:00, 28.98it/s, epoch: 35/40 -  train_loss:0.20137761533260345  train_accuracy:0.9197198152542114  train_auc_1:0.9088310599327087  val_loss:0.2941972017288208  val_accuracy:0.8839778900146484  val_auc_1:0.860407292842865 ]
100%|██████████| 29/29 [00:01<00:00, 27.61it/s, epoch: 36/40 -  train_loss:0.21837779879570007  train_accuracy:0.9113685488700867  train_auc_1:0.9138096570968628  val_loss:0.31745481491088867  val_accuracy:0.8795580267906189  val_auc_1:0.8539350032806396 ]
100%|██████████| 29/29 [00:01<00:00, 27.85it/s, epoch: 37/40 -  train_loss:0.22294917702674866  train_accuracy:0.9103982448577881  train_auc_1:0.9038949012756348  val_loss:0.28447607159614563  val_accuracy:0.8806629776954651  val_auc_1:0.8571106195449829 ]
100%|██████████| 29/29 [00:01<00:00, 28.83it/s, epoch: 38/40 -  train_loss:0.21166759729385376  train_accuracy:0.9135237336158752  train_auc_1:0.922845184803009  val_loss:0.3054659366607666  val_accuracy:0.8872928023338318  val_auc_1:0.8537644147872925 ]
100%|██████████| 29/29 [00:00<00:00, 29.38it/s, epoch: 39/40 -  train_loss:0.21334536373615265  train_accuracy:0.9162057638168335  train_auc_1:0.9161925315856934  val_loss:0.30080586671829224  val_accuracy:0.8872928023338318  val_auc_1:0.8560484647750854 ]
100%|██████████| 29/29 [00:01<00:00, 27.09it/s, epoch: 40/40 -  train_loss:0.21065743267536163  train_accuracy:0.9156249761581421  train_auc_1:0.9142652750015259  val_loss:0.28072354197502136  val_accuracy:0.8872928023338318  val_auc_1:0.8580352067947388 ]
[1.012537411848704, 0.7321080048878987]

[6]:
import matplotlib.pyplot as plt

for history in histories:
    plt.plot(history['train_auc_1'])
    plt.plot(history['val_auc_1'])

plt.title('Model Area Under Curve')
plt.ylabel('Area Under Curve')
plt.xlabel('Epoch')
plt.legend(
    ['origin_train', 'origin_val', 'pipeline_train', 'pipeline_val'], loc='lower right'
)
plt.show()
../../../../../_images/user_guide_federated_learning_vertical_federated_learning_SplitRec_efficiency_sl_pipeline_11_0.png

可以看到,两个模型的验证集 auc 均在 0.85 左右波动,使用流水线并行对此任务的训练精度影响不大,而训练时间由 0.76 分钟下降到 0.65 分钟。

总结#

本篇示例介绍了隐语拆分学习中流水线并行的使用方法。与一般拆分学习的使用方法一样,只需在定义 SLModel 时指定 strategy=‘pipeline’ 并设置 pipeline_num 即可。