SplitRec:在隐语拆分学习中使用流水线并行#
以下代码仅作为示例,请勿在生产环境直接使用。
本示例基于基于“拆分学习:银行营销”教程制作,建议先观看那个教程。
在拆分学习中,由于模型被拆分在多个设备当中,进行训练的时候,各方需要对中间结果和梯度进行多次传输,计算和网络通信存在大量 idle 时间,我们在隐语中参考论文《PipeLearn: Pipeline Parallelism for Collaborative Machine Learnin》,利用隐语底座 RayFed 的任务调度能力,实现了流水线并行,使得计算和通信能够交叠隐藏部分计算时间,提高资源利用率。由于计算和通信的并发执行,使用流水线并行可能会带来模型准确性上的损失,用户可以根据实际场景平衡性能和精度。
下面我们通过一个例子来看一下如何使用在隐语拆分学习中使用流水线并行。
环境设置#
首先,我们在 secretflow 环境中创造 2 个实体 alice 和 bob。
[1]:
import secretflow as sf
sf.shutdown()
sf.init(['alice', 'bob'], address='local')
alice, bob = sf.PYU('alice'), sf.PYU('bob')
2023-09-26 19:49:23,600 INFO worker.py:1538 -- Started a local Ray instance.
准备数据#
接下来我们准备要学习的数据。
我们使用“拆分学习:银行营销”中的数据准备和处理方法,下载银行营销数据集并进行处理。alice 和 bob 的角色和之前的教程完全相同:
[2]:
from secretflow.utils.simulation.datasets import load_bank_marketing
from secretflow.preprocessing.scaler import MinMaxScaler
from secretflow.preprocessing.encoder import LabelEncoder
from secretflow.data.split import train_test_split
random_state = 1234
data = load_bank_marketing(parts={alice: (0, 4), bob: (4, 16)}, axis=1)
label = load_bank_marketing(parts={alice: (16, 17)}, axis=1)
encoder = LabelEncoder()
data['job'] = encoder.fit_transform(data['job'])
data['marital'] = encoder.fit_transform(data['marital'])
data['education'] = encoder.fit_transform(data['education'])
data['default'] = encoder.fit_transform(data['default'])
data['housing'] = encoder.fit_transform(data['housing'])
data['loan'] = encoder.fit_transform(data['loan'])
data['contact'] = encoder.fit_transform(data['contact'])
data['poutcome'] = encoder.fit_transform(data['poutcome'])
data['month'] = encoder.fit_transform(data['month'])
label = encoder.fit_transform(label)
scaler = MinMaxScaler()
data = scaler.fit_transform(data)
train_data, test_data = train_test_split(
data, train_size=0.8, random_state=random_state
)
train_label, test_label = train_test_split(
label, train_size=0.8, random_state=random_state
)
定义模型结构#
接下来我们创建联邦模型,同样地,我们使用“拆分学习:银行营销”中的建模,构建出 base_model 和 fuse_model,然后就可以定义 SLModel 用于训练:
[3]:
def create_base_model(input_dim, output_dim, name='base_model'):
# Create model
def create_model():
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf
model = keras.Sequential(
[
keras.Input(shape=input_dim),
layers.Dense(100, activation="relu"),
layers.Dense(output_dim, activation="relu"),
]
)
# Compile model
model.summary()
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy", tf.keras.metrics.AUC()],
)
return model
return create_model
# prepare model
hidden_size = 64
model_base_alice = create_base_model(4, hidden_size)
model_base_bob = create_base_model(12, hidden_size)
def create_fuse_model(input_dim, output_dim, party_nums, name='fuse_model'):
def create_model():
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf
# input
input_layers = []
for i in range(party_nums):
input_layers.append(
keras.Input(
input_dim,
)
)
merged_layer = layers.concatenate(input_layers)
fuse_layer = layers.Dense(64, activation='relu')(merged_layer)
output = layers.Dense(output_dim, activation='sigmoid')(fuse_layer)
model = keras.Model(inputs=input_layers, outputs=output)
model.summary()
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy", tf.keras.metrics.AUC()],
)
return model
return create_model
model_fuse = create_fuse_model(input_dim=hidden_size, party_nums=2, output_dim=1)
base_model_dict = {alice: model_base_alice, bob: model_base_bob}
定义 SLModel#
这里如果使用流水线并行,设置 strategy = ‘pipline’,并设置参数 pipeline_size,pipeline_size 增大并发程度会增大,但达到一定阈值,当一方的计算或网络被打满,性能将不会再有提升,通常 pipeline_size 设为 2-4。
[4]:
from secretflow.ml.nn import SLModel
sl_model_origin = SLModel(
base_model_dict=base_model_dict,
device_y=alice,
model_fuse=model_fuse,
)
sl_model_pipeline = SLModel(
base_model_dict=base_model_dict,
device_y=alice,
model_fuse=model_fuse,
strategy='pipeline',
pipeline_size=2,
)
(_run pid=1817588) /home/ssd2/zhaocaibei/miniconda3/envs/jupyter/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=1817588) warnings.warn(
(_run pid=1817882) /home/ssd2/zhaocaibei/miniconda3/envs/jupyter/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=1817882) warnings.warn(
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.strategy.pipeline.PYUPipelineTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.strategy.pipeline.PYUPipelineTFModel'> with party bob.
开始训练#
我们分别对没有使用通讯压缩的模型和使用了量化压缩的模型进行训练,并把训练轮次拉高到40轮,看看效果如何。
[5]:
import time
histories = []
cost_time = []
for sl_model in [sl_model_origin, sl_model_pipeline]:
begin = time.time()
history = sl_model.fit(
train_data,
train_label,
validation_data=(test_data, test_label),
epochs=40,
batch_size=128,
shuffle=True,
verbose=1,
validation_freq=1,
)
end = time.time()
cost_time.append((end - begin) / 60)
histories.append(history)
print(cost_time)
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0aa04c0>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bcd0>}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0ac8430>}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bc10>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874b400>}, aligned=True), VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874be20>}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 5731, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f1368738df0>}
(pid=1825002) 2023-09-26 19:49:33.620129: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825080) 2023-09-26 19:49:33.727665: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825145) 2023-09-26 19:49:34.036428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825269) 2023-09-26 19:49:36.259301: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825002) 2023-09-26 19:49:36.484667: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825002) 2023-09-26 19:49:36.484792: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825002) 2023-09-26 19:49:36.484806: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=1825080) 2023-09-26 19:49:36.757167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825080) 2023-09-26 19:49:36.757286: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825080) 2023-09-26 19:49:36.757300: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=1825145) 2023-09-26 19:49:36.855295: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825145) 2023-09-26 19:49:36.855437: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825145) 2023-09-26 19:49:36.855455: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=1825269) 2023-09-26 19:49:37.154045: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825269) 2023-09-26 19:49:37.154148: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(pid=1825269) 2023-09-26 19:49:37.154161: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(PYUSLTFModel pid=1825002) 2023-09-26 19:49:38.824645: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUSLTFModel pid=1825002) 2023-09-26 19:49:38.824774: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=1825080) 2023-09-26 19:49:38.909952: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUSLTFModel pid=1825080) 2023-09-26 19:49:38.909992: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=1825002) Model: "sequential"
(PYUSLTFModel pid=1825002) _________________________________________________________________
(PYUSLTFModel pid=1825002) Layer (type) Output Shape Param #
(PYUSLTFModel pid=1825002) =================================================================
(PYUSLTFModel pid=1825002) dense (Dense) (None, 100) 500
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) =================================================================
(PYUSLTFModel pid=1825002) Total params: 6,964
(PYUSLTFModel pid=1825002) Trainable params: 6,964
(PYUSLTFModel pid=1825002) Non-trainable params: 0
(PYUSLTFModel pid=1825002) _________________________________________________________________
(PYUSLTFModel pid=1825002) Model: "model"
(PYUSLTFModel pid=1825002) __________________________________________________________________________________________________
(PYUSLTFModel pid=1825002) Layer (type) Output Shape Param # Connected to
(PYUSLTFModel pid=1825002) ==================================================================================================
(PYUSLTFModel pid=1825002) input_2 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) input_3 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) concatenate (Concatenate) (None, 128) 0 ['input_2[0][0]',
(PYUSLTFModel pid=1825002) 'input_3[0][0]']
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) dense_2 (Dense) (None, 64) 8256 ['concatenate[0][0]']
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) dense_3 (Dense) (None, 1) 65 ['dense_2[0][0]']
(PYUSLTFModel pid=1825002)
(PYUSLTFModel pid=1825002) ==================================================================================================
(PYUSLTFModel pid=1825002) Total params: 8,321
(PYUSLTFModel pid=1825002) Trainable params: 8,321
(PYUSLTFModel pid=1825002) Non-trainable params: 0
(PYUSLTFModel pid=1825002) __________________________________________________________________________________________________
(PYUSLTFModel pid=1825080) Model: "sequential"
(PYUSLTFModel pid=1825080) _________________________________________________________________
(PYUSLTFModel pid=1825080) Layer (type) Output Shape Param #
(PYUSLTFModel pid=1825080) =================================================================
(PYUSLTFModel pid=1825080) dense (Dense) (None, 100) 1300
(PYUSLTFModel pid=1825080)
(PYUSLTFModel pid=1825080) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=1825080)
(PYUSLTFModel pid=1825080) =================================================================
(PYUSLTFModel pid=1825080) Total params: 7,764
(PYUSLTFModel pid=1825080) Trainable params: 7,764
(PYUSLTFModel pid=1825080) Non-trainable params: 0
(PYUSLTFModel pid=1825080) _________________________________________________________________
0%| | 0/29 [00:00<?, ?it/s](PYUPipelineTFModel pid=1825145) 2023-09-26 19:49:39.329376: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUPipelineTFModel pid=1825145) 2023-09-26 19:49:39.329425: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUPipelineTFModel pid=1825145) Model: "sequential"
(PYUPipelineTFModel pid=1825145) _________________________________________________________________
(PYUPipelineTFModel pid=1825145) Layer (type) Output Shape Param #
(PYUPipelineTFModel pid=1825145) =================================================================
(PYUPipelineTFModel pid=1825145) dense (Dense) (None, 100) 500
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) dense_1 (Dense) (None, 64) 6464
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) =================================================================
(PYUPipelineTFModel pid=1825145) Total params: 6,964
(PYUPipelineTFModel pid=1825145) Trainable params: 6,964
(PYUPipelineTFModel pid=1825145) Non-trainable params: 0
(PYUPipelineTFModel pid=1825145) _________________________________________________________________
(PYUPipelineTFModel pid=1825269) 2023-09-26 19:49:39.538987: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(PYUPipelineTFModel pid=1825269) 2023-09-26 19:49:39.539186: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUPipelineTFModel pid=1825145) Model: "model"
(PYUPipelineTFModel pid=1825145) __________________________________________________________________________________________________
(PYUPipelineTFModel pid=1825145) Layer (type) Output Shape Param # Connected to
(PYUPipelineTFModel pid=1825145) ==================================================================================================
(PYUPipelineTFModel pid=1825145) input_2 (InputLayer) [(None, 64)] 0 []
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) input_3 (InputLayer) [(None, 64)] 0 []
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) concatenate (Concatenate) (None, 128) 0 ['input_2[0][0]',
(PYUPipelineTFModel pid=1825145) 'input_3[0][0]']
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) dense_2 (Dense) (None, 64) 8256 ['concatenate[0][0]']
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) dense_3 (Dense) (None, 1) 65 ['dense_2[0][0]']
(PYUPipelineTFModel pid=1825145)
(PYUPipelineTFModel pid=1825145) ==================================================================================================
(PYUPipelineTFModel pid=1825145) Total params: 8,321
(PYUPipelineTFModel pid=1825145) Trainable params: 8,321
(PYUPipelineTFModel pid=1825145) Non-trainable params: 0
(PYUPipelineTFModel pid=1825145) __________________________________________________________________________________________________
(PYUPipelineTFModel pid=1825269) Model: "sequential"
(PYUPipelineTFModel pid=1825269) _________________________________________________________________
(PYUPipelineTFModel pid=1825269) Layer (type) Output Shape Param #
(PYUPipelineTFModel pid=1825269) =================================================================
(PYUPipelineTFModel pid=1825269) dense (Dense) (None, 100) 1300
(PYUPipelineTFModel pid=1825269)
(PYUPipelineTFModel pid=1825269) dense_1 (Dense) (None, 64) 6464
(PYUPipelineTFModel pid=1825269)
(PYUPipelineTFModel pid=1825269) =================================================================
(PYUPipelineTFModel pid=1825269) Total params: 7,764
(PYUPipelineTFModel pid=1825269) Trainable params: 7,764
(PYUPipelineTFModel pid=1825269) Non-trainable params: 0
(PYUPipelineTFModel pid=1825269) _________________________________________________________________
7%|▋ | 2/29 [00:02<00:38, 1.43s/it](_run pid=1817588) 2023-09-26 19:49:42.338308: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:43.232958: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:43.233049: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:43.233059: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=1817588) 2023-09-26 19:49:45.244641: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817588) 2023-09-26 19:49:45.244680: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:08<00:00, 3.54it/s, epoch: 1/40 - train_loss:0.44887787103652954 train_accuracy:0.850215494632721 train_auc_1:0.5317299365997314 val_loss:0.39494723081588745 val_accuracy:0.8729282021522522 val_auc_1:0.5657897591590881 ]
100%|██████████| 29/29 [00:01<00:00, 16.09it/s, epoch: 2/40 - train_loss:0.3432118892669678 train_accuracy:0.8857954740524292 train_auc_1:0.6473174095153809 val_loss:0.363627165555954 val_accuracy:0.8729282021522522 val_auc_1:0.6689268350601196 ]
100%|██████████| 29/29 [00:01<00:00, 26.94it/s, epoch: 3/40 - train_loss:0.32648009061813354 train_accuracy:0.8863146305084229 train_auc_1:0.7191672921180725 val_loss:0.35098856687545776 val_accuracy:0.8729282021522522 val_auc_1:0.7191359400749207 ]
100%|██████████| 29/29 [00:01<00:00, 26.13it/s, epoch: 4/40 - train_loss:0.31278952956199646 train_accuracy:0.8833512663841248 train_auc_1:0.7800465226173401 val_loss:0.34081292152404785 val_accuracy:0.8729282021522522 val_auc_1:0.7596642971038818 ]
0%| | 0/29 [00:00<?, ?it/s](_run pid=1817740) 2023-09-26 19:49:51.567820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:52.431220: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:52.431321: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:52.431334: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=1817740) 2023-09-26 19:49:54.400120: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/rh/gcc-toolset-11/root/usr/lib64:/opt/rh/gcc-toolset-11/root/usr/lib:/opt/rh/gcc-toolset-11/root/usr/lib64/dyninst:/opt/rh/gcc-toolset-11/root/usr/lib/dyninst
(_run pid=1817740) 2023-09-26 19:49:54.400158: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:04<00:00, 6.83it/s, epoch: 5/40 - train_loss:0.29879459738731384 train_accuracy:0.8844026327133179 train_auc_1:0.8006787300109863 val_loss:0.32317325472831726 val_accuracy:0.8729282021522522 val_auc_1:0.801051139831543 ]
100%|██████████| 29/29 [00:01<00:00, 26.10it/s, epoch: 6/40 - train_loss:0.29146328568458557 train_accuracy:0.875 train_auc_1:0.8463853597640991 val_loss:0.3064562678337097 val_accuracy:0.8696132302284241 val_auc_1:0.8238029479980469 ]
100%|██████████| 29/29 [00:01<00:00, 26.24it/s, epoch: 7/40 - train_loss:0.24985690414905548 train_accuracy:0.8960176706314087 train_auc_1:0.8685527443885803 val_loss:0.30176323652267456 val_accuracy:0.8718231916427612 val_auc_1:0.839279055595398 ]
100%|██████████| 29/29 [00:01<00:00, 26.07it/s, epoch: 8/40 - train_loss:0.25740572810173035 train_accuracy:0.8872159123420715 train_auc_1:0.8768747448921204 val_loss:0.2859904170036316 val_accuracy:0.8806629776954651 val_auc_1:0.8433902263641357 ]
100%|██████████| 29/29 [00:01<00:00, 27.53it/s, epoch: 9/40 - train_loss:0.24906550347805023 train_accuracy:0.8995150923728943 train_auc_1:0.8653609156608582 val_loss:0.2812442481517792 val_accuracy:0.8828729391098022 val_auc_1:0.8512052893638611 ]
100%|██████████| 29/29 [00:01<00:00, 27.23it/s, epoch: 10/40 - train_loss:0.2445402294397354 train_accuracy:0.892699122428894 train_auc_1:0.8836410641670227 val_loss:0.2827773094177246 val_accuracy:0.8773480653762817 val_auc_1:0.8469785451889038 ]
100%|██████████| 29/29 [00:01<00:00, 26.02it/s, epoch: 11/40 - train_loss:0.2518855035305023 train_accuracy:0.8954645991325378 train_auc_1:0.8761004209518433 val_loss:0.295387327671051 val_accuracy:0.8784530162811279 val_auc_1:0.8485745787620544 ]
100%|██████████| 29/29 [00:01<00:00, 27.55it/s, epoch: 12/40 - train_loss:0.22354044020175934 train_accuracy:0.9102909564971924 train_auc_1:0.8964704275131226 val_loss:0.30353987216949463 val_accuracy:0.8795580267906189 val_auc_1:0.8534507155418396 ]
100%|██████████| 29/29 [00:01<00:00, 28.33it/s, epoch: 13/40 - train_loss:0.22443315386772156 train_accuracy:0.9079092741012573 train_auc_1:0.8922196626663208 val_loss:0.2777591645717621 val_accuracy:0.8795580267906189 val_auc_1:0.8531700372695923 ]
100%|██████████| 29/29 [00:01<00:00, 25.85it/s, epoch: 14/40 - train_loss:0.21603752672672272 train_accuracy:0.9125000238418579 train_auc_1:0.9022694230079651 val_loss:0.2857709228992462 val_accuracy:0.8817679286003113 val_auc_1:0.8522068858146667 ]
100%|██████████| 29/29 [00:01<00:00, 26.38it/s, epoch: 15/40 - train_loss:0.2281351238489151 train_accuracy:0.9126105904579163 train_auc_1:0.8849776983261108 val_loss:0.27802714705467224 val_accuracy:0.8773480653762817 val_auc_1:0.853461742401123 ]
100%|██████████| 29/29 [00:01<00:00, 27.39it/s, epoch: 16/40 - train_loss:0.2165425419807434 train_accuracy:0.9150568246841431 train_auc_1:0.8966040015220642 val_loss:0.2783280313014984 val_accuracy:0.8828729391098022 val_auc_1:0.8554485440254211 ]
100%|██████████| 29/29 [00:01<00:00, 28.48it/s, epoch: 17/40 - train_loss:0.21801750361919403 train_accuracy:0.9081858396530151 train_auc_1:0.8976572155952454 val_loss:0.2761484980583191 val_accuracy:0.8784530162811279 val_auc_1:0.8577655553817749 ]
100%|██████████| 29/29 [00:01<00:00, 28.16it/s, epoch: 18/40 - train_loss:0.2221526950597763 train_accuracy:0.907866358757019 train_auc_1:0.9024293422698975 val_loss:0.2797839343547821 val_accuracy:0.8828729391098022 val_auc_1:0.8589048385620117 ]
100%|██████████| 29/29 [00:01<00:00, 27.62it/s, epoch: 19/40 - train_loss:0.2285359650850296 train_accuracy:0.9059734344482422 train_auc_1:0.8901480436325073 val_loss:0.27858439087867737 val_accuracy:0.8828729391098022 val_auc_1:0.860638439655304 ]
100%|██████████| 29/29 [00:01<00:00, 27.14it/s, epoch: 20/40 - train_loss:0.2182772159576416 train_accuracy:0.915099561214447 train_auc_1:0.9078006744384766 val_loss:0.291841983795166 val_accuracy:0.8751381039619446 val_auc_1:0.8562520742416382 ]
100%|██████████| 29/29 [00:01<00:00, 27.61it/s, epoch: 21/40 - train_loss:0.20946133136749268 train_accuracy:0.9164772629737854 train_auc_1:0.9116370677947998 val_loss:0.2933138310909271 val_accuracy:0.8828729391098022 val_auc_1:0.8598623275756836 ]
100%|██████████| 29/29 [00:01<00:00, 26.59it/s, epoch: 22/40 - train_loss:0.23493120074272156 train_accuracy:0.9054203629493713 train_auc_1:0.8970139026641846 val_loss:0.27499568462371826 val_accuracy:0.8773480653762817 val_auc_1:0.8625756502151489 ]
100%|██████████| 29/29 [00:00<00:00, 29.05it/s, epoch: 23/40 - train_loss:0.21671472489833832 train_accuracy:0.9101216793060303 train_auc_1:0.9046225547790527 val_loss:0.2828799784183502 val_accuracy:0.8850829005241394 val_auc_1:0.8609355688095093 ]
100%|██████████| 29/29 [00:01<00:00, 28.19it/s, epoch: 24/40 - train_loss:0.22586138546466827 train_accuracy:0.9110991358757019 train_auc_1:0.90799880027771 val_loss:0.28323644399642944 val_accuracy:0.8850829005241394 val_auc_1:0.8611777424812317 ]
100%|██████████| 29/29 [00:01<00:00, 28.10it/s, epoch: 25/40 - train_loss:0.21121767163276672 train_accuracy:0.9133522510528564 train_auc_1:0.9088505506515503 val_loss:0.27677592635154724 val_accuracy:0.8861878514289856 val_auc_1:0.860908031463623 ]
100%|██████████| 29/29 [00:01<00:00, 27.28it/s, epoch: 26/40 - train_loss:0.20616813004016876 train_accuracy:0.9184659123420715 train_auc_1:0.908971905708313 val_loss:0.2886063754558563 val_accuracy:0.8828729391098022 val_auc_1:0.8584039807319641 ]
100%|██████████| 29/29 [00:01<00:00, 25.97it/s, epoch: 27/40 - train_loss:0.24042247235774994 train_accuracy:0.8976293206214905 train_auc_1:0.8970732688903809 val_loss:0.2905130684375763 val_accuracy:0.8817679286003113 val_auc_1:0.8628398180007935 ]
100%|██████████| 29/29 [00:01<00:00, 28.27it/s, epoch: 28/40 - train_loss:0.2106049805879593 train_accuracy:0.9131637215614319 train_auc_1:0.9120412468910217 val_loss:0.2760363817214966 val_accuracy:0.8883978128433228 val_auc_1:0.8631425499916077 ]
100%|██████████| 29/29 [00:01<00:00, 27.53it/s, epoch: 29/40 - train_loss:0.19771815836429596 train_accuracy:0.9181034564971924 train_auc_1:0.9175819158554077 val_loss:0.2815971374511719 val_accuracy:0.8872928023338318 val_auc_1:0.8594056367874146 ]
100%|██████████| 29/29 [00:01<00:00, 27.45it/s, epoch: 30/40 - train_loss:0.21977882087230682 train_accuracy:0.9065265655517578 train_auc_1:0.908697247505188 val_loss:0.2787911891937256 val_accuracy:0.8806629776954651 val_auc_1:0.8623830080032349 ]
100%|██████████| 29/29 [00:01<00:00, 26.70it/s, epoch: 31/40 - train_loss:0.2060454785823822 train_accuracy:0.9121767282485962 train_auc_1:0.909172534942627 val_loss:0.29584282636642456 val_accuracy:0.8806629776954651 val_auc_1:0.8634011149406433 ]
100%|██████████| 29/29 [00:01<00:00, 27.61it/s, epoch: 32/40 - train_loss:0.20517688989639282 train_accuracy:0.9191810488700867 train_auc_1:0.907102108001709 val_loss:0.28416600823402405 val_accuracy:0.8839778900146484 val_auc_1:0.8632196187973022 ]
100%|██████████| 29/29 [00:01<00:00, 27.90it/s, epoch: 33/40 - train_loss:0.21313920617103577 train_accuracy:0.9112278819084167 train_auc_1:0.9166741371154785 val_loss:0.2824288308620453 val_accuracy:0.8817679286003113 val_auc_1:0.8626307249069214 ]
100%|██████████| 29/29 [00:01<00:00, 27.56it/s, epoch: 34/40 - train_loss:0.20695945620536804 train_accuracy:0.9164823293685913 train_auc_1:0.9163408279418945 val_loss:0.2820662260055542 val_accuracy:0.8861878514289856 val_auc_1:0.8618767261505127 ]
100%|██████████| 29/29 [00:01<00:00, 27.02it/s, epoch: 35/40 - train_loss:0.2142862230539322 train_accuracy:0.9129849076271057 train_auc_1:0.9152473211288452 val_loss:0.2856462597846985 val_accuracy:0.8806629776954651 val_auc_1:0.8672426342964172 ]
100%|██████████| 29/29 [00:01<00:00, 27.22it/s, epoch: 36/40 - train_loss:0.18980328738689423 train_accuracy:0.92578125 train_auc_1:0.9262045621871948 val_loss:0.2957724332809448 val_accuracy:0.8850829005241394 val_auc_1:0.855277955532074 ]
100%|██████████| 29/29 [00:01<00:00, 27.44it/s, epoch: 37/40 - train_loss:0.2022796869277954 train_accuracy:0.9156526327133179 train_auc_1:0.9175050854682922 val_loss:0.28874075412750244 val_accuracy:0.8828729391098022 val_auc_1:0.8578150272369385 ]
100%|██████████| 29/29 [00:01<00:00, 26.17it/s, epoch: 38/40 - train_loss:0.20826201140880585 train_accuracy:0.917588472366333 train_auc_1:0.9161174893379211 val_loss:0.28712138533592224 val_accuracy:0.889502763748169 val_auc_1:0.8563125729560852 ]
100%|██████████| 29/29 [00:01<00:00, 28.39it/s, epoch: 39/40 - train_loss:0.20791961252689362 train_accuracy:0.9195243120193481 train_auc_1:0.9068150520324707 val_loss:0.28275591135025024 val_accuracy:0.8828729391098022 val_auc_1:0.8621078133583069 ]
100%|██████████| 29/29 [00:01<00:00, 26.34it/s, epoch: 40/40 - train_loss:0.2018997073173523 train_accuracy:0.9161931872367859 train_auc_1:0.9205420613288879 val_loss:0.2885696589946747 val_accuracy:0.8839778900146484 val_auc_1:0.8595322370529175 ]
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0aa04c0>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bcd0>}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f13d0ac8430>}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874bc10>, PYURuntime(bob): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874b400>}, aligned=True), VDataFrame(partitions={PYURuntime(alice): <secretflow.data.partition.pandas.partition.PdPartition object at 0x7f136874be20>}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 57815, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f1368738e20>}
100%|██████████| 29/29 [00:03<00:00, 9.21it/s, epoch: 1/40 - train_loss:0.4289693236351013 train_accuracy:0.841261088848114 train_auc_1:0.5522745847702026 val_loss:0.4278267025947571 val_accuracy:0.8729282021522522 val_auc_1:0.596114456653595 ]
100%|██████████| 29/29 [00:01<00:00, 27.58it/s, epoch: 2/40 - train_loss:0.3457517921924591 train_accuracy:0.8894886374473572 train_auc_1:0.5915822982788086 val_loss:0.36710259318351746 val_accuracy:0.8729282021522522 val_auc_1:0.6552668809890747 ]
100%|██████████| 29/29 [00:01<00:00, 27.34it/s, epoch: 3/40 - train_loss:0.33160045742988586 train_accuracy:0.8857758641242981 train_auc_1:0.6996316909790039 val_loss:0.3565421402454376 val_accuracy:0.8729282021522522 val_auc_1:0.6939350962638855 ]
100%|██████████| 29/29 [00:01<00:00, 26.56it/s, epoch: 4/40 - train_loss:0.3203859031200409 train_accuracy:0.8879978060722351 train_auc_1:0.7228385210037231 val_loss:0.34700706601142883 val_accuracy:0.8729282021522522 val_auc_1:0.7337919473648071 ]
100%|██████████| 29/29 [00:01<00:00, 27.51it/s, epoch: 5/40 - train_loss:0.29412248730659485 train_accuracy:0.8941271305084229 train_auc_1:0.7773510217666626 val_loss:0.3330070972442627 val_accuracy:0.8729282021522522 val_auc_1:0.7747275233268738 ]
100%|██████████| 29/29 [00:01<00:00, 26.64it/s, epoch: 6/40 - train_loss:0.284542053937912 train_accuracy:0.8840909004211426 train_auc_1:0.8406177759170532 val_loss:0.3239462077617645 val_accuracy:0.8729282021522522 val_auc_1:0.8016015291213989 ]
100%|██████████| 29/29 [00:01<00:00, 28.29it/s, epoch: 7/40 - train_loss:0.2678506672382355 train_accuracy:0.8896570801734924 train_auc_1:0.8536190986633301 val_loss:0.2891094386577606 val_accuracy:0.8773480653762817 val_auc_1:0.8471270799636841 ]
100%|██████████| 29/29 [00:01<00:00, 27.18it/s, epoch: 8/40 - train_loss:0.24808000028133392 train_accuracy:0.9051136374473572 train_auc_1:0.8591896891593933 val_loss:0.29273611307144165 val_accuracy:0.8718231916427612 val_auc_1:0.8348816633224487 ]
100%|██████████| 29/29 [00:01<00:00, 26.80it/s, epoch: 9/40 - train_loss:0.252238392829895 train_accuracy:0.8917025923728943 train_auc_1:0.8791320323944092 val_loss:0.2933768332004547 val_accuracy:0.8773480653762817 val_auc_1:0.8507869243621826 ]
100%|██████████| 29/29 [00:01<00:00, 27.14it/s, epoch: 10/40 - train_loss:0.24061305820941925 train_accuracy:0.8982300758361816 train_auc_1:0.8796927332878113 val_loss:0.28491833806037903 val_accuracy:0.8784530162811279 val_auc_1:0.843962550163269 ]
100%|██████████| 29/29 [00:01<00:00, 27.45it/s, epoch: 11/40 - train_loss:0.24184739589691162 train_accuracy:0.9035560488700867 train_auc_1:0.8807085752487183 val_loss:0.29128599166870117 val_accuracy:0.8773480653762817 val_auc_1:0.855217456817627 ]
100%|██████████| 29/29 [00:01<00:00, 28.02it/s, epoch: 12/40 - train_loss:0.24324959516525269 train_accuracy:0.8994318246841431 train_auc_1:0.8777059316635132 val_loss:0.2829277217388153 val_accuracy:0.8762431144714355 val_auc_1:0.8485085368156433 ]
100%|██████████| 29/29 [00:01<00:00, 28.26it/s, epoch: 13/40 - train_loss:0.24976494908332825 train_accuracy:0.9017045497894287 train_auc_1:0.8789190649986267 val_loss:0.3205900490283966 val_accuracy:0.8762431144714355 val_auc_1:0.8470830917358398 ]
100%|██████████| 29/29 [00:01<00:00, 27.93it/s, epoch: 14/40 - train_loss:0.25038978457450867 train_accuracy:0.9027478694915771 train_auc_1:0.87300044298172 val_loss:0.2858302593231201 val_accuracy:0.8817679286003113 val_auc_1:0.8585360646247864 ]
100%|██████████| 29/29 [00:01<00:00, 28.00it/s, epoch: 15/40 - train_loss:0.22572343051433563 train_accuracy:0.9065194129943848 train_auc_1:0.8839739561080933 val_loss:0.27690911293029785 val_accuracy:0.8795580267906189 val_auc_1:0.8565822839736938 ]
100%|██████████| 29/29 [00:01<00:00, 26.32it/s, epoch: 16/40 - train_loss:0.21374236047267914 train_accuracy:0.9098557829856873 train_auc_1:0.8846680521965027 val_loss:0.28506413102149963 val_accuracy:0.8817679286003113 val_auc_1:0.8479802012443542 ]
100%|██████████| 29/29 [00:01<00:00, 27.78it/s, epoch: 17/40 - train_loss:0.2231924682855606 train_accuracy:0.9081858396530151 train_auc_1:0.8896657228469849 val_loss:0.28099507093429565 val_accuracy:0.8817679286003113 val_auc_1:0.8554320335388184 ]
100%|██████████| 29/29 [00:01<00:00, 27.45it/s, epoch: 18/40 - train_loss:0.21955524384975433 train_accuracy:0.9089439511299133 train_auc_1:0.8989342451095581 val_loss:0.2811129093170166 val_accuracy:0.8795580267906189 val_auc_1:0.8558282852172852 ]
100%|██████████| 29/29 [00:01<00:00, 28.06it/s, epoch: 19/40 - train_loss:0.2427460104227066 train_accuracy:0.90625 train_auc_1:0.8821660280227661 val_loss:0.2881392240524292 val_accuracy:0.8762431144714355 val_auc_1:0.853461742401123 ]
100%|██████████| 29/29 [00:01<00:00, 27.76it/s, epoch: 20/40 - train_loss:0.22428719699382782 train_accuracy:0.9095686078071594 train_auc_1:0.8950278759002686 val_loss:0.27665039896965027 val_accuracy:0.8773480653762817 val_auc_1:0.859268069267273 ]
100%|██████████| 29/29 [00:01<00:00, 28.75it/s, epoch: 21/40 - train_loss:0.23987431824207306 train_accuracy:0.8982300758361816 train_auc_1:0.8865035772323608 val_loss:0.2787191569805145 val_accuracy:0.8795580267906189 val_auc_1:0.8540340662002563 ]
100%|██████████| 29/29 [00:01<00:00, 28.38it/s, epoch: 22/40 - train_loss:0.23535579442977905 train_accuracy:0.9059734344482422 train_auc_1:0.868804931640625 val_loss:0.2837145924568176 val_accuracy:0.8850829005241394 val_auc_1:0.852718710899353 ]
100%|██████████| 29/29 [00:01<00:00, 26.92it/s, epoch: 23/40 - train_loss:0.22587361931800842 train_accuracy:0.9102272987365723 train_auc_1:0.9052188396453857 val_loss:0.31282860040664673 val_accuracy:0.8751381039619446 val_auc_1:0.8482223749160767 ]
100%|██████████| 29/29 [00:01<00:00, 28.01it/s, epoch: 24/40 - train_loss:0.23454731702804565 train_accuracy:0.9004424810409546 train_auc_1:0.8923315405845642 val_loss:0.2845218777656555 val_accuracy:0.8817679286003113 val_auc_1:0.8535387516021729 ]
100%|██████████| 29/29 [00:01<00:00, 27.66it/s, epoch: 25/40 - train_loss:0.21877069771289825 train_accuracy:0.9135237336158752 train_auc_1:0.8983669877052307 val_loss:0.2817881405353546 val_accuracy:0.8872928023338318 val_auc_1:0.8562355637550354 ]
100%|██████████| 29/29 [00:01<00:00, 28.56it/s, epoch: 26/40 - train_loss:0.23317019641399384 train_accuracy:0.9047897458076477 train_auc_1:0.8926054835319519 val_loss:0.2959703505039215 val_accuracy:0.8806629776954651 val_auc_1:0.8549916744232178 ]
100%|██████████| 29/29 [00:01<00:00, 27.97it/s, epoch: 27/40 - train_loss:0.22137722373008728 train_accuracy:0.9076327681541443 train_auc_1:0.9086238145828247 val_loss:0.2853763699531555 val_accuracy:0.8806629776954651 val_auc_1:0.8527958989143372 ]
100%|██████████| 29/29 [00:00<00:00, 30.42it/s, epoch: 28/40 - train_loss:0.21273373067378998 train_accuracy:0.915409505367279 train_auc_1:0.9035125970840454 val_loss:0.2835741639137268 val_accuracy:0.8773480653762817 val_auc_1:0.8541442155838013 ]
100%|██████████| 29/29 [00:00<00:00, 30.00it/s, epoch: 29/40 - train_loss:0.2242117077112198 train_accuracy:0.9054203629493713 train_auc_1:0.9061750173568726 val_loss:0.28261181712150574 val_accuracy:0.8806629776954651 val_auc_1:0.8538470268249512 ]
100%|██████████| 29/29 [00:01<00:00, 28.50it/s, epoch: 30/40 - train_loss:0.23390451073646545 train_accuracy:0.90625 train_auc_1:0.8956843614578247 val_loss:0.2910405099391937 val_accuracy:0.8773480653762817 val_auc_1:0.856081485748291 ]
100%|██████████| 29/29 [00:01<00:00, 28.54it/s, epoch: 31/40 - train_loss:0.21303458511829376 train_accuracy:0.9097546935081482 train_auc_1:0.9092994928359985 val_loss:0.2907060384750366 val_accuracy:0.8795580267906189 val_auc_1:0.854667067527771 ]
100%|██████████| 29/29 [00:01<00:00, 28.50it/s, epoch: 32/40 - train_loss:0.2017047256231308 train_accuracy:0.9213067889213562 train_auc_1:0.9215606451034546 val_loss:0.2792257070541382 val_accuracy:0.8784530162811279 val_auc_1:0.8622398972511292 ]
100%|██████████| 29/29 [00:01<00:00, 27.78it/s, epoch: 33/40 - train_loss:0.21978729963302612 train_accuracy:0.90625 train_auc_1:0.9032835364341736 val_loss:0.2790915071964264 val_accuracy:0.8828729391098022 val_auc_1:0.8596147894859314 ]
100%|██████████| 29/29 [00:01<00:00, 28.67it/s, epoch: 34/40 - train_loss:0.2279532253742218 train_accuracy:0.900053858757019 train_auc_1:0.9082023501396179 val_loss:0.28917449712753296 val_accuracy:0.8850829005241394 val_auc_1:0.8572481870651245 ]
100%|██████████| 29/29 [00:01<00:00, 28.98it/s, epoch: 35/40 - train_loss:0.20137761533260345 train_accuracy:0.9197198152542114 train_auc_1:0.9088310599327087 val_loss:0.2941972017288208 val_accuracy:0.8839778900146484 val_auc_1:0.860407292842865 ]
100%|██████████| 29/29 [00:01<00:00, 27.61it/s, epoch: 36/40 - train_loss:0.21837779879570007 train_accuracy:0.9113685488700867 train_auc_1:0.9138096570968628 val_loss:0.31745481491088867 val_accuracy:0.8795580267906189 val_auc_1:0.8539350032806396 ]
100%|██████████| 29/29 [00:01<00:00, 27.85it/s, epoch: 37/40 - train_loss:0.22294917702674866 train_accuracy:0.9103982448577881 train_auc_1:0.9038949012756348 val_loss:0.28447607159614563 val_accuracy:0.8806629776954651 val_auc_1:0.8571106195449829 ]
100%|██████████| 29/29 [00:01<00:00, 28.83it/s, epoch: 38/40 - train_loss:0.21166759729385376 train_accuracy:0.9135237336158752 train_auc_1:0.922845184803009 val_loss:0.3054659366607666 val_accuracy:0.8872928023338318 val_auc_1:0.8537644147872925 ]
100%|██████████| 29/29 [00:00<00:00, 29.38it/s, epoch: 39/40 - train_loss:0.21334536373615265 train_accuracy:0.9162057638168335 train_auc_1:0.9161925315856934 val_loss:0.30080586671829224 val_accuracy:0.8872928023338318 val_auc_1:0.8560484647750854 ]
100%|██████████| 29/29 [00:01<00:00, 27.09it/s, epoch: 40/40 - train_loss:0.21065743267536163 train_accuracy:0.9156249761581421 train_auc_1:0.9142652750015259 val_loss:0.28072354197502136 val_accuracy:0.8872928023338318 val_auc_1:0.8580352067947388 ]
[1.012537411848704, 0.7321080048878987]
[6]:
import matplotlib.pyplot as plt
for history in histories:
plt.plot(history['train_auc_1'])
plt.plot(history['val_auc_1'])
plt.title('Model Area Under Curve')
plt.ylabel('Area Under Curve')
plt.xlabel('Epoch')
plt.legend(
['origin_train', 'origin_val', 'pipeline_train', 'pipeline_val'], loc='lower right'
)
plt.show()
可以看到,两个模型的验证集 auc 均在 0.85 左右波动,使用流水线并行对此任务的训练精度影响不大,而训练时间由 0.76 分钟下降到 0.65 分钟。
总结#
本篇示例介绍了隐语拆分学习中流水线并行的使用方法。与一般拆分学习的使用方法一样,只需在定义 SLModel 时指定 strategy=‘pipeline’ 并设置 pipeline_num 即可。