SplitRec:在隐语拆分学习中使用通信压缩#
以下代码仅作为示例,请勿在生产环境直接使用。
本示例基于基于“拆分学习:银行营销”教程制作,建议先观看那个教程。
在拆分学习中,由于模型被拆分在多个设备当中,进行训练的时候,各方需要对特征和梯度进行多次传输,带来很高的网络通讯消耗。为了减少通讯过程中的数据量,可以进行一些压缩处理。
SecretFlow提供了Compressor对拆分学习中的数据进行压缩。同时也提供了多种基类,可以在此基础上实现自己的压缩算法。
下面我们来试试一些算法的可用性,首先,我们在secretflow环境中创造2个实体alice和bob。
[1]:
import secretflow as sf
sf.shutdown()
sf.init(['alice', 'bob'], address='local')
alice, bob = sf.PYU('alice'), sf.PYU('bob')
2023-08-16 01:43:59,294 WARNING services.py:1732 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=3.92gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-08-16 01:43:59,444 INFO worker.py:1538 -- Started a local Ray instance.
接下来我们准备要学习的数据。
我们使用“拆分学习:银行营销”中的数据准备和处理方法,下载银行营销数据集并进行处理。alice和bob的角色和之前的教程完全相同:
[2]:
from secretflow.utils.simulation.datasets import load_bank_marketing
from secretflow.preprocessing.scaler import MinMaxScaler
from secretflow.preprocessing.encoder import LabelEncoder
from secretflow.data.split import train_test_split
random_state = 1234
data = load_bank_marketing(parts={alice: (0, 4), bob: (4, 16)}, axis=1)
label = load_bank_marketing(parts={alice: (16, 17)}, axis=1)
encoder = LabelEncoder()
data['job'] = encoder.fit_transform(data['job'])
data['marital'] = encoder.fit_transform(data['marital'])
data['education'] = encoder.fit_transform(data['education'])
data['default'] = encoder.fit_transform(data['default'])
data['housing'] = encoder.fit_transform(data['housing'])
data['loan'] = encoder.fit_transform(data['loan'])
data['contact'] = encoder.fit_transform(data['contact'])
data['poutcome'] = encoder.fit_transform(data['poutcome'])
data['month'] = encoder.fit_transform(data['month'])
label = encoder.fit_transform(label)
scaler = MinMaxScaler()
data = scaler.fit_transform(data)
train_data, test_data = train_test_split(
data, train_size=0.8, random_state=random_state
)
train_label, test_label = train_test_split(
label, train_size=0.8, random_state=random_state
)
(_run pid=27337) /usr/local/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=27337) warnings.warn(
(_run pid=27337) /usr/local/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=27337) warnings.warn(
接下来我们创建联邦模型,同样地,我们使用“拆分学习:银行营销”中的建模,构建出base_model和fuse_model,然后就可以定义SLModel用于训练:
[3]:
def create_base_model(input_dim, output_dim, name='base_model'):
# Create model
def create_model():
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf
model = keras.Sequential(
[
keras.Input(shape=input_dim),
layers.Dense(100, activation="relu"),
layers.Dense(output_dim, activation="relu"),
]
)
# Compile model
model.summary()
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy", tf.keras.metrics.AUC()],
)
return model
return create_model
# prepare model
hidden_size = 64
model_base_alice = create_base_model(4, hidden_size)
model_base_bob = create_base_model(12, hidden_size)
def create_fuse_model(input_dim, output_dim, party_nums, name='fuse_model'):
def create_model():
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf
# input
input_layers = []
for i in range(party_nums):
input_layers.append(
keras.Input(
input_dim,
)
)
merged_layer = layers.concatenate(input_layers)
fuse_layer = layers.Dense(64, activation='relu')(merged_layer)
output = layers.Dense(output_dim, activation='sigmoid')(fuse_layer)
model = keras.Model(inputs=input_layers, outputs=output)
model.summary()
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy", tf.keras.metrics.AUC()],
)
return model
return create_model
model_fuse = create_fuse_model(input_dim=hidden_size, party_nums=2, output_dim=1)
base_model_dict = {alice: model_base_alice, bob: model_base_bob}
from secretflow.ml.nn import SLModel
sl_model_origin = SLModel(
base_model_dict=base_model_dict,
device_y=alice,
model_fuse=model_fuse,
)
2023-08-16 01:44:03.512175: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-08-16 01:44:04.209189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-08-16 01:44:04.209381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-08-16 01:44:04.209397: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.
使用通讯压缩算法#
SecretFlow提供了Compressor,里面实现了各种基础的通讯压缩算法,可以直接使用。
只要导入想使用的压缩算法并实例化,定义SLModel时将实例化的方法作为参数传入就可以在训练中实现通讯压缩。
我们以QuantizedFP为例,该算法会将浮点数量化到8位以降低传输消耗。
[4]:
from secretflow.utils.compressor import QuantizedFP
qfp = QuantizedFP()
sl_model_compress = SLModel(
base_model_dict=base_model_dict,
device_y=alice,
model_fuse=model_fuse,
compressor=qfp, # 在这里传入实例化的compressor算法
)
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.
我们分别对没有使用通讯压缩的模型和使用了量化压缩的模型进行训练,并把训练轮次拉高到40轮,看看效果如何。
[5]:
histories = []
for sl_model in [sl_model_origin, sl_model_compress]:
history = sl_model.fit(
train_data,
train_label,
validation_data=(test_data, test_label),
epochs=40,
batch_size=128,
shuffle=True,
verbose=1,
validation_freq=1,
)
histories.append(history)
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4db5310>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de31c0>)}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675a60>)}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f3769f921f0>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de3190>)}, aligned=True), VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675dc0>)}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 11819, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f37ec6ec6d0>}
(pid=28114) 2023-08-16 01:44:08.296739: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28127) 2023-08-16 01:44:08.551930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28181) 2023-08-16 01:44:08.767248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28235) 2023-08-16 01:44:09.014466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28114) 2023-08-16 01:44:09.160525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28114) 2023-08-16 01:44:09.160694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28114) 2023-08-16 01:44:09.160713: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=28127) 2023-08-16 01:44:09.418021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28127) 2023-08-16 01:44:09.418136: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28127) 2023-08-16 01:44:09.418152: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=28181) 2023-08-16 01:44:09.654066: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28181) 2023-08-16 01:44:09.654235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28181) 2023-08-16 01:44:09.654257: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=28235) 2023-08-16 01:44:09.871219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28235) 2023-08-16 01:44:09.871317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28235) 2023-08-16 01:44:09.871333: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(PYUSLTFModel pid=28114) 2023-08-16 01:44:11.224977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28114) 2023-08-16 01:44:11.225041: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28114) Model: "sequential"
(PYUSLTFModel pid=28114) _________________________________________________________________
(PYUSLTFModel pid=28114) Layer (type) Output Shape Param #
(PYUSLTFModel pid=28114) =================================================================
(PYUSLTFModel pid=28114) dense (Dense) (None, 100) 500
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) =================================================================
(PYUSLTFModel pid=28114) Total params: 6,964
(PYUSLTFModel pid=28114) Trainable params: 6,964
(PYUSLTFModel pid=28114) Non-trainable params: 0
(PYUSLTFModel pid=28114) _________________________________________________________________
(PYUSLTFModel pid=28114) Model: "model"
(PYUSLTFModel pid=28114) __________________________________________________________________________________________________
(PYUSLTFModel pid=28114) Layer (type) Output Shape Param # Connected to
(PYUSLTFModel pid=28114) ==================================================================================================
(PYUSLTFModel pid=28114) input_2 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) input_3 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) concatenate (Concatenate) (None, 128) 0 ['input_2[0][0]',
(PYUSLTFModel pid=28114) 'input_3[0][0]']
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) dense_2 (Dense) (None, 64) 8256 ['concatenate[0][0]']
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) dense_3 (Dense) (None, 1) 65 ['dense_2[0][0]']
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) ==================================================================================================
(PYUSLTFModel pid=28114) Total params: 8,321
(PYUSLTFModel pid=28114) Trainable params: 8,321
(PYUSLTFModel pid=28114) Non-trainable params: 0
(PYUSLTFModel pid=28114) __________________________________________________________________________________________________
(PYUSLTFModel pid=28127) 2023-08-16 01:44:11.487105: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28127) 2023-08-16 01:44:11.487150: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28127) Model: "sequential"
(PYUSLTFModel pid=28127) _________________________________________________________________
(PYUSLTFModel pid=28127) Layer (type) Output Shape Param #
(PYUSLTFModel pid=28127) =================================================================
(PYUSLTFModel pid=28127) dense (Dense) (None, 100) 1300
(PYUSLTFModel pid=28127)
(PYUSLTFModel pid=28127) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=28127)
(PYUSLTFModel pid=28127) =================================================================
(PYUSLTFModel pid=28127) Total params: 7,764
(PYUSLTFModel pid=28127) Trainable params: 7,764
(PYUSLTFModel pid=28127) Non-trainable params: 0
(PYUSLTFModel pid=28127) _________________________________________________________________
0%| | 0/29 [00:00<?, ?it/s](PYUSLTFModel pid=28181) 2023-08-16 01:44:11.750458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28181) 2023-08-16 01:44:11.750499: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28181) Model: "sequential"
(PYUSLTFModel pid=28181) _________________________________________________________________
(PYUSLTFModel pid=28181) Layer (type) Output Shape Param #
(PYUSLTFModel pid=28181) =================================================================
(PYUSLTFModel pid=28181) dense (Dense) (None, 100) 500
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) =================================================================
(PYUSLTFModel pid=28181) Total params: 6,964
(PYUSLTFModel pid=28181) Trainable params: 6,964
(PYUSLTFModel pid=28181) Non-trainable params: 0
(PYUSLTFModel pid=28181) _________________________________________________________________
(PYUSLTFModel pid=28181) Model: "model"
(PYUSLTFModel pid=28181) __________________________________________________________________________________________________
(PYUSLTFModel pid=28181) Layer (type) Output Shape Param # Connected to
(PYUSLTFModel pid=28181) ==================================================================================================
(PYUSLTFModel pid=28181) input_2 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) input_3 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) concatenate (Concatenate) (None, 128) 0 ['input_2[0][0]',
(PYUSLTFModel pid=28181) 'input_3[0][0]']
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) dense_2 (Dense) (None, 64) 8256 ['concatenate[0][0]']
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) dense_3 (Dense) (None, 1) 65 ['dense_2[0][0]']
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) ==================================================================================================
(PYUSLTFModel pid=28181) Total params: 8,321
(PYUSLTFModel pid=28181) Trainable params: 8,321
(PYUSLTFModel pid=28181) Non-trainable params: 0
(PYUSLTFModel pid=28181) __________________________________________________________________________________________________
(PYUSLTFModel pid=28235) 2023-08-16 01:44:11.944846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28235) 2023-08-16 01:44:11.944886: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28235) Model: "sequential"
(PYUSLTFModel pid=28235) _________________________________________________________________
(PYUSLTFModel pid=28235) Layer (type) Output Shape Param #
(PYUSLTFModel pid=28235) =================================================================
(PYUSLTFModel pid=28235) dense (Dense) (None, 100) 1300
(PYUSLTFModel pid=28235)
(PYUSLTFModel pid=28235) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=28235)
(PYUSLTFModel pid=28235) =================================================================
(PYUSLTFModel pid=28235) Total params: 7,764
(PYUSLTFModel pid=28235) Trainable params: 7,764
(PYUSLTFModel pid=28235) Non-trainable params: 0
(PYUSLTFModel pid=28235) _________________________________________________________________
2023-08-16 01:44:12.766822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-08-16 01:44:12.766872: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
7%|▋ | 2/29 [00:01<00:15, 1.71it/s](_run pid=27349) 2023-08-16 01:44:13.072502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:13.779506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:13.779666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:13.779683: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=27349) 2023-08-16 01:44:15.482702: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:15.482741: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:05<00:00, 4.90it/s, epoch: 1/40 - train_loss:0.4123900532722473 train_accuracy:0.8816371560096741 train_auc_1:0.5304562449455261 val_loss:0.3779788911342621 val_accuracy:0.8729282021522522 val_auc_1:0.6028343439102173 ]
100%|██████████| 29/29 [00:00<00:00, 29.61it/s, epoch: 2/40 - train_loss:0.346932053565979 train_accuracy:0.8819137215614319 train_auc_1:0.6658823490142822 val_loss:0.36548909544944763 val_accuracy:0.8729282021522522 val_auc_1:0.6796367764472961 ]
100%|██████████| 29/29 [00:00<00:00, 29.64it/s, epoch: 3/40 - train_loss:0.3372674584388733 train_accuracy:0.8816371560096741 train_auc_1:0.7067811489105225 val_loss:0.35295170545578003 val_accuracy:0.8729282021522522 val_auc_1:0.7270445823669434 ]
90%|████████▉ | 26/29 [00:00<00:00, 29.28it/s](_run pid=27337) 2023-08-16 01:44:20.568055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:21.249659: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:21.249744: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:21.249758: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=27337) 2023-08-16 01:44:22.997208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:22.997268: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:03<00:00, 7.81it/s, epoch: 4/40 - train_loss:0.32340219616889954 train_accuracy:0.8769886493682861 train_auc_1:0.7885534763336182 val_loss:0.33063772320747375 val_accuracy:0.8729282021522522 val_auc_1:0.7887451648712158 ]
100%|██████████| 29/29 [00:00<00:00, 30.04it/s, epoch: 5/40 - train_loss:0.26995792984962463 train_accuracy:0.8907632827758789 train_auc_1:0.838128924369812 val_loss:0.3076745867729187 val_accuracy:0.870718240737915 val_auc_1:0.8215299844741821 ]
100%|██████████| 29/29 [00:00<00:00, 29.33it/s, epoch: 6/40 - train_loss:0.2533552348613739 train_accuracy:0.8924225568771362 train_auc_1:0.8705887794494629 val_loss:0.29530927538871765 val_accuracy:0.8773480653762817 val_auc_1:0.8317886590957642 ]
100%|██████████| 29/29 [00:00<00:00, 30.32it/s, epoch: 7/40 - train_loss:0.24668139219284058 train_accuracy:0.8990597128868103 train_auc_1:0.8558321595191956 val_loss:0.28804725408554077 val_accuracy:0.8839778900146484 val_auc_1:0.8480352163314819 ]
100%|██████████| 29/29 [00:00<00:00, 29.50it/s, epoch: 8/40 - train_loss:0.23031719028949738 train_accuracy:0.9137167930603027 train_auc_1:0.8611728549003601 val_loss:0.3139592111110687 val_accuracy:0.8762431144714355 val_auc_1:0.846015453338623 ]
100%|██████████| 29/29 [00:00<00:00, 30.08it/s, epoch: 9/40 - train_loss:0.23515202105045319 train_accuracy:0.900053858757019 train_auc_1:0.8805654048919678 val_loss:0.28104230761528015 val_accuracy:0.8795580267906189 val_auc_1:0.8522399663925171 ]
100%|██████████| 29/29 [00:01<00:00, 26.20it/s, epoch: 10/40 - train_loss:0.241227924823761 train_accuracy:0.9048295617103577 train_auc_1:0.8812973499298096 val_loss:0.2837042808532715 val_accuracy:0.8762431144714355 val_auc_1:0.8458613157272339 ]
100%|██████████| 29/29 [00:00<00:00, 29.39it/s, epoch: 11/40 - train_loss:0.24319183826446533 train_accuracy:0.8978987336158752 train_auc_1:0.8969712257385254 val_loss:0.2814089357852936 val_accuracy:0.8806629776954651 val_auc_1:0.8504238128662109 ]
100%|██████████| 29/29 [00:00<00:00, 29.38it/s, epoch: 12/40 - train_loss:0.23649084568023682 train_accuracy:0.9022090435028076 train_auc_1:0.8886930346488953 val_loss:0.2808994650840759 val_accuracy:0.8806629776954651 val_auc_1:0.8507044315338135 ]
100%|██████████| 29/29 [00:00<00:00, 29.70it/s, epoch: 13/40 - train_loss:0.2257165014743805 train_accuracy:0.912446141242981 train_auc_1:0.8892974853515625 val_loss:0.2844206690788269 val_accuracy:0.8817679286003113 val_auc_1:0.8516015410423279 ]
100%|██████████| 29/29 [00:00<00:00, 29.98it/s, epoch: 14/40 - train_loss:0.2239973098039627 train_accuracy:0.9110991358757019 train_auc_1:0.8925117254257202 val_loss:0.27834010124206543 val_accuracy:0.8828729391098022 val_auc_1:0.8576555252075195 ]
100%|██████████| 29/29 [00:00<00:00, 30.14it/s, epoch: 15/40 - train_loss:0.22855830192565918 train_accuracy:0.9065265655517578 train_auc_1:0.9059909582138062 val_loss:0.27655595541000366 val_accuracy:0.8828729391098022 val_auc_1:0.8548376560211182 ]
100%|██████████| 29/29 [00:00<00:00, 29.99it/s, epoch: 16/40 - train_loss:0.23442411422729492 train_accuracy:0.8992456793785095 train_auc_1:0.8952087759971619 val_loss:0.29822733998298645 val_accuracy:0.8773480653762817 val_auc_1:0.8496862649917603 ]
100%|██████████| 29/29 [00:00<00:00, 30.22it/s, epoch: 17/40 - train_loss:0.22274373471736908 train_accuracy:0.9148706793785095 train_auc_1:0.8883383870124817 val_loss:0.2906903922557831 val_accuracy:0.8795580267906189 val_auc_1:0.8574408292770386 ]
100%|██████████| 29/29 [00:00<00:00, 30.02it/s, epoch: 18/40 - train_loss:0.23235483467578888 train_accuracy:0.908462405204773 train_auc_1:0.8890659809112549 val_loss:0.2833332121372223 val_accuracy:0.8784530162811279 val_auc_1:0.853417694568634 ]
100%|██████████| 29/29 [00:00<00:00, 30.11it/s, epoch: 19/40 - train_loss:0.21570773422718048 train_accuracy:0.9125000238418579 train_auc_1:0.9087664484977722 val_loss:0.28136932849884033 val_accuracy:0.8773480653762817 val_auc_1:0.852614164352417 ]
100%|██████████| 29/29 [00:00<00:00, 30.17it/s, epoch: 20/40 - train_loss:0.22992058098316193 train_accuracy:0.9043141603469849 train_auc_1:0.9015873670578003 val_loss:0.2777860164642334 val_accuracy:0.8861878514289856 val_auc_1:0.8568739891052246 ]
100%|██████████| 29/29 [00:00<00:00, 29.56it/s, epoch: 21/40 - train_loss:0.2279340922832489 train_accuracy:0.9051437973976135 train_auc_1:0.8971817493438721 val_loss:0.2807583212852478 val_accuracy:0.8751381039619446 val_auc_1:0.8563731908798218 ]
100%|██████████| 29/29 [00:00<00:00, 30.77it/s, epoch: 22/40 - train_loss:0.21565255522727966 train_accuracy:0.9123831987380981 train_auc_1:0.9007200002670288 val_loss:0.2829255759716034 val_accuracy:0.8861878514289856 val_auc_1:0.8530654907226562 ]
100%|██████████| 29/29 [00:01<00:00, 28.86it/s, epoch: 23/40 - train_loss:0.2037818878889084 train_accuracy:0.9172952771186829 train_auc_1:0.9117729067802429 val_loss:0.283769816160202 val_accuracy:0.8806629776954651 val_auc_1:0.8584370613098145 ]
100%|██████████| 29/29 [00:00<00:00, 29.94it/s, epoch: 24/40 - train_loss:0.20522674918174744 train_accuracy:0.9189712405204773 train_auc_1:0.9200270175933838 val_loss:0.2849152088165283 val_accuracy:0.8872928023338318 val_auc_1:0.8554760217666626 ]
100%|██████████| 29/29 [00:00<00:00, 29.87it/s, epoch: 25/40 - train_loss:0.21887396275997162 train_accuracy:0.9092133641242981 train_auc_1:0.8964812755584717 val_loss:0.28434526920318604 val_accuracy:0.8817679286003113 val_auc_1:0.8562740087509155 ]
100%|██████████| 29/29 [00:00<00:00, 30.06it/s, epoch: 26/40 - train_loss:0.21427837014198303 train_accuracy:0.9110991358757019 train_auc_1:0.9081460237503052 val_loss:0.2802579998970032 val_accuracy:0.8806629776954651 val_auc_1:0.8563676476478577 ]
100%|██████████| 29/29 [00:00<00:00, 29.95it/s, epoch: 27/40 - train_loss:0.21740949153900146 train_accuracy:0.9131637215614319 train_auc_1:0.900717556476593 val_loss:0.288614422082901 val_accuracy:0.8795580267906189 val_auc_1:0.8571600914001465 ]
90%|████████▉ | 26/29 [00:00<00:00, 30.69it/s](_run pid=27350) 2023-08-16 01:44:46.829048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:47.528208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:47.528303: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:47.528317: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=27350) 2023-08-16 01:44:49.224472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:49.224513: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:03<00:00, 7.90it/s, epoch: 28/40 - train_loss:0.20581252872943878 train_accuracy:0.9156526327133179 train_auc_1:0.9221852421760559 val_loss:0.2895594835281372 val_accuracy:0.8861878514289856 val_auc_1:0.853269100189209 ]
100%|██████████| 29/29 [00:00<00:00, 29.86it/s, epoch: 29/40 - train_loss:0.21550066769123077 train_accuracy:0.9065265655517578 train_auc_1:0.9204001426696777 val_loss:0.2893003225326538 val_accuracy:0.8773480653762817 val_auc_1:0.8543698191642761 ]
100%|██████████| 29/29 [00:00<00:00, 30.12it/s, epoch: 30/40 - train_loss:0.2129545956850052 train_accuracy:0.9103982448577881 train_auc_1:0.9062888622283936 val_loss:0.2805260717868805 val_accuracy:0.8861878514289856 val_auc_1:0.857974648475647 ]
100%|██████████| 29/29 [00:00<00:00, 30.07it/s, epoch: 31/40 - train_loss:0.21476531028747559 train_accuracy:0.9113636612892151 train_auc_1:0.9071635007858276 val_loss:0.28486552834510803 val_accuracy:0.8861878514289856 val_auc_1:0.8532305955886841 ]
100%|██████████| 29/29 [00:00<00:00, 30.47it/s, epoch: 32/40 - train_loss:0.21274054050445557 train_accuracy:0.9136363863945007 train_auc_1:0.9150363802909851 val_loss:0.28660014271736145 val_accuracy:0.8828729391098022 val_auc_1:0.8550137877464294 ]
100%|██████████| 29/29 [00:00<00:00, 29.68it/s, epoch: 33/40 - train_loss:0.19922088086605072 train_accuracy:0.9162057638168335 train_auc_1:0.925368070602417 val_loss:0.28454411029815674 val_accuracy:0.8839778900146484 val_auc_1:0.8589598536491394 ]
100%|██████████| 29/29 [00:01<00:00, 28.91it/s, epoch: 34/40 - train_loss:0.19305925071239471 train_accuracy:0.9264547228813171 train_auc_1:0.9245292544364929 val_loss:0.2927177846431732 val_accuracy:0.8850829005241394 val_auc_1:0.8632690906524658 ]
100%|██████████| 29/29 [00:00<00:00, 29.83it/s, epoch: 35/40 - train_loss:0.18927669525146484 train_accuracy:0.9245793223381042 train_auc_1:0.9269171953201294 val_loss:0.28897616267204285 val_accuracy:0.8839778900146484 val_auc_1:0.8587452173233032 ]
100%|██████████| 29/29 [00:01<00:00, 28.43it/s, epoch: 36/40 - train_loss:0.20300477743148804 train_accuracy:0.917640209197998 train_auc_1:0.921332836151123 val_loss:0.2856888175010681 val_accuracy:0.8817679286003113 val_auc_1:0.8576774597167969 ]
100%|██████████| 29/29 [00:01<00:00, 28.36it/s, epoch: 37/40 - train_loss:0.21484363079071045 train_accuracy:0.9117809534072876 train_auc_1:0.9073271155357361 val_loss:0.28348052501678467 val_accuracy:0.8795580267906189 val_auc_1:0.8578591346740723 ]
100%|██████████| 29/29 [00:00<00:00, 29.46it/s, epoch: 38/40 - train_loss:0.2097211331129074 train_accuracy:0.9109513163566589 train_auc_1:0.9167930483818054 val_loss:0.28846046328544617 val_accuracy:0.8828729391098022 val_auc_1:0.8501706123352051 ]
100%|██████████| 29/29 [00:00<00:00, 30.17it/s, epoch: 39/40 - train_loss:0.211452916264534 train_accuracy:0.9207974076271057 train_auc_1:0.9071869254112244 val_loss:0.2857007682323456 val_accuracy:0.8850829005241394 val_auc_1:0.8534011840820312 ]
100%|██████████| 29/29 [00:00<00:00, 29.72it/s, epoch: 40/40 - train_loss:0.1950661838054657 train_accuracy:0.9189712405204773 train_auc_1:0.9238503575325012 val_loss:0.287675678730011 val_accuracy:0.8839778900146484 val_auc_1:0.8539405465126038 ]
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4db5310>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de31c0>)}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675a60>)}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f3769f921f0>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de3190>)}, aligned=True), VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675dc0>)}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 50480, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f37c4624cd0>}
100%|██████████| 29/29 [00:03<00:00, 7.41it/s, epoch: 1/40 - train_loss:0.4217776954174042 train_accuracy:0.8659462332725525 train_auc_1:0.5435447692871094 val_loss:0.40626364946365356 val_accuracy:0.8729282021522522 val_auc_1:0.5905393362045288 ]
100%|██████████| 29/29 [00:01<00:00, 15.59it/s, epoch: 2/40 - train_loss:0.3423333764076233 train_accuracy:0.8874446749687195 train_auc_1:0.6285374164581299 val_loss:0.3637339770793915 val_accuracy:0.8729282021522522 val_auc_1:0.670577883720398 ]
100%|██████████| 29/29 [00:01<00:00, 16.02it/s, epoch: 3/40 - train_loss:0.31453219056129456 train_accuracy:0.8949353694915771 train_auc_1:0.6967648267745972 val_loss:0.35318124294281006 val_accuracy:0.8729282021522522 val_auc_1:0.7181453108787537 ]
100%|██████████| 29/29 [00:01<00:00, 15.70it/s, epoch: 4/40 - train_loss:0.2924026548862457 train_accuracy:0.8968473672866821 train_auc_1:0.771354079246521 val_loss:0.3476685583591461 val_accuracy:0.8729282021522522 val_auc_1:0.7567088603973389 ]
100%|██████████| 29/29 [00:01<00:00, 15.96it/s, epoch: 5/40 - train_loss:0.3236430585384369 train_accuracy:0.8690732717514038 train_auc_1:0.8049758076667786 val_loss:0.32425957918167114 val_accuracy:0.8729282021522522 val_auc_1:0.8028783798217773 ]
100%|██████████| 29/29 [00:01<00:00, 15.77it/s, epoch: 6/40 - train_loss:0.2683410346508026 train_accuracy:0.8920454382896423 train_auc_1:0.8347899317741394 val_loss:0.3059132695198059 val_accuracy:0.8696132302284241 val_auc_1:0.8184260129928589 ]
100%|██████████| 29/29 [00:01<00:00, 15.99it/s, epoch: 7/40 - train_loss:0.24226166307926178 train_accuracy:0.9022727012634277 train_auc_1:0.850990891456604 val_loss:0.30843329429626465 val_accuracy:0.8729282021522522 val_auc_1:0.832201361656189 ]
100%|██████████| 29/29 [00:01<00:00, 15.66it/s, epoch: 8/40 - train_loss:0.23420202732086182 train_accuracy:0.9053977131843567 train_auc_1:0.8667846322059631 val_loss:0.2918694317340851 val_accuracy:0.8795580267906189 val_auc_1:0.8382883071899414 ]
100%|██████████| 29/29 [00:01<00:00, 15.87it/s, epoch: 9/40 - train_loss:0.24281850457191467 train_accuracy:0.8993362784385681 train_auc_1:0.8600778579711914 val_loss:0.28592929244041443 val_accuracy:0.8773480653762817 val_auc_1:0.8522564172744751 ]
100%|██████████| 29/29 [00:01<00:00, 16.07it/s, epoch: 10/40 - train_loss:0.25411662459373474 train_accuracy:0.8985795378684998 train_auc_1:0.8763052225112915 val_loss:0.27862876653671265 val_accuracy:0.8795580267906189 val_auc_1:0.8518161773681641 ]
100%|██████████| 29/29 [00:01<00:00, 15.97it/s, epoch: 11/40 - train_loss:0.2467927783727646 train_accuracy:0.9008620977401733 train_auc_1:0.8637750148773193 val_loss:0.27538853883743286 val_accuracy:0.8850829005241394 val_auc_1:0.8585635423660278 ]
100%|██████████| 29/29 [00:01<00:00, 15.81it/s, epoch: 12/40 - train_loss:0.24046260118484497 train_accuracy:0.9030172228813171 train_auc_1:0.8943703174591064 val_loss:0.2793208956718445 val_accuracy:0.8872928023338318 val_auc_1:0.8582884073257446 ]
100%|██████████| 29/29 [00:01<00:00, 15.74it/s, epoch: 13/40 - train_loss:0.2232421338558197 train_accuracy:0.9109513163566589 train_auc_1:0.9031308889389038 val_loss:0.27965837717056274 val_accuracy:0.8773480653762817 val_auc_1:0.857512354850769 ]
100%|██████████| 29/29 [00:01<00:00, 15.98it/s, epoch: 14/40 - train_loss:0.2226562350988388 train_accuracy:0.9120911359786987 train_auc_1:0.8835855722427368 val_loss:0.28520363569259644 val_accuracy:0.8806629776954651 val_auc_1:0.854595422744751 ]
100%|██████████| 29/29 [00:01<00:00, 15.90it/s, epoch: 15/40 - train_loss:0.23515889048576355 train_accuracy:0.904902994632721 train_auc_1:0.8961691856384277 val_loss:0.28021782636642456 val_accuracy:0.8850829005241394 val_auc_1:0.8563291430473328 ]
100%|██████████| 29/29 [00:01<00:00, 15.79it/s, epoch: 16/40 - train_loss:0.23402053117752075 train_accuracy:0.9024784564971924 train_auc_1:0.8906980752944946 val_loss:0.27909329533576965 val_accuracy:0.8850829005241394 val_auc_1:0.859708309173584 ]
100%|██████████| 29/29 [00:01<00:00, 15.93it/s, epoch: 17/40 - train_loss:0.2111150622367859 train_accuracy:0.9189712405204773 train_auc_1:0.8960785865783691 val_loss:0.27899590134620667 val_accuracy:0.8817679286003113 val_auc_1:0.8576114177703857 ]
100%|██████████| 29/29 [00:01<00:00, 15.81it/s, epoch: 18/40 - train_loss:0.20241659879684448 train_accuracy:0.915678858757019 train_auc_1:0.9157640933990479 val_loss:0.28282174468040466 val_accuracy:0.8784530162811279 val_auc_1:0.8583985567092896 ]
100%|██████████| 29/29 [00:01<00:00, 15.87it/s, epoch: 19/40 - train_loss:0.23259153962135315 train_accuracy:0.9071022868156433 train_auc_1:0.8956990242004395 val_loss:0.2828892171382904 val_accuracy:0.8817679286003113 val_auc_1:0.8546835780143738 ]
100%|██████████| 29/29 [00:01<00:00, 15.82it/s, epoch: 20/40 - train_loss:0.22440506517887115 train_accuracy:0.9034845232963562 train_auc_1:0.8989371657371521 val_loss:0.28058287501335144 val_accuracy:0.8784530162811279 val_auc_1:0.8598239421844482 ]
100%|██████████| 29/29 [00:01<00:00, 15.94it/s, epoch: 21/40 - train_loss:0.23205137252807617 train_accuracy:0.9051724076271057 train_auc_1:0.8899630308151245 val_loss:0.2741439938545227 val_accuracy:0.8795580267906189 val_auc_1:0.8636598587036133 ]
100%|██████████| 29/29 [00:01<00:00, 15.72it/s, epoch: 22/40 - train_loss:0.22656284272670746 train_accuracy:0.9030172228813171 train_auc_1:0.907919704914093 val_loss:0.2767719030380249 val_accuracy:0.8839778900146484 val_auc_1:0.8592514991760254 ]
100%|██████████| 29/29 [00:01<00:00, 15.98it/s, epoch: 23/40 - train_loss:0.22055070102214813 train_accuracy:0.9109228849411011 train_auc_1:0.913796067237854 val_loss:0.2815714180469513 val_accuracy:0.8928176760673523 val_auc_1:0.855701744556427 ]
100%|██████████| 29/29 [00:01<00:00, 15.60it/s, epoch: 24/40 - train_loss:0.23475250601768494 train_accuracy:0.9043141603469849 train_auc_1:0.9076265692710876 val_loss:0.2773815095424652 val_accuracy:0.8839778900146484 val_auc_1:0.8560759425163269 ]
100%|██████████| 29/29 [00:01<00:00, 16.07it/s, epoch: 25/40 - train_loss:0.2359710931777954 train_accuracy:0.9005681872367859 train_auc_1:0.9041743278503418 val_loss:0.28951746225357056 val_accuracy:0.8806629776954651 val_auc_1:0.8590589165687561 ]
100%|██████████| 29/29 [00:01<00:00, 15.76it/s, epoch: 26/40 - train_loss:0.21646590530872345 train_accuracy:0.9094827771186829 train_auc_1:0.9059643745422363 val_loss:0.27530720829963684 val_accuracy:0.8806629776954651 val_auc_1:0.8600990772247314 ]
100%|██████████| 29/29 [00:01<00:00, 15.82it/s, epoch: 27/40 - train_loss:0.21936063468456268 train_accuracy:0.9137930870056152 train_auc_1:0.9077043533325195 val_loss:0.2782182991504669 val_accuracy:0.8861878514289856 val_auc_1:0.8611392974853516 ]
100%|██████████| 29/29 [00:01<00:00, 16.03it/s, epoch: 28/40 - train_loss:0.21766482293605804 train_accuracy:0.9098451137542725 train_auc_1:0.9155865907669067 val_loss:0.2878170311450958 val_accuracy:0.8806629776954651 val_auc_1:0.8582608103752136 ]
100%|██████████| 29/29 [00:01<00:00, 15.72it/s, epoch: 29/40 - train_loss:0.2088153064250946 train_accuracy:0.9126105904579163 train_auc_1:0.9115303754806519 val_loss:0.28278136253356934 val_accuracy:0.8817679286003113 val_auc_1:0.8566923141479492 ]
100%|██████████| 29/29 [00:01<00:00, 15.74it/s, epoch: 30/40 - train_loss:0.2089204490184784 train_accuracy:0.9156526327133179 train_auc_1:0.9117385149002075 val_loss:0.2774920165538788 val_accuracy:0.8861878514289856 val_auc_1:0.8575839996337891 ]
100%|██████████| 29/29 [00:01<00:00, 15.52it/s, epoch: 31/40 - train_loss:0.20840761065483093 train_accuracy:0.9170354008674622 train_auc_1:0.909948468208313 val_loss:0.29270535707473755 val_accuracy:0.8817679286003113 val_auc_1:0.8584149479866028 ]
100%|██████████| 29/29 [00:01<00:00, 15.67it/s, epoch: 32/40 - train_loss:0.21289651095867157 train_accuracy:0.9139933586120605 train_auc_1:0.9137017130851746 val_loss:0.2861045300960541 val_accuracy:0.8872928023338318 val_auc_1:0.8621078729629517 ]
100%|██████████| 29/29 [00:01<00:00, 16.06it/s, epoch: 33/40 - train_loss:0.20959915220737457 train_accuracy:0.9116379022598267 train_auc_1:0.9085273146629333 val_loss:0.2869407832622528 val_accuracy:0.8828729391098022 val_auc_1:0.8602972030639648 ]
100%|██████████| 29/29 [00:01<00:00, 15.91it/s, epoch: 34/40 - train_loss:0.20927441120147705 train_accuracy:0.91731196641922 train_auc_1:0.9242825508117676 val_loss:0.2853357195854187 val_accuracy:0.8872928023338318 val_auc_1:0.8595817685127258 ]
100%|██████████| 29/29 [00:01<00:00, 15.88it/s, epoch: 35/40 - train_loss:0.21317821741104126 train_accuracy:0.9164719581604004 train_auc_1:0.9171379208564758 val_loss:0.2900511920452118 val_accuracy:0.8795580267906189 val_auc_1:0.8606053590774536 ]
100%|██████████| 29/29 [00:01<00:00, 15.76it/s, epoch: 36/40 - train_loss:0.22284917533397675 train_accuracy:0.909375011920929 train_auc_1:0.903253436088562 val_loss:0.2755865752696991 val_accuracy:0.8839778900146484 val_auc_1:0.8629168272018433 ]
100%|██████████| 29/29 [00:01<00:00, 15.70it/s, epoch: 37/40 - train_loss:0.19849534332752228 train_accuracy:0.9175646305084229 train_auc_1:0.923616886138916 val_loss:0.289157897233963 val_accuracy:0.8784530162811279 val_auc_1:0.8603851795196533 ]
100%|██████████| 29/29 [00:01<00:00, 15.88it/s, epoch: 38/40 - train_loss:0.20322787761688232 train_accuracy:0.9178650379180908 train_auc_1:0.9194050431251526 val_loss:0.2820649743080139 val_accuracy:0.8850829005241394 val_auc_1:0.8655640482902527 ]
100%|██████████| 29/29 [00:01<00:00, 15.71it/s, epoch: 39/40 - train_loss:0.1862594485282898 train_accuracy:0.9291487336158752 train_auc_1:0.9243948459625244 val_loss:0.2956363558769226 val_accuracy:0.8872928023338318 val_auc_1:0.8605613708496094 ]
100%|██████████| 29/29 [00:01<00:00, 15.88it/s, epoch: 40/40 - train_loss:0.20305216312408447 train_accuracy:0.915409505367279 train_auc_1:0.912611722946167 val_loss:0.2843382656574249 val_accuracy:0.8806629776954651 val_auc_1:0.8598018288612366 ]
[6]:
import matplotlib.pyplot as plt
for history in histories:
plt.plot(history['train_auc_1'])
plt.plot(history['val_auc_1'])
plt.title('Model Area Under Curve')
plt.ylabel('Area Under Curve')
plt.xlabel('Epoch')
plt.legend(
['origin', 'origin_val', 'fp8_compressed', 'fp8_compressed_val'], loc='lower right'
)
plt.show()
可以看到,两个模型的验证集auc均在0.85左右波动,使用8位量化对此任务的训练精度影响不大,而理论通讯消耗减少了3/4(从32位减少到了8位)。
自定义通讯压缩算法#
我们也可以自定义一个压缩算法,SecretFlow提供了SparseCompressor和QuantizedCompressor基类,对应稀疏化方法和量化压缩方法。
这里以量化压缩方法为例,来实现一个基于K-means的压缩算法。
K-means压缩论文是“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”提出的方法中的其中一个步骤,其思想是把对传输参数进行聚类,保存聚类中心的值,然后把其他值用聚类序号来表示。
继承QuantizedCompressor后,只要实现_compress_one(将一个numpy向量打包为QuantizedCompressedData) 和 _decompress_one(将QuantizedCompressedData还原回numpy向量)函数即可。
[7]:
from secretflow.utils.compressor import QuantizedCompressor
from secretflow.utils.compressor.quantized_compressor import QuantizedCompressedData
import numpy as np
class QuantizedKmeans(QuantizedCompressor):
"""Quantized compressor with Kmeans, a algorithm which replace float with relatived centroid's index.
Reference paper 2016 "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding".
Link: https://arxiv.org/abs/1510.00149
"""
class KmeansCompressData(QuantizedCompressedData):
def __init__(self, compressed_data, quant_bits, origin_type=None, q=None):
super().__init__(compressed_data, quant_bits, origin_type)
self.q = q
def __init__(self, quant_bits: int = 8, n_clusters=None):
super().__init__(quant_bits)
from sklearn.cluster import KMeans
if n_clusters is None:
self.n_clusters = quant_bits
else:
self.n_clusters = n_clusters
self.km = KMeans(self.n_clusters, n_init=1, max_iter=50)
def _compress_one(self, data: np.ndarray, **kwargs) -> "KmeansCompressData":
if data.flatten().shape[0] <= self.n_clusters:
return self.KmeansCompressData(data, self.quant_bits)
ori_shape = data.shape
self.km.fit(np.expand_dims(data.flatten(), axis=1))
quantized = self.km.labels_ - (1 << (self.quant_bits - 1))
quantized = np.reshape(quantized, ori_shape)
q = self.km.cluster_centers_
return self.KmeansCompressData(
quantized.astype(self.np_type), self.quant_bits, data.dtype, q
)
def _decompress_one(self, data: "KmeansCompressData") -> np.ndarray:
if data.compressed_data.flatten().shape[0] <= self.n_clusters:
return data.compressed_data
label = data.compressed_data.astype(data.origin_type) + (
1 << (self.quant_bits - 1)
)
dequantized = np.zeros_like(label)
for i in range(data.q.shape[0]):
dequantized[label == i] = data.q[i]
return dequantized
我们来实例化这个算法,再跑一遍联邦学习模型:
[8]:
qkm = QuantizedKmeans()
sl_model_kmeans = SLModel(
base_model_dict=base_model_dict,
device_y=alice,
model_fuse=model_fuse,
compressor=qkm,
)
history_kmeans = sl_model_kmeans.fit(
train_data,
train_label,
validation_data=(test_data, test_label),
epochs=40,
batch_size=128,
shuffle=True,
verbose=1,
validation_freq=1,
)
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4db5310>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de31c0>)}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675a60>)}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f3769f921f0>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de3190>)}, aligned=True), VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675dc0>)}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 91222, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f3554274a30>}
(pid=30232) 2023-08-16 01:46:19.445573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=30250) 2023-08-16 01:46:19.623566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=30232) 2023-08-16 01:46:20.187158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=30232) 2023-08-16 01:46:20.187299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=30232) 2023-08-16 01:46:20.187315: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=30250) 2023-08-16 01:46:20.343446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=30250) 2023-08-16 01:46:20.343542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=30250) 2023-08-16 01:46:20.343556: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(PYUSLTFModel pid=30232) 2023-08-16 01:46:22.002165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=30232) 2023-08-16 01:46:22.002204: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=30232) Model: "sequential"
(PYUSLTFModel pid=30232) _________________________________________________________________
(PYUSLTFModel pid=30232) Layer (type) Output Shape Param #
(PYUSLTFModel pid=30232) =================================================================
(PYUSLTFModel pid=30232) dense (Dense) (None, 100) 500
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) =================================================================
(PYUSLTFModel pid=30232) Total params: 6,964
(PYUSLTFModel pid=30232) Trainable params: 6,964
(PYUSLTFModel pid=30232) Non-trainable params: 0
(PYUSLTFModel pid=30232) _________________________________________________________________
(PYUSLTFModel pid=30232) Model: "model"
(PYUSLTFModel pid=30232) __________________________________________________________________________________________________
(PYUSLTFModel pid=30232) Layer (type) Output Shape Param # Connected to
(PYUSLTFModel pid=30232) ==================================================================================================
(PYUSLTFModel pid=30232) input_2 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) input_3 (InputLayer) [(None, 64)] 0 []
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) concatenate (Concatenate) (None, 128) 0 ['input_2[0][0]',
(PYUSLTFModel pid=30232) 'input_3[0][0]']
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) dense_2 (Dense) (None, 64) 8256 ['concatenate[0][0]']
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) dense_3 (Dense) (None, 1) 65 ['dense_2[0][0]']
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) ==================================================================================================
(PYUSLTFModel pid=30232) Total params: 8,321
(PYUSLTFModel pid=30232) Trainable params: 8,321
(PYUSLTFModel pid=30232) Non-trainable params: 0
(PYUSLTFModel pid=30232) __________________________________________________________________________________________________
(PYUSLTFModel pid=30250) 2023-08-16 01:46:22.167943: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=30250) 2023-08-16 01:46:22.167979: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=30250) Model: "sequential"
(PYUSLTFModel pid=30250) _________________________________________________________________
(PYUSLTFModel pid=30250) Layer (type) Output Shape Param #
(PYUSLTFModel pid=30250) =================================================================
(PYUSLTFModel pid=30250) dense (Dense) (None, 100) 1300
(PYUSLTFModel pid=30250)
(PYUSLTFModel pid=30250) dense_1 (Dense) (None, 64) 6464
(PYUSLTFModel pid=30250)
(PYUSLTFModel pid=30250) =================================================================
(PYUSLTFModel pid=30250) Total params: 7,764
(PYUSLTFModel pid=30250) Trainable params: 7,764
(PYUSLTFModel pid=30250) Non-trainable params: 0
(PYUSLTFModel pid=30250) _________________________________________________________________
100%|██████████| 29/29 [00:13<00:00, 2.39it/s](_run pid=27350) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (248) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:15<00:00, 1.83it/s, epoch: 1/40 - train_loss:0.4416384696960449 train_accuracy:0.8701704740524292 train_auc_1:0.518036961555481 val_loss:0.40735140442848206 val_accuracy:0.8729282021522522 val_auc_1:0.5592570304870605 ]
100%|██████████| 29/29 [00:13<00:00, 2.10it/s, epoch: 2/40 - train_loss:0.36653003096580505 train_accuracy:0.8817349076271057 train_auc_1:0.5584944486618042 val_loss:0.3673045337200165 val_accuracy:0.8729282021522522 val_auc_1:0.6474628448486328 ]
(_run pid=27350) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (237) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:11<00:00, 2.35it/s](_run pid=27349) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (251) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:13<00:00, 2.18it/s, epoch: 3/40 - train_loss:0.32427504658699036 train_accuracy:0.890625 train_auc_1:0.6890991926193237 val_loss:0.35910260677337646 val_accuracy:0.8729282021522522 val_auc_1:0.6932856440544128 ]
100%|██████████| 29/29 [00:13<00:00, 2.23it/s, epoch: 4/40 - train_loss:0.31370726227760315 train_accuracy:0.8875584006309509 train_auc_1:0.7440165281295776 val_loss:0.34864872694015503 val_accuracy:0.8729282021522522 val_auc_1:0.7281122803688049 ]
0%| | 0/29 [00:00<?, ?it/s](_run pid=27349) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (251) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:12<00:00, 2.24it/s, epoch: 5/40 - train_loss:0.2876177728176117 train_accuracy:0.8971962332725525 train_auc_1:0.7722644209861755 val_loss:0.33951109647750854 val_accuracy:0.8729282021522522 val_auc_1:0.7569069862365723 ]
100%|██████████| 29/29 [00:13<00:00, 2.12it/s, epoch: 6/40 - train_loss:0.29279187321662903 train_accuracy:0.8887392282485962 train_auc_1:0.7948960065841675 val_loss:0.3284510374069214 val_accuracy:0.8729282021522522 val_auc_1:0.8004623055458069 ]
100%|██████████| 29/29 [00:13<00:00, 2.13it/s, epoch: 7/40 - train_loss:0.27801549434661865 train_accuracy:0.8846591114997864 train_auc_1:0.8384666442871094 val_loss:0.3052524924278259 val_accuracy:0.8696132302284241 val_auc_1:0.815767765045166 ]
100%|██████████| 29/29 [00:13<00:00, 2.15it/s, epoch: 8/40 - train_loss:0.26859837770462036 train_accuracy:0.88606196641922 train_auc_1:0.8598557710647583 val_loss:0.2872508466243744 val_accuracy:0.8773480653762817 val_auc_1:0.8434892296791077 ]
100%|██████████| 29/29 [00:13<00:00, 2.16it/s, epoch: 9/40 - train_loss:0.2744479179382324 train_accuracy:0.8841261267662048 train_auc_1:0.8505929708480835 val_loss:0.3062475025653839 val_accuracy:0.8729282021522522 val_auc_1:0.8401155471801758 ]
100%|██████████| 29/29 [00:13<00:00, 2.12it/s, epoch: 10/40 - train_loss:0.248819500207901 train_accuracy:0.8949353694915771 train_auc_1:0.8706037998199463 val_loss:0.2888694405555725 val_accuracy:0.8751381039619446 val_auc_1:0.8512933850288391 ]
100%|██████████| 29/29 [00:13<00:00, 2.17it/s, epoch: 11/40 - train_loss:0.2316252887248993 train_accuracy:0.908462405204773 train_auc_1:0.875359058380127 val_loss:0.2812938690185547 val_accuracy:0.8806629776954651 val_auc_1:0.8486737012863159 ]
100%|██████████| 29/29 [00:13<00:00, 2.16it/s, epoch: 12/40 - train_loss:0.2373391091823578 train_accuracy:0.9034845232963562 train_auc_1:0.8871505260467529 val_loss:0.2884312868118286 val_accuracy:0.8817679286003113 val_auc_1:0.8496752977371216 ]
100%|██████████| 29/29 [00:13<00:00, 2.16it/s, epoch: 13/40 - train_loss:0.23225976526737213 train_accuracy:0.9103982448577881 train_auc_1:0.8738927841186523 val_loss:0.28070011734962463 val_accuracy:0.8795580267906189 val_auc_1:0.8505558967590332 ]
100%|██████████| 29/29 [00:13<00:00, 2.14it/s, epoch: 14/40 - train_loss:0.2326977550983429 train_accuracy:0.904633641242981 train_auc_1:0.8821461200714111 val_loss:0.2945939600467682 val_accuracy:0.8762431144714355 val_auc_1:0.8517225980758667 ]
100%|██████████| 29/29 [00:13<00:00, 2.17it/s, epoch: 15/40 - train_loss:0.23820573091506958 train_accuracy:0.8987832069396973 train_auc_1:0.8843868374824524 val_loss:0.28626009821891785 val_accuracy:0.8784530162811279 val_auc_1:0.8519262075424194 ]
100%|██████████| 29/29 [00:13<00:00, 2.13it/s, epoch: 16/40 - train_loss:0.2329801470041275 train_accuracy:0.9027478694915771 train_auc_1:0.8967185616493225 val_loss:0.27662354707717896 val_accuracy:0.8762431144714355 val_auc_1:0.8568078875541687 ]
100%|██████████| 29/29 [00:14<00:00, 2.05it/s, epoch: 17/40 - train_loss:0.2397279292345047 train_accuracy:0.9022727012634277 train_auc_1:0.8815232515335083 val_loss:0.2774234414100647 val_accuracy:0.8784530162811279 val_auc_1:0.8531590700149536 ]
100%|██████████| 29/29 [00:14<00:00, 2.04it/s, epoch: 18/40 - train_loss:0.23408104479312897 train_accuracy:0.9023783206939697 train_auc_1:0.8883072137832642 val_loss:0.2731465995311737 val_accuracy:0.8795580267906189 val_auc_1:0.8594001531600952 ]
100%|██████████| 29/29 [00:14<00:00, 2.05it/s, epoch: 19/40 - train_loss:0.23964105546474457 train_accuracy:0.897400438785553 train_auc_1:0.8906112313270569 val_loss:0.28690865635871887 val_accuracy:0.8850829005241394 val_auc_1:0.8540396690368652 ]
100%|██████████| 29/29 [00:13<00:00, 2.09it/s, epoch: 20/40 - train_loss:0.22502458095550537 train_accuracy:0.9051437973976135 train_auc_1:0.9018193483352661 val_loss:0.28959548473358154 val_accuracy:0.8773480653762817 val_auc_1:0.8575343489646912 ]
100%|██████████| 29/29 [00:14<00:00, 2.04it/s, epoch: 21/40 - train_loss:0.2249988317489624 train_accuracy:0.907866358757019 train_auc_1:0.9009451270103455 val_loss:0.2748246490955353 val_accuracy:0.8762431144714355 val_auc_1:0.8615685701370239 ]
100%|██████████| 29/29 [00:13<00:00, 2.10it/s, epoch: 22/40 - train_loss:0.22449716925621033 train_accuracy:0.9081858396530151 train_auc_1:0.8942176699638367 val_loss:0.2766781151294708 val_accuracy:0.8839778900146484 val_auc_1:0.863351583480835 ]
100%|██████████| 29/29 [00:13<00:00, 2.09it/s, epoch: 23/40 - train_loss:0.22895343601703644 train_accuracy:0.9034845232963562 train_auc_1:0.9039328098297119 val_loss:0.28054457902908325 val_accuracy:0.8817679286003113 val_auc_1:0.8572096824645996 ]
100%|██████████| 29/29 [00:13<00:00, 2.08it/s, epoch: 24/40 - train_loss:0.22407710552215576 train_accuracy:0.9059734344482422 train_auc_1:0.9005993008613586 val_loss:0.27575767040252686 val_accuracy:0.8850829005241394 val_auc_1:0.8586461544036865 ]
100%|██████████| 29/29 [00:14<00:00, 2.04it/s, epoch: 25/40 - train_loss:0.23382873833179474 train_accuracy:0.8992456793785095 train_auc_1:0.9045984745025635 val_loss:0.27955323457717896 val_accuracy:0.8795580267906189 val_auc_1:0.8569509983062744 ]
100%|██████████| 29/29 [00:14<00:00, 2.05it/s, epoch: 26/40 - train_loss:0.2256137877702713 train_accuracy:0.903761088848114 train_auc_1:0.9021925926208496 val_loss:0.2896490693092346 val_accuracy:0.8762431144714355 val_auc_1:0.8572261929512024 ]
100%|██████████| 29/29 [00:13<00:00, 2.15it/s, epoch: 27/40 - train_loss:0.20892585813999176 train_accuracy:0.9138434529304504 train_auc_1:0.9012246131896973 val_loss:0.28507480025291443 val_accuracy:0.8784530162811279 val_auc_1:0.854832112789154 ]
100%|██████████| 29/29 [00:13<00:00, 2.08it/s, epoch: 28/40 - train_loss:0.2042495459318161 train_accuracy:0.9186946749687195 train_auc_1:0.9083205461502075 val_loss:0.28037697076797485 val_accuracy:0.8828729391098022 val_auc_1:0.8549861907958984 ]
100%|██████████| 29/29 [00:13<00:00, 2.08it/s, epoch: 29/40 - train_loss:0.2143721729516983 train_accuracy:0.9090154767036438 train_auc_1:0.918034553527832 val_loss:0.28719428181648254 val_accuracy:0.889502763748169 val_auc_1:0.8539240956306458 ]
100%|██████████| 29/29 [00:14<00:00, 2.07it/s, epoch: 30/40 - train_loss:0.23188023269176483 train_accuracy:0.9043141603469849 train_auc_1:0.8931484818458557 val_loss:0.28120020031929016 val_accuracy:0.8795580267906189 val_auc_1:0.8607484102249146 ]
100%|██████████| 29/29 [00:13<00:00, 2.08it/s, epoch: 31/40 - train_loss:0.214926615357399 train_accuracy:0.9139933586120605 train_auc_1:0.9071189165115356 val_loss:0.27989909052848816 val_accuracy:0.8817679286003113 val_auc_1:0.8585194945335388 ]
100%|██████████| 29/29 [00:13<00:00, 2.09it/s, epoch: 32/40 - train_loss:0.19993817806243896 train_accuracy:0.9156526327133179 train_auc_1:0.918624758720398 val_loss:0.29084789752960205 val_accuracy:0.8883978128433228 val_auc_1:0.8593835830688477 ]
100%|██████████| 29/29 [00:14<00:00, 2.06it/s, epoch: 33/40 - train_loss:0.21098265051841736 train_accuracy:0.9143319129943848 train_auc_1:0.910923182964325 val_loss:0.3034096658229828 val_accuracy:0.8806629776954651 val_auc_1:0.8596697449684143 ]
100%|██████████| 29/29 [00:14<00:00, 1.98it/s, epoch: 34/40 - train_loss:0.21316346526145935 train_accuracy:0.908462405204773 train_auc_1:0.9148238897323608 val_loss:0.281110942363739 val_accuracy:0.8850829005241394 val_auc_1:0.8626692891120911 ]
100%|██████████| 29/29 [00:14<00:00, 2.06it/s, epoch: 35/40 - train_loss:0.19225820899009705 train_accuracy:0.9240301847457886 train_auc_1:0.9190618991851807 val_loss:0.29221484065055847 val_accuracy:0.8795580267906189 val_auc_1:0.856224536895752 ]
100%|██████████| 29/29 [00:14<00:00, 2.03it/s, epoch: 36/40 - train_loss:0.21308253705501556 train_accuracy:0.9129849076271057 train_auc_1:0.9214832782745361 val_loss:0.2819535434246063 val_accuracy:0.8839778900146484 val_auc_1:0.8588112592697144 ]
100%|██████████| 29/29 [00:13<00:00, 2.09it/s, epoch: 37/40 - train_loss:0.20776230096817017 train_accuracy:0.9178650379180908 train_auc_1:0.9141819477081299 val_loss:0.2892824411392212 val_accuracy:0.8795580267906189 val_auc_1:0.8580352067947388 ]
100%|██████████| 29/29 [00:13<00:00, 2.07it/s, epoch: 38/40 - train_loss:0.20729485154151917 train_accuracy:0.915099561214447 train_auc_1:0.9170703887939453 val_loss:0.28393277525901794 val_accuracy:0.8861878514289856 val_auc_1:0.8550798296928406 ]
100%|██████████| 29/29 [00:13<00:00, 2.10it/s, epoch: 39/40 - train_loss:0.21065551042556763 train_accuracy:0.9147727489471436 train_auc_1:0.9200801253318787 val_loss:0.3028808534145355 val_accuracy:0.8883978128433228 val_auc_1:0.8555586338043213 ]
100%|██████████| 29/29 [00:14<00:00, 2.05it/s, epoch: 40/40 - train_loss:0.20016591250896454 train_accuracy:0.9172952771186829 train_auc_1:0.9134470224380493 val_loss:0.2854197919368744 val_accuracy:0.8861878514289856 val_auc_1:0.8552834391593933 ]
[9]:
plt.plot(history_kmeans['train_auc_1'])
plt.plot(history_kmeans['val_auc_1'])
plt.title('Model Area Under Curve')
plt.ylabel('Area Under Curve')
plt.xlabel('Epoch')
plt.legend(['kmeans', 'kmeans_val'], loc='lower right')
plt.show()
最终验证集auc在0.855左右,也还不错~
压缩算法的压缩效果#
我们在ImageNet预训练的ResNet网络为例,试一下Int8、Fp8和Kmeans方法对模型参数的压缩效果,看看有什么差异。
[10]:
from secretflow.utils.compressor import QuantizedZeroPoint, QuantizedFP, QuantizedKmeans
from torchvision import models
import ssl
import time
import numpy as np
import matplotlib.pyplot as plt
ssl._create_default_https_context = ssl._create_unverified_context
net = models.resnet50(pretrained=True)
net_params = [p.detach().numpy().flatten() for p in net.parameters()]
coms = [
QuantizedZeroPoint(8),
QuantizedFP(8, format='E4M3'),
QuantizedFP(8, format='E5M2'),
QuantizedKmeans(8, n_clusters=100),
]
losses = []
durations = []
for c in coms:
start = time.time()
c_params = c.compress(net_params)
dc_params = c.decompress(c_params)
losses.append(sum([np.sum((a - b) ** 2) for a, b in zip(net_params, dc_params)]))
durations.append(time.time() - start)
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[11]:
plt.figure(figsize=(12.8, 4.8))
x = [1, 2, 3, 4]
x_label = ['Int8', 'Fp8-E4M3', 'Fp8-E5M2', 'Kmeans']
plt.subplot(121)
p1 = plt.bar(x, losses, color='deepskyblue')
plt.bar_label(p1, label_type='edge')
plt.xticks(x, x_label)
plt.title('SSE loss in compressing ResNet50')
plt.ylabel('Sum Square Error')
plt.subplot(122)
p2 = plt.bar(x, durations, color='salmon')
plt.bar_label(p2, label_type='edge')
plt.xticks(x, x_label)
plt.title('Time comsuming in compressing ResNet50')
plt.ylabel('time')
plt.show()
可以看到,kmeans压缩在控制精度损失方面表现最好,但压缩时间非常长。
浮点数(Fp8-M4E3)对ResNet模型参数压缩的效果略优于整型(Int8)压缩,时间消耗是整型压缩的3倍。
实际应用压缩算法时,可根据计算资源和压缩精度进行平衡。
总结#
本篇示例介绍了通讯压缩算法,并在拆分学习的基础之上使用了SecretFlow提供和自行设计的压缩算法。
从实验数据可以看出,将32位数压缩为8位的精度损失不大,而理论通信消耗仅为不作压缩时的1/4,因此在需要频繁传输数据和梯度的拆分学习中,加入通讯压缩不失为一个好的选择。
本教程使用明文聚合来做演示,同时没有考虑隐藏层的泄露问题,SecretFlow提供了聚合层AggLayer,通过MPC,TEE,HE,以及DP等方式规避隐层明文传输泄露的问题。如果您感兴趣,可以看相关文档。