SplitRec:在隐语拆分学习中使用通信压缩#

以下代码仅作为示例,请勿在生产环境直接使用。

本示例基于基于“拆分学习:银行营销”教程制作,建议先观看那个教程。

在拆分学习中,由于模型被拆分在多个设备当中,进行训练的时候,各方需要对特征和梯度进行多次传输,带来很高的网络通讯消耗。为了减少通讯过程中的数据量,可以进行一些压缩处理。

SecretFlow提供了Compressor对拆分学习中的数据进行压缩。同时也提供了多种基类,可以在此基础上实现自己的压缩算法。

下面我们来试试一些算法的可用性,首先,我们在secretflow环境中创造2个实体alice和bob。

[1]:
import secretflow as sf

sf.shutdown()
sf.init(['alice', 'bob'], address='local')
alice, bob = sf.PYU('alice'), sf.PYU('bob')
2023-08-16 01:43:59,294 WARNING services.py:1732 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=3.92gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-08-16 01:43:59,444 INFO worker.py:1538 -- Started a local Ray instance.

接下来我们准备要学习的数据。

我们使用“拆分学习:银行营销”中的数据准备和处理方法,下载银行营销数据集并进行处理。alice和bob的角色和之前的教程完全相同:

[2]:
from secretflow.utils.simulation.datasets import load_bank_marketing
from secretflow.preprocessing.scaler import MinMaxScaler
from secretflow.preprocessing.encoder import LabelEncoder
from secretflow.data.split import train_test_split

random_state = 1234

data = load_bank_marketing(parts={alice: (0, 4), bob: (4, 16)}, axis=1)
label = load_bank_marketing(parts={alice: (16, 17)}, axis=1)

encoder = LabelEncoder()
data['job'] = encoder.fit_transform(data['job'])
data['marital'] = encoder.fit_transform(data['marital'])
data['education'] = encoder.fit_transform(data['education'])
data['default'] = encoder.fit_transform(data['default'])
data['housing'] = encoder.fit_transform(data['housing'])
data['loan'] = encoder.fit_transform(data['loan'])
data['contact'] = encoder.fit_transform(data['contact'])
data['poutcome'] = encoder.fit_transform(data['poutcome'])
data['month'] = encoder.fit_transform(data['month'])
label = encoder.fit_transform(label)

scaler = MinMaxScaler()
data = scaler.fit_transform(data)

train_data, test_data = train_test_split(
    data, train_size=0.8, random_state=random_state
)
train_label, test_label = train_test_split(
    label, train_size=0.8, random_state=random_state
)
(_run pid=27337) /usr/local/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=27337)   warnings.warn(
(_run pid=27337) /usr/local/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but MinMaxScaler was fitted without feature names
(_run pid=27337)   warnings.warn(

接下来我们创建联邦模型,同样地,我们使用“拆分学习:银行营销”中的建模,构建出base_model和fuse_model,然后就可以定义SLModel用于训练:

[3]:
def create_base_model(input_dim, output_dim, name='base_model'):
    # Create model
    def create_model():
        from tensorflow import keras
        from tensorflow.keras import layers
        import tensorflow as tf

        model = keras.Sequential(
            [
                keras.Input(shape=input_dim),
                layers.Dense(100, activation="relu"),
                layers.Dense(output_dim, activation="relu"),
            ]
        )
        # Compile model
        model.summary()
        model.compile(
            loss='binary_crossentropy',
            optimizer='adam',
            metrics=["accuracy", tf.keras.metrics.AUC()],
        )
        return model

    return create_model


# prepare model
hidden_size = 64

model_base_alice = create_base_model(4, hidden_size)
model_base_bob = create_base_model(12, hidden_size)


def create_fuse_model(input_dim, output_dim, party_nums, name='fuse_model'):
    def create_model():
        from tensorflow import keras
        from tensorflow.keras import layers
        import tensorflow as tf

        # input
        input_layers = []
        for i in range(party_nums):
            input_layers.append(
                keras.Input(
                    input_dim,
                )
            )

        merged_layer = layers.concatenate(input_layers)
        fuse_layer = layers.Dense(64, activation='relu')(merged_layer)
        output = layers.Dense(output_dim, activation='sigmoid')(fuse_layer)

        model = keras.Model(inputs=input_layers, outputs=output)
        model.summary()

        model.compile(
            loss='binary_crossentropy',
            optimizer='adam',
            metrics=["accuracy", tf.keras.metrics.AUC()],
        )
        return model

    return create_model


model_fuse = create_fuse_model(input_dim=hidden_size, party_nums=2, output_dim=1)

base_model_dict = {alice: model_base_alice, bob: model_base_bob}


from secretflow.ml.nn import SLModel

sl_model_origin = SLModel(
    base_model_dict=base_model_dict,
    device_y=alice,
    model_fuse=model_fuse,
)
2023-08-16 01:44:03.512175: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-08-16 01:44:04.209189: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-08-16 01:44:04.209381: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-08-16 01:44:04.209397: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.

使用通讯压缩算法#

SecretFlow提供了Compressor,里面实现了各种基础的通讯压缩算法,可以直接使用。

只要导入想使用的压缩算法并实例化,定义SLModel时将实例化的方法作为参数传入就可以在训练中实现通讯压缩。

我们以QuantizedFP为例,该算法会将浮点数量化到8位以降低传输消耗。

[4]:
from secretflow.utils.compressor import QuantizedFP

qfp = QuantizedFP()

sl_model_compress = SLModel(
    base_model_dict=base_model_dict,
    device_y=alice,
    model_fuse=model_fuse,
    compressor=qfp,  # 在这里传入实例化的compressor算法
)
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.

我们分别对没有使用通讯压缩的模型和使用了量化压缩的模型进行训练,并把训练轮次拉高到40轮,看看效果如何。

[5]:
histories = []
for sl_model in [sl_model_origin, sl_model_compress]:
    history = sl_model.fit(
        train_data,
        train_label,
        validation_data=(test_data, test_label),
        epochs=40,
        batch_size=128,
        shuffle=True,
        verbose=1,
        validation_freq=1,
    )

    histories.append(history)
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4db5310>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de31c0>)}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675a60>)}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f3769f921f0>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de3190>)}, aligned=True), VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675dc0>)}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 11819, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f37ec6ec6d0>}
(pid=28114) 2023-08-16 01:44:08.296739: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28127) 2023-08-16 01:44:08.551930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28181) 2023-08-16 01:44:08.767248: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28235) 2023-08-16 01:44:09.014466: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=28114) 2023-08-16 01:44:09.160525: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28114) 2023-08-16 01:44:09.160694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28114) 2023-08-16 01:44:09.160713: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=28127) 2023-08-16 01:44:09.418021: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28127) 2023-08-16 01:44:09.418136: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28127) 2023-08-16 01:44:09.418152: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=28181) 2023-08-16 01:44:09.654066: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28181) 2023-08-16 01:44:09.654235: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28181) 2023-08-16 01:44:09.654257: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=28235) 2023-08-16 01:44:09.871219: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=28235) 2023-08-16 01:44:09.871317: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=28235) 2023-08-16 01:44:09.871333: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(PYUSLTFModel pid=28114) 2023-08-16 01:44:11.224977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28114) 2023-08-16 01:44:11.225041: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28114) Model: "sequential"
(PYUSLTFModel pid=28114) _________________________________________________________________
(PYUSLTFModel pid=28114)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=28114) =================================================================
(PYUSLTFModel pid=28114)  dense (Dense)               (None, 100)               500
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) =================================================================
(PYUSLTFModel pid=28114) Total params: 6,964
(PYUSLTFModel pid=28114) Trainable params: 6,964
(PYUSLTFModel pid=28114) Non-trainable params: 0
(PYUSLTFModel pid=28114) _________________________________________________________________
(PYUSLTFModel pid=28114) Model: "model"
(PYUSLTFModel pid=28114) __________________________________________________________________________________________________
(PYUSLTFModel pid=28114)  Layer (type)                   Output Shape         Param #     Connected to
(PYUSLTFModel pid=28114) ==================================================================================================
(PYUSLTFModel pid=28114)  input_2 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114)  input_3 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114)  concatenate (Concatenate)      (None, 128)          0           ['input_2[0][0]',
(PYUSLTFModel pid=28114)                                                                   'input_3[0][0]']
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114)  dense_2 (Dense)                (None, 64)           8256        ['concatenate[0][0]']
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114)  dense_3 (Dense)                (None, 1)            65          ['dense_2[0][0]']
(PYUSLTFModel pid=28114)
(PYUSLTFModel pid=28114) ==================================================================================================
(PYUSLTFModel pid=28114) Total params: 8,321
(PYUSLTFModel pid=28114) Trainable params: 8,321
(PYUSLTFModel pid=28114) Non-trainable params: 0
(PYUSLTFModel pid=28114) __________________________________________________________________________________________________
(PYUSLTFModel pid=28127) 2023-08-16 01:44:11.487105: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28127) 2023-08-16 01:44:11.487150: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28127) Model: "sequential"
(PYUSLTFModel pid=28127) _________________________________________________________________
(PYUSLTFModel pid=28127)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=28127) =================================================================
(PYUSLTFModel pid=28127)  dense (Dense)               (None, 100)               1300
(PYUSLTFModel pid=28127)
(PYUSLTFModel pid=28127)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=28127)
(PYUSLTFModel pid=28127) =================================================================
(PYUSLTFModel pid=28127) Total params: 7,764
(PYUSLTFModel pid=28127) Trainable params: 7,764
(PYUSLTFModel pid=28127) Non-trainable params: 0
(PYUSLTFModel pid=28127) _________________________________________________________________
  0%|          | 0/29 [00:00<?, ?it/s](PYUSLTFModel pid=28181) 2023-08-16 01:44:11.750458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28181) 2023-08-16 01:44:11.750499: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28181) Model: "sequential"
(PYUSLTFModel pid=28181) _________________________________________________________________
(PYUSLTFModel pid=28181)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=28181) =================================================================
(PYUSLTFModel pid=28181)  dense (Dense)               (None, 100)               500
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) =================================================================
(PYUSLTFModel pid=28181) Total params: 6,964
(PYUSLTFModel pid=28181) Trainable params: 6,964
(PYUSLTFModel pid=28181) Non-trainable params: 0
(PYUSLTFModel pid=28181) _________________________________________________________________
(PYUSLTFModel pid=28181) Model: "model"
(PYUSLTFModel pid=28181) __________________________________________________________________________________________________
(PYUSLTFModel pid=28181)  Layer (type)                   Output Shape         Param #     Connected to
(PYUSLTFModel pid=28181) ==================================================================================================
(PYUSLTFModel pid=28181)  input_2 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181)  input_3 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181)  concatenate (Concatenate)      (None, 128)          0           ['input_2[0][0]',
(PYUSLTFModel pid=28181)                                                                   'input_3[0][0]']
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181)  dense_2 (Dense)                (None, 64)           8256        ['concatenate[0][0]']
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181)  dense_3 (Dense)                (None, 1)            65          ['dense_2[0][0]']
(PYUSLTFModel pid=28181)
(PYUSLTFModel pid=28181) ==================================================================================================
(PYUSLTFModel pid=28181) Total params: 8,321
(PYUSLTFModel pid=28181) Trainable params: 8,321
(PYUSLTFModel pid=28181) Non-trainable params: 0
(PYUSLTFModel pid=28181) __________________________________________________________________________________________________
(PYUSLTFModel pid=28235) 2023-08-16 01:44:11.944846: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=28235) 2023-08-16 01:44:11.944886: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=28235) Model: "sequential"
(PYUSLTFModel pid=28235) _________________________________________________________________
(PYUSLTFModel pid=28235)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=28235) =================================================================
(PYUSLTFModel pid=28235)  dense (Dense)               (None, 100)               1300
(PYUSLTFModel pid=28235)
(PYUSLTFModel pid=28235)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=28235)
(PYUSLTFModel pid=28235) =================================================================
(PYUSLTFModel pid=28235) Total params: 7,764
(PYUSLTFModel pid=28235) Trainable params: 7,764
(PYUSLTFModel pid=28235) Non-trainable params: 0
(PYUSLTFModel pid=28235) _________________________________________________________________
2023-08-16 01:44:12.766822: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-08-16 01:44:12.766872: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
  7%|▋         | 2/29 [00:01<00:15,  1.71it/s](_run pid=27349) 2023-08-16 01:44:13.072502: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:13.779506: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:13.779666: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:13.779683: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=27349) 2023-08-16 01:44:15.482702: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(_run pid=27349) 2023-08-16 01:44:15.482741: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:05<00:00,  4.90it/s, epoch: 1/40 -  train_loss:0.4123900532722473  train_accuracy:0.8816371560096741  train_auc_1:0.5304562449455261  val_loss:0.3779788911342621  val_accuracy:0.8729282021522522  val_auc_1:0.6028343439102173 ]
100%|██████████| 29/29 [00:00<00:00, 29.61it/s, epoch: 2/40 -  train_loss:0.346932053565979  train_accuracy:0.8819137215614319  train_auc_1:0.6658823490142822  val_loss:0.36548909544944763  val_accuracy:0.8729282021522522  val_auc_1:0.6796367764472961 ]
100%|██████████| 29/29 [00:00<00:00, 29.64it/s, epoch: 3/40 -  train_loss:0.3372674584388733  train_accuracy:0.8816371560096741  train_auc_1:0.7067811489105225  val_loss:0.35295170545578003  val_accuracy:0.8729282021522522  val_auc_1:0.7270445823669434 ]
 90%|████████▉ | 26/29 [00:00<00:00, 29.28it/s](_run pid=27337) 2023-08-16 01:44:20.568055: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:21.249659: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:21.249744: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:21.249758: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=27337) 2023-08-16 01:44:22.997208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(_run pid=27337) 2023-08-16 01:44:22.997268: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:03<00:00,  7.81it/s, epoch: 4/40 -  train_loss:0.32340219616889954  train_accuracy:0.8769886493682861  train_auc_1:0.7885534763336182  val_loss:0.33063772320747375  val_accuracy:0.8729282021522522  val_auc_1:0.7887451648712158 ]
100%|██████████| 29/29 [00:00<00:00, 30.04it/s, epoch: 5/40 -  train_loss:0.26995792984962463  train_accuracy:0.8907632827758789  train_auc_1:0.838128924369812  val_loss:0.3076745867729187  val_accuracy:0.870718240737915  val_auc_1:0.8215299844741821 ]
100%|██████████| 29/29 [00:00<00:00, 29.33it/s, epoch: 6/40 -  train_loss:0.2533552348613739  train_accuracy:0.8924225568771362  train_auc_1:0.8705887794494629  val_loss:0.29530927538871765  val_accuracy:0.8773480653762817  val_auc_1:0.8317886590957642 ]
100%|██████████| 29/29 [00:00<00:00, 30.32it/s, epoch: 7/40 -  train_loss:0.24668139219284058  train_accuracy:0.8990597128868103  train_auc_1:0.8558321595191956  val_loss:0.28804725408554077  val_accuracy:0.8839778900146484  val_auc_1:0.8480352163314819 ]
100%|██████████| 29/29 [00:00<00:00, 29.50it/s, epoch: 8/40 -  train_loss:0.23031719028949738  train_accuracy:0.9137167930603027  train_auc_1:0.8611728549003601  val_loss:0.3139592111110687  val_accuracy:0.8762431144714355  val_auc_1:0.846015453338623 ]
100%|██████████| 29/29 [00:00<00:00, 30.08it/s, epoch: 9/40 -  train_loss:0.23515202105045319  train_accuracy:0.900053858757019  train_auc_1:0.8805654048919678  val_loss:0.28104230761528015  val_accuracy:0.8795580267906189  val_auc_1:0.8522399663925171 ]
100%|██████████| 29/29 [00:01<00:00, 26.20it/s, epoch: 10/40 -  train_loss:0.241227924823761  train_accuracy:0.9048295617103577  train_auc_1:0.8812973499298096  val_loss:0.2837042808532715  val_accuracy:0.8762431144714355  val_auc_1:0.8458613157272339 ]
100%|██████████| 29/29 [00:00<00:00, 29.39it/s, epoch: 11/40 -  train_loss:0.24319183826446533  train_accuracy:0.8978987336158752  train_auc_1:0.8969712257385254  val_loss:0.2814089357852936  val_accuracy:0.8806629776954651  val_auc_1:0.8504238128662109 ]
100%|██████████| 29/29 [00:00<00:00, 29.38it/s, epoch: 12/40 -  train_loss:0.23649084568023682  train_accuracy:0.9022090435028076  train_auc_1:0.8886930346488953  val_loss:0.2808994650840759  val_accuracy:0.8806629776954651  val_auc_1:0.8507044315338135 ]
100%|██████████| 29/29 [00:00<00:00, 29.70it/s, epoch: 13/40 -  train_loss:0.2257165014743805  train_accuracy:0.912446141242981  train_auc_1:0.8892974853515625  val_loss:0.2844206690788269  val_accuracy:0.8817679286003113  val_auc_1:0.8516015410423279 ]
100%|██████████| 29/29 [00:00<00:00, 29.98it/s, epoch: 14/40 -  train_loss:0.2239973098039627  train_accuracy:0.9110991358757019  train_auc_1:0.8925117254257202  val_loss:0.27834010124206543  val_accuracy:0.8828729391098022  val_auc_1:0.8576555252075195 ]
100%|██████████| 29/29 [00:00<00:00, 30.14it/s, epoch: 15/40 -  train_loss:0.22855830192565918  train_accuracy:0.9065265655517578  train_auc_1:0.9059909582138062  val_loss:0.27655595541000366  val_accuracy:0.8828729391098022  val_auc_1:0.8548376560211182 ]
100%|██████████| 29/29 [00:00<00:00, 29.99it/s, epoch: 16/40 -  train_loss:0.23442411422729492  train_accuracy:0.8992456793785095  train_auc_1:0.8952087759971619  val_loss:0.29822733998298645  val_accuracy:0.8773480653762817  val_auc_1:0.8496862649917603 ]
100%|██████████| 29/29 [00:00<00:00, 30.22it/s, epoch: 17/40 -  train_loss:0.22274373471736908  train_accuracy:0.9148706793785095  train_auc_1:0.8883383870124817  val_loss:0.2906903922557831  val_accuracy:0.8795580267906189  val_auc_1:0.8574408292770386 ]
100%|██████████| 29/29 [00:00<00:00, 30.02it/s, epoch: 18/40 -  train_loss:0.23235483467578888  train_accuracy:0.908462405204773  train_auc_1:0.8890659809112549  val_loss:0.2833332121372223  val_accuracy:0.8784530162811279  val_auc_1:0.853417694568634 ]
100%|██████████| 29/29 [00:00<00:00, 30.11it/s, epoch: 19/40 -  train_loss:0.21570773422718048  train_accuracy:0.9125000238418579  train_auc_1:0.9087664484977722  val_loss:0.28136932849884033  val_accuracy:0.8773480653762817  val_auc_1:0.852614164352417 ]
100%|██████████| 29/29 [00:00<00:00, 30.17it/s, epoch: 20/40 -  train_loss:0.22992058098316193  train_accuracy:0.9043141603469849  train_auc_1:0.9015873670578003  val_loss:0.2777860164642334  val_accuracy:0.8861878514289856  val_auc_1:0.8568739891052246 ]
100%|██████████| 29/29 [00:00<00:00, 29.56it/s, epoch: 21/40 -  train_loss:0.2279340922832489  train_accuracy:0.9051437973976135  train_auc_1:0.8971817493438721  val_loss:0.2807583212852478  val_accuracy:0.8751381039619446  val_auc_1:0.8563731908798218 ]
100%|██████████| 29/29 [00:00<00:00, 30.77it/s, epoch: 22/40 -  train_loss:0.21565255522727966  train_accuracy:0.9123831987380981  train_auc_1:0.9007200002670288  val_loss:0.2829255759716034  val_accuracy:0.8861878514289856  val_auc_1:0.8530654907226562 ]
100%|██████████| 29/29 [00:01<00:00, 28.86it/s, epoch: 23/40 -  train_loss:0.2037818878889084  train_accuracy:0.9172952771186829  train_auc_1:0.9117729067802429  val_loss:0.283769816160202  val_accuracy:0.8806629776954651  val_auc_1:0.8584370613098145 ]
100%|██████████| 29/29 [00:00<00:00, 29.94it/s, epoch: 24/40 -  train_loss:0.20522674918174744  train_accuracy:0.9189712405204773  train_auc_1:0.9200270175933838  val_loss:0.2849152088165283  val_accuracy:0.8872928023338318  val_auc_1:0.8554760217666626 ]
100%|██████████| 29/29 [00:00<00:00, 29.87it/s, epoch: 25/40 -  train_loss:0.21887396275997162  train_accuracy:0.9092133641242981  train_auc_1:0.8964812755584717  val_loss:0.28434526920318604  val_accuracy:0.8817679286003113  val_auc_1:0.8562740087509155 ]
100%|██████████| 29/29 [00:00<00:00, 30.06it/s, epoch: 26/40 -  train_loss:0.21427837014198303  train_accuracy:0.9110991358757019  train_auc_1:0.9081460237503052  val_loss:0.2802579998970032  val_accuracy:0.8806629776954651  val_auc_1:0.8563676476478577 ]
100%|██████████| 29/29 [00:00<00:00, 29.95it/s, epoch: 27/40 -  train_loss:0.21740949153900146  train_accuracy:0.9131637215614319  train_auc_1:0.900717556476593  val_loss:0.288614422082901  val_accuracy:0.8795580267906189  val_auc_1:0.8571600914001465 ]
 90%|████████▉ | 26/29 [00:00<00:00, 30.69it/s](_run pid=27350) 2023-08-16 01:44:46.829048: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:47.528208: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:47.528303: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:47.528317: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(_run pid=27350) 2023-08-16 01:44:49.224472: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(_run pid=27350) 2023-08-16 01:44:49.224513: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
100%|██████████| 29/29 [00:03<00:00,  7.90it/s, epoch: 28/40 -  train_loss:0.20581252872943878  train_accuracy:0.9156526327133179  train_auc_1:0.9221852421760559  val_loss:0.2895594835281372  val_accuracy:0.8861878514289856  val_auc_1:0.853269100189209 ]
100%|██████████| 29/29 [00:00<00:00, 29.86it/s, epoch: 29/40 -  train_loss:0.21550066769123077  train_accuracy:0.9065265655517578  train_auc_1:0.9204001426696777  val_loss:0.2893003225326538  val_accuracy:0.8773480653762817  val_auc_1:0.8543698191642761 ]
100%|██████████| 29/29 [00:00<00:00, 30.12it/s, epoch: 30/40 -  train_loss:0.2129545956850052  train_accuracy:0.9103982448577881  train_auc_1:0.9062888622283936  val_loss:0.2805260717868805  val_accuracy:0.8861878514289856  val_auc_1:0.857974648475647 ]
100%|██████████| 29/29 [00:00<00:00, 30.07it/s, epoch: 31/40 -  train_loss:0.21476531028747559  train_accuracy:0.9113636612892151  train_auc_1:0.9071635007858276  val_loss:0.28486552834510803  val_accuracy:0.8861878514289856  val_auc_1:0.8532305955886841 ]
100%|██████████| 29/29 [00:00<00:00, 30.47it/s, epoch: 32/40 -  train_loss:0.21274054050445557  train_accuracy:0.9136363863945007  train_auc_1:0.9150363802909851  val_loss:0.28660014271736145  val_accuracy:0.8828729391098022  val_auc_1:0.8550137877464294 ]
100%|██████████| 29/29 [00:00<00:00, 29.68it/s, epoch: 33/40 -  train_loss:0.19922088086605072  train_accuracy:0.9162057638168335  train_auc_1:0.925368070602417  val_loss:0.28454411029815674  val_accuracy:0.8839778900146484  val_auc_1:0.8589598536491394 ]
100%|██████████| 29/29 [00:01<00:00, 28.91it/s, epoch: 34/40 -  train_loss:0.19305925071239471  train_accuracy:0.9264547228813171  train_auc_1:0.9245292544364929  val_loss:0.2927177846431732  val_accuracy:0.8850829005241394  val_auc_1:0.8632690906524658 ]
100%|██████████| 29/29 [00:00<00:00, 29.83it/s, epoch: 35/40 -  train_loss:0.18927669525146484  train_accuracy:0.9245793223381042  train_auc_1:0.9269171953201294  val_loss:0.28897616267204285  val_accuracy:0.8839778900146484  val_auc_1:0.8587452173233032 ]
100%|██████████| 29/29 [00:01<00:00, 28.43it/s, epoch: 36/40 -  train_loss:0.20300477743148804  train_accuracy:0.917640209197998  train_auc_1:0.921332836151123  val_loss:0.2856888175010681  val_accuracy:0.8817679286003113  val_auc_1:0.8576774597167969 ]
100%|██████████| 29/29 [00:01<00:00, 28.36it/s, epoch: 37/40 -  train_loss:0.21484363079071045  train_accuracy:0.9117809534072876  train_auc_1:0.9073271155357361  val_loss:0.28348052501678467  val_accuracy:0.8795580267906189  val_auc_1:0.8578591346740723 ]
100%|██████████| 29/29 [00:00<00:00, 29.46it/s, epoch: 38/40 -  train_loss:0.2097211331129074  train_accuracy:0.9109513163566589  train_auc_1:0.9167930483818054  val_loss:0.28846046328544617  val_accuracy:0.8828729391098022  val_auc_1:0.8501706123352051 ]
100%|██████████| 29/29 [00:00<00:00, 30.17it/s, epoch: 39/40 -  train_loss:0.211452916264534  train_accuracy:0.9207974076271057  train_auc_1:0.9071869254112244  val_loss:0.2857007682323456  val_accuracy:0.8850829005241394  val_auc_1:0.8534011840820312 ]
100%|██████████| 29/29 [00:00<00:00, 29.72it/s, epoch: 40/40 -  train_loss:0.1950661838054657  train_accuracy:0.9189712405204773  train_auc_1:0.9238503575325012  val_loss:0.287675678730011  val_accuracy:0.8839778900146484  val_auc_1:0.8539405465126038 ]
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4db5310>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de31c0>)}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675a60>)}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f3769f921f0>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de3190>)}, aligned=True), VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675dc0>)}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 50480, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f37c4624cd0>}
100%|██████████| 29/29 [00:03<00:00,  7.41it/s, epoch: 1/40 -  train_loss:0.4217776954174042  train_accuracy:0.8659462332725525  train_auc_1:0.5435447692871094  val_loss:0.40626364946365356  val_accuracy:0.8729282021522522  val_auc_1:0.5905393362045288 ]
100%|██████████| 29/29 [00:01<00:00, 15.59it/s, epoch: 2/40 -  train_loss:0.3423333764076233  train_accuracy:0.8874446749687195  train_auc_1:0.6285374164581299  val_loss:0.3637339770793915  val_accuracy:0.8729282021522522  val_auc_1:0.670577883720398 ]
100%|██████████| 29/29 [00:01<00:00, 16.02it/s, epoch: 3/40 -  train_loss:0.31453219056129456  train_accuracy:0.8949353694915771  train_auc_1:0.6967648267745972  val_loss:0.35318124294281006  val_accuracy:0.8729282021522522  val_auc_1:0.7181453108787537 ]
100%|██████████| 29/29 [00:01<00:00, 15.70it/s, epoch: 4/40 -  train_loss:0.2924026548862457  train_accuracy:0.8968473672866821  train_auc_1:0.771354079246521  val_loss:0.3476685583591461  val_accuracy:0.8729282021522522  val_auc_1:0.7567088603973389 ]
100%|██████████| 29/29 [00:01<00:00, 15.96it/s, epoch: 5/40 -  train_loss:0.3236430585384369  train_accuracy:0.8690732717514038  train_auc_1:0.8049758076667786  val_loss:0.32425957918167114  val_accuracy:0.8729282021522522  val_auc_1:0.8028783798217773 ]
100%|██████████| 29/29 [00:01<00:00, 15.77it/s, epoch: 6/40 -  train_loss:0.2683410346508026  train_accuracy:0.8920454382896423  train_auc_1:0.8347899317741394  val_loss:0.3059132695198059  val_accuracy:0.8696132302284241  val_auc_1:0.8184260129928589 ]
100%|██████████| 29/29 [00:01<00:00, 15.99it/s, epoch: 7/40 -  train_loss:0.24226166307926178  train_accuracy:0.9022727012634277  train_auc_1:0.850990891456604  val_loss:0.30843329429626465  val_accuracy:0.8729282021522522  val_auc_1:0.832201361656189 ]
100%|██████████| 29/29 [00:01<00:00, 15.66it/s, epoch: 8/40 -  train_loss:0.23420202732086182  train_accuracy:0.9053977131843567  train_auc_1:0.8667846322059631  val_loss:0.2918694317340851  val_accuracy:0.8795580267906189  val_auc_1:0.8382883071899414 ]
100%|██████████| 29/29 [00:01<00:00, 15.87it/s, epoch: 9/40 -  train_loss:0.24281850457191467  train_accuracy:0.8993362784385681  train_auc_1:0.8600778579711914  val_loss:0.28592929244041443  val_accuracy:0.8773480653762817  val_auc_1:0.8522564172744751 ]
100%|██████████| 29/29 [00:01<00:00, 16.07it/s, epoch: 10/40 -  train_loss:0.25411662459373474  train_accuracy:0.8985795378684998  train_auc_1:0.8763052225112915  val_loss:0.27862876653671265  val_accuracy:0.8795580267906189  val_auc_1:0.8518161773681641 ]
100%|██████████| 29/29 [00:01<00:00, 15.97it/s, epoch: 11/40 -  train_loss:0.2467927783727646  train_accuracy:0.9008620977401733  train_auc_1:0.8637750148773193  val_loss:0.27538853883743286  val_accuracy:0.8850829005241394  val_auc_1:0.8585635423660278 ]
100%|██████████| 29/29 [00:01<00:00, 15.81it/s, epoch: 12/40 -  train_loss:0.24046260118484497  train_accuracy:0.9030172228813171  train_auc_1:0.8943703174591064  val_loss:0.2793208956718445  val_accuracy:0.8872928023338318  val_auc_1:0.8582884073257446 ]
100%|██████████| 29/29 [00:01<00:00, 15.74it/s, epoch: 13/40 -  train_loss:0.2232421338558197  train_accuracy:0.9109513163566589  train_auc_1:0.9031308889389038  val_loss:0.27965837717056274  val_accuracy:0.8773480653762817  val_auc_1:0.857512354850769 ]
100%|██████████| 29/29 [00:01<00:00, 15.98it/s, epoch: 14/40 -  train_loss:0.2226562350988388  train_accuracy:0.9120911359786987  train_auc_1:0.8835855722427368  val_loss:0.28520363569259644  val_accuracy:0.8806629776954651  val_auc_1:0.854595422744751 ]
100%|██████████| 29/29 [00:01<00:00, 15.90it/s, epoch: 15/40 -  train_loss:0.23515889048576355  train_accuracy:0.904902994632721  train_auc_1:0.8961691856384277  val_loss:0.28021782636642456  val_accuracy:0.8850829005241394  val_auc_1:0.8563291430473328 ]
100%|██████████| 29/29 [00:01<00:00, 15.79it/s, epoch: 16/40 -  train_loss:0.23402053117752075  train_accuracy:0.9024784564971924  train_auc_1:0.8906980752944946  val_loss:0.27909329533576965  val_accuracy:0.8850829005241394  val_auc_1:0.859708309173584 ]
100%|██████████| 29/29 [00:01<00:00, 15.93it/s, epoch: 17/40 -  train_loss:0.2111150622367859  train_accuracy:0.9189712405204773  train_auc_1:0.8960785865783691  val_loss:0.27899590134620667  val_accuracy:0.8817679286003113  val_auc_1:0.8576114177703857 ]
100%|██████████| 29/29 [00:01<00:00, 15.81it/s, epoch: 18/40 -  train_loss:0.20241659879684448  train_accuracy:0.915678858757019  train_auc_1:0.9157640933990479  val_loss:0.28282174468040466  val_accuracy:0.8784530162811279  val_auc_1:0.8583985567092896 ]
100%|██████████| 29/29 [00:01<00:00, 15.87it/s, epoch: 19/40 -  train_loss:0.23259153962135315  train_accuracy:0.9071022868156433  train_auc_1:0.8956990242004395  val_loss:0.2828892171382904  val_accuracy:0.8817679286003113  val_auc_1:0.8546835780143738 ]
100%|██████████| 29/29 [00:01<00:00, 15.82it/s, epoch: 20/40 -  train_loss:0.22440506517887115  train_accuracy:0.9034845232963562  train_auc_1:0.8989371657371521  val_loss:0.28058287501335144  val_accuracy:0.8784530162811279  val_auc_1:0.8598239421844482 ]
100%|██████████| 29/29 [00:01<00:00, 15.94it/s, epoch: 21/40 -  train_loss:0.23205137252807617  train_accuracy:0.9051724076271057  train_auc_1:0.8899630308151245  val_loss:0.2741439938545227  val_accuracy:0.8795580267906189  val_auc_1:0.8636598587036133 ]
100%|██████████| 29/29 [00:01<00:00, 15.72it/s, epoch: 22/40 -  train_loss:0.22656284272670746  train_accuracy:0.9030172228813171  train_auc_1:0.907919704914093  val_loss:0.2767719030380249  val_accuracy:0.8839778900146484  val_auc_1:0.8592514991760254 ]
100%|██████████| 29/29 [00:01<00:00, 15.98it/s, epoch: 23/40 -  train_loss:0.22055070102214813  train_accuracy:0.9109228849411011  train_auc_1:0.913796067237854  val_loss:0.2815714180469513  val_accuracy:0.8928176760673523  val_auc_1:0.855701744556427 ]
100%|██████████| 29/29 [00:01<00:00, 15.60it/s, epoch: 24/40 -  train_loss:0.23475250601768494  train_accuracy:0.9043141603469849  train_auc_1:0.9076265692710876  val_loss:0.2773815095424652  val_accuracy:0.8839778900146484  val_auc_1:0.8560759425163269 ]
100%|██████████| 29/29 [00:01<00:00, 16.07it/s, epoch: 25/40 -  train_loss:0.2359710931777954  train_accuracy:0.9005681872367859  train_auc_1:0.9041743278503418  val_loss:0.28951746225357056  val_accuracy:0.8806629776954651  val_auc_1:0.8590589165687561 ]
100%|██████████| 29/29 [00:01<00:00, 15.76it/s, epoch: 26/40 -  train_loss:0.21646590530872345  train_accuracy:0.9094827771186829  train_auc_1:0.9059643745422363  val_loss:0.27530720829963684  val_accuracy:0.8806629776954651  val_auc_1:0.8600990772247314 ]
100%|██████████| 29/29 [00:01<00:00, 15.82it/s, epoch: 27/40 -  train_loss:0.21936063468456268  train_accuracy:0.9137930870056152  train_auc_1:0.9077043533325195  val_loss:0.2782182991504669  val_accuracy:0.8861878514289856  val_auc_1:0.8611392974853516 ]
100%|██████████| 29/29 [00:01<00:00, 16.03it/s, epoch: 28/40 -  train_loss:0.21766482293605804  train_accuracy:0.9098451137542725  train_auc_1:0.9155865907669067  val_loss:0.2878170311450958  val_accuracy:0.8806629776954651  val_auc_1:0.8582608103752136 ]
100%|██████████| 29/29 [00:01<00:00, 15.72it/s, epoch: 29/40 -  train_loss:0.2088153064250946  train_accuracy:0.9126105904579163  train_auc_1:0.9115303754806519  val_loss:0.28278136253356934  val_accuracy:0.8817679286003113  val_auc_1:0.8566923141479492 ]
100%|██████████| 29/29 [00:01<00:00, 15.74it/s, epoch: 30/40 -  train_loss:0.2089204490184784  train_accuracy:0.9156526327133179  train_auc_1:0.9117385149002075  val_loss:0.2774920165538788  val_accuracy:0.8861878514289856  val_auc_1:0.8575839996337891 ]
100%|██████████| 29/29 [00:01<00:00, 15.52it/s, epoch: 31/40 -  train_loss:0.20840761065483093  train_accuracy:0.9170354008674622  train_auc_1:0.909948468208313  val_loss:0.29270535707473755  val_accuracy:0.8817679286003113  val_auc_1:0.8584149479866028 ]
100%|██████████| 29/29 [00:01<00:00, 15.67it/s, epoch: 32/40 -  train_loss:0.21289651095867157  train_accuracy:0.9139933586120605  train_auc_1:0.9137017130851746  val_loss:0.2861045300960541  val_accuracy:0.8872928023338318  val_auc_1:0.8621078729629517 ]
100%|██████████| 29/29 [00:01<00:00, 16.06it/s, epoch: 33/40 -  train_loss:0.20959915220737457  train_accuracy:0.9116379022598267  train_auc_1:0.9085273146629333  val_loss:0.2869407832622528  val_accuracy:0.8828729391098022  val_auc_1:0.8602972030639648 ]
100%|██████████| 29/29 [00:01<00:00, 15.91it/s, epoch: 34/40 -  train_loss:0.20927441120147705  train_accuracy:0.91731196641922  train_auc_1:0.9242825508117676  val_loss:0.2853357195854187  val_accuracy:0.8872928023338318  val_auc_1:0.8595817685127258 ]
100%|██████████| 29/29 [00:01<00:00, 15.88it/s, epoch: 35/40 -  train_loss:0.21317821741104126  train_accuracy:0.9164719581604004  train_auc_1:0.9171379208564758  val_loss:0.2900511920452118  val_accuracy:0.8795580267906189  val_auc_1:0.8606053590774536 ]
100%|██████████| 29/29 [00:01<00:00, 15.76it/s, epoch: 36/40 -  train_loss:0.22284917533397675  train_accuracy:0.909375011920929  train_auc_1:0.903253436088562  val_loss:0.2755865752696991  val_accuracy:0.8839778900146484  val_auc_1:0.8629168272018433 ]
100%|██████████| 29/29 [00:01<00:00, 15.70it/s, epoch: 37/40 -  train_loss:0.19849534332752228  train_accuracy:0.9175646305084229  train_auc_1:0.923616886138916  val_loss:0.289157897233963  val_accuracy:0.8784530162811279  val_auc_1:0.8603851795196533 ]
100%|██████████| 29/29 [00:01<00:00, 15.88it/s, epoch: 38/40 -  train_loss:0.20322787761688232  train_accuracy:0.9178650379180908  train_auc_1:0.9194050431251526  val_loss:0.2820649743080139  val_accuracy:0.8850829005241394  val_auc_1:0.8655640482902527 ]
100%|██████████| 29/29 [00:01<00:00, 15.71it/s, epoch: 39/40 -  train_loss:0.1862594485282898  train_accuracy:0.9291487336158752  train_auc_1:0.9243948459625244  val_loss:0.2956363558769226  val_accuracy:0.8872928023338318  val_auc_1:0.8605613708496094 ]
100%|██████████| 29/29 [00:01<00:00, 15.88it/s, epoch: 40/40 -  train_loss:0.20305216312408447  train_accuracy:0.915409505367279  train_auc_1:0.912611722946167  val_loss:0.2843382656574249  val_accuracy:0.8806629776954651  val_auc_1:0.8598018288612366 ]
[6]:
import matplotlib.pyplot as plt

for history in histories:
    plt.plot(history['train_auc_1'])
    plt.plot(history['val_auc_1'])

plt.title('Model Area Under Curve')
plt.ylabel('Area Under Curve')
plt.xlabel('Epoch')
plt.legend(
    ['origin', 'origin_val', 'fp8_compressed', 'fp8_compressed_val'], loc='lower right'
)
plt.show()
../../../../../_images/user_guide_federated_learning_vertical_federated_learning_SplitRec_efficiency_sl_compressor_10_0.png

可以看到,两个模型的验证集auc均在0.85左右波动,使用8位量化对此任务的训练精度影响不大,而理论通讯消耗减少了3/4(从32位减少到了8位)。

自定义通讯压缩算法#

我们也可以自定义一个压缩算法,SecretFlow提供了SparseCompressor和QuantizedCompressor基类,对应稀疏化方法和量化压缩方法。

这里以量化压缩方法为例,来实现一个基于K-means的压缩算法。

K-means压缩论文是“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”提出的方法中的其中一个步骤,其思想是把对传输参数进行聚类,保存聚类中心的值,然后把其他值用聚类序号来表示。

继承QuantizedCompressor后,只要实现_compress_one(将一个numpy向量打包为QuantizedCompressedData) 和 _decompress_one(将QuantizedCompressedData还原回numpy向量)函数即可。

[7]:
from secretflow.utils.compressor import QuantizedCompressor
from secretflow.utils.compressor.quantized_compressor import QuantizedCompressedData
import numpy as np


class QuantizedKmeans(QuantizedCompressor):
    """Quantized compressor with Kmeans, a algorithm which replace float with relatived centroid's index.

    Reference paper 2016 "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding".

    Link: https://arxiv.org/abs/1510.00149
    """

    class KmeansCompressData(QuantizedCompressedData):
        def __init__(self, compressed_data, quant_bits, origin_type=None, q=None):
            super().__init__(compressed_data, quant_bits, origin_type)
            self.q = q

    def __init__(self, quant_bits: int = 8, n_clusters=None):
        super().__init__(quant_bits)
        from sklearn.cluster import KMeans

        if n_clusters is None:
            self.n_clusters = quant_bits
        else:
            self.n_clusters = n_clusters
        self.km = KMeans(self.n_clusters, n_init=1, max_iter=50)

    def _compress_one(self, data: np.ndarray, **kwargs) -> "KmeansCompressData":
        if data.flatten().shape[0] <= self.n_clusters:
            return self.KmeansCompressData(data, self.quant_bits)
        ori_shape = data.shape
        self.km.fit(np.expand_dims(data.flatten(), axis=1))

        quantized = self.km.labels_ - (1 << (self.quant_bits - 1))

        quantized = np.reshape(quantized, ori_shape)
        q = self.km.cluster_centers_

        return self.KmeansCompressData(
            quantized.astype(self.np_type), self.quant_bits, data.dtype, q
        )

    def _decompress_one(self, data: "KmeansCompressData") -> np.ndarray:
        if data.compressed_data.flatten().shape[0] <= self.n_clusters:
            return data.compressed_data
        label = data.compressed_data.astype(data.origin_type) + (
            1 << (self.quant_bits - 1)
        )
        dequantized = np.zeros_like(label)
        for i in range(data.q.shape[0]):
            dequantized[label == i] = data.q[i]

        return dequantized

我们来实例化这个算法,再跑一遍联邦学习模型:

[8]:
qkm = QuantizedKmeans()

sl_model_kmeans = SLModel(
    base_model_dict=base_model_dict,
    device_y=alice,
    model_fuse=model_fuse,
    compressor=qkm,
)

history_kmeans = sl_model_kmeans.fit(
    train_data,
    train_label,
    validation_data=(test_data, test_label),
    epochs=40,
    batch_size=128,
    shuffle=True,
    verbose=1,
    validation_freq=1,
)
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party alice.
INFO:root:Create proxy actor <class 'secretflow.ml.nn.sl.backend.tensorflow.sl_base.PYUSLTFModel'> with party bob.
INFO:root:SL Train Params: {'x': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4db5310>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de31c0>)}, aligned=True), 'y': VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675a60>)}, aligned=True), 'batch_size': 128, 'epochs': 40, 'verbose': 1, 'callbacks': None, 'validation_data': (VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f3769f921f0>), PYURuntime(bob): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4de3190>)}, aligned=True), VDataFrame(partitions={PYURuntime(alice): Partition(data=<secretflow.device.device.pyu.PYUObject object at 0x7f37c4675dc0>)}, aligned=True)), 'shuffle': True, 'sample_weight': None, 'validation_freq': 1, 'dp_spent_step_freq': None, 'dataset_builder': None, 'audit_log_params': {}, 'random_seed': 91222, 'audit_log_dir': None, 'self': <secretflow.ml.nn.sl.sl_model.SLModel object at 0x7f3554274a30>}
(pid=30232) 2023-08-16 01:46:19.445573: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=30250) 2023-08-16 01:46:19.623566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
(pid=30232) 2023-08-16 01:46:20.187158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=30232) 2023-08-16 01:46:20.187299: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=30232) 2023-08-16 01:46:20.187315: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(pid=30250) 2023-08-16 01:46:20.343446: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
(pid=30250) 2023-08-16 01:46:20.343542: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
(pid=30250) 2023-08-16 01:46:20.343556: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
(PYUSLTFModel pid=30232) 2023-08-16 01:46:22.002165: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=30232) 2023-08-16 01:46:22.002204: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=30232) Model: "sequential"
(PYUSLTFModel pid=30232) _________________________________________________________________
(PYUSLTFModel pid=30232)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=30232) =================================================================
(PYUSLTFModel pid=30232)  dense (Dense)               (None, 100)               500
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) =================================================================
(PYUSLTFModel pid=30232) Total params: 6,964
(PYUSLTFModel pid=30232) Trainable params: 6,964
(PYUSLTFModel pid=30232) Non-trainable params: 0
(PYUSLTFModel pid=30232) _________________________________________________________________
(PYUSLTFModel pid=30232) Model: "model"
(PYUSLTFModel pid=30232) __________________________________________________________________________________________________
(PYUSLTFModel pid=30232)  Layer (type)                   Output Shape         Param #     Connected to
(PYUSLTFModel pid=30232) ==================================================================================================
(PYUSLTFModel pid=30232)  input_2 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232)  input_3 (InputLayer)           [(None, 64)]         0           []
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232)  concatenate (Concatenate)      (None, 128)          0           ['input_2[0][0]',
(PYUSLTFModel pid=30232)                                                                   'input_3[0][0]']
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232)  dense_2 (Dense)                (None, 64)           8256        ['concatenate[0][0]']
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232)  dense_3 (Dense)                (None, 1)            65          ['dense_2[0][0]']
(PYUSLTFModel pid=30232)
(PYUSLTFModel pid=30232) ==================================================================================================
(PYUSLTFModel pid=30232) Total params: 8,321
(PYUSLTFModel pid=30232) Trainable params: 8,321
(PYUSLTFModel pid=30232) Non-trainable params: 0
(PYUSLTFModel pid=30232) __________________________________________________________________________________________________
(PYUSLTFModel pid=30250) 2023-08-16 01:46:22.167943: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
(PYUSLTFModel pid=30250) 2023-08-16 01:46:22.167979: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
(PYUSLTFModel pid=30250) Model: "sequential"
(PYUSLTFModel pid=30250) _________________________________________________________________
(PYUSLTFModel pid=30250)  Layer (type)                Output Shape              Param #
(PYUSLTFModel pid=30250) =================================================================
(PYUSLTFModel pid=30250)  dense (Dense)               (None, 100)               1300
(PYUSLTFModel pid=30250)
(PYUSLTFModel pid=30250)  dense_1 (Dense)             (None, 64)                6464
(PYUSLTFModel pid=30250)
(PYUSLTFModel pid=30250) =================================================================
(PYUSLTFModel pid=30250) Total params: 7,764
(PYUSLTFModel pid=30250) Trainable params: 7,764
(PYUSLTFModel pid=30250) Non-trainable params: 0
(PYUSLTFModel pid=30250) _________________________________________________________________
100%|██████████| 29/29 [00:13<00:00,  2.39it/s](_run pid=27350) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (248) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:15<00:00,  1.83it/s, epoch: 1/40 -  train_loss:0.4416384696960449  train_accuracy:0.8701704740524292  train_auc_1:0.518036961555481  val_loss:0.40735140442848206  val_accuracy:0.8729282021522522  val_auc_1:0.5592570304870605 ]
100%|██████████| 29/29 [00:13<00:00,  2.10it/s, epoch: 2/40 -  train_loss:0.36653003096580505  train_accuracy:0.8817349076271057  train_auc_1:0.5584944486618042  val_loss:0.3673045337200165  val_accuracy:0.8729282021522522  val_auc_1:0.6474628448486328 ]
(_run pid=27350) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (237) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:11<00:00,  2.35it/s](_run pid=27349) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (251) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:13<00:00,  2.18it/s, epoch: 3/40 -  train_loss:0.32427504658699036  train_accuracy:0.890625  train_auc_1:0.6890991926193237  val_loss:0.35910260677337646  val_accuracy:0.8729282021522522  val_auc_1:0.6932856440544128 ]
100%|██████████| 29/29 [00:13<00:00,  2.23it/s, epoch: 4/40 -  train_loss:0.31370726227760315  train_accuracy:0.8875584006309509  train_auc_1:0.7440165281295776  val_loss:0.34864872694015503  val_accuracy:0.8729282021522522  val_auc_1:0.7281122803688049 ]
  0%|          | 0/29 [00:00<?, ?it/s](_run pid=27349) /tmp/ipykernel_25795/600430472.py:13: ConvergenceWarning: Number of distinct clusters (251) found smaller than n_clusters (256). Possibly due to duplicate points in X.
100%|██████████| 29/29 [00:12<00:00,  2.24it/s, epoch: 5/40 -  train_loss:0.2876177728176117  train_accuracy:0.8971962332725525  train_auc_1:0.7722644209861755  val_loss:0.33951109647750854  val_accuracy:0.8729282021522522  val_auc_1:0.7569069862365723 ]
100%|██████████| 29/29 [00:13<00:00,  2.12it/s, epoch: 6/40 -  train_loss:0.29279187321662903  train_accuracy:0.8887392282485962  train_auc_1:0.7948960065841675  val_loss:0.3284510374069214  val_accuracy:0.8729282021522522  val_auc_1:0.8004623055458069 ]
100%|██████████| 29/29 [00:13<00:00,  2.13it/s, epoch: 7/40 -  train_loss:0.27801549434661865  train_accuracy:0.8846591114997864  train_auc_1:0.8384666442871094  val_loss:0.3052524924278259  val_accuracy:0.8696132302284241  val_auc_1:0.815767765045166 ]
100%|██████████| 29/29 [00:13<00:00,  2.15it/s, epoch: 8/40 -  train_loss:0.26859837770462036  train_accuracy:0.88606196641922  train_auc_1:0.8598557710647583  val_loss:0.2872508466243744  val_accuracy:0.8773480653762817  val_auc_1:0.8434892296791077 ]
100%|██████████| 29/29 [00:13<00:00,  2.16it/s, epoch: 9/40 -  train_loss:0.2744479179382324  train_accuracy:0.8841261267662048  train_auc_1:0.8505929708480835  val_loss:0.3062475025653839  val_accuracy:0.8729282021522522  val_auc_1:0.8401155471801758 ]
100%|██████████| 29/29 [00:13<00:00,  2.12it/s, epoch: 10/40 -  train_loss:0.248819500207901  train_accuracy:0.8949353694915771  train_auc_1:0.8706037998199463  val_loss:0.2888694405555725  val_accuracy:0.8751381039619446  val_auc_1:0.8512933850288391 ]
100%|██████████| 29/29 [00:13<00:00,  2.17it/s, epoch: 11/40 -  train_loss:0.2316252887248993  train_accuracy:0.908462405204773  train_auc_1:0.875359058380127  val_loss:0.2812938690185547  val_accuracy:0.8806629776954651  val_auc_1:0.8486737012863159 ]
100%|██████████| 29/29 [00:13<00:00,  2.16it/s, epoch: 12/40 -  train_loss:0.2373391091823578  train_accuracy:0.9034845232963562  train_auc_1:0.8871505260467529  val_loss:0.2884312868118286  val_accuracy:0.8817679286003113  val_auc_1:0.8496752977371216 ]
100%|██████████| 29/29 [00:13<00:00,  2.16it/s, epoch: 13/40 -  train_loss:0.23225976526737213  train_accuracy:0.9103982448577881  train_auc_1:0.8738927841186523  val_loss:0.28070011734962463  val_accuracy:0.8795580267906189  val_auc_1:0.8505558967590332 ]
100%|██████████| 29/29 [00:13<00:00,  2.14it/s, epoch: 14/40 -  train_loss:0.2326977550983429  train_accuracy:0.904633641242981  train_auc_1:0.8821461200714111  val_loss:0.2945939600467682  val_accuracy:0.8762431144714355  val_auc_1:0.8517225980758667 ]
100%|██████████| 29/29 [00:13<00:00,  2.17it/s, epoch: 15/40 -  train_loss:0.23820573091506958  train_accuracy:0.8987832069396973  train_auc_1:0.8843868374824524  val_loss:0.28626009821891785  val_accuracy:0.8784530162811279  val_auc_1:0.8519262075424194 ]
100%|██████████| 29/29 [00:13<00:00,  2.13it/s, epoch: 16/40 -  train_loss:0.2329801470041275  train_accuracy:0.9027478694915771  train_auc_1:0.8967185616493225  val_loss:0.27662354707717896  val_accuracy:0.8762431144714355  val_auc_1:0.8568078875541687 ]
100%|██████████| 29/29 [00:14<00:00,  2.05it/s, epoch: 17/40 -  train_loss:0.2397279292345047  train_accuracy:0.9022727012634277  train_auc_1:0.8815232515335083  val_loss:0.2774234414100647  val_accuracy:0.8784530162811279  val_auc_1:0.8531590700149536 ]
100%|██████████| 29/29 [00:14<00:00,  2.04it/s, epoch: 18/40 -  train_loss:0.23408104479312897  train_accuracy:0.9023783206939697  train_auc_1:0.8883072137832642  val_loss:0.2731465995311737  val_accuracy:0.8795580267906189  val_auc_1:0.8594001531600952 ]
100%|██████████| 29/29 [00:14<00:00,  2.05it/s, epoch: 19/40 -  train_loss:0.23964105546474457  train_accuracy:0.897400438785553  train_auc_1:0.8906112313270569  val_loss:0.28690865635871887  val_accuracy:0.8850829005241394  val_auc_1:0.8540396690368652 ]
100%|██████████| 29/29 [00:13<00:00,  2.09it/s, epoch: 20/40 -  train_loss:0.22502458095550537  train_accuracy:0.9051437973976135  train_auc_1:0.9018193483352661  val_loss:0.28959548473358154  val_accuracy:0.8773480653762817  val_auc_1:0.8575343489646912 ]
100%|██████████| 29/29 [00:14<00:00,  2.04it/s, epoch: 21/40 -  train_loss:0.2249988317489624  train_accuracy:0.907866358757019  train_auc_1:0.9009451270103455  val_loss:0.2748246490955353  val_accuracy:0.8762431144714355  val_auc_1:0.8615685701370239 ]
100%|██████████| 29/29 [00:13<00:00,  2.10it/s, epoch: 22/40 -  train_loss:0.22449716925621033  train_accuracy:0.9081858396530151  train_auc_1:0.8942176699638367  val_loss:0.2766781151294708  val_accuracy:0.8839778900146484  val_auc_1:0.863351583480835 ]
100%|██████████| 29/29 [00:13<00:00,  2.09it/s, epoch: 23/40 -  train_loss:0.22895343601703644  train_accuracy:0.9034845232963562  train_auc_1:0.9039328098297119  val_loss:0.28054457902908325  val_accuracy:0.8817679286003113  val_auc_1:0.8572096824645996 ]
100%|██████████| 29/29 [00:13<00:00,  2.08it/s, epoch: 24/40 -  train_loss:0.22407710552215576  train_accuracy:0.9059734344482422  train_auc_1:0.9005993008613586  val_loss:0.27575767040252686  val_accuracy:0.8850829005241394  val_auc_1:0.8586461544036865 ]
100%|██████████| 29/29 [00:14<00:00,  2.04it/s, epoch: 25/40 -  train_loss:0.23382873833179474  train_accuracy:0.8992456793785095  train_auc_1:0.9045984745025635  val_loss:0.27955323457717896  val_accuracy:0.8795580267906189  val_auc_1:0.8569509983062744 ]
100%|██████████| 29/29 [00:14<00:00,  2.05it/s, epoch: 26/40 -  train_loss:0.2256137877702713  train_accuracy:0.903761088848114  train_auc_1:0.9021925926208496  val_loss:0.2896490693092346  val_accuracy:0.8762431144714355  val_auc_1:0.8572261929512024 ]
100%|██████████| 29/29 [00:13<00:00,  2.15it/s, epoch: 27/40 -  train_loss:0.20892585813999176  train_accuracy:0.9138434529304504  train_auc_1:0.9012246131896973  val_loss:0.28507480025291443  val_accuracy:0.8784530162811279  val_auc_1:0.854832112789154 ]
100%|██████████| 29/29 [00:13<00:00,  2.08it/s, epoch: 28/40 -  train_loss:0.2042495459318161  train_accuracy:0.9186946749687195  train_auc_1:0.9083205461502075  val_loss:0.28037697076797485  val_accuracy:0.8828729391098022  val_auc_1:0.8549861907958984 ]
100%|██████████| 29/29 [00:13<00:00,  2.08it/s, epoch: 29/40 -  train_loss:0.2143721729516983  train_accuracy:0.9090154767036438  train_auc_1:0.918034553527832  val_loss:0.28719428181648254  val_accuracy:0.889502763748169  val_auc_1:0.8539240956306458 ]
100%|██████████| 29/29 [00:14<00:00,  2.07it/s, epoch: 30/40 -  train_loss:0.23188023269176483  train_accuracy:0.9043141603469849  train_auc_1:0.8931484818458557  val_loss:0.28120020031929016  val_accuracy:0.8795580267906189  val_auc_1:0.8607484102249146 ]
100%|██████████| 29/29 [00:13<00:00,  2.08it/s, epoch: 31/40 -  train_loss:0.214926615357399  train_accuracy:0.9139933586120605  train_auc_1:0.9071189165115356  val_loss:0.27989909052848816  val_accuracy:0.8817679286003113  val_auc_1:0.8585194945335388 ]
100%|██████████| 29/29 [00:13<00:00,  2.09it/s, epoch: 32/40 -  train_loss:0.19993817806243896  train_accuracy:0.9156526327133179  train_auc_1:0.918624758720398  val_loss:0.29084789752960205  val_accuracy:0.8883978128433228  val_auc_1:0.8593835830688477 ]
100%|██████████| 29/29 [00:14<00:00,  2.06it/s, epoch: 33/40 -  train_loss:0.21098265051841736  train_accuracy:0.9143319129943848  train_auc_1:0.910923182964325  val_loss:0.3034096658229828  val_accuracy:0.8806629776954651  val_auc_1:0.8596697449684143 ]
100%|██████████| 29/29 [00:14<00:00,  1.98it/s, epoch: 34/40 -  train_loss:0.21316346526145935  train_accuracy:0.908462405204773  train_auc_1:0.9148238897323608  val_loss:0.281110942363739  val_accuracy:0.8850829005241394  val_auc_1:0.8626692891120911 ]
100%|██████████| 29/29 [00:14<00:00,  2.06it/s, epoch: 35/40 -  train_loss:0.19225820899009705  train_accuracy:0.9240301847457886  train_auc_1:0.9190618991851807  val_loss:0.29221484065055847  val_accuracy:0.8795580267906189  val_auc_1:0.856224536895752 ]
100%|██████████| 29/29 [00:14<00:00,  2.03it/s, epoch: 36/40 -  train_loss:0.21308253705501556  train_accuracy:0.9129849076271057  train_auc_1:0.9214832782745361  val_loss:0.2819535434246063  val_accuracy:0.8839778900146484  val_auc_1:0.8588112592697144 ]
100%|██████████| 29/29 [00:13<00:00,  2.09it/s, epoch: 37/40 -  train_loss:0.20776230096817017  train_accuracy:0.9178650379180908  train_auc_1:0.9141819477081299  val_loss:0.2892824411392212  val_accuracy:0.8795580267906189  val_auc_1:0.8580352067947388 ]
100%|██████████| 29/29 [00:13<00:00,  2.07it/s, epoch: 38/40 -  train_loss:0.20729485154151917  train_accuracy:0.915099561214447  train_auc_1:0.9170703887939453  val_loss:0.28393277525901794  val_accuracy:0.8861878514289856  val_auc_1:0.8550798296928406 ]
100%|██████████| 29/29 [00:13<00:00,  2.10it/s, epoch: 39/40 -  train_loss:0.21065551042556763  train_accuracy:0.9147727489471436  train_auc_1:0.9200801253318787  val_loss:0.3028808534145355  val_accuracy:0.8883978128433228  val_auc_1:0.8555586338043213 ]
100%|██████████| 29/29 [00:14<00:00,  2.05it/s, epoch: 40/40 -  train_loss:0.20016591250896454  train_accuracy:0.9172952771186829  train_auc_1:0.9134470224380493  val_loss:0.2854197919368744  val_accuracy:0.8861878514289856  val_auc_1:0.8552834391593933 ]
[9]:
plt.plot(history_kmeans['train_auc_1'])
plt.plot(history_kmeans['val_auc_1'])

plt.title('Model Area Under Curve')
plt.ylabel('Area Under Curve')
plt.xlabel('Epoch')
plt.legend(['kmeans', 'kmeans_val'], loc='lower right')
plt.show()
../../../../../_images/user_guide_federated_learning_vertical_federated_learning_SplitRec_efficiency_sl_compressor_16_0.png

最终验证集auc在0.855左右,也还不错~

压缩算法的压缩效果#

我们在ImageNet预训练的ResNet网络为例,试一下Int8、Fp8和Kmeans方法对模型参数的压缩效果,看看有什么差异。

[10]:
from secretflow.utils.compressor import QuantizedZeroPoint, QuantizedFP, QuantizedKmeans
from torchvision import models
import ssl
import time

import numpy as np
import matplotlib.pyplot as plt

ssl._create_default_https_context = ssl._create_unverified_context
net = models.resnet50(pretrained=True)
net_params = [p.detach().numpy().flatten() for p in net.parameters()]

coms = [
    QuantizedZeroPoint(8),
    QuantizedFP(8, format='E4M3'),
    QuantizedFP(8, format='E5M2'),
    QuantizedKmeans(8, n_clusters=100),
]
losses = []
durations = []

for c in coms:
    start = time.time()
    c_params = c.compress(net_params)
    dc_params = c.decompress(c_params)
    losses.append(sum([np.sum((a - b) ** 2) for a, b in zip(net_params, dc_params)]))
    durations.append(time.time() - start)
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/usr/local/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
[11]:
plt.figure(figsize=(12.8, 4.8))
x = [1, 2, 3, 4]
x_label = ['Int8', 'Fp8-E4M3', 'Fp8-E5M2', 'Kmeans']

plt.subplot(121)
p1 = plt.bar(x, losses, color='deepskyblue')
plt.bar_label(p1, label_type='edge')
plt.xticks(x, x_label)
plt.title('SSE loss in compressing ResNet50')
plt.ylabel('Sum Square Error')

plt.subplot(122)
p2 = plt.bar(x, durations, color='salmon')
plt.bar_label(p2, label_type='edge')
plt.xticks(x, x_label)
plt.title('Time comsuming in compressing ResNet50')
plt.ylabel('time')

plt.show()
../../../../../_images/user_guide_federated_learning_vertical_federated_learning_SplitRec_efficiency_sl_compressor_20_0.png

可以看到,kmeans压缩在控制精度损失方面表现最好,但压缩时间非常长。

浮点数(Fp8-M4E3)对ResNet模型参数压缩的效果略优于整型(Int8)压缩,时间消耗是整型压缩的3倍。

实际应用压缩算法时,可根据计算资源和压缩精度进行平衡。

总结#

本篇示例介绍了通讯压缩算法,并在拆分学习的基础之上使用了SecretFlow提供和自行设计的压缩算法。

从实验数据可以看出,将32位数压缩为8位的精度损失不大,而理论通信消耗仅为不作压缩时的1/4,因此在需要频繁传输数据和梯度的拆分学习中,加入通讯压缩不失为一个好的选择。

本教程使用明文聚合来做演示,同时没有考虑隐藏层的泄露问题,SecretFlow提供了聚合层AggLayer,通过MPC,TEE,HE,以及DP等方式规避隐层明文传输泄露的问题。如果您感兴趣,可以看相关文档。