secretflow.utils.simulation package#

Subpackages#

Submodules#

secretflow.utils.simulation.datasets module#

Functions:

dataset(name[, cache_dir])

Get the specific dataset file path.

load_iris(parts[, axis, aggregator, comparator])

Load iris dataset to federated dataframe.

load_dermatology(parts[, axis, ...])

Load dermatology dataset to federated dataframe.

load_bank_marketing(parts[, axis, full, ...])

Load bank marketing dataset to federated dataframe.

load_mnist(parts[, normalized_x, ...])

Load mnist dataset to federated ndarrays.

load_linear(parts)

Load the linear dataset to federated dataframe.

load_cora(parts[, data_dir, add_self_loop])

Load the cora dataset for split learning GNN.

secretflow.utils.simulation.datasets.dataset(name: str, cache_dir: Optional[str] = None) str[source]#

Get the specific dataset file path.

Parameters

name – the dataset name, should be one of [‘iris’, ‘dermatology’, ‘bank_marketing’, ‘mnist’, ‘linear’].

Returns

the dataset file path.

secretflow.utils.simulation.datasets.load_iris(parts: Union[List[PYU], Dict[PYU, Union[float, Tuple]]], axis=0, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) Union[HDataFrame, VDataFrame][source]#

Load iris dataset to federated dataframe.

This dataset includes columns:
  1. sepal_length

  2. sepal_width

  3. petal_length

  4. petal_width

  5. class

This dataset originated from Iris.

Parameters
  • parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.

  • axis – optional; optional, the value is 0 or 1. 0 means split by row and returns a horizontal partitioning federated DataFrame. 1 means split by column returns a vertical partitioning federated DataFrame.

  • aggregator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

  • comparator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

Returns

return a HDataFrame if axis is 0 else VDataFrame.

secretflow.utils.simulation.datasets.load_dermatology(parts: Union[List[PYU], Dict[PYU, Union[float, Tuple]]], axis=0, class_starts_from_zero: bool = True, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) Union[HDataFrame, VDataFrame][source]#

Load dermatology dataset to federated dataframe.

This dataset consists of dermatology cancer diagnosis. For the original dataset please refer to Dermatology.

Parameters
  • parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.

  • axis – optional, the value could be 0 or 1. 0 means split by row and returns a horizontal partitioning federated DataFrame. 1 means split by column returns a vertical partitioning federated DataFrame.

  • class_starts_from_zero – optional, class starts from zero if True.

  • aggregator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

  • comparator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

Returns

return a HDataFrame if axis is 0 else VDataFrame.

secretflow.utils.simulation.datasets.load_bank_marketing(parts: Union[List[PYU], Dict[PYU, Union[float, Tuple]]], axis=0, full=False, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) Union[HDataFrame, VDataFrame][source]#

Load bank marketing dataset to federated dataframe.

This dataset is related with direct marketing campaigns. For the original dataset please refer to Bank marketing.

Parameters
  • parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.

  • axis – optional, the value is 0 or 1. 0 means split by row and returns a horizontal partitioning federated DataFrame. 1 means split by column returns a vertical partitioning federated DataFrame.

  • full – optional. indicates whether to load to full version of dataset.

  • aggregator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

  • comparator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

Returns

return a HDataFrame if axis is 0 else VDataFrame.

secretflow.utils.simulation.datasets.load_mnist(parts: Union[List[PYU], Dict[PYU, Union[float, Tuple]]], normalized_x: bool = True, categorical_y: bool = False, is_torch: bool = False) Tuple[Tuple[FedNdarray, FedNdarray], Tuple[FedNdarray, FedNdarray]][source]#

Load mnist dataset to federated ndarrays.

This dataset has a training set of 60,000 examples, and a test set of 10,000 examples. Each example is a 28x28 grayscale image of the 10 digits. For the original dataset please refer to MNIST.

Parameters
  • parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.

  • normalized_x – optional, normalize x if True. Default to True.

  • categorical_y – optional, do one hot encoding to y if True. Default to True.

Returns

A tuple consists of two tuples, (x_train, y_train) and (x_train, y_train).

secretflow.utils.simulation.datasets.load_linear(parts: Union[List[PYU], Dict[PYU, Union[float, Tuple]]]) VDataFrame[source]#

Load the linear dataset to federated dataframe.

This dataset is random generated and includes columns:
  1. id

  2. 20 features: [x1, x2, x3, …, x19, x20]

  3. y

Parameters

parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.

Returns

return a VDataFrame.

secretflow.utils.simulation.datasets.load_cora(parts: List[PYU], data_dir: Optional[str] = None, add_self_loop: bool = True) Tuple[FedNdarray, FedNdarray, FedNdarray, FedNdarray, FedNdarray, FedNdarray, FedNdarray, FedNdarray][source]#

Load the cora dataset for split learning GNN.

Parameters

parts (List[PYU]) – parties that the paper features will be partitioned evenly.

Returns

edge, x, Y_train, Y_val, Y_valid, index_train, index_val, index_test. Note that Y is bound to the first participant.

Return type

A tuple of FedNdarray

secretflow.utils.simulation.tf_gnn_model module#

Classes:

GraphAttention(*args, **kwargs)

ServerNet(*args, **kwargs)

class secretflow.utils.simulation.tf_gnn_model.GraphAttention(*args, **kwargs)[source]#

Bases: Layer

Methods:

__init__(F_[, attn_heads, ...])

build(input_shape)

Creates the variables of the layer (optional, for subclass implementers).

call(inputs)

This is where the layer's logic lives.

compute_output_shape(input_shape)

Computes the output shape of the layer.

get_config()

Returns the config of the layer.

__init__(F_, attn_heads=1, attn_heads_reduction='average', dropout_rate=0.5, activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', attn_kernel_initializer='glorot_uniform', kernel_regularizer=None, bias_regularizer=None, attn_kernel_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, attn_kernel_constraint=None, **kwargs)[source]#
build(input_shape)[source]#

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]#

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Parameters
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns

A tensor or list/tuple of tensors.

compute_output_shape(input_shape)[source]#

Computes the output shape of the layer.

This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.

Parameters

input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

Returns

An input shape tuple.

get_config()[source]#

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns

Python dictionary.

class secretflow.utils.simulation.tf_gnn_model.ServerNet(*args, **kwargs)[source]#

Bases: Layer

Methods:

__init__(in_channel, hidden_size, num_layer, ...)

build(input_shape)

Creates the variables of the layer (optional, for subclass implementers).

call(inputs)

This is where the layer's logic lives.

compute_output_shape(input_shape)

Computes the output shape of the layer.

get_config()

Returns the config of the layer.

__init__(in_channel: int, hidden_size: int, num_layer: int, num_class: int, dropout: float, **kwargs)[source]#
build(input_shape)[source]#

Creates the variables of the layer (optional, for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters

input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]#

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state in __init__(), or the build() method that is called automatically before call() executes the first time.

Parameters
  • inputs

    Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

    arguments, and inputs cannot be provided via the default value of a keyword argument.

    • NumPy array or Python scalar values in inputs get cast as tensors.

    • Keras mask metadata is only collected from inputs.

    • Layers are built (build(input_shape) method) using shape info from inputs only.

    • input_spec compatibility is only checked against inputs.

    • Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.

    • The SavedModel input specification is generated using inputs only.

    • Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.

  • *args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.

  • **kwargs

    Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

    whether the call is meant for training or inference.

    • mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns

A tensor or list/tuple of tensors.

compute_output_shape(input_shape)[source]#

Computes the output shape of the layer.

This method will cause the layer’s state to be built, if that has not happened before. This requires that the layer will later be used with inputs that match the input shape provided here.

Parameters

input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.

Returns

An input shape tuple.

get_config()[source]#

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns

Python dictionary.

Module contents#