secretflow.utils.simulation.data#

secretflow.utils.simulation.data.dataframe#

Functions:

create_df(source, parts[, axis, shuffle, ...])

Create a federated dataframe from a single data source.

create_hdf(source, parts[, shuffle, ...])

Create a HDataFrame from a single dataset source.

create_vdf(source, parts[, shuffle])

Create a VDataFrame from a single dataset source.

secretflow.utils.simulation.data.dataframe.create_df(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], axis: int = 0, shuffle: bool = False, random_state: Optional[int] = None, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) Union[HDataFrame, VDataFrame][source]#

Create a federated dataframe from a single data source.

Parameters
  • source – the dataset source, shall be a file path or pandas.DataFrame or callable (shall returns a pandas.DataFrame).

  • parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict of pyu with value, the value shall be one of the followings: 1. a float 2. an interval in tuple closed on the left-side and open on the right-side.

  • axis – optional, the value is 0 or 1. 0 means split by row returning a horizontal partitioning federated DataFrame. 1 means split by column returning a vertical partitioning federated DataFrame.

  • shuffle – optional, if suffule the dataset before split.

  • random_state – optional, the random state for shuffle.

  • aggregator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

  • comparator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

Returns

return a HDataFrame if axis is 0 else VDataFrame.

Return type

Union[HDataFrame, VDataFrame]

Examples

>>> df = pd.DataFrame({'f1': [1, 2, 3, 4], 'f3': [11, 12, 13, 14]})
>>> # Create a HDataFrame evenly.
>>> hdf = create_df(df, [alice, bob], axis=0)
>>> # Create a VDataFrame with a given percentage.
>>> vdf = create_df(df, {alice: 0.3, bob: 0.7}, axis=1)
>>> # Create a HDataFrame with a given index.
>>> hdf = create_df(df, {alice: (0, 1), bob: (1, 4)})
secretflow.utils.simulation.data.dataframe.create_hdf(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], shuffle: bool = False, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) HDataFrame[source]#

Create a HDataFrame from a single dataset source.

Refer to create_df() for full documentation.

secretflow.utils.simulation.data.dataframe.create_vdf(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], shuffle: bool = False) VDataFrame[source]#

Create a VDataFrame from a single dataset source.

Refer to create_df() for full documentation.

secretflow.utils.simulation.data.ndarray#

Functions:

create_ndarray(source, parts[, axis, ...])

Create a federated ndarray from a single data source.

secretflow.utils.simulation.data.ndarray.create_ndarray(source: Union[str, ndarray, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], axis: int = 0, shuffle: bool = False, random_state: Optional[int] = None, allow_pickle: bool = False, is_torch: bool = False) FedNdarray[source]#

Create a federated ndarray from a single data source.

Parameters
  • source – the dataset source, shall be a file path or numpy.ndarray or callable (shall returns a pandas.DataFrame).

  • parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict {PYU: value}, the value shall be one of the followings. 1) a float 2) an interval in tuple closed on the left-side and open on the right-side.

  • axis – optional, the value is 0 or 1. 0 means split by row returning a horizontal partitioning federated DataFrame. 1 means split by column returning a vertical partitioning federated DataFrame.

  • shuffle – optional, if suffule the dataset before split.

  • random_state – optional, the random state for shuffle.

  • allow_pickle – the np.load argument when source is a file path.

Returns

a FedNdrray.

Examples

>>> arr = np.array([[1, 2, 3, 4], [11, 12, 13, 14]])
>>> # Create a horizontal partitioned FedNdarray evenly.
>>> h_arr = created_ndarray(arr, [alice, bob], axis=0)
>>> # Create a vertical partitioned FedNdarray.
>>> v_arr = created_ndarray(arr, {alice: 0.3, bob: 0.7}, axis=1)
>>> # Create a horizontal partitioned FedNdarray evenly.
>>> h_arr = created_ndarray(arr, {alice: (0, 1), bob: (1, 4)})