secretflow.utils.simulation.data package#

Submodules#

secretflow.utils.simulation.data.dataframe module#

Functions:

`create_df`(source, parts[, axis, shuffle, ...])	Create a federated dataframe from a single data source.
`create_hdf`(source, parts[, shuffle, ...])	Create a HDataFrame from a single dataset source.
`create_vdf`(source, parts[, shuffle])	Create a VDataFrame from a single dataset source.

secretflow.utils.simulation.data.dataframe.create_df(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], axis: int = 0, shuffle: bool = False, random_state: Optional[int] = None, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) → Union[HDataFrame, VDataFrame][source]#

Create a federated dataframe from a single data source.

Parameters

source – the dataset source, shall be a file path or pandas.DataFrame or callable (shall returns a pandas.DataFrame).
parts – the data partitions. The dataset will be distributed as evenly as possible to each PYU if parts is a array of PYUs. If parts is a dict of pyu with value, the value shall be one of the followings: 1. a float 2. an interval in tuple closed on the left-side and open on the right-side.
axis – optional, the value is 0 or 1. 0 means split by row returning a horizontal partitioning federated DataFrame. 1 means split by column returning a vertical partitioning federated DataFrame.
shuffle – optional, if suffule the dataset before split.
random_state – optional, the random state for shuffle.
aggregator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.
comparator – optional, shall be provided only when axis is 0. For details, please refer to secretflow.data.horizontal.HDataFrame.

Returns

return a HDataFrame if axis is 0 else VDataFrame.

Return type

Union[HDataFrame, VDataFrame]

Examples

>>> df = pd.DataFrame({'f1': [1, 2, 3, 4], 'f3': [11, 12, 13, 14]})

>>> # Create a HDataFrame evenly.
>>> hdf = create_df(df, [alice, bob], axis=0)

>>> # Create a VDataFrame with a given percentage.
>>> vdf = create_df(df, {alice: 0.3, bob: 0.7}, axis=1)

>>> # Create a HDataFrame with a given index.
>>> hdf = create_df(df, {alice: (0, 1), bob: (1, 4)})

secretflow.utils.simulation.data.dataframe.create_hdf(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], shuffle: bool = False, aggregator: Optional[Aggregator] = None, comparator: Optional[Comparator] = None) → HDataFrame[source]#

Create a HDataFrame from a single dataset source.

Refer to create_df() for full documentation.

secretflow.utils.simulation.data.dataframe.create_vdf(source: Union[str, DataFrame, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], shuffle: bool = False) → VDataFrame[source]#

Create a VDataFrame from a single dataset source.

Refer to create_df() for full documentation.

secretflow.utils.simulation.data.ndarray module#

Functions:

create_ndarray(source, parts[, axis, ...])

Create a federated ndarray from a single data source.

secretflow.utils.simulation.data.ndarray.create_ndarray(source: Union[str, ndarray, Callable], parts: Union[List[PYU], Dict[PYU, Union[float, tuple]]], axis: int = 0, shuffle: bool = False, random_state: Optional[int] = None, allow_pickle: bool = False, is_torch: bool = False) → FedNdarray[source]#