secretflow.data.mix package#

Submodules#

secretflow.data.mix.dataframe module#

Classes:

PartitionWay(value)

The partitioning.

MixDataFrame([partitions])

Mixed DataFrame consisting of HDataFrame/VDataFrame.

class secretflow.data.mix.dataframe.PartitionWay(value)[source]#

Bases: Enum

The partitioning. HORIZONTAL: horizontal partitioning. VERATICAL: vertical partitioning.

Attributes:

HORIZONTAL

VERTICAL

HORIZONTAL = 'horizontal'#
VERTICAL = 'vertical'#
class secretflow.data.mix.dataframe.MixDataFrame(partitions: Optional[Tuple[Union[HDataFrame, VDataFrame]]] = None)[source]#

Bases: object

Mixed DataFrame consisting of HDataFrame/VDataFrame.

MixDataFrame provides two perspectives based on how the data is partitioned. Let’s illustrate with an example, assuming the following partitions: alice_part0, alice_part1, bob, carol, dave_part0/dave_part1.

Among them, (alice_part0, bob, dave_part0) is aligned, (alice_part1, carol, dave_part1) is aligned.

col1

col2, col3

col4, col5

alice_part0

bob

dave_part0

alice_part1

carol

dave_part1

1. If horizontal partitioned(PartitionWay.HORIZONTAL), the perspective of the mixed DataFrame is as follows:

col1, col2, col3, col4, col5

alice_part0, bob, dave_part0

alice_part1, carol, dave_part1

2. If vertical partitioned(PartitionWay.VERTICAL), the perspective of the mixed DataFrame is as follows:

col1

col2, col3

col4, col5

alice_part0 alice_part1

bob carol

dave_part0 dave_part1

MixDataFrame has the following characteristics.

1. Multiple Partitions corresponding to a column can be provided by different parties or by the same party.

  1. The number of Partitions corresponding to each column is the same

  2. The number of aligned Partition samples is the same.

Attributes:

partitions

The blocks that make up a mixed DataFrame.

partition_way

Data partitioning.

values

dtypes

Returns the dtypes in the DataFrame.

columns

The column labels of the DataFrame.

shape

Returns a tuple representing the dimensionality of the DataFrame.

Methods:

mean(*args, **kwargs)

Returns the mean of the values over the requested axis.

min(*args, **kwargs)

Returns the min of the values over the requested axis.

max(*args, **kwargs)

Returns the max of the values over the requested axis.

count(*args, **kwargs)

Count non-NA cells for each column or row.

isna()

quantile([q, axis])

kurtosis(*args, **kwargs)

skew(*args, **kwargs)

sem(*args, **kwargs)

std(*args, **kwargs)

var(*args, **kwargs)

astype(dtype[, copy, errors])

Cast object to a specified dtype dtype.

copy()

Shallow copy of this dataframe.

drop([labels, axis, index, columns, level, ...])

Drop specified labels from rows or columns.

fillna([value, method, axis, inplace, ...])

Fill NA/NaN values using the specified method.

__init__([partitions])

partitions: Tuple[Union[HDataFrame, VDataFrame]] = None#

The blocks that make up a mixed DataFrame. Shall all be HDataFrame or VDataFrame, and shall not be mixed.

property partition_way: PartitionWay#

Data partitioning.

mean(*args, **kwargs) Series[source]#

Returns the mean of the values over the requested axis.

All arguments are same with pandas.DataFrame.mean().

Returns

pd.Series

min(*args, **kwargs) Series[source]#

Returns the min of the values over the requested axis.

All arguments are same with pandas.DataFrame.min().

Returns

pd.Series

max(*args, **kwargs) Series[source]#

Returns the max of the values over the requested axis.

All arguments are same with pandas.DataFrame.max().

Returns

pd.Series

count(*args, **kwargs) Series[source]#

Count non-NA cells for each column or row.

All arguments are same with pandas.DataFrame.count().

Returns

pd.Series

isna()[source]#
quantile(q=0.5, axis=0)[source]#
kurtosis(*args, **kwargs)[source]#
skew(*args, **kwargs)[source]#
sem(*args, **kwargs)[source]#
std(*args, **kwargs)[source]#
var(*args, **kwargs)[source]#
property values#
property dtypes: Series#

Returns the dtypes in the DataFrame.

Returns

the data type of each column.

Return type

pd.Series

astype(dtype, copy: bool = True, errors: str = 'raise')[source]#

Cast object to a specified dtype dtype.

All args are same as pandas.DataFrame.astype().

property columns#

The column labels of the DataFrame.

property shape: Tuple#

Returns a tuple representing the dimensionality of the DataFrame.

copy() MixDataFrame[source]#

Shallow copy of this dataframe.

Returns

MixDataFrame.

drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') Optional[MixDataFrame][source]#

Drop specified labels from rows or columns.

All arguments are same with pandas.DataFrame.drop().

Returns

MixDataFrame without the removed index or column labels or None if inplace=True.

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None) Optional[MixDataFrame][source]#

Fill NA/NaN values using the specified method.

All arguments are same with pandas.DataFrame.fillna().

Returns

MixDataFrame with missing values filled or None if inplace=True.

__init__(partitions: Optional[Tuple[Union[HDataFrame, VDataFrame]]] = None) None#

Module contents#

Classes:

MixDataFrame([partitions])

Mixed DataFrame consisting of HDataFrame/VDataFrame.

PartitionWay(value)

The partitioning.

class secretflow.data.mix.MixDataFrame(partitions: Optional[Tuple[Union[HDataFrame, VDataFrame]]] = None)[source]#

Bases: object

Mixed DataFrame consisting of HDataFrame/VDataFrame.

MixDataFrame provides two perspectives based on how the data is partitioned. Let’s illustrate with an example, assuming the following partitions: alice_part0, alice_part1, bob, carol, dave_part0/dave_part1.

Among them, (alice_part0, bob, dave_part0) is aligned, (alice_part1, carol, dave_part1) is aligned.

col1

col2, col3

col4, col5

alice_part0

bob

dave_part0

alice_part1

carol

dave_part1

1. If horizontal partitioned(PartitionWay.HORIZONTAL), the perspective of the mixed DataFrame is as follows:

col1, col2, col3, col4, col5

alice_part0, bob, dave_part0

alice_part1, carol, dave_part1

2. If vertical partitioned(PartitionWay.VERTICAL), the perspective of the mixed DataFrame is as follows:

col1

col2, col3

col4, col5

alice_part0 alice_part1

bob carol

dave_part0 dave_part1

MixDataFrame has the following characteristics.

1. Multiple Partitions corresponding to a column can be provided by different parties or by the same party.

  1. The number of Partitions corresponding to each column is the same

  2. The number of aligned Partition samples is the same.

Attributes:

partitions

The blocks that make up a mixed DataFrame.

partition_way

Data partitioning.

values

dtypes

Returns the dtypes in the DataFrame.

columns

The column labels of the DataFrame.

shape

Returns a tuple representing the dimensionality of the DataFrame.

Methods:

mean(*args, **kwargs)

Returns the mean of the values over the requested axis.

min(*args, **kwargs)

Returns the min of the values over the requested axis.

max(*args, **kwargs)

Returns the max of the values over the requested axis.

count(*args, **kwargs)

Count non-NA cells for each column or row.

isna()

quantile([q, axis])

kurtosis(*args, **kwargs)

skew(*args, **kwargs)

sem(*args, **kwargs)

std(*args, **kwargs)

var(*args, **kwargs)

astype(dtype[, copy, errors])

Cast object to a specified dtype dtype.

copy()

Shallow copy of this dataframe.

drop([labels, axis, index, columns, level, ...])

Drop specified labels from rows or columns.

fillna([value, method, axis, inplace, ...])

Fill NA/NaN values using the specified method.

__init__([partitions])

partitions: Tuple[Union[HDataFrame, VDataFrame]] = None#

The blocks that make up a mixed DataFrame. Shall all be HDataFrame or VDataFrame, and shall not be mixed.

property partition_way: PartitionWay#

Data partitioning.

mean(*args, **kwargs) Series[source]#

Returns the mean of the values over the requested axis.

All arguments are same with pandas.DataFrame.mean().

Returns

pd.Series

min(*args, **kwargs) Series[source]#

Returns the min of the values over the requested axis.

All arguments are same with pandas.DataFrame.min().

Returns

pd.Series

max(*args, **kwargs) Series[source]#

Returns the max of the values over the requested axis.

All arguments are same with pandas.DataFrame.max().

Returns

pd.Series

count(*args, **kwargs) Series[source]#

Count non-NA cells for each column or row.

All arguments are same with pandas.DataFrame.count().

Returns

pd.Series

isna()[source]#
quantile(q=0.5, axis=0)[source]#
kurtosis(*args, **kwargs)[source]#
skew(*args, **kwargs)[source]#
sem(*args, **kwargs)[source]#
std(*args, **kwargs)[source]#
var(*args, **kwargs)[source]#
property values#
property dtypes: Series#

Returns the dtypes in the DataFrame.

Returns

the data type of each column.

Return type

pd.Series

astype(dtype, copy: bool = True, errors: str = 'raise')[source]#

Cast object to a specified dtype dtype.

All args are same as pandas.DataFrame.astype().

property columns#

The column labels of the DataFrame.

property shape: Tuple#

Returns a tuple representing the dimensionality of the DataFrame.

copy() MixDataFrame[source]#

Shallow copy of this dataframe.

Returns

MixDataFrame.

drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') Optional[MixDataFrame][source]#

Drop specified labels from rows or columns.

All arguments are same with pandas.DataFrame.drop().

Returns

MixDataFrame without the removed index or column labels or None if inplace=True.

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None) Optional[MixDataFrame][source]#

Fill NA/NaN values using the specified method.

All arguments are same with pandas.DataFrame.fillna().

Returns

MixDataFrame with missing values filled or None if inplace=True.

__init__(partitions: Optional[Tuple[Union[HDataFrame, VDataFrame]]] = None) None#
class secretflow.data.mix.PartitionWay(value)[source]#

Bases: Enum

The partitioning. HORIZONTAL: horizontal partitioning. VERATICAL: vertical partitioning.

Attributes:

HORIZONTAL

VERTICAL

HORIZONTAL = 'horizontal'#
VERTICAL = 'vertical'#