secretflow.ml.linear.ss_sgd package#

Submodules#

secretflow.ml.linear.ss_sgd.model module#

Classes:

`Penalty`(value)	An enumeration.
`SSRegression`(spu)	This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver.

class secretflow.ml.linear.ss_sgd.model.Penalty(value)[source]#

Bases: Enum

An enumeration.

Attributes:

`NONE`
`L1`
`L2`

NONE = 'None'#

L1 = 'l1'#

L2 = 'l2'#

class secretflow.ml.linear.ss_sgd.model.SSRegression(spu: SPU)[source]#

Bases: object

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. SS-SGD is short for secret sharing SGD training.

more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent

Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression

Logistic regression, despite its name, is a linear model for classification rather than regression. logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. This method can fit binary regularization with optional L2 regularization.

more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.

More detail for SPU: https://spu.readthedocs.io/en/beta/

This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.

Parameters: spu – secure device.

Notes

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

`__init__`(spu)
`fit`(x, y, epochs[, learning_rate, ...])	Fit the model according to the given training data.
`save_model`()	Save fit model in LinearModel format.
`load_model`(m)	Load LinearModel format model.
`predict`(x[, batch_size, to_pyu])	Predict using the model.

__init__(spu: SPU) → None[source]#

fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], epochs: int, learning_rate: float = 0.1, batch_size: int = 1024, sig_type: str = 't1', reg_type: str = 'logistic', penalty: str = 'None', l2_norm: float = 0.5) → None[source]#

Fit the model according to the given training data.

Parameters

x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.
epochs – int iteration rounds.
learning_rate – float, default=0.1 controls how much to change the model in one epoch.
batch_size – int, default=1024 how many samples use in one calculation.
sig_type – str, default=t1 sigmoid approximation type.
reg_type – str, default=logistic Linear or Logistic regression.
penalty – str, default=None The penalty (aka regularization term) to be used.
l2_norm – float, default=0.5 L2 regularization term.

Returns

Final weights in SPUObject.

save_model() → LinearModel[source]#: Save fit model in LinearModel format.

load_model(m: LinearModel) → None[source]#: Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) → Union[SPUObject, FedNdarray][source]#

Predict using the model.

Parameters

x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.
batch_size – int, default=1024 how many samples use in one calculation.
to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

Returns

pred scores in SPUObject or FedNdarray, shape (n_samples,)

Module contents#

Classes:

SSRegression(spu)

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver.

class secretflow.ml.linear.ss_sgd.SSRegression(spu: SPU)[source]#

Bases: object

more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent

more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression

more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.

More detail for SPU: https://spu.readthedocs.io/en/beta/

This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.

Parameters: spu – secure device.

Notes

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

`__init__`(spu)
`fit`(x, y, epochs[, learning_rate, ...])	Fit the model according to the given training data.
`save_model`()	Save fit model in LinearModel format.
`load_model`(m)	Load LinearModel format model.
`predict`(x[, batch_size, to_pyu])	Predict using the model.

__init__(spu: SPU) → None[source]#

Fit the model according to the given training data.

Parameters

x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.
epochs – int iteration rounds.
learning_rate – float, default=0.1 controls how much to change the model in one epoch.
batch_size – int, default=1024 how many samples use in one calculation.
sig_type – str, default=t1 sigmoid approximation type.
reg_type – str, default=logistic Linear or Logistic regression.
penalty – str, default=None The penalty (aka regularization term) to be used.
l2_norm – float, default=0.5 L2 regularization term.

Returns

Final weights in SPUObject.

save_model() → LinearModel[source]#: Save fit model in LinearModel format.

load_model(m: LinearModel) → None[source]#: Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) → Union[SPUObject, FedNdarray][source]#

Predict using the model.

Parameters

x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.
batch_size – int, default=1024 how many samples use in one calculation.
to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

Returns

pred scores in SPUObject or FedNdarray, shape (n_samples,)