secretflow.ml.linear.ss_sgd package#

Submodules#

secretflow.ml.linear.ss_sgd.model module#

Classes:

Penalty(value)

An enumeration.

SSRegression(spu)

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver.

class secretflow.ml.linear.ss_sgd.model.Penalty(value)[source]#

Bases: Enum

An enumeration.

Attributes:

NONE

L1

L2

NONE = 'None'#
L1 = 'l1'#
L2 = 'l2'#
class secretflow.ml.linear.ss_sgd.model.SSRegression(spu: SPU)[source]#

Bases: object

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. SS-SGD is short for secret sharing SGD training.

more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent

Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression

Logistic regression, despite its name, is a linear model for classification rather than regression. logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. This method can fit binary regularization with optional L2 regularization.

more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.

More detail for SPU: https://spu.readthedocs.io/en/beta/

This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.

Parameters

spu – secure device.

Notes

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

__init__(spu)

fit(x, y, epochs[, learning_rate, ...])

Fit the model according to the given training data.

save_model()

Save fit model in LinearModel format.

load_model(m)

Load LinearModel format model.

predict(x[, batch_size, to_pyu])

Predict using the model.

__init__(spu: SPU) None[source]#
fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], epochs: int, learning_rate: float = 0.1, batch_size: int = 1024, sig_type: str = 't1', reg_type: str = 'logistic', penalty: str = 'None', l2_norm: float = 0.5) None[source]#

Fit the model according to the given training data.

Parameters
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.

  • epochs – int iteration rounds.

  • learning_rate – float, default=0.1 controls how much to change the model in one epoch.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • sig_type – str, default=t1 sigmoid approximation type.

  • reg_type – str, default=logistic Linear or Logistic regression.

  • penalty – str, default=None The penalty (aka regularization term) to be used.

  • l2_norm – float, default=0.5 L2 regularization term.

Returns

Final weights in SPUObject.

save_model() LinearModel[source]#

Save fit model in LinearModel format.

load_model(m: LinearModel) None[source]#

Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) Union[SPUObject, FedNdarray][source]#

Predict using the model.

Parameters
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

Returns

pred scores in SPUObject or FedNdarray, shape (n_samples,)

Module contents#

Classes:

SSRegression(spu)

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver.

class secretflow.ml.linear.ss_sgd.SSRegression(spu: SPU)[source]#

Bases: object

This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. SS-SGD is short for secret sharing SGD training.

more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent

Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression

Logistic regression, despite its name, is a linear model for classification rather than regression. logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. This method can fit binary regularization with optional L2 regularization.

more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.

More detail for SPU: https://spu.readthedocs.io/en/beta/

This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.

Parameters

spu – secure device.

Notes

training dataset should be normalized or standardized, otherwise the SGD solver will not converge.

Methods:

__init__(spu)

fit(x, y, epochs[, learning_rate, ...])

Fit the model according to the given training data.

save_model()

Save fit model in LinearModel format.

load_model(m)

Load LinearModel format model.

predict(x[, batch_size, to_pyu])

Predict using the model.

__init__(spu: SPU) None[source]#
fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], epochs: int, learning_rate: float = 0.1, batch_size: int = 1024, sig_type: str = 't1', reg_type: str = 'logistic', penalty: str = 'None', l2_norm: float = 0.5) None[source]#

Fit the model according to the given training data.

Parameters
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.

  • epochs – int iteration rounds.

  • learning_rate – float, default=0.1 controls how much to change the model in one epoch.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • sig_type – str, default=t1 sigmoid approximation type.

  • reg_type – str, default=logistic Linear or Logistic regression.

  • penalty – str, default=None The penalty (aka regularization term) to be used.

  • l2_norm – float, default=0.5 L2 regularization term.

Returns

Final weights in SPUObject.

save_model() LinearModel[source]#

Save fit model in LinearModel format.

load_model(m: LinearModel) None[source]#

Load LinearModel format model.

predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) Union[SPUObject, FedNdarray][source]#

Predict using the model.

Parameters
  • x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.

  • batch_size – int, default=1024 how many samples use in one calculation.

  • to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.

Returns

pred scores in SPUObject or FedNdarray, shape (n_samples,)