secretflow.ml.linear package#
Subpackages#
Submodules#
secretflow.ml.linear.fl_lr_mix module#
Classes:
SGD based logistic regression for mix partitioned data. |
- class secretflow.ml.linear.fl_lr_mix.FlLogisticRegressionMix[source]#
Bases:
object
SGD based logistic regression for mix partitioned data.
The following is an example to illustrate the algorithm.
Suppose alice has features and label, while bob/carol/dave have features only.
The perspective of MixDataFrame X is as follows:
X
VDataFrame_0
alice_x0
bob_x
dave_x0
VDataFrame_1
alice_x1
carol_x
dave_x1
The perspective of MixDataFrame Y is as follows:
Y
VDataFrame_0
alice_y0
VDataFrame_1
alice_y1
When fitted with the X and Y, two
FlLogisticRegressionVertical
instances are constructed. The first one will be fitted with VDataFrame_0 of X and Y, while the second one will be fitted with VDataFrame_1 of X and Y,.The main steps of one epoch are:
The FlLogisticRegressionVertical are fitted with the VDataFrame of X and Y respectly.
Aggregate \({\theta}\) of the FlLogisticRegressionVertical with
SecureAggregator
.Send aggregated \({\theta}\) to the FlLogisticRegressionVertical.
Methods:
fit
(x, y, batch_size, epochs, aggregators, heus)Fit the model.
predict
(x)Predict the score.
- fit(x: MixDataFrame, y: MixDataFrame, batch_size: int, epochs: int, aggregators: List[Aggregator], heus: List[HEU], fxp_bits: Optional[int] = 18, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1, agg_epochs: Optional[int] = 1, audit_log_dir: Optional[Dict[PYU, str]] = None)[source]#
Fit the model.
- Parameters
x – trainning vector. X should be a horizontal partitioned
MixDataFrame
, which consists of :py:class:`~secretflow.data.vertical.VDataFrame`s.y – target vector relative to x. Y should be a horizontal partitioned
MixDataFrame
alos. X and y should have the same amount of `VDataFrame`s.batch_size – number of samples per gradient update.
epochs – number of epochs to train the model.
aggregators – aggregator used to compute vertical lr. Amount of aggregators should be same as the VDataFrame of X.
heus – a list of heu used to compute vertical lr. Amount of heus should be same as the VDataFrame of X.
fxp_bits – the fraction bit length for encoding before sending to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).
tol – optional, tolerance for stopping criteria. Defaults to 1e-4.
learning_rate – optional, learning rate. Defaults to 0.1.
agg_epochs – aggregate weights for every {agg_epochs} epochs. Defaults to 1.
audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.
- predict(x: MixDataFrame) List[PYUObject] [source]#
Predict the score.
- Parameters
x – the samples to predict.
- Returns
a list of PYUObjects holding prediction results.
secretflow.ml.linear.fl_lr_v module#
Classes:
alias of |
|
|
Vertical logistic regression. |
- class secretflow.ml.linear.fl_lr_v.FlLrVWorker[source]#
Bases:
object
Methods:
init_train_data
(x, batch_size, epochs[, y, ...])Initialize the training data.
Get next batch of X and y.
compute_mul
(x_batch)Compute Xi*Wi.
predict
(mul)Do prediction.
compute_loss
(y, h, avg_flag)compute_residual
(y, h)encode
(data, frac_bits)decode
(data, frac_bits)generate_rand_mask
(decode_frac)set_weight
(w)update_weight
(masked_gradient, learning_rate)update_weight_agg
(x_batch, residual, ...)- init_train_data(x: Union[DataFrame, ndarray], batch_size: int, epochs: int, y: Optional[Union[DataFrame, ndarray]] = None, shuffle_seed: Optional[int] = None)[source]#
Initialize the training data.
- Parameters
x – the training vector.
batch_size – number of samples per gradient update.
epochs – number of epochs to train the model.
y – optional; the target vector relative to x.
shuffle_seed – optional; the data will be shuffled if not none.
- next_batch() Tuple[ndarray, ndarray] [source]#
Get next batch of X and y.
- Returns
A tuple of (x batch, y batch), while y batch is None if no y.
- secretflow.ml.linear.fl_lr_v.PYUFlLrVWorker[source]#
alias of
ActorProxy(PYUFlLrVWorker)
Methods:__init__
(*args, **kwargs)Abstraction device object base class.
compute_loss
(y, h, avg_flag, *[, _ray_trace_ctx])compute_mul
(x_batch, *[, _ray_trace_ctx])Compute Xi*Wi.
compute_residual
(y, h, *[, _ray_trace_ctx])decode
(data, frac_bits, *[, _ray_trace_ctx])encode
(data, frac_bits, *[, _ray_trace_ctx])generate_rand_mask
(decode_frac, *[, ...])get_weight
(*[, _ray_trace_ctx])init_train_data
(x, batch_size, epochs[, y, ...])Initialize the training data.
next_batch
(*[, _ray_trace_ctx])Get next batch of X and y.
predict
(mul, *[, _ray_trace_ctx])Do prediction.
set_weight
(w, *[, _ray_trace_ctx])update_weight
(masked_gradient, learning_rate, *)update_weight_agg
(x_batch, residual, ...[, ...])
- class secretflow.ml.linear.fl_lr_v.FlLogisticRegressionVertical(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[source]#
Bases:
object
Vertical logistic regression.
Implement the basic SGD based logistic regression among multiple vertical participants.
To explain this algorithm, suppose alice has features and label, while bob and charlie have features only. The main steps of SGD are:
Alice does prediction using secure aggregation.
Alice sends the residual to bob/charlie in HE(Homomorphic Encryption) ciphertext.
Bob and charlie compute gradients in HE ciphertext and send masked gradients to alice.
Alice decrypts the masked gradients and send them back to bob/charlie.
Bob and charlie unmask gradients and update their weights independently.
Alice updates its weights also.
Methods:
__init__
(devices, aggregator, heu[, ...])Init VanillaVerLogisticRegression.
init_train_data
(x, y, epochs, batch_size[, ...])predict
(x)Predict the score.
compute_loss
(x, y[, avg_flag])Compute the loss.
Get weight from this estimator.
set_weight
(weight)Set weight to this estimator.
fit
(x, y, batch_size, epochs[, tol, ...])Fit the model.
fit_in_steps
(n_step, learning_rate, epoch)Fit in steps.
- __init__(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[source]#
Init VanillaVerLogisticRegression.
- Parameters
devices – a list of PYU devices taking part in the computation.
aggregator – the aggregator instance.
heu – the heu device instance.
fxp_bits – the fraction bit length for encoding before send to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).
audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.
- init_train_data(x: FedNdarray, y: FedNdarray, epochs: int, batch_size: int, shuffle_seed: Optional[int] = None)[source]#
- predict(x: Union[VDataFrame, FedNdarray, List[PYUObject]]) PYUObject [source]#
Predict the score.
- Parameters
x – the samples to predict.
- Returns
a PYUObject holds prediction results.
- Return type
- compute_loss(x: FedNdarray, y: FedNdarray, avg_flag: Optional[bool] = True) PYUObject [source]#
Compute the loss.
- Parameters
x – the samples.
y – the label.
avg_flag – whether dividing the sample number. Defaults to True.
- Returns
a PYUObject holds loss value.
- Return type
- get_weight() Dict[PYU, PYUObject] [source]#
Get weight from this estimator.
- Returns
A dict of pyu and its weight. Note that the intecept(w0) is the first column of the label deivce weight.
- set_weight(weight: Dict[PYU, Union[PYUObject, ndarray]])[source]#
Set weight to this estimator.
- Parameters
weight – a dict of pyu and its weight.
- fit(x: Union[VDataFrame, FedNdarray], y: Union[VDataFrame, FedNdarray], batch_size: int, epochs: int, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1)[source]#
Fit the model.
- Parameters
x – trainning vector.
y – target vector relative to x.
batch_size – number of samples per gradient update.
epochs – number of epochs to train the model.
tol – optional, tolerance for stopping criteria. Defaults to 1e-4.
learning_rate – optional, learning rate. Defaults to 0.1.
secretflow.ml.linear.linear_model module#
Classes:
|
An enumeration. |
|
Unified linear regression model. |
- class secretflow.ml.linear.linear_model.RegType(value)[source]#
Bases:
Enum
An enumeration.
Attributes:
- Linear = 'linear'#
- Logistic = 'logistic'#
- class secretflow.ml.linear.linear_model.LinearModel(weights: Union[SPUObject, List[PYUObject]], reg_type: RegType, sig_type: SigType)[source]#
Bases:
object
Unified linear regression model.
- weights#
{SPUObject, List[PYUObject]} for mpc lr, use SPUObject save all weights; for fl lr, use list of PYUObject.
- Type
Union[secretflow.device.device.spu.SPUObject, List[secretflow.device.device.pyu.PYUObject]]
- reg_type#
RegType linear regression or logistic regression model.
- sig_type#
SigType which sigmoid approximation should use, only use in mpc lr.
Attributes:
Methods:
__init__
(weights, reg_type, sig_type)
Module contents#
Classes:
SGD based logistic regression for mix partitioned data. |
|
|
Vertical logistic regression. |
|
This method provides logistic regression linear models for vertical split dataset setting by using secret sharing and homomorphic encryption with mini batch SGD training solver. |
|
This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. |
|
Unified linear regression model. |
|
An enumeration. |
- class secretflow.ml.linear.FlLogisticRegressionMix[source]#
Bases:
object
SGD based logistic regression for mix partitioned data.
The following is an example to illustrate the algorithm.
Suppose alice has features and label, while bob/carol/dave have features only.
The perspective of MixDataFrame X is as follows:
X
VDataFrame_0
alice_x0
bob_x
dave_x0
VDataFrame_1
alice_x1
carol_x
dave_x1
The perspective of MixDataFrame Y is as follows:
Y
VDataFrame_0
alice_y0
VDataFrame_1
alice_y1
When fitted with the X and Y, two
FlLogisticRegressionVertical
instances are constructed. The first one will be fitted with VDataFrame_0 of X and Y, while the second one will be fitted with VDataFrame_1 of X and Y,.The main steps of one epoch are:
The FlLogisticRegressionVertical are fitted with the VDataFrame of X and Y respectly.
Aggregate \({\theta}\) of the FlLogisticRegressionVertical with
SecureAggregator
.Send aggregated \({\theta}\) to the FlLogisticRegressionVertical.
Methods:
fit
(x, y, batch_size, epochs, aggregators, heus)Fit the model.
predict
(x)Predict the score.
- fit(x: MixDataFrame, y: MixDataFrame, batch_size: int, epochs: int, aggregators: List[Aggregator], heus: List[HEU], fxp_bits: Optional[int] = 18, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1, agg_epochs: Optional[int] = 1, audit_log_dir: Optional[Dict[PYU, str]] = None)[source]#
Fit the model.
- Parameters
x – trainning vector. X should be a horizontal partitioned
MixDataFrame
, which consists of :py:class:`~secretflow.data.vertical.VDataFrame`s.y – target vector relative to x. Y should be a horizontal partitioned
MixDataFrame
alos. X and y should have the same amount of `VDataFrame`s.batch_size – number of samples per gradient update.
epochs – number of epochs to train the model.
aggregators – aggregator used to compute vertical lr. Amount of aggregators should be same as the VDataFrame of X.
heus – a list of heu used to compute vertical lr. Amount of heus should be same as the VDataFrame of X.
fxp_bits – the fraction bit length for encoding before sending to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).
tol – optional, tolerance for stopping criteria. Defaults to 1e-4.
learning_rate – optional, learning rate. Defaults to 0.1.
agg_epochs – aggregate weights for every {agg_epochs} epochs. Defaults to 1.
audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.
- predict(x: MixDataFrame) List[PYUObject] [source]#
Predict the score.
- Parameters
x – the samples to predict.
- Returns
a list of PYUObjects holding prediction results.
- class secretflow.ml.linear.FlLogisticRegressionVertical(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[source]#
Bases:
object
Vertical logistic regression.
Implement the basic SGD based logistic regression among multiple vertical participants.
To explain this algorithm, suppose alice has features and label, while bob and charlie have features only. The main steps of SGD are:
Alice does prediction using secure aggregation.
Alice sends the residual to bob/charlie in HE(Homomorphic Encryption) ciphertext.
Bob and charlie compute gradients in HE ciphertext and send masked gradients to alice.
Alice decrypts the masked gradients and send them back to bob/charlie.
Bob and charlie unmask gradients and update their weights independently.
Alice updates its weights also.
Methods:
__init__
(devices, aggregator, heu[, ...])Init VanillaVerLogisticRegression.
init_train_data
(x, y, epochs, batch_size[, ...])predict
(x)Predict the score.
compute_loss
(x, y[, avg_flag])Compute the loss.
Get weight from this estimator.
set_weight
(weight)Set weight to this estimator.
fit
(x, y, batch_size, epochs[, tol, ...])Fit the model.
fit_in_steps
(n_step, learning_rate, epoch)Fit in steps.
- __init__(devices: List[PYU], aggregator: Aggregator, heu: HEU, fxp_bits: Optional[int] = 18, audit_log_dir: Optional[Dict[PYU, str]] = None)[source]#
Init VanillaVerLogisticRegression.
- Parameters
devices – a list of PYU devices taking part in the computation.
aggregator – the aggregator instance.
heu – the heu device instance.
fxp_bits – the fraction bit length for encoding before send to heu device. Defaults to spu_fxp_precision(spu.spu_pb2.FM64).
audit_log_dir – a dict specifying the audit log directory for each device. No audit log if is None. Default to None. Please leave it None unless you are very sure what the audit does and accept the risk.
- init_train_data(x: FedNdarray, y: FedNdarray, epochs: int, batch_size: int, shuffle_seed: Optional[int] = None)[source]#
- predict(x: Union[VDataFrame, FedNdarray, List[PYUObject]]) PYUObject [source]#
Predict the score.
- Parameters
x – the samples to predict.
- Returns
a PYUObject holds prediction results.
- Return type
- compute_loss(x: FedNdarray, y: FedNdarray, avg_flag: Optional[bool] = True) PYUObject [source]#
Compute the loss.
- Parameters
x – the samples.
y – the label.
avg_flag – whether dividing the sample number. Defaults to True.
- Returns
a PYUObject holds loss value.
- Return type
- get_weight() Dict[PYU, PYUObject] [source]#
Get weight from this estimator.
- Returns
A dict of pyu and its weight. Note that the intecept(w0) is the first column of the label deivce weight.
- set_weight(weight: Dict[PYU, Union[PYUObject, ndarray]])[source]#
Set weight to this estimator.
- Parameters
weight – a dict of pyu and its weight.
- fit(x: Union[VDataFrame, FedNdarray], y: Union[VDataFrame, FedNdarray], batch_size: int, epochs: int, tol: Optional[float] = 0.0001, learning_rate: Optional[float] = 0.1)[source]#
Fit the model.
- Parameters
x – trainning vector.
y – target vector relative to x.
batch_size – number of samples per gradient update.
epochs – number of epochs to train the model.
tol – optional, tolerance for stopping criteria. Defaults to 1e-4.
learning_rate – optional, learning rate. Defaults to 0.1.
- class secretflow.ml.linear.HESSLogisticRegression(spu: SPU, heu_x: HEU, heu_y: HEU)[source]#
Bases:
object
This method provides logistic regression linear models for vertical split dataset setting by using secret sharing and homomorphic encryption with mini batch SGD training solver. HESS-SGD is short for HE & secret sharing SGD training.
During the calculation process, the HEU is used to protect the weights and calculate the predicted y, and the SPU is used to calculate the sigmoid and gradient.
SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security. More detail: https://spu.readthedocs.io/en/beta/
HEU is a secure computing device that implementing HE encryption and decryption, and provides matrix operations similar to the numpy, reducing the threshold for use. More detail: https://heu.readthedocs.io/en/latest/
For more detail, please refer to paper in KDD’21: https://dl.acm.org/doi/10.1145/3447548.3467210
- Parameters
spu – SPU SPU device.
heu_x – HEU HEU device without label.
heu_y – HEU HEU device with label.
Notes
training dataset should be normalized or standardized, otherwise the SGD solver will not converge.
Methods:
__init__
(spu, heu_x, heu_y)fit
(x, y[, learning_rate, epochs, batch_size])Fit linear model with Stochastic Gradient Descent.
Save fit model in LinearModel format.
load_model
(m)Load LinearModel format model.
predict
(x)Probability estimates.
- fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], learning_rate=0.001, epochs=1, batch_size=None)[source]#
Fit linear model with Stochastic Gradient Descent.
- Parameters
x – {FedNdarray, VDataFrame} Input data, must be colocated with SPU.
y – {FedNdarray, VDataFrame} Target data, must be located on self._heu_y.
learning_rate – float, default=1e-3. Learning rate.
epochs – int, default=1 Number of epochs to train the model
batch_size – int, default=None Number of samples per gradient update. If None, batch_size will default to number of all samples.
- save_model() LinearModel [source]#
Save fit model in LinearModel format.
- load_model(m: LinearModel) None [source]#
Load LinearModel format model.
- predict(x: Union[FedNdarray, VDataFrame]) PYUObject [source]#
Probability estimates.
- Parameters
x – {FedNdarray, VDataFrame} Predict samples.
- Returns
probability of the sample for each class in the model.
- Return type
- class secretflow.ml.linear.SSRegression(spu: SPU)[source]#
Bases:
object
This method provides both linear and logistic regression linear models for vertical split dataset setting by using secret sharing with mini batch SGD training solver. SS-SGD is short for secret sharing SGD training.
more detail for SGD: https://stats.stackexchange.com/questions/488017/understanding-mini-batch-gradient-descent
Linear regression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
more detail for linear regression: https://en.wikipedia.org/wiki/Linear_regression
Logistic regression, despite its name, is a linear model for classification rather than regression. logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. This method can fit binary regularization with optional L2 regularization.
more detail for logistic regression: https://en.wikipedia.org/wiki/Logistic_regression
SPU is a verifiable and measurable secure computing device that running under various MPC protocols to provide provable security.
More detail for SPU: https://spu.readthedocs.io/en/beta/
This method protects the original dataset and the final model by secret sharing the dataset to SPU device and running model fit under SPU.
- Parameters
spu – secure device.
Notes
training dataset should be normalized or standardized, otherwise the SGD solver will not converge.
Methods:
__init__
(spu)fit
(x, y, epochs[, learning_rate, ...])Fit the model according to the given training data.
Save fit model in LinearModel format.
load_model
(m)Load LinearModel format model.
predict
(x[, batch_size, to_pyu])Predict using the model.
- fit(x: Union[FedNdarray, VDataFrame], y: Union[FedNdarray, VDataFrame], epochs: int, learning_rate: float = 0.1, batch_size: int = 1024, sig_type: str = 't1', reg_type: str = 'logistic', penalty: str = 'None', l2_norm: float = 0.5) None [source]#
Fit the model according to the given training data.
- Parameters
x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y – {FedNdarray, VDataFrame} of shape (n_samples,) Target vector relative to X.
epochs – int iteration rounds.
learning_rate – float, default=0.1 controls how much to change the model in one epoch.
batch_size – int, default=1024 how many samples use in one calculation.
sig_type – str, default=t1 sigmoid approximation type.
reg_type – str, default=logistic Linear or Logistic regression.
penalty – str, default=None The penalty (aka regularization term) to be used.
l2_norm – float, default=0.5 L2 regularization term.
- Returns
Final weights in SPUObject.
- save_model() LinearModel [source]#
Save fit model in LinearModel format.
- load_model(m: LinearModel) None [source]#
Load LinearModel format model.
- predict(x: Union[FedNdarray, VDataFrame], batch_size: int = 1024, to_pyu: Optional[PYU] = None) Union[SPUObject, FedNdarray] [source]#
Predict using the model.
- Parameters
x – {FedNdarray, VDataFrame} of shape (n_samples, n_features) Predict samples.
batch_size – int, default=1024 how many samples use in one calculation.
to – the prediction initiator if not None predict result is reveal to to_pyu device and save as FedNdarray otherwise, keep predict result in secret and save as SPUObject.
- Returns
pred scores in SPUObject or FedNdarray, shape (n_samples,)
- class secretflow.ml.linear.LinearModel(weights: Union[SPUObject, List[PYUObject]], reg_type: RegType, sig_type: SigType)[source]#
Bases:
object
Unified linear regression model.
- weights#
{SPUObject, List[PYUObject]} for mpc lr, use SPUObject save all weights; for fl lr, use list of PYUObject.
- Type
Union[secretflow.device.device.spu.SPUObject, List[secretflow.device.device.pyu.PYUObject]]
- reg_type#
RegType linear regression or logistic regression model.
- sig_type#
SigType which sigmoid approximation should use, only use in mpc lr.
Attributes:
Methods:
__init__
(weights, reg_type, sig_type)