secretflow package#
Subpackages#
- secretflow.data package
- secretflow.device package
- secretflow.ml package
- secretflow.preprocessing package
- secretflow.security package
- secretflow.stats package
- Subpackages
- Submodules
- secretflow.stats.biclassification_eval module
- secretflow.stats.psi_eval module
- secretflow.stats.pva_eval module
- secretflow.stats.regression_eval module
- secretflow.stats.score_card module
- secretflow.stats.ss_pearsonr_v module
- secretflow.stats.ss_pvalue_v module
- secretflow.stats.ss_vif_v module
- secretflow.stats.table_statistics module
- Module contents
- secretflow.utils package
- Subpackages
- Submodules
- secretflow.utils.compressor module
- secretflow.utils.errors module
- secretflow.utils.hash module
- secretflow.utils.io module
- secretflow.utils.ndarray_bigint module
- secretflow.utils.ndarray_encoding module
- secretflow.utils.sigmoid module
- secretflow.utils.testing module
- Module contents
Module contents#
Classes:
|
Homomorphic encryption device |
|
PYU is the device doing computation in single domain. |
|
|
|
|
|
|
|
HEU Object |
|
PYU device object. |
|
Functions:
|
Connect to an existing Ray cluster or start one and connect to it. |
|
Define a device class which should accept DeviceObject as method parameters and return DeviceObject. |
|
Get plaintext data from device. |
|
Disconnect the worker, and terminate processes started by secretflow.init(). |
|
Device object conversion. |
|
Wait for device objects until all are ready or error occurrency. |
- class secretflow.HEU(config: dict, spu_field_type)[source]#
Bases:
Device
Homomorphic encryption device
Methods:
__init__
(config, spu_field_type)Initialize HEU
init
()get_participant
(party)Get ray actor by name
has_party
(party)- __init__(config: dict, spu_field_type)[source]#
Initialize HEU
- Parameters
config –
HEU init config, for example
{ 'sk_keeper': { 'party': 'alice' }, 'evaluators': [{ 'party': 'bob' }], # The HEU working mode, choose from PHEU / LHEU / FHEU_ROUGH / FHEU 'mode': 'PHEU', # TODO: cleartext_type should be migrated to HeObject. 'encoding': { # DT_I1, DT_I8, DT_I16, DT_I32, DT_I64 or DT_FXP (default) 'cleartext_type': "DT_FXP" # see https://heu.readthedocs.io/en/latest/getting_started/quick_start.html#id3 for detail # available encoders: # - IntegerEncoder: Plaintext = Cleartext * scale # - FloatEncoder (default): Plaintext = Cleartext * scale # - BigintEncoder: Plaintext = Cleartext # - BatchEncoder: Plaintext = Pack[Cleartext, Cleartext] 'encoder': 'FloatEncoder' } 'he_parameters': { 'schema': 'paillier', 'key_pair': { 'generate': { 'bit_size': 2048, }, } } }
spu_field_type – Field type in spu, Device.to operation requires the data scale of HEU to be aligned with SPU
- class secretflow.PYU(party: str, node: str = '')[source]#
Bases:
Device
PYU is the device doing computation in single domain.
Essentially PYU is a python worker who can execute any python code.
Methods:
__init__
(party[, node])PYU contructor.
- class secretflow.SPU(cluster_def: Dict, link_desc: Optional[Dict] = None, name: str = 'SPU')[source]#
Bases:
Device
Methods:
__init__
(cluster_def[, link_desc, name])SPU device constructor.
init
()Init SPU runtime in each party
reset
()Reset spu to clear corrupted internal state, for test only
psi_df
(key, dfs, receiver[, protocol, ...])Private set intersection with DataFrame.
psi_csv
(key, input_path, output_path, receiver)Private set intersection with csv file.
psi_join_df
(key, dfs, receiver, join_party)Private set intersection with csv file.
psi_join_csv
(key, input_path, output_path, ...)Private set intersection with csv file.
- __init__(cluster_def: Dict, link_desc: Optional[Dict] = None, name: str = 'SPU')[source]#
SPU device constructor.
- Parameters
cluster_def –
SPU cluster definition. More details refer to SPU runtime config.
For example
{ 'nodes': [ { 'party': 'alice', 'id': 'local:0', # The address for other peers. 'address': '127.0.0.1:9001', # The listen address of this node. # Optional. Address will be used if listen_address is empty. 'listen_address': '' }, { 'party': 'bob', 'id': 'local:1', 'address': '127.0.0.1:9002', 'listen_address': '' }, ], 'runtime_config': { 'protocol': spu.spu_pb2.SEMI2K, 'field': spu.spu_pb2.FM128, 'sigmoid_mode': spu.spu_pb2.RuntimeConfig.SIGMOID_REAL, } }
link_desc –
optional. A dict specifies the link parameters. Available parameters are:
connect_retry_times
connect_retry_interval_ms
recv_timeout_ms
http_max_payload_size
http_timeout_ms
throttle_window_size
brpc_channel_protocol refer to https://github.com/apache/incubator-brpc/blob/master/docs/en/client.md#protocols
brpc_channel_connection_type refer to https://github.com/apache/incubator-brpc/blob/master/docs/en/client.md#connection-type
- psi_df(key: Union[str, List[str], Dict[Device, List[str]]], dfs: List[PYUObject], receiver: str, protocol='KKRT_PSI_2PC', precheck_input=True, sort=True, broadcast_result=True, bucket_size=1048576, curve_type='CURVE_25519')[source]#
Private set intersection with DataFrame.
- Parameters
key (str, List[str], Dict[Device, List[str]]) – Column(s) used to join.
dfs (List[PYUObject]) – DataFrames to be joined, which
runtimes. (should be colocated with SPU) –
receiver (str) – Which party can get joined data, others will get None.
protocol (str) – PSI protocol.
precheck_input (bool) – Whether to check input data before join.
sort (bool) – Whether sort data by key after join.
broadcast_result (bool) – Whether to broadcast joined data to all parties.
bucket_size (int) – Specified the hash bucket size used in psi.
memory. (Larger values consume more) –
curve_type (str) – curve for ecdh psi
- Returns
Joined DataFrames with order reserved.
- Return type
List[PYUObject]
- psi_csv(key: Union[str, List[str], Dict[Device, List[str]]], input_path: Union[str, Dict[Device, str]], output_path: Union[str, Dict[Device, str]], receiver: str, protocol='KKRT_PSI_2PC', precheck_input=True, sort=True, broadcast_result=True, bucket_size=1048576, curve_type='CURVE_25519')[source]#
Private set intersection with csv file.
- Parameters
key (str, List[str], Dict[Device, List[str]]) – Column(s) used to join.
input_path – CSV files to be joined, comma seperated and contains header.
output_path – Joined csv files, comma seperated and contains header.
receiver (str) – Which party can get joined data.
-1. (Others won't generate output file and intersection_count get) –
protocol (str) – PSI protocol.
precheck_input (bool) – Whether check input data before joining,
now (for) –
duplicate. (it will check if key) –
sort (bool) – Whether sort data by key after joining.
broadcast_result (bool) – Whether broadcast joined data to all parties.
bucket_size (int) – Specified the hash bucket size used in psi.
memory. (Larger values consume more) –
- Returns
PSI reports output by SPU with order reserved.
- Return type
List[Dict]
- psi_join_df(key: Union[str, List[str], Dict[Device, List[str]]], dfs: List[PYUObject], receiver: str, join_party: str, protocol='KKRT_PSI_2PC', precheck_input=True, bucket_size=1048576, curve_type='CURVE_25519')[source]#
Private set intersection with csv file.
- Parameters
key (str, List[str], Dict[Device, List[str]]) – Column(s) used to join.
dfs (List[PYUObject]) – DataFrames to be joined, which should be colocated with SPU runtimes.
receiver (str) – Which party can get joined data. Others won’t generate output file and intersection_count get -1
join_party (str) – party can get joined data
protocol (str) – PSI protocol.
precheck_input (bool) – Whether check input data before joining, for now, it will check if key duplicate.
bucket_size (int) – Specified the hash bucket size used in psi. Larger values consume more memory.
curve_type (str) – curve for ecdh psi
- Returns
Joined DataFrames with order reserved.
- Return type
List[PYUObject]
- psi_join_csv(key: Union[str, List[str], Dict[Device, List[str]]], input_path: Union[str, Dict[Device, str]], output_path: Union[str, Dict[Device, str]], receiver: str, join_party: str, protocol='KKRT_PSI_2PC', precheck_input=True, bucket_size=1048576, curve_type='CURVE_25519')[source]#
Private set intersection with csv file.
- Parameters
key (str, List[str], Dict[Device, List[str]]) – Column(s) used to join.
input_path – CSV files to be joined, comma seperated and contains header.
output_path – Joined csv files, comma seperated and contains header.
receiver (str) – Which party can get joined data. Others won’t generate output file and intersection_count get -1
join_party (str) – party can get joined data
protocol (str) – PSI protocol.
precheck_input (bool) – Whether check input data before joining, for now, it will check if key duplicate.
bucket_size (int) – Specified the hash bucket size used in psi. Larger values consume more memory.
curve_type (str) – curve for ecdh psi
- Returns
PSI reports output by SPU with order reserved.
- Return type
List[Dict]
- class secretflow.Device(device_type: DeviceType)[source]#
Bases:
ABC
Methods:
__init__
(device_type)Abstraction device base class.
Attributes:
Get underlying device type
- __init__(device_type: DeviceType)[source]#
Abstraction device base class.
- Parameters
device_type (DeviceType) – underlying device type
- property device_type#
Get underlying device type
- class secretflow.DeviceObject(device: Device)[source]#
Bases:
ABC
Methods:
__init__
(device)Abstraction device object base class.
to
(device[, config])Device object conversion.
Attributes:
Get underlying device type
- __init__(device: Device)[source]#
Abstraction device object base class.
- Parameters
device (Device) – Device where this object is located.
- property device_type#
Get underlying device type
- to(device: Device, config: Optional[MoveConfig] = None)[source]#
Device object conversion.
- Parameters
device (Device) – Target device
config – configuration of this data movement
- Returns
Target device object.
- Return type
- class secretflow.HEUObject(device, data: ObjectRef, location_party: str, is_plain: bool = False)[source]#
Bases:
DeviceObject
HEU Object
- data#
The data hold by this Heu object
- location#
The party where the data actually resides
- is_plain#
Is the data encrypted or not
Methods:
__init__
(device, data, location_party[, ...])Abstraction device object base class.
encrypt
([heu_audit_log])Force encrypt if data is plaintext
sum
()Sum of HeObject elements over a given axis.
dump
(path)Dump ciphertext into files.
- class secretflow.PYUObject(device: PYU, data: ObjectRef)[source]#
Bases:
DeviceObject
PYU device object.
- data#
Reference to underlying data.
- Type
ray.ObjectRef
Methods:
__init__
(device, data)Abstraction device object base class.
- class secretflow.SPUObject(device: Device, meta: ObjectRef, shares: Sequence[ObjectRef])[source]#
Bases:
DeviceObject
Methods:
__init__
(device, meta, shares)SPUObject refers to a Python Object which could be flattened to a list of SPU Values.
- __init__(device: Device, meta: ObjectRef, shares: Sequence[ObjectRef])[source]#
SPUObject refers to a Python Object which could be flattened to a list of SPU Values. A SPU value is a Numpy array or equivalent. e.g.
1. If referred Python object is [1,2,3] Then meta would be referred to a single SPUValueMeta, and shares is a list of referrence to pieces of share of [1,2,3].
2. If referred Python object is {‘a’: 1, ‘b’: [3, np.array(…)]} The meta would be referred to something like {‘a’: SPUValueMeta1, ‘b’: [SPUValueMeta2, SPUValueMeta3]} Each element of shares would be referred to something like {‘a’: share1, ‘b’: [share2, share3]}
3. shares is a list of ObjectRef to share slices while these share slices are not necessarily located at SPU device. The data transfer would only happen when SPU device consumes SPU objects.
- Parameters
meta – Ref to the metadata.
refs (Sequence[ray.ObjectRef]) – Refs to shares of data.
- secretflow.init(parties: Optional[Union[str, List[str]]] = None, address: Optional[str] = None, num_cpus: Optional[int] = None, log_to_driver=False, omp_num_threads: Optional[int] = None, **kwargs)[source]#
Connect to an existing Ray cluster or start one and connect to it.
- Parameters
parties – parties this node represents, e.g: ‘alice’, [‘alice’, ‘bob’, ‘carol’].
address – The address of the Ray cluster to connect to. If this address is not provided, then a raylet, a plasma store, a plasma manager, and some workers will be started.
num_cpus – Number of CPUs the user wishes to assign to each raylet.
log_to_driver – Whether direct output of worker processes on all nodes to driver.
omp_num_threads – set environment variable OMP_NUM_THREADS. It works only when address is None.
**kwargs – see
ray.init()
parameters.
- secretflow.proxy(device_object_type: Type[DeviceObject], max_concurrency=None)[source]#
Define a device class which should accept DeviceObject as method parameters and return DeviceObject.
This proxy function mainly does the following work: 1. Add an additional parameter device: Device to init method __init__. 2. Wrap class methods, allow passing DeviceObject as parameters, which must be on the same device as the class instance. 3. According to the return annotation of class methods, return the corresponding number of DeviceObject.
@proxy(PYUObject) class Model: def __init__(self, builder): self.weights = builder() def build_dataset(self, x, y): self.dataset_x = x self.dataset_y = y def get_weights(self) -> np.ndarray: return self.weights def train_step(self, step) -> Tuple[np.ndarray, int]: return self.weights, 100 alice = PYU('alice') model = Model(builder, device=alice) x, y = alice(load_data)() model.build_dataset(x, y) w = model.get_weights() w, n = model.train_step(10)
- Parameters
device_object_type (Type[DeviceObject]) – DeviceObject type, eg. PYUObject.
max_concurrency (int) – Actor threadpool size.
- Returns
Wrapper function.
- Return type
Callable
- secretflow.reveal(func_or_object)[source]#
Get plaintext data from device.
NOTE: Use this function with extreme caution, as it may cause privacy leaks. In SecretFlow, we recommend that data should flow between different devices and rarely revealed to driver. Only use this function when data dependency control flow occurs.
- Parameters
func_or_object – May be callable or any Python objects which contains Device objects.
- secretflow.shutdown()[source]#
Disconnect the worker, and terminate processes started by secretflow.init().
This will automatically run at the end when a Python process that uses Ray exits. It is ok to run this twice in a row. The primary use case for this function is to cleanup state between tests.
- secretflow.to(device: Device, data: Any, spu_vis: str = 'secret')[source]#
Device object conversion.
- Parameters
device (Device) – Target device.
data (Any) – DeviceObject or plaintext data.
spu_vis (str) – Deivce object visibility, SPU device only. secret: Secret sharing with protocol spdz-2k, aby3, etc. public: Public sharing, which means data will be replicated to each node.
- Returns
Target device object.
- Return type