Data Streams Generator¶
This module contains the class for generating multiple synthetic data streams with various change points.
Multiple Streams Generator Class¶
- class source.generator.ds_generator.MultiDataStreams(num_streams: int = 2, dict_streams: list = [])¶
Bases:
object” Class to generate and manage multiple data streams with change points.
- Parameters:
num_streams (int) – The number of data streams to generate.
dict_streams (list) – A list of dictionaries, each containing parameters for a ChangePointGenerator.
Initialize ManyDataStreams with a list of ChangePointGenerator instances.
- Parameters:
num_streams (int) – The number of data streams to generate.
dict_streams (list) – A list of dictionaries, each containing parameters for a ChangePointGenerator.
- __init__(num_streams: int = 2, dict_streams: list = [])¶
Initialize ManyDataStreams with a list of ChangePointGenerator instances.
- Parameters:
num_streams (int) – The number of data streams to generate.
dict_streams (list) – A list of dictionaries, each containing parameters for a ChangePointGenerator.
- generate_data_streams(dict_missing=None)¶
Generate data for all ChangePointGenerator instances and store the results.
- Parameters:
dict_missing (list, optional) – A list of dictionaries specifying missing data parameters for each stream. Each dictionary can have the following keys: - ‘type’: ‘point’ or ‘block’ - ‘percentage’: float, percentage of data to be made missing - ‘min_block_size’: int, minimum size of blocks for block missingness (only for ‘block’ type) - ‘max_block_size’: int, maximum size of blocks for block missingness (only for ‘block’ type) If None, no missing data will be introduced.
- get_all_streams()¶
Get the list of all generated data streams.
- Returns:
A list of all generated data streams.
- Return type:
list
- get_data_streams_as_array()¶
Get all generated data streams as a transposed NumPy array.
- Returns:
A transposed NumPy array of all generated data streams. Shape: (num_data_points, num_streams)
- Return type:
np.ndarray
- plot_all_streams()¶
Plot the data for all ChangePointGenerator instances.
Example Usage¶
Generate Multiple Data Streams
Without Missing Data
from source.generator.ds_generator import MultiDataStreams
dict_streams = [
{"num_segments": 3,
"segment_length": 1000,
"change_point_type": "sudden_shift",
"seed": 2},
{"num_segments": 6,
"segment_length": 500,
"change_point_type": "sudden_shift",
"seed": 11}
]
many_data_streams = MultiDataStreams(dict_streams=dict_streams)
many_data_streams.generate_data_streams()
list_data_streams = many_data_streams.get_all_streams()
many_data_streams.plot_all_streams()
With Missing Data
from source.generator.ds_generator import MultiDataStreams
dict_streams = [
{"num_segments": 3,
"segment_length": 1000,
"change_point_type": "sudden_shift",
"seed": 2},
{"num_segments": 6,
"segment_length": 500,
"change_point_type": "sudden_shift",
"seed": 11},
{"num_segments": 4,
"segment_length": 750,
"change_point_type": "gradual_drift",
"seed": 7}
]
dict_missing = [{"type": "point", # Point missingness for the first stream
"percentage": 0.4},
None, # No missing data for the second stream
{"type": "block", # Block missingness for the third stream
"percentage": 0.3,
"min_block_size": 5,
"max_block_size": 20}
]
many_data_streams = MultiDataStreams(dict_streams=dict_streams)
many_data_streams.generate_data_streams(dict_missing=dict_missing)
list_data_streams = many_data_streams.get_all_streams()
many_data_streams.plot_all_streams()