Data Streams Generator

This module contains the class for generating multiple synthetic data streams with various change points.

Multiple Streams Generator Class

class source.generator.ds_generator.MultiDataStreams(num_streams: int = 2, dict_streams: list = [])

Bases: object

” Class to generate and manage multiple data streams with change points.

Parameters:
  • num_streams (int) – The number of data streams to generate.

  • dict_streams (list) – A list of dictionaries, each containing parameters for a ChangePointGenerator.

Initialize ManyDataStreams with a list of ChangePointGenerator instances.

Parameters:
  • num_streams (int) – The number of data streams to generate.

  • dict_streams (list) – A list of dictionaries, each containing parameters for a ChangePointGenerator.

__init__(num_streams: int = 2, dict_streams: list = [])

Initialize ManyDataStreams with a list of ChangePointGenerator instances.

Parameters:
  • num_streams (int) – The number of data streams to generate.

  • dict_streams (list) – A list of dictionaries, each containing parameters for a ChangePointGenerator.

generate_data_streams(dict_missing=None)

Generate data for all ChangePointGenerator instances and store the results.

Parameters:

dict_missing (list, optional) – A list of dictionaries specifying missing data parameters for each stream. Each dictionary can have the following keys: - ‘type’: ‘point’ or ‘block’ - ‘percentage’: float, percentage of data to be made missing - ‘min_block_size’: int, minimum size of blocks for block missingness (only for ‘block’ type) - ‘max_block_size’: int, maximum size of blocks for block missingness (only for ‘block’ type) If None, no missing data will be introduced.

get_all_streams()

Get the list of all generated data streams.

Returns:

A list of all generated data streams.

Return type:

list

get_data_streams_as_array()

Get all generated data streams as a transposed NumPy array.

Returns:

A transposed NumPy array of all generated data streams. Shape: (num_data_points, num_streams)

Return type:

np.ndarray

plot_all_streams()

Plot the data for all ChangePointGenerator instances.

Example Usage

Generate Multiple Data Streams

  • Without Missing Data

from source.generator.ds_generator import MultiDataStreams

 dict_streams = [
                 {"num_segments": 3,
                 "segment_length": 1000,
                 "change_point_type": "sudden_shift",
                     "seed": 2},
                 {"num_segments": 6,
                     "segment_length": 500,
                     "change_point_type": "sudden_shift",
                     "seed": 11}
                 ]

 many_data_streams = MultiDataStreams(dict_streams=dict_streams)
 many_data_streams.generate_data_streams()
 list_data_streams = many_data_streams.get_all_streams()
 many_data_streams.plot_all_streams()
Multiple Data Streams Generator
  • With Missing Data

from source.generator.ds_generator import MultiDataStreams

 dict_streams = [
                 {"num_segments": 3,
                 "segment_length": 1000,
                 "change_point_type": "sudden_shift",
                     "seed": 2},
                 {"num_segments": 6,
                     "segment_length": 500,
                     "change_point_type": "sudden_shift",
                     "seed": 11},
                 {"num_segments": 4,
                     "segment_length": 750,
                     "change_point_type": "gradual_drift",
                     "seed": 7}
                 ]

 dict_missing = [{"type": "point",  # Point missingness for the first stream
                  "percentage": 0.4},
                 None,  # No missing data for the second stream
                 {"type": "block",  # Block missingness for the third stream
                  "percentage": 0.3,
                  "min_block_size": 5,
                  "max_block_size": 20}
                 ]

 many_data_streams = MultiDataStreams(dict_streams=dict_streams)
 many_data_streams.generate_data_streams(dict_missing=dict_missing)
 list_data_streams = many_data_streams.get_all_streams()
 many_data_streams.plot_all_streams()
Multiple Data Streams Generator with Missing Data