Create a single (coalesced) CSV file for Training Data

import hsfs
connection = hsfs.connection()
fs = connection.get_feature_store()
Starting Spark application
IDYARN Application IDKindStateSpark UIDriver log
22application_1624018572921_0012pysparkidleLinkLink
SparkSession available as 'spark'.
Connected. Call `.close()` to terminate connection gracefully.

You should have already created the sales_fg and exogenous_fg feature groups by running the hsfs/basics/feature-engineering.ipynb notebook.

sales_fg = fs.get_feature_group('sales_fg')
exogenous_fg = fs.get_feature_group('exogenous_fg')

df = sales_fg.select_all().join(exogenous_fg.select_except(["is_holiday"]))
VersionWarning: No version provided for getting feature group `sales_fg`, defaulting to `1`.
VersionWarning: No version provided for getting feature group `exogenous_fg`, defaulting to `1`.

Set coalesce to True, when creating a training dataset to produce a single CSV file.

sc = fs.get_storage_connector("demo_fs_meb10000_Training_Datasets")
td = fs.create_training_dataset(name="sales_model_one",
                               description="Single CSV file to train the sales model",
                               data_format="csv",
                               coalesce=True,
                               version=2,
                               storage_connector=sc)

td.save(df)
<hsfs.training_dataset.TrainingDataset object at 0x7f879de41790>