Do you want to try out this notebook? Get a free account (no credit-card reqd) at hopsworks.ai. You can also install open-source Hopsworks or view tutorial videos here.
Create a single (coalesced) CSV file for Training Data
import hsfs
connection = hsfs.connection()
fs = connection.get_feature_store()
Starting Spark application
ID | YARN Application ID | Kind | State | Spark UI | Driver log |
---|---|---|---|---|---|
22 | application_1624018572921_0012 | pyspark | idle | Link | Link |
SparkSession available as 'spark'.
Connected. Call `.close()` to terminate connection gracefully.
You should have already created the sales_fg and exogenous_fg feature groups by running the hsfs/basics/feature-engineering.ipynb notebook.
sales_fg = fs.get_feature_group('sales_fg')
exogenous_fg = fs.get_feature_group('exogenous_fg')
df = sales_fg.select_all().join(exogenous_fg.select_except(["is_holiday"]))
VersionWarning: No version provided for getting feature group `sales_fg`, defaulting to `1`.
VersionWarning: No version provided for getting feature group `exogenous_fg`, defaulting to `1`.
Set coalesce to True, when creating a training dataset to produce a single CSV file.
sc = fs.get_storage_connector("demo_fs_meb10000_Training_Datasets")
td = fs.create_training_dataset(name="sales_model_one",
description="Single CSV file to train the sales model",
data_format="csv",
coalesce=True,
version=2,
storage_connector=sc)
td.save(df)
<hsfs.training_dataset.TrainingDataset object at 0x7f879de41790>