Do you want to try out this notebook? Get a free account (no credit-card reqd) at hopsworks.ai. You can also install open-source Hopsworks or view tutorial videos here.
4. Create training dataset from online feature store enabled feature groups
Establish a connection with your Hopsworks feature store.
import hsfs
connection = hsfs.connection()
# get a reference to the feature store, you can access also shared feature stores by providing the feature store name
fs = connection.get_feature_store()
Starting Spark application
ID | YARN Application ID | Kind | State | Spark UI | Driver log |
---|---|---|---|---|---|
3 | application_1619309085643_0004 | pyspark | idle | Link | Link |
SparkSession available as 'spark'.
Connected. Call `.close()` to terminate connection gracefully.
Get feature groups
card_transactions_10m_agg = fs.get_feature_group("card_transactions_10m_agg", version = 1)
card_transactions_1h_agg = fs.get_feature_group("card_transactions_1h_agg", version = 1)
card_transactions_12h_agg = fs.get_feature_group("card_transactions_12h_agg", version = 1)
Create training dataset
query = card_transactions_10m_agg.select(["stdev_amt_per_10m", "avg_amt_per_10m", "num_trans_per_10m"])\
.join(card_transactions_1h_agg.select(["stdev_amt_per_1h", "avg_amt_per_1h", "num_trans_per_1h"]))\
.join(card_transactions_12h_agg.select(["stdev_amt_per_12h", "avg_amt_per_12h", "num_trans_per_12h"]))
td_meta = fs.create_training_dataset(name="card_fraud_model",
description="Training dataset to train card fraud model",
data_format="tfrecord",
version=1)
td_meta.save(query)
<hsfs.training_dataset.TrainingDataset object at 0x7f49f3111410>