Snowflake: getting started

Get started with Snowflake and the Hopsworks Feature Store

This tutorial notebook will help you to get started working with the Hopsworks feature store and Snowflake.

  • We assume that you already have snowflake account. If not please follow the tutorial to setup your Snowflake account and load telcom churn dataset

Create Snowflake connector in Hopsworks

Create a snowflake storage connector in a featurestore.

POST /hopsworks-api/api/project/<project id>/featurestores/<feature storre id>/storageconnectors HTTP/1.1
{
 "name": "sfconnector",
 "description": "snowflake connector",
 "type": "featurestoreSnowflakeConnectorDTO",
 "storageConnectorType": "SNOWFLAKE",
 "featurestoreId": <feature storre id>,
 "url": "https://12345.west-europe.azure.snowflakecomputing.com",
 "user": "HOPSWORKS",
 "password": "snowflake password",
 "database": "ML_WORKSHOP",
 "schema": "PUBLIC",
 "warehouse": "COMPUTE_WH",
 "role": "HOPSWORKS_ROLE"
}

The following options are required to create a snowflake connector: - url: the hostname for your account in the following format: .snowflakecomputing.com. - user: login name for the Snowflake user. - password: password of the Snowflake user. (required if token is not set) - token: OAuth token that can be used to access snowflake. (required if password is not set) - database: the database to use for the session after connecting. - schema: the schema to use for the session after connecting.

Remaining options are not required: - warehouse: the default virtual warehouse to use for the session after connecting. - role: the default security role to use for the session after connecting. - table: the table to which data is written to or read from.

Additional snowflake options can be added as a list of key-value pair in sfOptions:

{
    ...
    "table": "CUSTOMER_CHURN",
    "sfOptions": [{name:"sfTimezone", value:"spark"},
                  {name: "sfCompress", value: "true"},
                  {name: "s3MaxFileSize", value: "100MB"},
                  ...
                 ]
}

Using the connector

After creating the connector you can use it with your python, scala or pyspark programs.

Python

Additional options are not supported in python. If you have additional options in your connector you need to add them to the “sfConnectorOptions” with the correct key name.

import hsfs
import snowflake.connector
connection = hsfs.connection()
fs = connection.get_feature_store()
connector = fs.get_storage_connector("sfconnector")
sfConnectorOptions = connector.snowflake_connector_options()
ctx = snowflake.connector.connect(**sfConnectorOptions)
import pandas as pd
# Query Snowflake Data
cs=ctx.cursor()
allrows=cs.execute("""select CUSTOMER_ID,GENDER,SENIOR_CITIZEN,PARTNER,DEPENDENTS,TENURE,PHONE_SERVICE,
                             MULTIPLE_LINES,INTERNET_SERVICE,ONLINE_SECURITY,ONLINE_BACKUP,DEVICE_PROTECTION,
                             TECH_SUPPORT,STREAMING_TV,STREAMING_MOVIES,CONTRACT,PAPERLESS_BILLING,  
                             PAYMENT_METHOD,MONTHLY_CHARGES,TOTAL_CHARGES,CHURN from CUSTOMER_CHURN """).fetchall()

churn = pd.DataFrame(allrows)
churn.columns=['Customer_Id','Gender','Senior_Citizen','Partner','Dependents','Tenure','Phone_Service',
               'Multiple_Lines','Internet_Service','Online_Security','Online_Backup','Device_Protection',
               'Tech_Support','Streaming_Tv','Streaming_Movies','Contract','Paperless_Billing',
               'Payment_Method','Monthly_Charges','Total_Charges', 'Churn']

pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 10)         # Keep the output on one page
churn