Do you want to try out this notebook? Get a free account (no credit-card reqd) at hopsworks.ai. You can also install open-source Hopsworks or view tutorial videos here.
Pandas Example with HopsFS
Pandas - read/write CSV files with HopsFS (HDFS)
from hops import hdfs
from hops import pandas_helper as pandas
import pandas as pd
features = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status",
"Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
"Hours per week", "Country", "Target"]
train_data = pandas.read_csv(hdfs.project_path() + "/TourData/census/adult.data", names=features, sep=r'\s*,\s*', engine='python', na_values="?")
train_data.info()
pandas.write_csv("Resources/relative-path.csv", train_data)
pandas.write_csv(hdfs.project_path() + "/Resources/full-path.csv", train_data)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 32561 non-null int64
1 Workclass 30725 non-null object
2 fnlwgt 32561 non-null int64
3 Education 32561 non-null object
4 Education-Num 32561 non-null int64
5 Marital Status 32561 non-null object
6 Occupation 30718 non-null object
7 Relationship 32561 non-null object
8 Race 32561 non-null object
9 Sex 32561 non-null object
10 Capital Gain 32561 non-null int64
11 Capital Loss 32561 non-null int64
12 Hours per week 32561 non-null int64
13 Country 31978 non-null object
14 Target 32561 non-null object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB