myfm.utils.benchmark_data.MovieLens100kDataManager¶

class myfm.utils.benchmark_data.MovieLens100kDataManager(zippath: Optional[pathlib.Path] = None)[source]¶

Bases: myfm.utils.benchmark_data.loader_base.MovieLensBase

The Data manager for MovieLens 100k dataset.

__init__(zippath: Optional[pathlib.Path] = None)¶

Methods

`__init__`([zippath])
`genres`()
`load_movie_info`()	load movie meta information.
`load_rating_all`()	Load the entire rating dataset.
`load_rating_kfold_split`(K, fold[, random_state])	Load the entire dataset and split it into train/test set.
`load_rating_predefined_split`(fold)	Read the pre-defined train/test split.
`load_user_info`()	load user meta information.

Attributes

`DEFAULT_PATH`
`DOWNLOAD_URL`

load_movie_info() → pandas.core.frame.DataFrame[source]¶

load movie meta information.

Returns: A dataframe containing meta-information (id, title, release_date, url, genres) about the movies. Multiple genres per movie will be concatenated by “|”.
Return type: pd.DataFrame

load_rating_all() → pandas.core.frame.DataFrame[source]¶

Load the entire rating dataset.

Returns: all the available ratings.
Return type: pd.DataFrame

load_rating_kfold_split(K: int, fold: int, random_state: Optional[int] = 0) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶

Load the entire dataset and split it into train/test set. K-fold

Parameters

K (int) – K in the K-fold splitting scheme.
fold (int) – fold index.
random_state (Union[np.RandomState, int, None], optional) – Controlls random state of the split.

Returns

train and test dataframes.

Return type

Tuple[pd.DataFrame, pd.DataFrame]

Raises

ValueError – When 0 <= fold < K is not met.

load_rating_predefined_split(fold: int) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶

Read the pre-defined train/test split. Fold index ranges from 1 to 5.

Parameters: fold (int) – specifies the fold index.
Returns: train and test dataframes.
Return type: Tuple[pd.DataFrame, pd.DataFrame]

load_user_info() → pandas.core.frame.DataFrame[source]¶

load user meta information.

Returns: user infomation
Return type: pd.DataFrame