myfm.utils.benchmark_data.MovieLens100kDataManager¶
- class myfm.utils.benchmark_data.MovieLens100kDataManager(zippath: Optional[pathlib.Path] = None)[source]¶
Bases:
myfm.utils.benchmark_data.loader_base.MovieLensBase
The Data manager for MovieLens 100k dataset.
- __init__(zippath: Optional[pathlib.Path] = None)¶
Methods
__init__
([zippath])genres
()load movie meta information.
Load the entire rating dataset.
load_rating_kfold_split
(K, fold[, random_state])Load the entire dataset and split it into train/test set.
Read the pre-defined train/test split.
load user meta information.
Attributes
DEFAULT_PATH
DOWNLOAD_URL
- load_movie_info() pandas.core.frame.DataFrame [source]¶
load movie meta information.
- Returns
A dataframe containing meta-information (id, title, release_date, url, genres) about the movies. Multiple genres per movie will be concatenated by “|”.
- Return type
pd.DataFrame
- load_rating_all() pandas.core.frame.DataFrame [source]¶
Load the entire rating dataset.
- Returns
all the available ratings.
- Return type
pd.DataFrame
- load_rating_kfold_split(K: int, fold: int, random_state: Optional[int] = 0) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] ¶
Load the entire dataset and split it into train/test set. K-fold
- Parameters
K (int) – K in the K-fold splitting scheme.
fold (int) – fold index.
random_state (Union[np.RandomState, int, None], optional) – Controlls random state of the split.
- Returns
train and test dataframes.
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
- Raises
ValueError – When 0 <= fold < K is not met.
- load_rating_predefined_split(fold: int) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] [source]¶
Read the pre-defined train/test split. Fold index ranges from 1 to 5.
- Parameters
fold (int) – specifies the fold index.
- Returns
train and test dataframes.
- Return type
Tuple[pd.DataFrame, pd.DataFrame]