myfm.utils.encoders.MultipleValuesToSparseEncoder

class myfm.utils.encoders.MultipleValuesToSparseEncoder(items: typing.Iterable[str], min_freq: int = 1, sep: str = ',', normalize: bool = True, handle_unknown: typing_extensions.Literal[create, ignore, raise] = 'create')[source]

Bases: myfm.utils.encoders.categorical.CategoryValueToSparseEncoder[str]

The class to N-hot encode a List of items into a sparse matrix representation.

__init__(items: typing.Iterable[str], min_freq: int = 1, sep: str = ',', normalize: bool = True, handle_unknown: typing_extensions.Literal[create, ignore, raise] = 'create')[source]

Construct the encoder by providing a list of strings, each of which is a list of strings concatenated by sep.

Parameters
  • items (Iterable[str]) – Iterable of strings, each of which is a concatenated list of possibly multiple items.

  • min_freq (int, optional) – The minimal frequency for an item to be retained in the known items list, by default 1.

  • sep (str, optional) – Tells how to separate string back into a list. Defaults to ‘,’.

  • normalize (bool, optional) – If True, non-zero entry in the encoded matrix will have 1 / N ** 0.5, where N is the number of non-zero entries in that row. Defaults to True.

  • handle_unknown (Literal["create", "ignore", "raise"], optional) – How to handle previously unseen values during encoding. If “create”, then there is a single category named “__UNK__” for unknown values, ant it is treated as 0th category. If “ignore”, such an item will be ignored. If “raise”, a KeyError is raised. Defaults to “create”.

Methods

__init__(items[, min_freq, sep, normalize, ...])

Construct the encoder by providing a list of strings, each of which is a list of strings concatenated by sep.

names()

Description of each non-zero entry.

to_sparse(items)

names() List[str]

Description of each non-zero entry.