cudf.core.column.string.StringMethods.extract#
- StringMethods.extract(pat: str, flags: int = 0, expand: bool = True) SeriesOrIndex #
Extract capture groups in the regex pat as columns in a DataFrame.
For each subject string in the Series, extract groups from the first match of regular expression pat.
- Parameters
- patstr
Regular expression pattern with capturing groups.
- flagsint, default 0 (no flags)
Flags to pass through to the regex engine (e.g. re.MULTILINE)
- expandbool, default True
If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.
- Returns
- DataFrame or Series/Index
A DataFrame with one row for each subject string, and one column for each group. If expand=False and pat has only one capture group, then return a Series/Index.
Notes
The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Examples
>>> import cudf >>> s = cudf.Series(['a1', 'b2', 'c3']) >>> s.str.extract(r'([ab])(\d)') 0 1 0 a 1 1 b 2 2 <NA> <NA>
A pattern with one group will return a DataFrame with one column if expand=True.
>>> s.str.extract(r'[ab](\d)', expand=True) 0 0 1 1 2 2 <NA>
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False) 0 1 1 2 2 <NA> dtype: object