apandasexstringtodtypes 0.1

What is it used for?
Convert a Pandas DataFrame/Series with dtype str/string/object to the best available dtypes
pip install a-pandas-ex-string-to-dtypes

from a_pandas_ex_string_to_dtypes import pd_add_string_to_dtypes

import pandas as pd


df = pd.read_csv("https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv")



PassengerId Survived Pclass ... Fare Cabin Embarked

0 1 0 3 ... 7.2500 NaN S

1 2 1 1 ... 71.2833 C85 C

2 3 1 3 ... 7.9250 NaN S

3 4 1 1 ... 53.1000 C123 S

4 5 0 3 ... 8.0500 NaN S

.. ... ... ... ... ... ... ...

886 887 0 2 ... 13.0000 NaN S

887 888 1 1 ... 30.0000 B42 S

888 889 0 3 ... 23.4500 NaN S

889 890 1 1 ... 30.0000 C148 C

890 891 0 3 ... 7.7500 NaN Q

[891 rows x 12 columns]

PassengerId int64

Survived int64

Pclass int64

Name object

Sex object

Age float64

SibSp int64

Parch int64

Ticket object

Fare float64

Cabin object

Embarked object

dtype: object

dfstring = pd.concat(

[df[x].astype("string") for x in df.columns], axis=1, ignore_index=True





PassengerId Survived Pclass ... Fare Cabin Embarked

0 1 0 3 ... 7.25 <NA> S

1 2 1 1 ... 71.2833 C85 C

2 3 1 3 ... 7.925 <NA> S

3 4 1 1 ... 53.1 C123 S

4 5 0 3 ... 8.05 <NA> S

.. ... ... ... ... ... ... ...

886 887 0 2 ... 13.0 <NA> S

887 888 1 1 ... 30.0 B42 S

888 889 0 3 ... 23.45 <NA> S

889 890 1 1 ... 30.0 C148 C

890 891 0 3 ... 7.75 <NA> Q

[891 rows x 12 columns]

PassengerId string

Survived string

Pclass string

Name string

Sex string

Age string

SibSp string

Parch string

Ticket string

Fare string

Cabin string

Embarked string

dtype: object

converted = dfstring.ds_string_to_best_dtype()



PassengerId Survived Pclass ... Fare Cabin Embarked

0 1 0 3 ... 7.2500 <NA> S

1 2 1 1 ... 71.2833 C85 C

2 3 1 3 ... 7.9250 <NA> S

3 4 1 1 ... 53.1000 C123 S

4 5 0 3 ... 8.0500 <NA> S

.. ... ... ... ... ... ... ...

886 887 0 2 ... 13.0000 <NA> S

887 888 1 1 ... 30.0000 B42 S

888 889 0 3 ... 23.4500 <NA> S

889 890 1 1 ... 30.0000 C148 C

890 891 0 3 ... 7.7500 <NA> Q

[891 rows x 12 columns]

PassengerId uint16

Survived uint8

Pclass uint8

Name string

Sex category

Age object

SibSp uint8

Parch uint8

Ticket object

Fare float64

Cabin category

Embarked category

dtype: object


df: Union[pd.DataFrame, pd.Series]


Union[pd.DataFrame, pd.Series]


