redshift-connector 2.1.3

Creator: bradpython12

Last updated:

Add to Cart

Description:

redshiftconnector 2.1.3

redshift_connector is the Amazon Redshift connector for
Python. Easy integration with pandas and numpy, as well as support for numerous Amazon Redshift specific features help you get the most out of your data
Supported Amazon Redshift features include:

IAM authentication
Identity provider (IdP) authentication
Redshift specific data types

This pure Python connector implements Python Database API Specification 2.0.

Getting Started

Install from Binary


Package Manager
Downloads
Installation Command



PyPi

pip install redshift_connector

Conda

conda install -c conda-forge redshift_connector





Install from Source
You may install from source by cloning this repository.
$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ pip install .


Tutorials

001 - Connecting to Amazon Redshift
002 - Data Science Library Integrations
003 - Amazon Redshift Feature Support
004 - Amazon Redshift Datatypes

We are working to add more documentation and would love your feedback. Please reach out to the team by opening an issue or starting a discussion to help us fill in the gaps in our documentation.


Integrations
redshift_connector integrates with various open source projects to provide an interface to Amazon Redshift. Please open an issue with our project to request new integrations or get support for a redshift_connector issue seen in an existing integration.

apache-airflow
querybook
sqlalchemy-redshift



Basic Example
import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
database='dev',
user='awsuser',
password='my_password'
)

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
[
('One Hundred Years of Solitude', 'Gabriel García Márquez'),
('A Brief History of Time', 'Stephen Hawking')
]
)
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])


Enabling autocommit
Following the DB-API specification, autocommit is off by default. It can be turned on by using the autocommit property of the connection.
# Make sure we're not in a transaction
conn.rollback()
conn.autocommit = True
conn.run("VACUUM")
conn.autocommit = False


Configuring paramstyle
Paramstyle can be set on both a module and cursor level. When paramstyle is set on a module level e.g. redshift_connector.paramstyle = 'qmark', the user specified paramstyle is used for all subsequent cursors unless set on the cursor.
When paramstyle is set on the cursor e.g. `cursor.paramstyle = 'qmark' the user specified paramstyle is only used for that cursor object.
# setting paramstyle to qmark on a module level
redshift_connector.paramstyle = 'qmark'


with redshift_connector.connect() as conn1:
with conn1.cursor() as cursor1: # this cursor will use qmark paramstyle as it's been set on the module level
pass

with conn1.cursor() as cursor2:
# setting paramstyle to numeric on the cursor level only this cursor will use numeric paramstyle
cursor.paramstyle = 'numeric'

with conn1.cursor() as cursor3: # this cursor will use qmark paramstyle as it's been set on the module level
pass

with redshift_connector.connect() as conn2:
with conn2.cursor() as cursor1: # this cursor will use qmark paramstyle as it's been set on the module level
pass
The module level default paramstyle used is format. Valid values for paramstyle include qmark, numeric, named, format, pyformat. The below example shows how to use various paramstyles after the paramstyle is set on the cursor.
When paramstyle is set to named or pyformat, parameters must be passed as a Python dictionary to the execute() method. Other paramstyles require parameters to be passed as a Python tuple or list.
# qmark
cursor.paramstyle = 'qmark'
sql = 'insert into foo(bar, jar) VALUES(?, ?)'
cursor.execute(sql, (1, "hello world"))

# numeric
cursor.paramstyle = 'numeric'
sql = 'insert into foo(bar, jar) VALUES(:1, :2)'
cursor.execute(sql, (1, "hello world"))

# named
cursor.paramstyle = 'named'
sql = 'insert into foo(bar, jar) VALUES(:p1, :p2)'
cursor.execute(sql, {"p1":1, "p2":"hello world"})

# format
cursor.paramstyle = 'format'
sql = 'insert into foo(bar, jar) VALUES(%s, %s)'
cursor.execute(sql, (1, "hello world"))

# pyformat
cursor.paramstyle = 'pyformat'
sql = 'insert into foo(bar, jar) VALUES(%(bar)s, %(jar)s)'
cursor.execute(sql, {"bar": 1, "jar": "hello world"})


Exception Handling
redshift_connector uses the guideline for exception handling specified in the Python DB-API. For exception definitions, please see redshift_connector/error.py


Example using IAM Credentials
IAM Credentials can be supplied directly to connect(...) using an AWS profile as shown below:
import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
iam=True,
database='dev',
db_user='awsuser',
password='',
user='',
cluster_identifier='examplecluster',
profile='default'
)
# ~/.aws/credentials
[default]
aws_access_key_id="my_aws_access_key_id"
aws_secret_access_key="my_aws_secret_access_key"
aws_session_token="my_aws_session_token"

# ~/.aws/config
[default]
region=us-west-2
If a region is not provided in ~/.aws/config or you would like to override its value, region may be passed to connect(...).
Alternatively, IAM credentials can be supplied directly to connect(...) using AWS credentials as shown below:
import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
iam=True,
database='dev',
db_user='awsuser',
password='',
user='',
cluster_identifier='examplecluster',
access_key_id="my_aws_access_key_id",
secret_access_key="my_aws_secret_access_key",
session_token="my_aws_session_token",
region="us-east-2"
)


Integration with pandas
Retrieving query results as a pandas.DataFrame
import pandas
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
[
('One Hundred Years of Solitude', 'Gabriel García Márquez'),
('A Brief History of Time', 'Stephen Hawking')

])
cursor.execute("select * from book")
result: pandas.DataFrame = cursor.fetch_dataframe()
print(result)
>> bookname author
>> 0 One Hundred Years of Solitude Gabriel García Márquez
>> 1 A Brief History of Time Stephen Hawking
Insert data stored in a pandas.DataFrame into an Amazon Redshift table
import numpy as np
import pandas as pd

df = pd.DataFrame(
np.array(
[
["One Hundred Years of Solitude", "Gabriel García Márquez"],
["A Brief History of Time", "Stephen Hawking"],
]
),
columns=["bookname", "author‎"],
)
with conn.cursor() as cursor:
cursor.write_dataframe(df, "book")
cursor.execute("select * from book; ")
result = cursor.fetchall()


Integration with numpy
import numpy
cursor.execute("select * from book")

result: numpy.ndarray = cursor.fetch_numpy_array()
print(result)
>> [['One Hundred Years of Solitude' 'Gabriel García Márquez']
>> ['A Brief History of Time' 'Stephen Hawking']]


Query using functions
cursor.execute("SELECT CURRENT_TIMESTAMP")
print(cursor.fetchone())
>> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)]


Connection Parameters


Name
Type
Description
Default Value
Required



access_key_id
str
The access key for the IAM role or IAM user configured for IAM database authentication
None
No

allow_db_user_override
bool
True specifies the driver uses the DbUser value from the SAML assertion while False indicates the value in the DbUser connection parameter is used
FALSE
No

app_name
str
The name of the IdP application used for authentication
None
No

auth_profile
str
The name of an Amazon Redshift Authentication profile having connection properties as JSON. See the RedshiftProperty class to learn how connection parameters should be named.
None
No

auto_create
bool
Indicates whether the user should be created if they do not exist
FALSE
No

client_id
str
The client id from Azure IdP
None
No

client_secret
str
The client secret from Azure IdP
None
No

cluster_identifier
str
The cluster identifier of the Amazon Redshift Cluster
None
No

credentials_provider
str
The IdP that will be used for authenticating with Amazon Redshift.
None
No

database
str
The name of the database to connect to
None
No

database_metadata_current_db_only
bool
Indicates if application supports multi-database datashare catalogs. Default value of True indicates application does not support multi-database datashare catalogs for backwards compatibility
TRUE
No

db_groups
list
A comma-separated list of existing database group names that the DbUser joins for the current session
None
No

db_user
str
The user ID to use with Amazon Redshift
None
No

endpoint_url
str
The Amazon Redshift endpoint url. This option is only used by AWS internal teams.
None
No

group_federation
bool
Use the IdP Groups in the Redshift. Default value False.
False
No

host
str
The hostname of Amazon Redshift cluster
None
No

iam
bool
If IAM Authentication is enabled
FALSE
No

iam_disable_cache
bool
This option specifies whether the IAM credentials are cached. By default the IAM credentials are cached. This improves performance when requests to the API gateway are throttled.
FALSE
No

idc_client_display_name
str
The client display name to be used in user consent in IdC browser auth. This is an optional value. The default value is “Amazon Redshift Python connector”.
None
No

idc_region
str
The AWS region where AWS identity center instance is located. It is required for the IdC browser auth plugin.
None
No

identity_namespace
str
The identity namespace to be used for the IdC browser auth plugin and IdP token auth plugin. It is an optional value if there is only one IdC instance existing or if default identity namespace is set on the cluster - else it is required.
None
No

idp_response_timeout
int
The timeout for retrieving SAML assertion from IdP
120
No

idp_tenant
str
The IdP tenant
None
No

issuer_url
str
The issuer url for the AWS IdC access portal. It is required for the IdC browser auth plugin.
None
No

listen_port
int
The listen port IdP will send the SAML assertion to
7890
No

login_to_rp
str
Only for AdfsCredentialsProvider. Used to specify the loginToRp when performing IdpInitiatedSignOn as apart of form based authentication.
urn:amazon:webservices
No

login_url
str
The SSO Url for the IdP
None
No

max_prepared_statements
int
The maximum number of prepared statements that can be open at once
1000
No

numeric_to_float
bool
Specifies if NUMERIC datatype values will be converted from decimal.Decimal to float. By default NUMERIC values are received as decimal.Decimal. Enabling this option is not recommended for use cases which prefer the most precision as results may be rounded. Please reference the Python docs on decimal.Decimal to see the tradeoffs between decimal.Decimal and float before enabling this option.
False
No

partner_sp_id
str
The Partner SP Id used for authentication with Ping
None
No

password
str
The password to use for authentication
None
No

port
Int
The port number of the Amazon Redshift cluster
5439
No

preferred_role
str
The IAM role preferred for the current connection
None
No

principal_arn
str
The ARN of the IAM entity (user or role) for which you are generating a policy
None
No

profile
str
The name of a profile in a AWS credentials file that contains AWS credentials.
None
No

provider_name
str
The name of the Redshift Native Auth Provider.
None
No

region
str
The AWS region where the cluster is located
None
No

role_arn
str
The Amazon Resource Name (ARN) of the role that the caller is assuming. This parameter is used by JwtCredentialsProvider. For this provider, this is a mandatory parameter.
None
No

role_session_name
str
An identifier for the assumed role session. Typically, you pass the name or identifier that is associated with the user who is using your application. That way, the temporary security credentials that your application will use are associated with that user. This parameter is used by JwtCredentialsProvider. For this provider, this is an optional parameter.
jwt_redshift_session
No

scope
str
Scope for BrowserAzureOauth2CredentialsProvider authentication.
“”
No

secret_access_key_id
str
The secret access key for the IAM role or IAM user configured for IAM database authentication
None
No

serverless_acct_id
str
The account ID of the serverless. Default value None
None
No

serverless_work_group
str
The name of work group for serverless end point. Default value None.
None
No

session_token
str
The access key for the IAM role or IAM user configured for IAM database authentication. Not required unless temporary AWS credentials are being used.
None
No

ssl
bool
If SSL is enabled
TRUE
No

ssl_insecure
bool
Specifies if IDP hosts server certificate will be verified
TRUE
No

sslmode
str
The security of the connection to Amazon Redshift. verify-ca and verify-full are supported.
verify_ca
No

timeout
int
The number of seconds before the connection to the server will timeout.
None
No

token
str
The access token required for the IdP token auth plugin.
None
No

token_type
str
The token type required for the IdP token auth plugin.
ACCESS_TOKEN
No

user
str
The username to use for authentication
None
No

web_identity_token
str
The OAuth 2.0 access token or OpenID Connect ID token that is provided by the identity provider. Your application must get this token by authenticating the user who is using your application with a web identity provider. This parameter is used by JwtCredentialsProvider. For this provider, this is a mandatory parameter.
None
No





Supported Datatypes
redshift_connector supports the following Amazon Redshift datatypes. redshift_connector will attempt to treat unsupported datatypes as strings.
Incoming data from Amazon Redshift is treated as follows:


Amazon Redshift Datatype
Python Datatype



ACLITEM
str

BOOLEAN
bool

INT8
int

INT4
int

INT2
int

VARCHAR
str

OID
int

REGPROC
int

XID
int

FLOAT4
float

FLOAT8
float

TEXT
str

CHAR
str

DATE
datetime.date

TIME
datetime.time

TIMETZ
datetime.time

TIMESTAMP
datetime.datetime

TIMESTAMPTZ
datetime.datetime

NUMERIC
decimal.Decimal

GEOMETRY
str

SUPER
str

VARBYTE
bytes

GEOGRAPHY
str





Logging
redshift_connector uses logging for providing detailed error messages regarding IdP authentication. A do-nothing handler is enabled by default as to prevent logs from being output to sys.stderr.
Enable logging in your application to view logs output by redshift_connector as described in
the documentation for Python logging module.


Client Transfer Protocol
redshift_connector requests the Amazon Redshift server use the highest transfer protocol version supported. As of v2.0.879 binary transfer protocol is requested by default. If necessary, the requested transfer protocol can be modified via the client_protocol_version parameter of redshift_connector.connect(...). Please see the Connection Parameters table for more details.


Getting Help

Ask a question on Stack Overflow and tag it with redshift_connector
Open a support ticket with AWS Support
If you may have found a bug, please open an issue



Contributing
We look forward to collaborating with you! Please read through CONTRIBUTING before submitting any issues or pull requests.


Changelog Generation
An entry in the changelog is generated upon release using gitchangelog. Please use the configuration file, .gitchangelog.rc when generating the changelog.



Running Tests
You can run tests by using pytest test/unit. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in test/config.ini.

Additional Resources

LICENSE
Python Database API Specification v2.0 (PEP 249)
PostgreSQL Frontend/Backend Protocol

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.