pyspark.sql.DataFrameReader.jdbc#
- DataFrameReader.jdbc(url, table, column=None, lowerBound=None, upperBound=None, numPartitions=None, predicates=None, properties=None)[source]#
Construct a
DataFrame
representing the database table namedtable
accessible via JDBC URLurl
and connectionproperties
.Partitions of the table will be retrieved in parallel if either
column
orpredicates
is specified.lowerBound
,upperBound
andnumPartitions
is needed whencolumn
is specified.If both
column
andpredicates
are specified,column
will be used.New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- tablestr
the name of the table
- columnstr, optional
alias of
partitionColumn
option. Refer topartitionColumn
in Data Source Option for the version you use.- predicateslist, optional
a list of expressions suitable for inclusion in WHERE clauses; each one defines one partition of the
DataFrame
- propertiesdict, optional
a dictionary of JDBC database connection arguments. Normally at least properties “user” and “password” with their corresponding values. For example { ‘user’ : ‘SYSTEM’, ‘password’ : ‘mypassword’ }
- Returns
- Other Parameters
- Extra options
For the extra options, refer to Data Source Option for the version you use.
Notes
Don’t create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.