pyspark.sql.datasource.DataSource.schema#

DataSource.schema()[source]#

Returns the schema of the data source.

It can refer any field initialized in the DataSource.__init__() method to infer the data source’s schema when users do not explicitly specify it. This method is invoked once when calling spark.read.format(...).load() to get the schema for a data source read operation. If this method is not implemented, and a user does not provide a schema when reading the data source, an exception will be thrown.

Returns
schemaStructType or str

The schema of this data source or a DDL string represents the schema

Examples

Returns a DDL string:

>>> def schema(self):
...    return "a INT, b STRING"

Returns a StructType:

>>> def schema(self):
...   return StructType().add("a", "int").add("b", "string")