pyspark.sql.datasource.DataSource.schema#
- DataSource.schema()[source]#
Returns the schema of the data source.
It can refer any field initialized in the
DataSource.__init__()
method to infer the data source’s schema when users do not explicitly specify it. This method is invoked once when callingspark.read.format(...).load()
to get the schema for a data source read operation. If this method is not implemented, and a user does not provide a schema when reading the data source, an exception will be thrown.- Returns
- schema
StructType
or str The schema of this data source or a DDL string represents the schema
- schema
Examples
Returns a DDL string:
>>> def schema(self): ... return "a INT, b STRING"
Returns a
StructType
:>>> def schema(self): ... return StructType().add("a", "int").add("b", "string")