Bases: DataSource
        A DataSource for reading table from public Google Sheets.
Name: googlesheets
Schema: By default, all columns are treated as strings and the header row defines the column names.
  Options
  
- url: The URL of the Google Sheets document.
- path: The ID of the Google Sheets document.
- sheet_id: The ID of the worksheet within the document.
- has_header: Whether the sheet has a header row. Default is- true.
Either url or path must be specified, but not both.
 
Examples:
    Register the data source.
    >>> from pyspark_datasources import GoogleSheetsDataSource
>>> spark.dataSource.register(GoogleSheetsDataSource)
Load data from a public Google Sheets document using path and optional sheet_id.
    >>> spreadsheet_id = "10pD8oRN3RTBJq976RKPWHuxYy0Qa_JOoGFpsaS0Lop0"
>>> spark.read.format("googlesheets").options(sheet_id="0").load(spreadsheet_id).show()
+-------+---------+---------+-------+
|country| latitude|longitude|   name|
+-------+---------+---------+-------+
|     AD|42.546245| 1.601554|Andorra|
|    ...|      ...|      ...|    ...|
+-------+---------+---------+-------+
Load data from a public Google Sheets document using url.
    >>> url = "https://docs.google.com/spreadsheets/d/10pD8oRN3RTBJq976RKPWHuxYy0Qa_JOoGFpsaS0Lop0/edit?gid=0#gid=0"
>>> spark.read.format("googlesheets").options(url=url).load().show()
+-------+---------+--------+-------+
|country| latitude|ongitude|   name|
+-------+---------+--------+-------+
|     AD|42.546245|1.601554|Andorra|
|    ...|      ...|     ...|    ...|
+-------+---------+--------+-------+
Specify custom schema.
    >>> schema = "id string, lat double, long double, name string"
>>> spark.read.format("googlesheets").schema(schema).options(url=url).load().show()
+---+---------+--------+-------+
| id|      lat|    long|   name|
+---+---------+--------+-------+
| AD|42.546245|1.601554|Andorra|
|...|      ...|     ...|    ...|
+---+---------+--------+-------+
Treat first row as data instead of header.
    >>> schema = "c1 string, c2 string, c3 string, c4 string"
>>> spark.read.format("googlesheets").schema(schema).options(url=url, has_header="false").load().show()
+-------+---------+---------+-------+
|     c1|       c2|       c3|     c4|
+-------+---------+---------+-------+
|country| latitude|longitude|   name|
|     AD|42.546245| 1.601554|Andorra|
|    ...|      ...|      ...|    ...|
+-------+---------+---------+-------+
                Source code in pyspark_datasources/googlesheets.py
                |  | class GoogleSheetsDataSource(DataSource):
    """
    A DataSource for reading table from public Google Sheets.
    Name: `googlesheets`
    Schema: By default, all columns are treated as strings and the header row defines the column names.
    Options
    --------
    - `url`: The URL of the Google Sheets document.
    - `path`: The ID of the Google Sheets document.
    - `sheet_id`: The ID of the worksheet within the document.
    - `has_header`: Whether the sheet has a header row. Default is `true`.
    Either `url` or `path` must be specified, but not both.
    Examples
    --------
    Register the data source.
    >>> from pyspark_datasources import GoogleSheetsDataSource
    >>> spark.dataSource.register(GoogleSheetsDataSource)
    Load data from a public Google Sheets document using `path` and optional `sheet_id`.
    >>> spreadsheet_id = "10pD8oRN3RTBJq976RKPWHuxYy0Qa_JOoGFpsaS0Lop0"
    >>> spark.read.format("googlesheets").options(sheet_id="0").load(spreadsheet_id).show()
    +-------+---------+---------+-------+
    |country| latitude|longitude|   name|
    +-------+---------+---------+-------+
    |     AD|42.546245| 1.601554|Andorra|
    |    ...|      ...|      ...|    ...|
    +-------+---------+---------+-------+
    Load data from a public Google Sheets document using `url`.
    >>> url = "https://docs.google.com/spreadsheets/d/10pD8oRN3RTBJq976RKPWHuxYy0Qa_JOoGFpsaS0Lop0/edit?gid=0#gid=0"
    >>> spark.read.format("googlesheets").options(url=url).load().show()
    +-------+---------+--------+-------+
    |country| latitude|ongitude|   name|
    +-------+---------+--------+-------+
    |     AD|42.546245|1.601554|Andorra|
    |    ...|      ...|     ...|    ...|
    +-------+---------+--------+-------+
    Specify custom schema.
    >>> schema = "id string, lat double, long double, name string"
    >>> spark.read.format("googlesheets").schema(schema).options(url=url).load().show()
    +---+---------+--------+-------+
    | id|      lat|    long|   name|
    +---+---------+--------+-------+
    | AD|42.546245|1.601554|Andorra|
    |...|      ...|     ...|    ...|
    +---+---------+--------+-------+
    Treat first row as data instead of header.
    >>> schema = "c1 string, c2 string, c3 string, c4 string"
    >>> spark.read.format("googlesheets").schema(schema).options(url=url, has_header="false").load().show()
    +-------+---------+---------+-------+
    |     c1|       c2|       c3|     c4|
    +-------+---------+---------+-------+
    |country| latitude|longitude|   name|
    |     AD|42.546245| 1.601554|Andorra|
    |    ...|      ...|      ...|    ...|
    +-------+---------+---------+-------+
    """
    @classmethod
    def name(self):
        return "googlesheets"
    def __init__(self, options: Dict[str, str]):
        if "url" in options:
            sheet = Sheet.from_url(options.pop("url"))
        elif "path" in options:
            sheet = Sheet(options.pop("path"), options.pop("sheet_id", None))
        else:
            raise ValueError(
                "You must specify either `url` or `path` (spreadsheet ID)."
            )
        has_header = options.pop("has_header", "true").lower() == "true"
        self.parameters = Parameters(sheet, has_header)
    def schema(self) -> StructType:
        if not self.parameters.has_header:
            raise ValueError("Custom schema is required when `has_header` is false")
        import pandas as pd
        # Read schema from the first row of the sheet
        df = pd.read_csv(self.parameters.sheet.get_query_url("select * limit 1"))
        return StructType([StructField(col, StringType()) for col in df.columns])
    def reader(self, schema: StructType) -> DataSourceReader:
        return GoogleSheetsReader(self.parameters, schema)
 |