Class BlobAccessor (2.6.0)

BlobAccessor(*args, **kwargs)

Blob functions for Series and Index.

Properties

session

API documentation for session property.

Methods

audio_transcribe

audio_transcribe(
    *,
    connection: typing.Optional[str] = None,
    model_name: typing.Optional[
        typing.Literal["gemini-2.0-flash-001", "gemini-2.0-flash-lite-001"]
    ] = None,
    verbose: bool = False
) -> bigframes.series.Series

Transcribe audio content using a Gemini multimodal model.

Parameters
NameDescription
connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

model_namestr

The model for natural language tasks. Accepted values are "gemini-2.0-flash-lite-001", and "gemini-2.0-flash-001". See "https://ai.google.dev/gemini-api/docs/models" for model choices.

verbosebool, default "False"

controls the verbosity of the output. When set to True, both error messages and the transcribed content are displayed. Conversely, when set to False, only the transcribed content is presented, suppressing error messages.

Returns
TypeDescription
bigframes.series.Seriesstr or struct[str, str], depend on the "verbose" parameter. Contains the transcribed text from the audio file. Includes error messages if verbosity is enabled.

authorizer

authorizer() -> bigframes.series.Series

Authorizers of the Blob.

Returns
TypeDescription
bigframes.series.SeriesAutorithers(connection) as string.

content_type

content_type() -> bigframes.series.Series

Retrieve the content type of the Blob.

Returns
TypeDescription
bigframes.series.Seriesstring of the content type.

display

display(
    n: int = 3,
    *,
    content_type: str = "",
    width: typing.Optional[int] = None,
    height: typing.Optional[int] = None
)

Display the blob content in the IPython Notebook environment. Only works for image type now.

Parameters
NameDescription
nint, default 3

number of sample blob objects to display.

content_typestr, default ""

content type of the blob. If unset, use the blob metadata of the storage. Possible values are "image", "audio" and "video".

widthint or None, default None

width in pixels that the image/video are constrained to. If unset, use the global setting in bigframes.options.display.blob_display_width, otherwise image/video's original size or ratio is used. No-op for other content types.

heightint or None, default None

height in pixels that the image/video are constrained to. If unset, use the global setting in bigframes.options.display.blob_display_height, otherwise image/video's original size or ratio is used. No-op for other content types.

exif

exif(
    *,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
) -> bigframes.series.Series

Extract EXIF data. Now only support image types.

Parameters
NameDescription
connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rowsint, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpuint or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memorystr, default "512Mi"

container memory size. String of the format

Returns
TypeDescription
bigframes.series.SeriesJSON series of key-value pairs.

get_runtime_json_str

get_runtime_json_str(
    mode: str = "R", *, with_metadata: bool = False
) -> bigframes.series.Series

Get the runtime (contains signed URL to access gcs data) and apply the ToJSONSTring transformation.

Parameters
NameDescription
modestr or str, default "R"

the mode for accessing the runtime. Default to "R". Possible values are "R" (read-only) and "RW" (read-write)

with_metadatabool, default False

whether to include metadata in the JSON string. Default to False.

Returns
TypeDescription
strthe runtime object in the JSON string.

image_blur

image_blur(
    ksize: tuple[int, int],
    *,
    dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
) -> bigframes.series.Series

Blurs images.

Parameters
NameDescription
ksizetuple(int, int)

Kernel size.

dststr or bigframes.series.Series or None, default None

Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding.

connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rowsint, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpuint or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memorystr, default "512Mi"

container memory size. String of the format

Returns
TypeDescription
bigframes.series.Seriesblob Series if destination is GCS. Or bytes Series if destination is BQ.

image_normalize

image_normalize(
    *,
    alpha: float = 1.0,
    beta: float = 0.0,
    norm_type: str = "l2",
    dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
) -> bigframes.series.Series

Normalize images.

Parameters
NameDescription
alphafloat, default 1.0

Norm value to normalize to or the lower range boundary in case of the range normalization.

betafloat, default 0.0

Upper range boundary in case of the range normalization; it is not used for the norm normalization.

norm_typestr, default "l2"

Normalization type. Accepted values are "inf", "l1", "l2" and "minmax".

dststr or bigframes.series.Series or None, default None

Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding.

connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rowsint, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpuint or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memorystr, default "512Mi"

container memory size. String of the format

Returns
TypeDescription
bigframes.series.Seriesblob Series if destination is GCS. Or bytes Series if destination is BQ.

image_resize

image_resize(
    dsize: tuple[int, int] = (0, 0),
    *,
    fx: float = 0.0,
    fy: float = 0.0,
    dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
)

Resize images.

Parameters
NameDescription
dsizetuple(int, int), default (0, 0)

Destination size. If set to 0, fx and fy parameters determine the size.

fxfloat, default 0.0

scale factor along the horizontal axis. If set to 0.0, dsize parameter determines the output size.

fyfloat, defalut 0.0

scale factor along the vertical axis. If set to 0.0, dsize parameter determines the output size.

dststr or bigframes.series.Series or None, default None

Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding.

connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rowsint, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpuint or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memorystr, default "512Mi"

container memory size. String of the format

Returns
TypeDescription
bigframes.series.Seriesblob Series if destination is GCS. Or bytes Series if destination is BQ.

md5_hash

md5_hash() -> bigframes.series.Series

Retrieve the md5 hash of the Blob.

Returns
TypeDescription
bigframes.series.Seriesstring of the md5 hash.

metadata

metadata() -> bigframes.series.Series

Retrieve the metadata of the Blob.

Returns
TypeDescription
bigframes.series.SeriesJSON metadata of the Blob. Contains fields: content_type, md5_hash, size and updated(time).

pdf_chunk

pdf_chunk(
    *,
    connection: typing.Optional[str] = None,
    chunk_size: int = 2000,
    overlap_size: int = 200,
    max_batching_rows: int = 1,
    container_cpu: typing.Union[float, int] = 2,
    container_memory: str = "1Gi",
    verbose: bool = False
) -> bigframes.series.Series

Extracts and chunks text from PDF URLs and saves the text as arrays of strings.

Parameters
NameDescription
connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

chunk_sizeint, default 2000

the desired size of each text chunk (number of characters).

overlap_sizeint, default 200

the number of overlapping characters between consective chunks. The helps to ensure context is perserved across chunk boundaries.

max_batching_rowsint, default 1

Max number of rows per batch send to cloud run to execute the function.

container_cpuint or float, default 2

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memorystr, default "1Gi"

container memory size. String of the format

verbosebool, default "False"

controls the verbosity of the output. When set to True, both error messages and the extracted content are displayed. Conversely, when set to False, only the extracted content is presented, suppressing error messages.

Returns
TypeDescription
bigframe.series.Seriesarray[str] or struct[str, array[str]], depend on the "verbose" parameter. where each string is a chunk of text extracted from PDF. Includes error messages if verbosity is enabled.

pdf_extract

pdf_extract(
    *,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 1,
    container_cpu: typing.Union[float, int] = 2,
    container_memory: str = "1Gi",
    verbose: bool = False
) -> bigframes.series.Series

Extracts text from PDF URLs and saves the text as string.

Parameters
NameDescription
connectionstr or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rowsint, default 1

Max number of rows per batch send to cloud run to execute the function.

container_cpuint or float, default 2

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memorystr, default "1Gi"

container memory size. String of the format

verbosebool, default "False"

controls the verbosity of the output. When set to True, both error messages and the extracted content are displayed. Conversely, when set to False, only the extracted content is presented, suppressing error messages.

Returns
TypeDescription
bigframes.series.Seriesstr or struct[str, str], depend on the "verbose" parameter. Contains the extracted text from the PDF file. Includes error messages if verbosity is enabled.

read_url

read_url() -> bigframes.series.Series

Retrieve the read URL of the Blob.

Returns
TypeDescription
bigframes.series.SeriesRead only URLs.

size

size() -> bigframes.series.Series

Retrieve the file size of the Blob.

Returns
TypeDescription
bigframes.series.Seriesfile size in bytes.

updated

updated() -> bigframes.series.Series

Retrieve the updated time of the Blob.

Returns
TypeDescription
bigframes.series.Seriesupdated time as UTC datetime.

uri

uri() -> bigframes.series.Series

URIs of the Blob.

Returns
TypeDescription
bigframes.series.SeriesURIs as string.

version

version() -> bigframes.series.Series

Versions of the Blob.

Returns
TypeDescription
bigframes.series.SeriesVersion as string.

write_url

write_url() -> bigframes.series.Series

Retrieve the write URL of the Blob.

Returns
TypeDescription
bigframes.series.SeriesWritable URLs.