Concept Readers

Running an embeddings building pipeline requires a component that can read Concepts from input.

Classes

PostgresConceptExtractor

class PostgresConceptExtractor(ConceptReader):
    db_connector: PGConnector
    batch_size: int
    logger: Logger

A ConceptReader that connects to a PostgreSQL database and reads the concepts.

AttributeDescription
db_connectorA PGConnector that can configure a connection to a database
batch_sizeAn integer specifying the size of batches to fetch from the database
loggerlog database interactions

Methods

load_concept_batch
load_concept_batch() -> Generator[list[Concept]]

Returns a Generator for batches of Concepts from the database

Yields Generator[list[Concept]]

batch_size concepts from the database

load_concepts
load_concepts() -> list[Concept]

Returns the concepts from the database. This could be several million.

Returns list[Concept]

All the concepts from the database

CsvConceptExtractor

class CsvConceptExtractor(ConceptReader):
    path: Path
    batch_size: int
    table_schema: dict[str, polars.DataType] = CONCEPT_SCHEMA

A ConceptReader that connects to a CSV file in the Athena standard format and reads the concepts.

AttributeTypeDescription
pathPathThe path to a CSV file of concepts
batch_sizeintThe number of concepts to retrieve at a time
table_schemadict[str, polars.DataType]A schema for the concept table. The default works for files from Athena

Methods

load_concept_batch() -> Generator[list[Concept]]

Returns a Generator for batches of Concepts from the file

Yields Generator[list[Concept]]

batch_size concepts from the database

load_concepts
load_concepts() -> list[Concept]

Returns the concepts from the file. This could be several million.

Returns list[Concept]

All the concepts from the file

Functions

parse_rows

def parse_rows(rows: list[tuple[int, str, str, str, str]]) -> list[Concept]:

Take rows from the concept table and build a list of Concepts for them

ParameterTypeDescription
rowslist[tuple[int, str, str, str, str]]A row from the concepts table

Returns list[Concept]

The concepts as Concept objects

Constants

CONCEPT_SCHEMA

A schema for polars to parse the CSV files downloaded from Athena