artvee_scraper package
Submodules
artvee_scraper.artvee_client module
- class artvee_scraper.artvee_client.ArtveeClient(conn_timeout=3.05, read_timeout=10, max_attempts=3)[source]
Bases:
objectHTTP client for interacting with the Artvee API.
- Constants:
- _HTTP_CONN_TIMEOUT_SEC (float):
Default number of seconds to wait to establish a connection to a remote machine.
- _HTTP_READ_TIMEOUT_SEC (float):
Default number of seconds the client will wait for the server to send a response.
- _ITEMS_PER_PAGE (int):
Maximum number of items to retrieve per page in API requests.
- _TITLE_PATTERN (re.Pattern):
Regex pattern for extracting title and date. ex: Landscape with Weather Vane (1935); group 1 = title (ex: Landscape with Weather Vane), group 2 = date (ex: 1935)
- _ARTIST_PATTERN (re.Pattern):
Regex pattern for extracting artist name and origin. ex: Arthur Dove(American, 1880-1946); group 1 = artist name (ex: Arthur Dove), group 2 = origin (ex: American, 1880-1946)
- _RESOURCE_PATTERN (re.Pattern):
Regex pattern for extracting the resource name. ex: https://artvee.com/dl/zwei-tanzende/; group 1 = resource (ex: zwei-tanzende)
- _IMG_DIMENSION_PATTERN (re.Pattern):
Regex pattern for extracting image dimensions. ex: 1800 x 1185px; group 1 = width (ex: 1800), group 2 = height (ex: 1185)
- _IMG_FILE_SIZE_PATTERN (re.Pattern):
Regex pattern for extracting image file size and unit.ex: 1.82 MB; group 1 = size (ex: 1.82), group 2 = unit (ex: MB)
- Attributes:
- _timeout (tuple[float, float]):
Timeouts to use for HTTP requests.
- _session (Session):
Allows persistance of parameters across HTTP requests.
- Args:
- conn_timeout (float, optional):
Number of seconds to wait to establish a connection to a remote machine. Defaults 3.05 seconds.
- read_timeout (float, optional):
Number of seconds the client will wait for the server to send a response. Defaults to 10 seconds.
- max_attempts (int, optional):
The maximum number of attempts (including the initial call). Must be between 1 and 10. Defaults to 3 (initial call + two retries).
- Raises:
- ValueError:
If conn_timeout is not positive. If read_timeout is not positive. If max_attempts is not in the range [1, 10].
- get_image(img_metadata)[source]
Retrieve the image data.
- Return type:
- Args:
- img_metadata (ImageMetadata):
Information that describes attributes of an artwork.
- Returns:
- bytes:
The raw JPG image data.
- Raises:
- requests.exceptions.HTTPError:
If the HTTP request returns an unsuccessful status code.
- get_metadata(category, page)[source]
Retrieve artwork metadata for a specified category and page.
- Return type:
- Args:
- category (CategoryType):
The category for which to retrieve artwork metadata.
- page (int):
The page number to retrieve the metadata from. Pages are indexed starting at 1.
- Returns:
- List[Tuple[ArtworkMetadata, ImageMetadata]]:
A list where each tuple represents the attributes of an artwork. ArtworkMetadata: Attributes of an artwork. ImageMetadata: Attributes of an image file.
- Raises:
- requests.exceptions.HTTPError:
If the HTTP request returns an unsuccessful status code.
- get_page_count(category)[source]
Retrieve the total number of webpages for a given category.
- Return type:
- Args:
- category (CategoryType):
The category for which to retrieve the page count.
- Returns:
int: The total number of pages available for the specified category.
- Raises:
- requests.exceptions.HTTPError:
If the HTTP request returns an unsuccessful status code.
- ValueError:
If the total items cannot be parsed / converted to an integer.
artvee_scraper.artwork module
- class artvee_scraper.artwork.Artwork(url, resource, title, category, artist='Unknown Artist', date='n.d.', origin=None, image=None)[source]
Bases:
ArtworkMetadataRepresents an artistic work.
- Attributes:
- image (Image, optional):
The image, including associated metadata. Defaults to None
- class artvee_scraper.artwork.ArtworkMetadata(url, resource, title, category, artist='Unknown Artist', date='n.d.', origin=None)[source]
Bases:
objectInformation that describes attributes of an artwork.
- Attributes:
- url (str):
Artwork URL (ex: https://artvee.com/dl/zwei-tanzende/)
- resource (str):
Unique name of artwork; extracted from the URL (ex: zwei-tanzende)
- title (str):
Name of the artwork (ex: Zwei Tanzende)
- category (str):
Category the work of art is depicting (ex: abstract)
- artist (str, optional):
Name of the person that created this artwork. Defaults to Unknown Artist
- date (str, optional):
Year or date range the artwork was completed (ex: 2012 - 2019). Defaults to n.d., no date
- origin (str, optional):
Artist nationality and lifespan (ex: Austrian, 1834-1921). Defaults to None
- class artvee_scraper.artwork.CategoryType(value)[source]
Bases:
EnumEnumeration for different categories of art.
- Attributes:
- ABSTRACT (str):
Art that uses shapes, colors, forms, and gestural marks rather than aiming for an accurate representation of visual reality.
- FIGURATIVE (str):
Art that represents recognizable subjects, particularly the human figure, focusing on real-world forms.
- LANDSCAPE (str):
Art that depicts natural scenes, often focusing on the beauty and atmosphere of the environment.
- RELIGION (str):
Art that conveys spiritual themes or depicts subjects related to faith, spirituality, and religious practices.
- MYTHOLOGY (str):
Art that illustrates or interprets themes, characters, and stories from myths and legends, often exploring cultural beliefs and narratives.
- POSTERS (str):
Art designed for display and promotion, often featuring bold imagery and text to convey a message or advertise events, products, or causes.
- ANIMALS (str):
Art that focuses on the representation of animals, capturing their form, behavior, and characteristics.
- ILLUSTRATION (str):
Art that creates images to accompany text or convey a narrative, often found in books, magazines, and advertising.
- STILL_LIFE (str):
Art that depicts inanimate objects, such as fruits, flowers, and everyday items.
- BOTANICAL (str):
Art that focuses on the representation of plants and flowers, often emphasizing accuracy and detail to depict their beauty and scientific characteristics.
- DRAWINGS (str):
Art form created using various mediums, such as pencil, charcoal, or ink, to render images through lines and shading, often capturing ideas, sketches, or detailed representations.
- ASIAN_ART (str):
Art that represents styles and techniques from Asian cultures, often reflecting traditional themes, motifs, and cultural significance.
- ABSTRACT = 'abstract'
- ANIMALS = 'animals'
- ASIAN_ART = 'asian-art'
- BOTANICAL = 'botanical'
- DRAWINGS = 'drawings'
- FIGURATIVE = 'figurative'
- ILLUSTRATION = 'illustration'
- LANDSCAPE = 'landscape'
- MYTHOLOGY = 'mythology'
- POSTERS = 'posters'
- RELIGION = 'religion'
- STILL_LIFE = 'still-life'
- class artvee_scraper.artwork.Image(source_url=None, width=0, height=0, file_size=0, file_size_unit=None, raw=None, format_name='jpg')[source]
Bases:
ImageMetadataRepresents a graphical image.
- Attributes:
- raw (bytes, optional):
The raw binary data of the image. Defaults to None
- format_name (str, optional):
The format of the image file - identifies how the image should be processed/displayed. Defaults to jpg.
- class artvee_scraper.artwork.ImageMetadata(source_url=None, width=0, height=0, file_size=0, file_size_unit=None)[source]
Bases:
objectInformation that describes attributes of an image file.
- Attributes:
- source_url (str, optional):
The URL where the image is sourced from. Defaults to None
- width (int, optional):
The width of the image in pixels. Defaults to 0
- height (int, optional):
The height of the image in pixels. Defaults to 0
- file_size (float, optional):
The size of the image file. Defaults to 0
- file_size_unit (str, optional):
The unit of the file size (ex: “KB”, “MB”). Defaults to None
artvee_scraper.scraper module
- class artvee_scraper.scraper.ArtveeScraper(artvee_client=<artvee_scraper.artvee_client.ArtveeClient object>, worker_threads=3, categories=(CategoryType.ABSTRACT, CategoryType.FIGURATIVE, CategoryType.LANDSCAPE, CategoryType.RELIGION, CategoryType.MYTHOLOGY, CategoryType.POSTERS, CategoryType.ANIMALS, CategoryType.ILLUSTRATION, CategoryType.STILL_LIFE, CategoryType.BOTANICAL, CategoryType.DRAWINGS, CategoryType.ASIAN_ART))[source]
Bases:
objectA web scraper which concurrently extracts artwork from Artvee. Callbacks are notified asynchronously for each scraped artwork so that user-defined actions may be taken.
- Attributes:
- _artvee_client (ArtveeClient):
An HTTP client for accessing artwork.
- _worker_pool (ThreadPoolExecutor):
A pool of threads to manage concurrent scraping tasks.
- _categories (Tuple[CategoryType]):
Category types to scrape.
- _boss_thread (Thread):
The main thread responsible for executing the scraping logic; delegates tasks to workers.
- _stop_event (Event):
Signal to indicate the scraping process should be halted.
- _listener_lock (Lock):
A lock which provides access the listeners in a thread-safe manner.
- _listeners (dict):
A hashset of callbacks to invoke asynchronously.
- Args:
- artvee_client (ArtveeClient, optional):
An HTTP client for accessing artwork. Defaults to a new instance.
- worker_threads (int, optional):
The number of worker threads to use for processing. Must be between 1 and 10. Defaults to 3.
- categories (Tuple[CategoryType], optional):
Category types to scrape. Defaults to all categories.
- Raises:
- ValueError:
If worker_threads is not in the range [1, 10].
- deregister_listener(on_event_listener)[source]
Deregisters a callback so that it will no longer be notified of events.
- Return type:
Self
- Args:
- on_event_listener (Callable[[Artwork, Exception | None]], None]):
The callback to no longer notify when an event occurs.
- Returns:
self for method chaining
- join()[source]
Blocks the calling thread until all active tasks have been completed.
- Return type:
- Returns:
None
- register_listener(on_event_listener)[source]
Registers a callback to be notified of events asynchronously.
A callback may only be registered once; additional attempts to register the same callback will overwrite the previous registration. The success notification contains a fully populated Artwork, whereas the failure notification contains a partially populated Artwork and associated exception.
- Return type:
Self
- Args:
- on_event_listener (Callable[[Artwork, Exception | None], None]):
A callback function that will be notified async when an event occurs.
- Raises:
- ValueError:
If the provided on_event_listener argument is not callable.
- Returns:
self for method chaining