asrch.commands._open package#
Usage examples#
Open Command Usage Examples
Open text from URL:
$ asrch open text --url https://scrapethissite.com/simple/pages Output: [body text of the specified URL]
Browser mode (new in 7.4.0):
$ asrch open text --browse https://srapethissite.com/simple/pages Output: [example] >>> Body text [body text example] [page text example last line] >>> (1) https://example.com/page1, (2) https://example.com/page2, (3) https://example.com/page3, >>> [q]uit, [h]istory ([<] back in history 1) or Enter page number: 3 [q]: quit program [h]: navigate through your session history, e.g., h1 (first element in history), h2 (second element), h3 (third) <: go to last opened page in history Enter page number: go to new page from output above >>> [q]uit, [h]istory ([<] back in history 1) or Enter page number: 3 >>> Navigating to https://example.com/page3
Subpackages#
Module contents#
open command
- asrch.commands._open.get_html(url: str, proxy: str | None, header: bool = False, log=True) str [source]#
Get the HTML source code of a web page.
- Parameters:
url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to False
log (bool) – Flag indicating whether to log actions, defaults to True.
- Returns:
The HTML source code of the web page.
- Return type:
str
- asrch.commands._open.get_image(url: str | None, proxy: str | None, header: bool, download: bool | None = False, workspace: str | None = None) collections.abc.Generator[str, None, None] [source]#
Get the image from a URL.
- Parameters:
url – The URL of the image. If None, returns an empty generator.
proxy – Proxy to be used for the request, defaults to None.
header – Flag indicating whether to include headers in the request.
download – Flag indicating whether to download the image, defaults to False.
- Raises:
ValueError – If the URL is empty.
NoSuchelementexception – If the element can’t be found.
ElementNotVisibleException – If the element isn’t visible in the DOM.
StaleElementReferenceException – If the element can no longer be accessed.
- Returns:
A generator yielding the image content as strings.
- asrch.commands._open.get_js(url: str, proxy: str | None, header: bool = False, log=True) collections.abc.Generator[str, None, None] [source]#
Get the JavaScript content from a web page.
- Parameters:
url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to Fals.
log (bool) – Flag indicating whether to log actions, defaults to True.
- Returns:
A generator yielding the JavaScript content as strings.
- Return type:
Generator[str, None, None]
- asrch.commands._open.get_page(url: str | int, header: bool = False, proxy: str | None = None, log=True, parser='html.parser', browse: bool = False) str | int [source]#
Get the content of a web page.
- Parameters:
url (str) – The URL of the web page.
header (bool, optional) – Flag indicating whether to include headers in the request, defaults to False.
proxy (Optional[str], optional) – Proxy to be used for the request, defaults to None.
log (bool, optional) – Flag indicating whether to log actions, defaults to True.
parser (str, optional) – The parser to use for BeautifulSoup, defaults to ‘lxml’.
browse (bool, optional) – Flag indicating whether to enter browse mode, defaults to False.
- Raises:
ValueError – If the URL is empty.
Exception – If there are issues with the request.
- Returns:
The content of the web page if successful, otherwise an error code.
- Return type:
Union[str, int]
- asrch.commands._open.get_screenshot(url: str, proxy: str | None, header: bool = False, log=True) None [source]#
Take a screenshot of a web page.
- Parameters:
url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to False
log (bool) – Flag indicating whether to log actions, defaults to True.
- asrch.commands._open.highlight_elements(soup)[source]#
Highlight all elements in BeautifulSoup object
- asrch.commands._open.inspect(url: str, proxy: str | None, header: bool = False, log=True, parser='html.parser') str [source]#
Get the HTML source code of a web page.
- Parameters:
url (str) – The URL of the web page.
proxy (str, optional) – Proxy to be used for the request, defaults to None.
header (bool) – Flag indicating whether to include headers in the request, defaults to False
log (bool) – Flag indicating whether to log actions, defaults to True.
- Returns:
The HTML source code of the web page.
- Return type:
str