main#

Entry point for asrch

Main file for asrch which handles all modules CLI argument and command handling

asrch.__main__.browse(url: str = <typer.models.ArgumentInfo object>, proxy: str = <typer.models.OptionInfo object>, log: bool = <typer.models.OptionInfo object>, nodriver: bool = <typer.models.OptionInfo object>, parser: str = <typer.models.OptionInfo object>, images: bool = <typer.models.OptionInfo object>, debug: bool = <typer.models.OptionInfo object>)[source]#

Browse the web.

Parameters:
  • url (str) – The URL to start browsing from (default is “https://asrch.bitbucket.io”).

  • proxy (str, optional) – Proxy to send requests (<ip:port>) [optional].

  • log (bool, optional) – Suppress all logs (for emacs mode).

  • nodriver (bool, optional) – Use requests and BS4 instead of selenium (faster but more detectable).

  • parser (str, optional) – HTML parser to use for nodriver. Choices: html.parser, lxml (default is html.parser).

Returns:

None

Return type:

None

Raises:

ValueError – If invalid parser option is provided.

Example:

To browse using default settings:

>>> browse()

To browse using a proxy and suppress logs:

>>> browse(proxy="127.0.0.1:8080", log=True)

To browse without using a webdriver and specify a parser:

>>> browse(nodriver=True, parser="lxml")
asrch.__main__.ccache()[source]#

Clear browser cache

asrch.__main__.conf()[source]#

Output current config

asrch.__main__.create_workspace(name: str, current_datetime: str, config: str)[source]#
asrch.__main__.ensure_ws_doesnt_exist(ws_name: str) bool[source]#
asrch.__main__.find(url: typing.Annotated[str, <typer.models.OptionInfo object at 0x7054c526d460>], element: typing.Annotated[str, <typer.models.OptionInfo object at 0x7054c526d4f0>], proxy: typing.Annotated[str | None, <typer.models.OptionInfo object at 0x7054c526d580>] = '', header: typing.Annotated[bool, <typer.models.OptionInfo object at 0x7054c526d610>] = False, log: typing.Annotated[bool, <typer.models.OptionInfo object at 0x7054c526d6a0>] = False, locator: typing.Annotated[str, <typer.models.OptionInfo object at 0x7054c526d730>] = 'tag_name')[source]#

Find an element on a web page.

Parameters:
  • url (str) – URL to retrieve.

  • element (str) – Element to return.

  • proxy (Optional[str]) – Proxy to send request (<ip:port>) [optional].

  • header (bool) – Show browser header.

  • log (bool) – Suppress all logs.

  • locator (str) – Locator to find the element.

Default proxy:

“”

Default header:

False

Default log:

False

Default locator:

“tag_name”

Returns:

None

Return type:

None

asrch.__main__.get_ws()[source]#
asrch.__main__.open_(mode: asrch.utils.constants.OpenModes = <typer.models.ArgumentInfo object>, url: str = <typer.models.ArgumentInfo object>, proxy: str = <typer.models.OptionInfo object>, header: bool = <typer.models.OptionInfo object>, browse: bool = <typer.models.OptionInfo object>, pager: bool = <typer.models.OptionInfo object>, download: bool = <typer.models.OptionInfo object>, log: bool = <typer.models.OptionInfo object>, nodriver: bool = <typer.models.OptionInfo object>, parser: str = <typer.models.OptionInfo object>, silent: bool = <typer.models.OptionInfo object>, inspect_mode: str = <typer.models.OptionInfo object>)[source]#

Open a URL and perform various operations.

Parameters:
  • mode (OpenModes) – Mode for the command.

  • url (str) – URL to retrieve.

  • proxy (str, optional) – Proxy to send request <ip:port> [optional].

  • header (bool, optional) – Show browser header.

  • browse (bool, optional) – Enable browsing (using keyboard inputs to open URLs).

  • pager (bool, optional) – Output [JS] in pager.

  • download (bool, optional) – Download all scraped images.

  • log (bool, optional) – Suppress all logs (for emacs mode).

  • nodriver (bool, optional) – Use requests and BS4 instead of selenium (faster but more detectable).

  • parser (str, optional) – HTML parser to use for nodriver. Choices: html.parser, lxml.

  • silent (bool, optional) – Output to text file instead of terminal (defaults to workspace folder).

  • inspect_mode (str, optional) – What part of the page you would like to inspect.

Note:
  • For log, this option is intended for emacs mode and should be ignored in normal CLI mode, but can be used if needed.

  • inspect_mode with -M js can produce a large output (~5k line history record).

asrch.__main__.search(query: str = <typer.models.ArgumentInfo object>, proxy: str = <typer.models.OptionInfo object>, browse: bool = <typer.models.OptionInfo object>, header: bool = <typer.models.OptionInfo object>, log: bool = <typer.models.OptionInfo object>)[source]#

Search function to perform a search operation.

Parameters:
  • header (Annotated[bool, typer.Option(help="show browser header")]) – Show browser header. Annotated with bool.

  • proxy (Annotated[str, typer.Option(help="proxy to send request <IP:port> [optional]")]) – Proxy to send request <IP:port> [optional]. Annotated with str.

  • log (bool) – toggle logging message visibility

Default:

false

: This flag is intended for the emacs plugin and is not made to

be used within the normal CLI mode however you can use it if you like.

asrch.__main__.workspace(initialize: bool = <typer.models.OptionInfo object>, create: bool = <typer.models.OptionInfo object>, config: str = <typer.models.OptionInfo object>, name: str = <typer.models.OptionInfo object>, delete: bool = <typer.models.OptionInfo object>)[source]#

Perform operations related to workspaces. Depending on the options provided, this function can initialize, create, delete, or perform other actions related to workspaces.

Parameters:
  • initialize (bool) – Flag to initialize workspace.

  • create (bool) – Flag to create workspace.

  • config (str) – The name of the workspace to act on.

  • name (str) – The name of the workspace to act on.

  • delete (bool) – Flag to delete workspace.