src.checks package¶

Submodules¶

src.checks.checked_in_binaries module¶

Check that determines the file type for every file in the project and compares it to a blacklist of binary executable file formats

class src.checks.checked_in_binaries.FactHelperFileFileType(ft: Dict[str, str])[source]¶

Bases: FileTypeInterface

Parameters:: ft (Dict[str, str]) –

__init__(ft: Dict[str, str]) → None[source]¶

Parameters:: ft (Dict[str, str]) –
Return type:: None

_key() → Tuple[Hashable, ...][source]¶

Used to decide equality in comparisons and hash based lookups

Return type:: Tuple[Hashable, …]

__module__¶

class src.checks.checked_in_binaries.FactHelperFile[source]¶

Bases: FileTypeToolInterface

file_type_of(file: Path) → FactHelperFileFileType[source]¶

Parameters:: file (Path) –
Return type:: FactHelperFileFileType

__module__¶

class src.checks.checked_in_binaries.CheckedInBinaries(*args, **kwargs)[source]¶

Bases: CheckInterface

Represents a check that determies the file type for every file in the project and compares it to a blacklist of binary executable file formats

blacklist_dir: Path¶

exclude: Pattern¶

blacklist: Set[FileTypeInterface]¶

whitelist: Set[FileTypeInterface]¶

fileTypeTools: List[Type[FileTypeToolInterface]]¶

__init__(*args, **kwargs)[source]¶

__update_blacklist() → None¶

Return type:: None

__init_blacklist() → None¶

Return type:: None

__is_too_generic(file_type: FileTypeInterface) → bool¶

Parameters:: file_type (FileTypeInterface) –
Return type:: bool

_run_all_tools() → None[source]¶

For each available tool or library the set of detected file types is determined. All files with illegal file types are recorded.

Return type:: None

_format_findings() → Dict[str, Dict[str, List[str]]][source]¶

Return type:: Dict[str, Dict[str, List[str]]]

__format_findings(findings: Dict[FileTypeInterface, List[Path]]) → Dict[str, List[str]]¶

Parameters:: findings (Dict[FileTypeInterface, List[Path]]) –
Return type:: Dict[str, List[str]]

_is_ok(file_type: FileTypeInterface) → bool[source]¶

Parameters:: file_type (FileTypeInterface) –
Return type:: bool

_calc_score() → float[source]¶

Return type:: float

_determine_violations()[source]¶

run(args_dict: Dict[str, Any] | None = None) → Dict[str, Any][source]¶

Parameters:: args_dict (Dict[str, Any] | None) –
Return type:: Dict[str, Any]

__annotations__¶

__module__¶

src.checks.comments_in_code module¶

Implementation of the “Comments in Code” check

class src.checks.comments_in_code.CommentsInCode(*args: Any, **kwargs: Dict[str, Any])[source]¶

Bases: CheckInterface

Implementation of the “Comments in Code” check

This check essentially just runs tokei and divides comment lines by combined comment and code lines. There is some additional logic to handle programming languages.

Parameters:

args (Any) –
kwargs (Dict[str, Any]) –

__init__(*args: Any, **kwargs: Dict[str, Any]) → None[source]¶

Parameters:

args (Any) –
kwargs (Dict[str, Any]) –

Return type:

None

__load_tokei_to_linguist() → Dict[str, str | None]¶

Return type:: Dict[str, str | None]

__compute_l_check(tokei_to_linguist: Dict[str, str | None]) → Set[str]¶

Compute L_check via relation L_check = Im(A) {None}

Parameters:: tokei_to_linguist (Dict[str, str | None]) –
Return type:: Set[str]

__have_tokei() → bool¶

Return type:: bool

__fetch_linguist() → Dict[str, float]¶

Return type:: Dict[str, float]

__compute_l_repo(l_of_r: Set[str], l_check: Set[str]) → Set[str]¶

Parameters:

l_of_r (Set[str]) –
l_check (Set[str]) –

Return type:

Set[str]

__run_tokei() → Dict[str, Dict[str, int]]¶

Return type:: Dict[str, Dict[str, int]]

_tokei(lang: str) → float[source]¶

Map that takes a language in L_repo to its comments to code ratio

Parameters:: lang (str) –
Return type:: float

_sigma(value: float) → float[source]¶

Scoring function that receives the average comments to code ratio as an input and maps it to the final score.

Needed since we cannot expect a project to be 100% comments to receive a perfect score

Parameters:: value (float) –
Return type:: float

_compute_tokei() → Dict[str, float][source]¶

Returns:: The computed tokei map
Return type:: Dict[str, float]

_compute_score(lang_ratios: Dict[str, float]) → float[source]¶

Parameters:: lang_ratios (Dict[str, float]) –
Return type:: float

run(args_dict: Dict[str, Any] | None = None) → Dict[str, Any][source]¶

Parameters:: args_dict (Dict[str, Any] | None) –
Return type:: Dict[str, Any]

__annotations__¶

__module__¶

src.checks.existence_of_documentation_infrastructure module¶

This module contains the implementation of the Existence of Documentation Infrastructure check.

The check performs a bunch of heuristics to estimate the amount of documentation that exists for the piece of software developed in a repository.

src.checks.existence_of_documentation_infrastructure.logger: Logger¶: Def: “Plain in-tree documentation” is defined as software documentation that is directly managed by the git vcs. In particular, no further steps are necessary to obtain the final form of the documentation after a checkout of the repository har been performed.

class src.checks.existence_of_documentation_infrastructure.PlainInTreeFile(repo: Repo, api: Project)[source]¶

Bases: DocumentationTypeInterface

Def: Plain in-tree documentation is said to be “file” if it is contained within plain text files that are placed in the same (sub)tree as non-documentation related files.

In practice this check does two things: 1. It has a simple whitelist of file names that are automatically considered to contain documentation when they are found in the repository. 2. It searches text files in the repository for links that point to files in the repository itself. It then uses those links to find the files locally.

Finally, the set of all these files (set is deduplicated) is used to compute the amount of documentation.

Parameters:

repo (Repo) –
api (Project) –

DOC_FILE_NAME_WHITELIST: Set[str]¶

Generated on local dump of OpenCoDE using (with some manually curation): .. code block:: bash

for f in $(fd -t f -i –regex ‘.(md|txt|rst)$’); do;
b=$(basename “$f” | rg -vi

‘(license|changelog|security|contrib|test|release|conduct)’);

[ ! -z $b ] && echo $b; done | sort | uniq -c | sort -n | tail -n 100

_url_to_file(url: str) → Path | None[source]¶

Tries its very best to convert a url (probably a link to a file in the remote GitLab repository) to a local file in our checkout. It is kinda important to keep in mind that the input is entirely untrusted and path traversal issues must be avoided (even though we would only open the file and report its number of characters).

Parameters:: url (str) – url that points to some file in the remote repo
Returns:: local copy of the file
Return type:: Path | None

_find_doc_files_from_links() → Set[Path][source]¶

Checks for links to documentation that point back to the repository itself, both in the publiccode.yml and in text files. Then it tries to find the respective files locally.

Returns:: Set of local documentation files that were found in that way.
Return type:: Set[Path]

static _doc_file_filter(file_name: str) → bool[source]¶

Used to filter out all non-documentation files when iterating over a repository.

Parameters:: file_name (str) – name of the file to decide
Returns:: True iff file should be skipped
Return type:: bool

delta() → Tuple[float, int][source]¶

Restriction of the delta map to the documentation type represented by the implementor and the repository specified during the construction of this instance.

Returns:: confidence into the result, and amount of documentation detected
Return type:: Tuple[float, int]

__annotations__¶

__module__¶

class src.checks.existence_of_documentation_infrastructure.PlainInTreeFolder(repo: Repo, api: Project)[source]¶

Bases: DocumentationTypeInterface

Def: Plain in-tree documentation is said to be “folder” if there exists a subtree that is solely comprised of documentation.

Im practice, we simply look for folders that are named something like *doc*. We then recursively count characters in this subtree (text files only).

Parameters:

repo (Repo) –
api (Project) –

DOC_FOLDER_RE¶: used to match directory names that contain documentation

classmethod _doc_dir_predicate(dir_name: str) → bool[source]¶

Parameters:: dir_name (str) – name of the directory to decide
Returns:: True iff the directory name indicates that the directory holds documentation.
Return type:: bool

_find_doc_dirs() → Iterable[Path][source]¶

Returns all subtrees that are likely to hold only documentation.

Return type:: Iterable[Path]

_count_docs() → int[source]¶

Returns:: number of non-whitespace characters in some kinds of text files that live within directories that maybe contain documentation.
Return type:: int

delta() → Tuple[float, int][source]¶

Restriction of the delta map to the documentation type represented by the implementor and the repository specified during the construction of this instance.

Returns:: confidence into the result, and amount of documentation detected
Return type:: Tuple[float, int]

__annotations__¶

__module__¶

class src.checks.existence_of_documentation_infrastructure.OutOfTreeExternal(repo: Repo, api: Project)[source]¶

Bases: DocumentationTypeInterface

Def: “External out-of-tree documentation” is defined as any software documentation that can not be generated from the contents of the source code repository of the software and is not integrated into a management software that is wrapping the git repository. For example, this includes manually curated documentation that is hosted on an external website.

In practice, this check does two things: 1. Regex: It searches links with something like “docs” in their preview text within some kinds of text files. It then takes all links that are not pointing at the repository itself and marks them as external documentation (with low confidence). 2. It searches the ‘publiccode.yml’ and checks for keys that point at documentation. If they do not point at the project, it counts them as external documentation with high confidence. As we can not (and don’t want to) scrape the referenced websites in some way, the “amount” of documentation behind an external link is just a hard-coded value.

Parameters:

repo (Repo) –
api (Project) –

HARD_CODED_SCORE: float¶: If some docs are found, the amount returned by the delta method will always evaluate to this score.

_get_amount() → int[source]¶

Generates a value for the amount of documentation that was found. Since we do not want to scrape websites this is just some hard-coded value that leads to a score we like.

Returns:: The amount that leads to the hard-coded score
Return type:: int

delta() → Tuple[float, int][source]¶

Restriction of the delta map to the documentation type represented by the implementor and the repository specified during the construction of this instance.

Returns:: confidence into the result, and amount of documentation detected
Return type:: Tuple[float, int]

__annotations__¶

__module__¶

class src.checks.existence_of_documentation_infrastructure.OutOfTreeWiki(repo: Repo, api: Project)[source]¶

Bases: DocumentationTypeInterface

Def: “Wiki out-of-tree documentation” is defined as any software documentation that is integrated into the management software that wraps the software’s git repository. In particular, this encompasses documentation that was generated from the source code repository and then made available via this method.

In the case of OpenCoDE, we check the Wiki pages of a project.

API: https://python-gitlab.readthedocs.io/en/stable/gl_objects/wikis.html

Parameters:

repo (Repo) –
api (Project) –

_fetch_wiki_pages() → Dict[str, int][source]¶

Returns:: mapping of wiki page names to number of non-whitespace characters on the page
Return type:: Dict[str, int]

delta() → Tuple[float, int][source]¶

Restriction of the delta map to the documentation type represented by the implementor and the repository specified during the construction of this instance.

Returns:: confidence into the result, and amount of documentation detected
Return type:: Tuple[float, int]

__annotations__¶

__module__¶

class src.checks.existence_of_documentation_infrastructure.ExistenceOfDocumentationInfrastructure(proj: Project, repo: Repo, api: Gitlab)[source]¶

Bases: CheckInterface

Implementation of the Existence of Documentation Infrastructure check.

The class only contains the high-level logic of this check. It computes the mapping delta with the help of the specialized documentation type classes and then performs the score calculation based on that.

Parameters:

proj (Project) –
repo (Repo) –
api (Gitlab) –

DEFAULT_RISE: float¶: 0.5.

doc_types: List[Type[DocumentationTypeInterface]]¶: Specialized classes for detecting and counting the different kinds of documentation that we defined.

static _sigma(x: int, a: float = 9.967226258835993) → float[source]¶

Parameters:

x (int) – amount [0,+infty)
a (float) –

Returns:

score in [0,1)

Return type:

float

static sigma_inv(y: float, a: float = 9.967226258835993) → float[source]¶

Parameters:

y (float) – score in [0,1)
a (float) –

Returns:

amount in [0, +infty)

Return type:

float

static _sigma_inv_2(x: int, y: float) → float[source]¶

Parameters:

x (int) – amount in [0, +infty)
y (float) – score in [0,1)

Float:

exponent such that y = sigma(x)

Return type:

float

_compute_delta() → List[Tuple[str, float, int]][source]¶

Returns:: Pre-computed mapping delta for the current repository.
Return type:: List[Tuple[str, float, int]]

_score(delta: List[Tuple[str, float, int]]) → float[source]¶

Returns:: The final score of the current repository.
Parameters:: delta (List[Tuple[str, float, int]]) –
Return type:: float

_detailed_results(delta: List[Tuple[str, float, int]]) → Dict[str, List[Dict[str, Any]]][source]¶

Returns:: Check specific results with some details about the different kinds of documentation that were detected.
Parameters:: delta (List[Tuple[str, float, int]]) –
Return type:: Dict[str, List[Dict[str, Any]]]

__annotations__¶

__module__¶

run(args_dict: Dict[str, Any] | None = None) → Dict[str, Any][source]¶

Parameters:: args_dict (Dict[str, Any] | None) –
Return type:: Dict[str, Any]

src.checks.interfaces_checked_in_binaries module¶

class src.checks.interfaces_checked_in_binaries.FileTypeInterface[source]¶

Bases: object

Represents a recognized file type

__init__()[source]¶

_key() → Tuple[Hashable, ...][source]¶

Used to decide equality in comparisons and hash based lookups

Return type:: Tuple[Hashable, …]

__repr__() → str[source]¶

Return repr(self).

Return type:: str

__str__() → str[source]¶

Return str(self).

Return type:: str

__hash__() → int[source]¶

Return hash(self).

Return type:: int

__eq__(other) → bool[source]¶

Return self==value.

Return type:: bool

__annotations__¶

__dict__¶

__module__¶

__weakref__¶: list of weak references to the object (if defined)

class src.checks.interfaces_checked_in_binaries.FileTypeToolInterface[source]¶

Bases: object

Represents a tool that implements the mapping file -> file_type

__init__()[source]¶

class property name: str¶

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

__repr__() → str[source]¶

Return repr(self).

Return type:: str

__str__() → str[source]¶

Return str(self).

Return type:: str

file_type_of(file: Path) → FileTypeInterface[source]¶

Parameters:: file (Path) –
Return type:: FileTypeInterface

__annotations__¶

__dict__¶

__module__¶

__weakref__¶: list of weak references to the object (if defined)

src.checks.interfaces_existence_of_documentation_infrastructure module¶

This module contains the interfaces and common functionality used by the Existence of Documentation Infrastructure check.

class src.checks.interfaces_existence_of_documentation_infrastructure.DocumentationTypeInterface(repo: Repo, api: Project)[source]¶

Bases: Named

Abstracts over the different kinds of documentation that a project might have. The business logic for finding and scoring documentation is in the implementing classes, this interface is used by the main check class to compute the final score.

The class also contains some helpers for common operations.

Parameters:

repo (Repo) –
api (Project) –

TEXT_FILE_REGEX: Pattern[str]¶: used to filter files that are likely not plain text

LINK_PATTERN: Pattern[str]¶: used to find markdown links to documentation

class PubbliccodeymlDocLink(type, url)¶

Bases: tuple

retuned by methods that collect links to documentation

Parameters:

type (str) –
url (str) –

__annotations__¶

__getnewargs__()¶: Return self as a plain tuple. Used by copy and pickle.

__match_args__¶

__module__¶

static __new__(_cls, type: str, url: str)¶

Create new instance of PubbliccodeymlDocLink(type, url)

Parameters:

type (str) –
url (str) –

__repr__()¶: Return a nicely formatted representation string

__slots__¶

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults¶

_fields¶

classmethod _make(iterable)¶: Make a new PubbliccodeymlDocLink object from a sequence or iterable

_replace(**kwds)¶: Return a new PubbliccodeymlDocLink object replacing specified fields with new values

type: str¶: Alias for field number 0

url: str¶: Alias for field number 1

class ScrapedDocLink(file, preview, url)¶

Bases: tuple

Parameters:

file (str) –
preview (str) –
url (str) –

__annotations__¶

__getnewargs__()¶: Return self as a plain tuple. Used by copy and pickle.

__match_args__¶

__module__¶

static __new__(_cls, file: str, preview: str, url: str)¶

Create new instance of ScrapedDocLink(file, preview, url)

Parameters:

file (str) –
preview (str) –
url (str) –

__repr__()¶: Return a nicely formatted representation string

__slots__¶

_asdict()¶: Return a new dict which maps field names to their values.

_field_defaults¶

_fields¶

classmethod _make(iterable)¶: Make a new ScrapedDocLink object from a sequence or iterable

_replace(**kwds)¶: Return a new ScrapedDocLink object replacing specified fields with new values

file: str¶: Alias for field number 0

preview: str¶: Alias for field number 1

url: str¶: Alias for field number 2

RM_WHITESPACE_MAP: Dict[int, Literal[None]]¶

__init__(repo: Repo, api: Project) → None[source]¶

Parameters:

repo (Repo) –
api (Project) –

Return type:

None

_is_external_url(url: str | None) → bool[source]¶

Checks if a link points to a target outside of OpenCoDE.

Parameters:: url (str | None) – url to decide
Returns:: True iff the url does not point to OpenCoDE
Return type:: bool

_docs_in_publiccodeyml(only_external: bool = False, only_internal: bool = False) → List[PubbliccodeymlDocLink][source]¶

Checks if the publiccode.yaml exists, and if it does, whether it contains links to documentation. Optionally returns only links that point back to the project itself, or only links that point to an URL outside of OpenCoDE.

Returns:

Tuples of (documentation type, link target) for all doc links that were found.

Parameters:

only_external (bool) –
only_internal (bool) –

Return type:

List[PubbliccodeymlDocLink]

_collect_doc_links(only_external: bool = False, only_internal: bool = False) → List[ScrapedDocLink][source]¶

Scans some kinds of text files in the repository for links that have something like *docs* in the preview text. Optionally returns only links that point back to the project itself, or only links that point to an URL outside of OpenCoDE.

Para only_external:

return only links that point to a location outside of OpenCoDE

Parameters:

only_internal (bool) – return only links that point back to the project itself
only_external (bool) –

Returns:

Tuples of (file name, link preview text, link target) for all doc links that were found.

Return type:

List[ScrapedDocLink]

_amount(files: Iterable[Path]) → int[source]¶

Returns:: Returns total number on non-whitespace characters in files.
Parameters:: files (Iterable[Path]) –
Return type:: int

classmethod _text_file_filter(file_name: str) → bool[source]¶

Parameters:: file_name (str) –
Return type:: bool

_get_publiccodeyml() → Dict[str, Any] | None[source]¶

Try to find and parse the projects publiccode.yaml.

Returns:: a mapping that contains the parsed file
Return type:: Dict[str, Any] | None

_remove_whitespace(s: str) → str[source]¶

Returns:: input string with all non-whitespace characters removed
Parameters:: s (str) –
Return type:: str

delta() → Tuple[float, int][source]¶

Restriction of the delta map to the documentation type represented by the implementor and the repository specified during the construction of this instance.

Returns:: confidence into the result, and amount of documentation detected
Return type:: Tuple[float, int]

__annotations__¶

__module__¶

src.checks.interfaces_sast_usage_basic module¶

src.checks.interfaces_secrets module¶

class src.checks.interfaces_secrets.SecretInterface[source]¶

Bases: object

Represents a single secret that was found by some tool

__init__()[source]¶

summarize() → dict[source]¶

Return type:: dict

__dict__¶

__module__¶

__weakref__¶: list of weak references to the object (if defined)

class src.checks.interfaces_secrets.SecretsToolInterface[source]¶

Bases: object

Represents a tool that can be applied to a project in order to discover secrets

__init__()[source]¶

class property name¶

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

check_file(f: Path) → None[source]¶

Checks the file ‘f’ for secrets and adds any found secrets to the internal state of the tool.

Parameters:: f (Path) –
Return type:: None

check_files(files: Iterable[Path]) → None[source]¶

Checks the ‘files’ for secrets and adds any found secrets to the internal state of the tool.

Parameters:: files (Iterable[Path]) –
Return type:: None

create_or_overwrite_baseline(project_id: int)[source]¶

Creates a new, or overwites an existing, baseline using the tool’s internal state

Parameters:: project_id (int) –

update_baseline(project_id: int) → None[source]¶

Uses the tool’s internal state to update an existing or create a new baseline on disk. Essentially performs an intersection between the internal state and the baseline if there is one, else it just writes the internal state to disk.

Important: This method might change the tool’s internal state.

Parameters:: project_id (int) –
Return type:: None

diff_vs_baseline(project_id: int) → Iterable[SecretInterface][source]¶

The list of all secrets the tool found in a project that are not in the baseline.

Parameters:: project_id (int) –
Return type:: Iterable[SecretInterface]

delete_baseline(project_id: int) → None[source]¶

Removes any peristed baselines this tool has for the given project. No-op if there is no baseline.

Parameters:: project_id (int) –
Return type:: None

property detected_secrets: Iterable[SecretInterface]¶: The list of all secrets the tool found in a project. Essentially the tools internal state.

__dict__¶

__module__¶

__weakref__¶: list of weak references to the object (if defined)

src.checks.sast_usage_basic module¶

Implementation of the SastUsageBasic check

class src.checks.sast_usage_basic.SastToolKind(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Bases: Enum

Enumerates the different classes of SAST tools that we differentiate between in our check. Each SastTool has one and it determines “how good” it is if we detect it in a project.

LINTER¶

SECURITY¶

SECRET¶

SCA¶

classmethod weight(kind: SastToolKind) → float[source]¶

Encodes “how good” it is if we detect the tool in a project.

Parameters:: kind (SastToolKind) –
Return type:: float

__module__¶

class src.checks.sast_usage_basic.SastTool(tool_json: Dict[str, Any])[source]¶

Bases: object

Represents a tool that we can hope to detect in a project.

Parameters:: tool_json (Dict[str, Any]) –

default_special_regex_values: Dict[str, Pattern[str]]¶

default_language_regex_values: Dict[str, str]¶

__init__(tool_json: Dict[str, Any])[source]¶

Parameters:: tool_json (Dict[str, Any]) – A map describing the tool; can be accessed by indexing the instance; domain is defined by the tool JSON schema

__add_source_file_regex(tool_json: Dict[str, Any]) → Dict[str, Any]¶

Adds a source_file_regex key to the input map and populates the value with a regex that should match the names of source files of languages the input maps’ languages array.

Returns:: The updated map
Parameters:: tool_json (Dict[str, Any]) –
Return type:: Dict[str, Any]

__compile_regex(tool_json: Dict[str, Any]) → Dict[str, Any]¶

Replaces string values in the input dict that represent regular expressions with their compiled versions.

Returns:: The updated dict
Parameters:: tool_json (Dict[str, Any]) –
Return type:: Dict[str, Any]

__getitem__(index: str) → Any[source]¶

Parameters:: index (str) –
Return type:: Any

classmethod from_file_validate(schema: Dict[str, Any], file: Path) → SastTool[source]¶

Constructs an instance from a JSON file describing a tool. Validates the file against the expected schema before using it.

Parameters:

schema (Dict[str, Any]) –
file (Path) –

Return type:

SastTool

check_file(f: Path) → bool[source]¶

Returns:: True iff the file ‘f’ indicates that the SAST tool is being used in the project
Parameters:: f (Path) –
Return type:: bool

_check_file(f: Path, name_regex: Pattern[str] | None, content_regex: Pattern[str] | None = None) → bool[source]¶

Returns:

True iff a file’s name and contents match the respective regular expressions; if no content regex is supplied only the filename is checked

Parameters:

f (Path) –
name_regex (Pattern[str] | None) –
content_regex (Pattern[str] | None) –

Return type:

bool

property weight: float¶

Not all tools are equally good. Here we supply a rather arbitrary weight to influence how much effect the presence of the tool has on the final score. The higher the weight, the more I like the tool.

Returns:: The weight

__annotations__¶

__dict__¶

__module__¶

__weakref__¶: list of weak references to the object (if defined)

class src.checks.sast_usage_basic.SastUsageBasic(*args: Any, **kwargs: Dict[str, Any])[source]¶

Bases: CheckInterface

Parameters:

args (Any) –
kwargs (Dict[str, Any]) –

exclude: re.Pattern[str]¶

__init__(*args: Any, **kwargs: Dict[str, Any]) → None[source]¶

Parameters:

args (Any) –
kwargs (Dict[str, Any]) –

Return type:

None

__load_tool_schema() → Dict[str, Any]¶

Loads the JSON schema of the tool definitions from permanent storage. config: tool_schema

Returns:: JSON schema of a single tool
Return type:: Dict[str, Any]

__generate_tools() → None¶

Generates the individual JSON tool definitions from a CSV file that describes all of them.

config: tools_csv

effect: populates the directory config::tools_dir

note: no-op if directory is not empty

Return type:: None

__load_tools() → List[SastTool]¶

Loads the JSON tool definitions from permanent storage.

config: tools_dir

Returns:: list of tools
Return type:: List[SastTool]

__build_lang_tools() → Dict[str, List[SastTool]]¶

Constructs a mapping from programming languages to tools :return: The mapping

Return type:: Dict[str, List[SastTool]]

_detect_sast_tools() → Dict[str, List[SastTool]][source]¶

Performs the actual “analysis”. Builds map that takes programming languages to the set of SAST tools that the project uses for this language.

Returns:: The mapping
Return type:: Dict[str, List[SastTool]]

_calc_score(detected_tools: Dict[str, List[SastTool]]) → float[source]¶

Consumes the result of`_detect_sast_tools` and calculates the final score out of it.

Returns:: score
Parameters:: detected_tools (Dict[str, List[SastTool]]) –
Return type:: float

run(args_dict: Dict[str, Any] | None = None) → Dict[str, Any][source]¶

Parameters:: args_dict (Dict[str, Any] | None) –
Return type:: Dict[str, Any]

__annotations__¶

__module__¶

src.checks.secrets module¶

Implementation of the Secrets check, which attempts to find leaked secrets in a git repository. It is really just running a bunch of open source tools and collecting their output.

class src.checks.secrets.DetectSecretsSecret(potential_secret: PotentialSecret)[source]¶

Bases: SecretInterface

Parameters:: potential_secret (PotentialSecret) –

__init__(potential_secret: PotentialSecret)[source]¶

Parameters:: potential_secret (PotentialSecret) –

summarize() → dict[source]¶

Return type:: dict

__annotations__¶

__module__¶

class src.checks.secrets.DetectSecrets[source]¶

Bases: SecretsToolInterface

__init__() → None[source]¶

Return type:: None

check_file(f: Path) → None[source]¶

Checks the file ‘f’ for secrets and adds any found secrets to the internal state of the tool.

Parameters:: f (Path) –
Return type:: None

_get_baseline_dir(project_id: int) → Path[source]¶

Parameters:: project_id (int) –
Return type:: Path

_get_baseline_file(project_id: int) → Path[source]¶

Parameters:: project_id (int) –
Return type:: Path

maybe_load_baseline(project_id: int) → SecretsCollection | None[source]¶

Parameters:: project_id (int) –
Return type:: SecretsCollection | None

create_or_overwrite_baseline(project_id: int) → None[source]¶

Creates a new, or overwites an existing, baseline using the tool’s internal state

Parameters:: project_id (int) –
Return type:: None

update_baseline(project_id: int) → None[source]¶

Uses the tool’s internal state to update an existing or create a new baseline on disk. Essentially performs an intersection between the internal state and the baseline if there is one, else it just writes the internal state to disk.

Important: This method might change the tool’s internal state.

Parameters:: project_id (int) –
Return type:: None

diff_vs_baseline(project_id: int) → Iterable[SecretInterface][source]¶

The list of all secrets the tool found in a project that are not in the baseline.

Parameters:: project_id (int) –
Return type:: Iterable[SecretInterface]

delete_baseline(project_id: int) → None[source]¶

Removes any peristed baselines this tool has for the given project. No-op if there is no baseline.

Parameters:: project_id (int) –
Return type:: None

check_files(files: Iterable[Path]) → None[source]¶

Checks the ‘files’ for secrets and adds any found secrets to the internal state of the tool.

Parameters:: files (Iterable[Path]) –
Return type:: None

property detected_secrets: Generator[SecretInterface, None, None]¶: The list of all secrets the tool found in a project. Essentially the tools internal state.

__annotations__¶

__module__¶

class src.checks.secrets.Secrets(proj: Project, repo: Repo, api: Gitlab)[source]¶

Bases: CheckInterface

Class which represents a check that runs a bunch of secret detection tools against a given project and spits out a ‘score’.

Parameters:

proj (Project) –
repo (Repo) –
api (Gitlab) –

exclude: Pattern¶

secretsTools: List[Type[SecretsToolInterface]]¶

_detect_secrets() → Dict[str, Iterable[SecretInterface]][source]¶

Generates the set of results that are not in the baseline, i.e, ‘R (R cup B)’, for each tool. Returns the union ‘V’ of these sets. Also updates or creates baselines along the way.

Return type:: Dict[str, Iterable[SecretInterface]]

_calc_score(detected_secrets: Dict[str, Iterable[SecretInterface]]) → float[source]¶

Parameters:: detected_secrets (Dict[str, Iterable[SecretInterface]]) –
Return type:: float

_process_args(args_dict: Dict[str, Any] | None) → None[source]¶

Parameters:: args_dict (Dict[str, Any] | None) –
Return type:: None

run(args_dict: Dict[str, Any] | None = None) → Dict[str, Any][source]¶

Parameters:: args_dict (Dict[str, Any] | None) –
Return type:: Dict[str, Any]

__annotations__¶

__module__¶

src.checks.secrets._custom_settings() → Iterator[Settings][source]¶

Return type:: Iterator[Settings]

Module contents¶

Module that allows to instantiate any selection of available checks on a project

src.checks.results_valid(results: Dict[str, Any]) → bool[source]¶

Validates the results that a check instance’s run method returned.

Parameters:: results (Dict[str, Any]) –
Return type:: bool

src.checks.validate_args(args_dict: Dict[str, Any]) → bool[source]¶

Validates the set of arguments passed to the ‘check’ subcommand

Parameters:: args_dict (Dict[str, Any]) –
Return type:: bool

src.checks.transform_args(args_dict: Dict[str, Any]) → Tuple[Path | None, int | None][source]¶

Transforms the set of arguments passed to the check subcommand

Parameters:: args_dict (Dict[str, Any]) –
Return type:: Tuple[Path | None, int | None]

src.checks.iter_checks(proj: ~gitlab.v4.objects.projects.Project, repo: ~git.repo.base.Repo, api: ~gitlab.client.Gitlab, filter_func: ~typing.Callable[[~typing.Type[~src.interfaces.CheckInterface]], bool] = <function <lambda>>) → Iterable[CheckInterface][source]¶

This method is used to get instances of all checks in the set filter_func(availableChecks) for a single project. yields: check instances for the given project

Parameters:

proj (Project) –
repo (Repo) –
api (Gitlab) –
filter_func (Callable[[Type[CheckInterface]], bool]) –

Return type:

Iterable[CheckInterface]

src.checks package¶

Submodules¶

src.checks.checked_in_binaries module¶

src.checks.comments_in_code module¶

src.checks.existence_of_documentation_infrastructure module¶

src.checks.interfaces_checked_in_binaries module¶

src.checks.interfaces_existence_of_documentation_infrastructure module¶

src.checks.interfaces_sast_usage_basic module¶

src.checks.interfaces_secrets module¶

src.checks.sast_usage_basic module¶

src.checks.secrets module¶

Module contents¶

occmd

Navigation

Related Topics