Merged
Changes from all commits
Show all changes
35 commitsSelect commit Hold shift + click to select a range
d141a67
feat: add get_document and list_document functions
galz101de3f02
fixed tests
galz101b9324f
lint fix
galz10446e187
added tests and changed DocumentWrapper
galz10ed86e84
fixed failing test
galz109fc848b
updated tests
galz1088001da
changed DocStrings and tests
galz101090173
fixed failing test
galz100cec6b5
changed name and return type of list_documents
galz100b67124
updated failing tests
galz1001e551f
lint fix
galz10c25faf3
chore: updated comments
galz10cf8abef
updating naming for get_document to get_shards
galz10a8bbf7a
revert get_document changes
galz103c547e0
Merge branch 'main' into update_comments
galz108df5b0b
lint fix
galz106ac3b43
added code-block to comments
galz100b1b52a
feat: add TableWrapper and helper functions
galz10fb81212
wrapped lines and paragraphs
galz10eaabc42
Merge branch 'main' into wrap-table
galz10efc302f
Merge branch 'main' into wrap-table
galz1088bbe01
added tests for new features
galz10f7ede09
lint fix
galz1086c753c
feat: added helper functions to DocumentWrapper
galz108bcf3e2
lint fix
galz101c88552
fixed failing test
galz105ffd706
refactored code
galz10224b589
lint fix
galz1069e847c
lint fix
galz10f9975bc
refactored code
galz1063e00fd
Merge branch 'main' into add-helpers
galz104dc1ec6
fixed failing test
galz1091f348f
refactored code
galz10328ee4c
Merge branch 'main' into add-helpers
galz10249983c
added text fixture to simplify testing
galz10File filter
Filter by extension
Conversations
Failed to load comments.
Uh oh!
There was an error while loading. Please reload this page.
Jump to
Failed to load files.
Uh oh!
There was an error while loading. Please reload this page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -36,8 +36,8 @@ def _entities_from_shards( | ||
Required. List of document shards. | ||
Returns: | ||
List[wrapped_entity.Entity]: | ||
a list of Entitys. | ||
List[Entity]: | ||
a list of Entity. | ||
""" | ||
result = [] | ||
@@ -55,8 +55,8 @@ def _pages_from_shards(shards: documentai.Document) -> List[Page]: | ||
Required. List of document shards. | ||
Returns: | ||
List[wrapped_page.Page]: | ||
A list of Pages. | ||
List[Page]: | ||
A list of Page. | ||
""" | ||
result = [] | ||
@@ -227,3 +227,54 @@ def __post_init__(self): | ||
self._shards = _get_shards(gcs_prefix=self.gcs_prefix) | ||
self.pages = _pages_from_shards(shards=self._shards) | ||
self.entities = _entities_from_shards(shards=self._shards) | ||
def search_pages( | ||
self, target_string: str = None, pattern: str = None | ||
) -> List[Page]: | ||
r"""Returns the list of Page containing target_string. | ||
Args: | ||
target_string (str): | ||
Optional. target str. | ||
pattern (str): | ||
Optional. regex str. | ||
Returns: | ||
List[Page]: | ||
A list of Page. | ||
""" | ||
if (target_string is None and pattern is None) or ( | ||
target_string is not None and pattern is not None | ||
): | ||
raise ValueError( | ||
"Exactly one of target_string and pattern must be specified." | ||
) | ||
found_pages = [] | ||
for page in self.pages: | ||
for paragraph in page.paragraphs: | ||
if target_string is not None and target_string in paragraph.text: | ||
found_pages.append(page) | ||
break | ||
elif ( | ||
pattern is not None | ||
and re.search(pattern, paragraph.text) is not None | ||
): | ||
found_pages.append(page) | ||
break | ||
return found_pages | ||
def get_entity_by_type(self, target_type: str) -> List[Entity]: | ||
r"""Returns a list of wrapped entities matching target_type. | ||
Args: | ||
target_type (str): | ||
Required. target_type. | ||
Returns: | ||
List[Entity]: | ||
A list of Entity matching target_type. | ||
""" | ||
return [entity for entity in self.entities if entity.type_ == target_type] | ||
galz10 marked this conversation as resolved. Show resolved Hide resolvedUh oh!There was an error while loading. Please reload this page. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "or" instead of "and".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checking if both target_string and pattern are populated so it needs an "and" to make sure both are not none.