Advanced tools for Python dictionaries. Including search and serialization tools.
BSD-3-CLAUSE License
Advanced tools for Python dictionaries.
Included Tools:
DictSearch
: Search large and complex Python dictionaries/JSON files.Serializer
: Make custom JSON serializable Python classes serializable (make safe for conversion to JSON).Pip installable package available.
pip install dictpy
Imagine you have some big ugly Python dictionary (like the one produced by PubChem
when you download the JSON file
for CID 6) and you want to
extract some specific piece of information. This section will show how DictSearch
can make this easy.
To perform the search we can pass the Python dictionary, and a search target (more discussion below on this) to
DictSearch
. It will find all valid objects for the search. The results of the search will be stored in .result
.
import dictpy
search = dictpy.DictSearch(data=json_data, target=target)
print(search.result)
The return object is a list[list[tree, obj]]
tree
: shows the navigation to get to the data ('.' separated)
Record.Section.1.Description
{"Record": {
"Section": [
######,
{"Description": #####} # A match to the search!
]
}}
obj
return the object
{"dog": "*"}
; returns: {"dog": "golden retriever"}
"dog"
; returns: {"dog": "golden retriever"}
{"dog": "golden retriever"}
; returns: {"dog": "golden retriever"}
return_func
.
search = dictpy.DictSearch(data=json_data, target=target, return_func=dictpy.DictSearch.return_parent_object)
{"dog": "*"}
; returns:
{
"dog": "golden retriever",
"cat": "bangel",
"fish": "goldfish"
}
"dog"
; returns:{
"dog": "golden retriever",
"cat": "bangel",
"fish": "goldfish"
}
target
Target can take match accept strings
, int
, floats
, single line dictionaries
, and regex
(regular expression).
Wild cards(*
) can also be used for partial dictionary searches.
Example Targets:
{"RecordType": "CID"}
{"RecordNumber": 6}
op_convert_str_to_num=False
2526
3D Conformer
{"MoveToTop": "*"}
{"*": "Chemical Safety"}
"^[A-I]{3}$"
{"^RecordT": "*"}
For more examples see tests/test_dict_search.py.
This example will extract data from a JSON for "1-Chloro-2,4-dinitrobenzene" download from PubChem.
First, we will load our example above (change "/path/to/data/" to your file location for the file above):
import json
with open("C:/path/to/data/cid_6.json", "r") as f:
text = f.read()
json_data = json.loads(text)
print(json_data)
You will get a massive printout of the 12,000 line JSON file.
import dictpy
search = dictpy.DictSearch(data=json_data, target={"RecordType": "CID"})
print(search.result)
Print out:
[['Record.RecordType', {'RecordType': 'CID'}]]
Integer search target:
search = dictpy.DictSearch(data=json_data, target=2526)
print(search.result)
Print out:
[
['Record.Section.3.Section.1.Section.14.Information.1.Value.Number', 2526],
['Record.Section.3.Section.1.Section.14.Information.1.Value.Number', 2526]
]
Serializer
is useful for turning custom python classes into JSON compatible dictionaries.
This serialization class is a useful pre-process step for complex custom python class that contain non-JSON serializable safe objects (Example: datatime objects, custom classes, any classes from other packages, ObjectIDs, etc.)
Inherit Serializer
in to your custom python class.
import json
import datetime
import dictpy
class Example(dictpy.Serializer):
def __init__(self, datetime_obj, stuff2):
self.datetime_obj = datetime_obj # NOT JSON serializable object
self.stuff2 = stuff2
self.stuff3 = None
example = Example(datetime.time(), "stuff2")
# json_output = json.dumps(example) # This will fail with NOT JSON serializable objects
dict_of_example = example.as_dict()
dict_of_example = dictpy.Serializer.dict_cleanup(dict_of_example) # converts NOT JSON serializable objects to strings.
dict_of_example = dictpy.Serializer.remove_none(dict_of_example) # Optional: remove None; self.stuff3 removed
json_output = json.dumps(dict_of_example)