How to Use JSONPath in Python: A Step-by-Step Guide
TutorialsLearn how to use Python JSONPath in our step-by-step guide that covers JSONPath setup, expression language syntax, and a working extraction code example.

Nerijus Kriaučiūnas
Key Takeaways
-
JSONPath is a query language with expressions that are first-class objects for navigating complex JSON structures.
-
JSONPath syntax uses atomic expressions and various operators, combining them with dot or bracket notations.
-
Extracting specific deeply nested data requires combining different operators, expressions, and, in some cases, adding filters or conditional logic.
It’s common to scrape JSON data when available and resort to HTML only out of necessity. It’s not enough to parse JSON with Python , as more complex and deeply nested JSON objects might require JSONPath. It’s a great tool to streamline development and enable more powerful data extraction.
What Is JSONPath and Why Use It?
Python JSONPath is a query expression language for navigating and extracting JSON structures. As one might guess from the name, JSONPath is to JSON what XPath is to XML and HTML.
Just like XPath uses expressions to traverse XML hierarchies, JSONPath expressions query JSON objects and arrays. XPath navigates documents by targeting elements, attributes, and text nodes within a hierarchical structure. JSONPath applies the same logic to JSON structures, which use JSON objects and arrays.
Its advantage lies in a standardized way to parse JSON data from complex JSON files without using complicated Python code. As such, JSONPath is used in various cases where more reliable and maintainable web scrapers are needed.
- API data collection
Modern web services expose API endpoints that return JSON data. JSONPath can help to extract specific fields from these APIs with simple expressions.
- Web scraping
JSON data is frequently embedded within HTML, which enables the extraction of JSON strings. Then they can be queried with JSONPath rather than parsing the entire HTML DOM.
- Testing
API testing frameworks use JSONPath to verify specific fields in JSON responses. It reduces the manual work needed to navigate through response objects.
- Data pipelines
Data processing systems frequently use JSONPath to filter JSON data at a large scale. Without constant code modifications, JSONPath enables data pipelines that adapt to JSON structure changes.
Installing JSONPath Library in Python
Before installing JSONPath, ensure you have the newest version of Python installed on your system. Open terminal or command prompt and verify your Python version:
python --version
Consider setting up a virtual environment for Python projects to keep dependency isolation, ensuring that your Python JSONPath experiments won't conflict with other work. Here's how to do it on Windows Terminal, if you don’t intend to use an IDE:
python -m venv jsonpath_env
jsonpath_env\Scripts\activate
Now your terminal should show the environment name - (jsonpath_env), and you're ready to install the JSONPath library. We'll be using jsonpath-ng as it's the most popular and actively maintained JSONPath library for Python.
pip install jsonpath-ng
One last step before using JSONPath as a parsing method is to verify that your installation works. Start Python terminal (with the virtual environment still activated):
Python
Then, use a Python script to test whether the JSONPath installation works.
from jsonpath_ng import parse
data = {"name": "IPRoyal", "City": "New York"}
expression = parse('$.name')
result = [match.value for match in expression.find(data)]
print(result)
If the output is ['IPRoyal'] without any errors, your installation is successful. Before starting to use JSONPath in your projects, you need to learn about its syntax.
JSONPath Syntax Basics
JSONPath differs from other Python libraries, such as BeautifulSoup, in that JSONPath expressions are first-class objects. It's a full language implementation with expressions that can be passed as arguments in functions, assigned to variables, stored in lists, and used in many other operations.
JSONPath syntax is one of the main reasons why it's powerful for web scraping projects with dynamic queries. You can extract specific data from complex hierarchies before converting it into a table-like structure. Yet, the subsequent conversion to CSV is also straightforward, leaving the choice between JSON and CSV in the final database open.
There are two equivalent ways to navigate JSON structures with Python JSONPath.
$.store.book[0].title # Dot notation
$['price-usd'] # Bracket notation
Both notations can produce the same results. The choice depends on the type of property names you're working with.
- Dot notation is easier to read but is mainly suited for simple property names.
- Bracket notation is better when property names contain special characters.
You should use whichever notation is more convenient and can even mix them in the same JSONPath expression. For example, the expression below can be written in at least three ways:
# Purely dot notation
$.store.book[0].title
# Purely bracket notation
$['store']['book'][0]['title']
# Mixed notation
$.store['book'][0].title
Python JSONPath is used by chaining atomic expressions and binary comparison operators to form a path in a JSON data structure. Expressions start with the root operator ($). In our example, the expression navigates to the store JSON object (.store), its book array (['book']), selects the first object ([0]), and extracts the property title (.title).
The basis of JSONPath queries is atomic expressions that can be modified with operators and filtered to select elements based on conditions. Learning atomic expressions is a good start when learning JSONPath.
| Name | Syntax | Definition | Example |
|---|---|---|---|
| Root Node | $ | References the root object of the entire JSON document. All paths start here. | $ returns the entire book catalog |
| Current Node | @ | References the current node being processed (used within filters and iterations). | @.title refers to the title of the current book item |
| Child Property (Dot) | .property | Accesses a child property using dot notation. Property name must be a valid identifier. | $.store.books accesses the books array |
| Child Property (Bracket) | ['property'] | Accesses a child property using bracket notation. Allows special characters, spaces, and reserved words. | $.store['books'] [0]['author-name'] accesses the author name (with hyphen) of the first book |
| Wildcard | * | Matches all properties in an object or all elements in an array. | $.store.books[*].title selects all book titles in the catalog |
| Recursive Descent | .. | Recursively searches for a property at any depth in the document tree. | $..price finds all price fields across all items (books, bundled products, discounts, etc.) |
| This | this |
Named operator referencing the current object (jsonpath-ng specific alternative to @). | $.store.books[?(this.price < 15)].title returns titles of books under $15 |
Python JSONPath Example: Extracting Values
Suppose you want to scrape an online bookstore and have extracted a JSON file embedded in its HTML. For JSONPath expressions to work, you need Python dictionaries or lists, not raw JSON.
So, your first step is always to get a Python-readable format using json.loads() (for JSON strings) or json.load() (for JSON files) (If you need to do the reverse, see our post on converting Python strings to JSON ):
import json
from jsonpath_ng import parse
# JSON as a string (as you might receive from an API or scrape from HTML)
json_string = '''
{
"store": {
"name": "Python Books Store",
"location": "Online",
"books": [
{
"id": 1,
"title": "Web Scraping with Python Programming",
"author": "JAMES SMITH",
"price": 29.99,
"category": "programming",
"in_stock": true,
"ratings": {
"average": 4.5,
"count": 127
}
},
{
"id": 2,
"title": "JSON at Work",
"author": "Tom Marrs",
"price": 24.99,
"category": "programming",
"in_stock": true,
"ratings": {
"average": 4.8,
"count": 95
}
},
{
"id": 3,
"title": "Hands-On Machine Learning",
"author": "Aurélien Géron",
"price": 39.99,
"category": "data-science",
"in_stock": false,
"ratings": {
"average": 4.2,
"count": 203
}
}
]
}
}
'''
data = json.loads(json_string)
Now we can use parse() to extract all book titles and then print them.
title_expression = parse('$.store.books[*].title')
matches = title_expression.find(data)
titles = [match.value for match in matches]
print("All book titles:")
for title in titles:
print(f" - {title}")
The output should look like this:
All book titles:
Web Scraping with Python Programming
JSON at Work
Hands-On Machine Learning
Similarly, we can extract specific book data, such as the title of the first book in the list:
first_book_expression = parse('$.store.books[0].title')
first_title = first_book_expression.find(data)[0].value
print(f"\nFirst book: {first_title}")
Or use JSONPath to extract nested data from a combination of list items:
rating_expression = parse('$.store.books[*].ratings.average')
ratings = [match.value for match in rating_expression.find(data)]
print(f"\nBook ratings: {ratings}")
We can change the paths to get different results, as it instructs the scraper to look elsewhere or select different items. For example, narrower paths match fewer nodes ($.store.book.title selects a single title) while broader paths use wildcards ($.store.book[].title)* or recursive descent ($..title) to include more nodes. Here are more examples:
print("\n--- Path Variations ---")
print("All books:", [m.value for m in parse('$.store.books[*].title').find(data)])
print("First 2 books:", [m.value for m in parse('$.store.books[0:2].title').find(data)])
print("Last book:", [m.value for m in parse('$.store.books[-1].title').find(data)])
print("All prices (recursive):", [m.value for m in parse('$..price').find(data)])
Parsing Nested JSON
Real-world JSON structures, especially those from APIs, often have multiple layers of objects and arrays, making manual extraction tedious. If you want to minimize errors while parsing with JSONPath, practice with open JSON file sources.
The Open Weather Map and the NASA API are two great options. However, even with open JSON sources, the complexity of your data extraction grows the more deeply nested the values you need. Suppose we have such a JSON file and need to find rain data.
from jsonpath_ng import parse
weather_data = {
"coord": {"lon": 7.367, "lat": 45.133},
"weather": [{"id": 501, "main": "Rain", "description": "moderate rain", "icon": "10d"}],
"base": "stations",
"main": {
"temp": 284.2,
"feels_like": 282.93,
"temp_min": 283.06,
"temp_max": 286.82,
"pressure": 1021,
"humidity": 60,
"sea_level": 1021,
"grnd_level": 910
},
"visibility": 10000,
"wind": {"speed": 4.09, "deg": 121, "gust": 3.47},
"rain": {"1h": 2.73},
"clouds": {"all": 83},
"dt": 1726660758,
"sys": {"type": 1, "id": 6736, "country": "IT", "sunrise": 1726636384, "sunset": 1726680975},
"timezone": 7200,
"id": 3165523,
"name": "Province of Turin",
"cod": 200
}
print("=== Recursive Search Combined with Filtering ===\n")
print("1. Finding rain data:")
rain_data = parse('$..rain').find(weather_data)
if rain_data:
rain_1h = parse("$.rain['1h']").find(weather_data)[0].value
print(f" Rainfall in last hour: {rain_1h}mm")
else:
print(" No rain data found")
print("\n2. Finding wind data with conditions:")
wind_measurements = parse('$..wind').find(weather_data)
if wind_measurements:
wind_speed = parse('$.wind.speed').find(weather_data)[0].value
wind_gust = parse('$.wind.gust').find(weather_data)[0].value
if wind_gust > wind_speed:
print(f" Strong gusts detected: {wind_gust} m/s (avg: {wind_speed} m/s)")
else:
print(f" Normal wind: {wind_speed} m/s (gusts: {wind_gust} m/s)")
print("\n3. Checking for severe conditions:")
humidity_data = parse('$..humidity').find(weather_data)
if humidity_data:
humidity = humidity_data[0].value
if humidity > 70:
print(f" High humidity: {humidity}%")
else:
print(f" Normal humidity: {humidity}%")
if rain_data:
rain_1h_value = parse("$.rain['1h']").find(weather_data)[0].value
if rain_1h_value > 2.5:
print(f" Heavy rain detected: {rain_1h_value}mm/h")
else:
print(f" Light rain: {rain_1h_value}mm/h")
Depending on your target, a real-world scraper that uses Python JSONPath should include certain precautions, which we omitted for explanation purposes:
- Missing or null values
Using try-except blocks and implementing default values is recommended if you expect the output to contain empty matches.
- Array index errors
Check if find() returns any results before accessing arrays and writing paths to find them.
- Path specificity
Be specific with JSONPath if the structure is known. For example, path $.books[*].title[*] is better than $..title.
- Check JSON files beforehand
Parse JSON with json.loads() before applying JSONPath - this will catch malformed JSON and convert it to Python format.
- JSONPath testers
Use online JSONPath testers to verify the expressions before implementing them in your scraper.
Conclusion
Unless your project is extremely simple, it's almost guaranteed that learning JSONPath will be easier than parsing JSON without it. JSONPath provides much-needed flexibility and code clarity, which is essential not only in the initial stages but also when maintaining your code in the long term.