A* - All Informative Post

A* (pronounced “A-star”) is a popular heuristic search algorithm used for pathfinding and graph traversal problems. It is an extension of the uniform-cost search algorithm, combining the advantages of both uniform-cost search and greedy best-first search.

The A* algorithm uses a heuristic function, denoted as h(n), which provides an estimate of the cost from a given node to the goal node. It also uses a cost function, denoted as g(n), which represents the actual cost incurred to reach a particular node from the start node. The algorithm maintains a priority queue of nodes to be expanded, ordered by the sum of g(n) and h(n) values.

The steps of the A* algorithm are as follows:

Initialize the open set with the start node and set its g-value to 0.
Initialize the closed set as an empty set.
While the open set is not empty:

Select the node with the lowest f(n) = g(n) + h(n) value from the open set.
If the selected node is the goal node, the path has been found. Terminate.
Move the selected node from the open set to the closed set.
Expand the selected node by considering its neighbors.
For each neighbor:

Calculate its tentative g-value by adding the cost of reaching the neighbor from the selected node.
If the neighbor is already in the closed set and the tentative g-value is higher than its current g-value, skip it.
If the neighbor is not in the open set or the tentative g-value is lower than its current g-value:

Set the neighbor’s g-value to the tentative g-value.
Set the neighbor’s parent as the selected node.
If the neighbor is not in the open set, add it to the open set.

If the open set is empty and a goal node has not been reached, then a path from the start node to the goal node does not exist.

A* guarantees to find the shortest path from the start node to the goal node, provided that the heuristic function h(n) satisfies certain conditions. The heuristic function must be admissible, meaning it never overestimates the actual cost, and it must be consistent (or monotonic), ensuring that the estimated cost from a node to its neighbor is not greater than the sum of the actual cost and the estimated cost from the neighbor to the goal.

The efficiency of A* depends on the quality of the heuristic function. If the heuristic provides accurate estimates, the algorithm can significantly reduce the search space and find the optimal path more quickly compared to uninformed search algorithms. However, if the heuristic is not well-suited for the problem domain, A* may expand unnecessary nodes and become less efficient.

The A* search algorithm, builds on the principles of Dijkstra’s shortest path algorithm to provide a faster solution when faced with the problem of finding the shortest path between two nodes. It achieves this by introducing a heuristic element to help decide the next node to consider as it moves along the path. You can read more about heuristics in the topic on complexity.

Dijkstra’s algorithm finds the shortest path between the start node and all other nodes. As well as being faster, the A* algorithm differs from Dijkstra’s in that it seeks only the shortest path between the start node and the target node.

Heuristics, sometimes called heuristic functions, are used to provide ‘good enough’ solutions to very complex problems where finding a perfect solution would take too much time. When you use heuristics, you trade accuracy, correctness, and exactness for speedy processing.

One of the drawbacks with Dijkstra’s algorithm is that it can (and will) evaluate paths that will never provide the shortest option. Imagine tring to find the shortest route on a map between London and Edinburgh. Anyone with a reasonable grasp of UK geography, would not bother to evaluate a route that went via Plymouth. The trade off between speed and accuracy is important. In some applications, accuracy is less important than computational time. For example, in a SatNav application a route that is calculated in seconds and is “short enough” is preferable to having to wait 10 minutes for the perfect route.

The A* algorithm uses a heuristic function to help decide which path to follow next. The heuristic function provides an estimate of the minimum cost between a given node and the target node. The algorithm will combine the actual cost from the start node – referred to as g(n) – with the estimated cost to the target node – referred to as h(n) – and uses the result to select the next node to evaluate. This is explained in more detail in the step-by-step method that follows.

Choosing a heuristic function

There is no single best heuristic to use in path finding, as every application is different. For example, if the cost relates to a distance, it could be estimated by calculating the “straight line” distance between the nodes, perhaps by using one of the following methods:

However, do remember that the weights on a graph do not always represent distance.

It is very important that the heuristic function does not overestimate costs. However, so long as the heuristic function provides an estimate that is less than or equal to the actual cost, A* will always find an optimal path and will generally find it much faster than Dijkstra’s algorithm.

In this example, you will consider a small graph and use the A* algorithm to find the shortest path from A to F.

If you have not yet already studied Dijkstra’s shortest path algorithm, it is recommended that you start there, as much of the terminology is introduced and explained in that topic.

The heuristic function that is used here is a black box. This means that you do not see the algorithm (the detail is abstracted so that there is one less thing to think about). However, the function will return the estimated distance from the given node to the target node (F), as follows:

On the graph, these heuristic values appear adjacent to the node so that you have all of the important information (that you need to follow the method) in one place.

`g-score` and `f-score`

In Dijkstra’s algorithm, you kept track of the cost of each path as you evaluated all of the possible routes. Every time a shorter path was found, you updated the cost and the previous node.

In A*, the cost of the route from the start node is calculated, stored and updated in a similar way but is referred to as g-score. There is also an f-score, which is the cost of the path (so far) + the estimated cost (provided by the heuristic) of reaching the target node. It is the f-score (rather than the g-score) that is used to select the next current node.

Step 1

The algorithm starts by initialising the g-score of all of the nodes to infinity (or a very large number) to show that the score has not yet been calculated. The value for f-score can also be set to infinity.

Create the unvisited list with the following headings
- node
- g-score (cost from start)
- f-score (cost from start + heuristic)
- previous
In the node column list all of the nodes, starting with the start node (labelled node A in this example)
set the g-score of the start node (A) to 0,0 and the f-score to 10,10. This is the value returned by the heuristic function for node A.
for all other nodes, set the g-score and f-score to infinity and previous to none.

Step 2

Pick the node that has the lowest f-score in the unvisited list. The f-score of A is currently the lowest in the unvisited list; all of the other nodes have a cost of infinity. A becomes the current node.
Examine the nodes that can be reached directly from A (A’s neighbours) that have not yet been visited:
- The edge from A to B has a weight of 10,10. The g-score currently recorded in your unvisited list is infinity. A value of 10,10 is less than infinity, so you erase that and write 10,10 instead, and record the previous node as A. The heuristic h(B) is 15,15. The f-score is calculated by adding the heuristic value to g-score; this gives you an f-score for node B of 25,25 (10,10 + 15,15 = 25,25) and this is recorded in the unvisited list.
- The edge from A to C has a weight of 12,12. The g-score currently recorded in your unvisited list is infinity. A cost of 12,12 is less than infinity, so you erase that and write 12,12 instead, and record the previous node as A. The heuristic h(C) is 5. This gives you an f-score for node C of 17,17 (12,12 + 5,5 = 17,17) which you record.
- The edge from A to D has a weight of 5,5. The g-score currently recorded in your unvisited list is infinity. A cost of 5,5 is less than infinity, so you erase that and write 5,5 instead, and record the previous node as A. The heuristic h(D) is 5,5. This gives you an f-score for node B of 10,10 (5,5 + 5,5 = 10,10) which you record.
You have now evaluated all of the neighbours of the current node. Remove A from the unvisited list and add it to the visited list with its g-score (0,0), f-score (10,10) and previous node (none).

Step 3

Pick the node that has the lowest f-score in the unvisited list. The f-score of D (10,10) is currently the lowest in the unvisited list. D becomes the current node.
Examine the nodes that can be reached directly from D (D’s neighbours) that have not yet been visited. Note that you don’t go back to nodes that you have already placed on the visited list.
- The edge from D to C has a weight of 6,6. The g-score for D is 5,5, so you add 5,5 and 6,6 to produce the cost (g-score) of using that path (i.e. to get from A to C via D), which is 11,11. This is lower than the current g-score for C, so you update the entry in the unvisited list and record the previous node as D. The heuristic h(C) is 5,5. This gives you a new f-score for node C of 16,16 (11,11 + 5,5 = 16,16); you update the f-score for C in the unvisited list.
- The edge from D to F has a weight of 14,14. The g-score for D is 5,5, so you add 5,5 and 14,14 to produce the cost (g-score) of using that path (i.e. to get from A to F via D), which is 19,19. The g-score currently recorded for F is infinity; 19,19 is less than infinity, so you erase that and write 19,19 instead, and record the previous node as D. The heuristic h(F) is 0,0. This gives you a new f-score for node F of 19,19 (19,19 + 0,0 = 19,19); this is updated in the visited list.

Note that by following this method, you always maintain the lowest values for each node’s g-score and f-score in the unvisited list, which will help you eventually find the shortest path.

You have now evaluated all of the neighbours of the current node. Remove D from the unvisited list and add it to the visited list with its g-score (5,5), f-score (10,10) and previous node (A).

Step 4

Pick the node that has the lowest f-score in the unvisited list. The f-score of C (16,16) is currently the lowest in the unvisited list. C becomes the current node.
Examine the nodes that can be reached directly from C (C’s neighbours) that have not yet been visited.
- The edge from C to E has a weight of 11,11. The g-score for C is 11,11, so you add 11,11 and 11,11 to produce the cost (g-score) of using that path (i.e. to get from A to E via C), which is 22,22; 22,22 is less than infinity, so you erase that and write 22,22 instead, and record the previous node as C. The heuristic h(E) is 10. This gives you a new f-score for node E of 32,32 (22,22 + 10,10 = 32,32) which is recorded in the unvisited list.
- The edge from C to F has a weight of 8,8. The g-score for C is 11,11, so you add 11,11 and 8,8 to produce the cost (g-score) of using that path (i.e. to get from A to F via C), which is 19,19. The ‘g-score for F is already 19,19, so no changes are made.
You have now evaluated all of the neighbours of the current node. Remove C from the unvisited list and add it to the visited list with its g-score (11,11), f-score (16,16) and previous node (D).

Step 5

Pick the node that has the lowest f-score in the unvisited list. The f-score of F (19,19) is currently the lowest in the unvisited list. F becomes the current node.
F is the target node so the search is complete. Remove F from the unvisited list and add it to the visited list with its g-score (19,19), f-score (19,19) and previous node (D).
The g-score for node F provides the cost of the shortest path from A to F which is 19,19.

Finding the shortest path between two nodes

In the visited list you have a set of data for all of the nodes that you examined. Observe that, unlike Dijkstra’s algorithm, the A* algorithm does not necessarily produce the information for all of the nodes in the graph as it uses the heuristic function to direct the path more efficiently towards the target node.

To find the route of the shortest path from A to F:

first, you look at F and see that the previous node is D
from D, the previous node is A

Therefore, the shortest path from A to F is A → D → F.

Expressing an algorithm in structured English is often useful as it allows much of the implementation detail (for example, how to store the graph data) to be hidden. This is a form of abstraction.

In the algorithm, the h-score is obtained by getting the heuristic value for the node, i.e. h, left bracket, n, right bracket,h(n).

    Declare the visited list

    Declare the unvisited list

    For each node in graph:

        Add the node to the unvisited list with a g-score of infinity, an f-score of infinity and previous node of null

    Set the start node's g-score to 0 in the unvisited list

    Set the start node's f-score to its h-score in the unvisited list

    While the unvisited list is not empty:

        Set current node to the node in the unvisited list with the lowest f-score

        If the current node is the target node:

           End the while loop

           Copy the values for the current node from the unvisited list to the visited list

        Else:

            For each neighbour of current node:

                If neighbour node is not in the visited list:

                    Calculate new g-score = weight of edge + g-score of current node

                    If new g-score is less than neighbour's g-score in unvisited list:

                        Update the neighbour's g-score with the new g-score

                        Update the neighbour's f-score to new g-score + h_score

                        Update the neighbour's previous node to the current node

            Copy the values for the current node from the unvisited list to the visited list

            Remove the current node from the unvisited list

    Return the visited list

Implementation considerations

There will be several complexities to consider before you try to implement the algorithm. Here are a few of them:

how to store the graph
how to store the unvisited list
how to store the visited list
what value to use to represent infinity
what value to use to represent the absence of a previous node

Books on A*

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.