|

ANSWERED on Thu 16 Apr 2009 - 7:33 am UTC by mathtalk

## Question: Search for zipcodes within a radius distance using latitude and longitude.

Priced at \$10.00

Customer

10 Apr 2009 09:04 UTCFri 10 Apr 2009 - 9:04 am UTC

This problem seemed simple at first, but then I found that it involves some tricky math.

I have a table of zipcodes. Each zipcode has a latitude and longitude associated with it. Given a specific lat,long coordinate, I want to find all the zipcodes that are within a given radius (in US miles).

I found information about the Haversine formula, but that doesn't seem to quite give me what I want.

The most efficient way to do this is to first query for a set of zipcodes withing a "square". Then I would calculate the distance for each individual zipcode in that square and keep only the ones that occupy the circle which represents the radius distance from the original point.

For instance, here is my zipcode query.

select * from share.zipcode where
(latitude between (MINLAT) and (MAXLAT))
and (longitude between (MINLON) and (MAXLON))
;

First, given a lat,long coordinate I need to calculate MINLAT,MAXLAT,MINLON, and MAXLON.

Then, given the resulting zipcodes, I need to calculate the distance between the original lat,long and the lat,long of each zipcode and then keep the ones that are inside the max radius.

Researcher

10 Apr 2009 12:27 UTCFri 10 Apr 2009 - 12:27 pm UTC

Hi happy,

Have you looked at Kevin Roth's code?

Code Library > Zip Code Search
"Description: Everything you need to perform a zip code radius search."
http://www.kevinroth.com/projects/project.php?id=85

Patricia

Researcher

10 Apr 2009 17:12 UTCFri 10 Apr 2009 - 5:12 pm UTC

An alternative approach is to do this with online look-ups.  There are sites where you can enter lat/long, and get back the relevant Zip+4, and other sites where you can enter the Zip and get all other zips within a given radius.

Not the most elegant solution, but I thought I'd mention it, just in case...

Researcher

11 Apr 2009 13:27 UTCSat 11 Apr 2009 - 1:27 pm UTC

Hi,

Perhaps you'd like a discussion of the math and how to implement it in SQL?

It would be good to know what database manager you plan to use (e.g. MySQL, SQLServer, etc.).

regards, mathtalk

Researcher

11 Apr 2009 15:55 UTCSat 11 Apr 2009 - 3:55 pm UTC

Hi happyengineer,

It would also be useful to know what degree of accuracy is required. If approximate results are acceptable, then the application can be greatly simplified.

Regards,
eiffel

Researcher

12 Apr 2009 05:12 UTCSun 12 Apr 2009 - 5:12 am UTC

Doesn't this assume zipcodes are assigned mathematically? Australian postcodes are assigned to geographical areas by someone drawing lines along the streets on a map, and correspond roughly to population density.

Customer

15 Apr 2009 21:14 UTCWed 15 Apr 2009 - 9:14 pm UTC

Rough accuracy is fine. It should be close enough that users won't say "hey, I know that that's within 5 miles of me!". So being accurate to a tenth of a mile is probably fine.

I'm using postgres, but I don't need the math to be done inside the query unless necessary. I'm doing most of the calculations on the java side which makes the query to the db.

Here is what I came up with. It seems to work, but I'm not 100% certain I did it correctly.

First, I run this query:
FROM Zipcode WHERE (latitude between :minLat and :maxLat) and (longitude between :minLon and :maxLon)

The parameters are assigned as follows:

double milesPerDegree = 0.868976242 / 60.0 * 1.2;
double degrees = milesPerDegree * milesDistance;

double minLon = lon - degrees;
double maxLon = lon + degrees;
double minLat = lat - degrees;
double maxLat = lat + degrees;

where milesDistance is the actual radius requires. The milesPerDegree is a conversion of degrees to nautical miles. I then multiplied it by 1.2 which seemed to result in an area large enough to get all the zipcodes I need.

The query returns a list of zipcodes which are inside that box. Then, on the java side I run the following calculation for each zipcode to find out its exact distance. If it's <= milesDistance then I keep it and use it. If it's outside then I throw it away.

for(Zipcode z : zipcodes) {
double lat2 = z.getLatitude();
double lon2 = z.getLongitude();

// This calculation seems to be close to correct. Google maps seems to confirm that the given results are
// actually roughly within the distance they are supposed to be.
double PI_180 = Math.PI / 180;
double distance =
3956 * (2 * Math.asin(Math.sqrt(
Math.pow(Math.sin(((lat-lat2)*PI_180)/2),2)
+ Math.cos(lat*0.017453293) * Math.cos(lat2*0.017453293) *
Math.pow(Math.sin(((lon-lon2)*PI_180)/2),2)
)));

if(distance <= milesDistance) {
}
}

In short, I query for a box which covers all the zipcodes that are in a square which covers the circle with radius milesDistance. Then I do the full calculation for all zipcodes inside that square to find the ones inside the circle.

I want the DB query to put minimal load on the DB and return as few zipcodes as possible without missing any valid ones.

Customer

15 Apr 2009 21:17 UTCWed 15 Apr 2009 - 9:17 pm UTC

@angy, it's not mathematical. I have a database table of US zipcodes and their corresponding lat,lon coordinates.

16 Apr 2009 07:33 UTCThu 16 Apr 2009 - 7:33 am UTC

Hi, happyengineer:

I answered a similar kind of Question here:

although the geometry is different and the coding was
pushed into the database layer.  A more quantitative
difference is that where evidently you used a value of
3956 miles for Earth's radius, I used 3958.

You present a formula (or the Java implementation of it)
which finds the distance between two points identified
by longitude & latitude coordinates.

To make the SQL query quick & easy, you propose to look
up places whose longitude & latitude lie between a pair
of (respective) bounds, then check actual distance on
the Java side of things.

This is especially easy to work out for the latitude
coordinates because a degree of latitude would have
constant length if the Earth were a perfect sphere.
The shape is more precisely an oblate spheroid, but
that is merely a refinement in approximation. Radius
at the equator is around 3963 miles, i.e. greater
than at the poles (and elsewhere).

For radius R of 3956 miles we get a circumference of
2*pi*R = 24,856 miles, which would make 1 degree of
latitude be 69 miles long.

While I'm happy sticking to the perfect sphere model,
the oblate spheriod details are presented here:

[Geodesy Page by James Q. Jacobs]
http://www.jqjacobs.net/astro/geodesy.html

Note that for an oblate spheroid there are actually
two notions of latitude, one geodetic (based on the
angle that a normal to the surface makes with the
equatorial plane) and one geocentric (based on the
angles lines drawn to the Earth's center make with
the equatorial plane).  In the geodetic case the
length of a degree of latitude increases slightly
from the equator (68.7 miles) to the poles (69.4
miles).

US ZIP codes lie mostly in the western hemisphere
(longitude) and entirely in the norther hemisphere
(latitude),  For the continental US (excluding
Alaska and Hawaii) the latitudes are between 25
and 50 degrees north, and the variation in length
of a degree of latitude is at most 0.4%.  So the
simple sphere model and 69 miles per degree of
latitude should work well enough here:

double degrees = milesDistance/69.0;

double minLat = lat - degrees;
double maxLat = lat + degrees;

One improvement I suggest in your formulas is to
consider that the length of a degree of longitude
shrinks with increasing latitude (toward a pole).

If you don't mind reusing the value degrees after
we've found minLat,maxLat above:

double PI_180 = Math.PI / 180;

degrees /= Math.cos(lat * PI_180);

double minLon = lon - degrees;
double maxLon = lon + degrees;

Note that the result of the above is to increase
degrees, because it's being divided by the cosine
of the latitude, i.e. by a positive number less
than 1.  The effect of a degree of longitude
being shorter than a degree of latitude is that
it takes more of them to make up a given distance.
when 40 degrees North latitude is reached.

The intervals minLat & maxLat, minLon & maxLon can
then be used in your SQL query just as you meant.

regards, mathtalk

P.S. This approach also works reasonably well for
ZIP codes in Hawaii and Alaska, although a very
small portion of the Aleutian Islands are beyond
180 degrees West & technically in the Eastern
hemisphere.  I don't expect you need to deal with
them, but a simple approach would be to adjust
the longitude from degrees East to a corresponding
"degrees West" value greater than 180, and similarly
for any US possessions lying further west.  Usually
in a database such East longitudes are represented
as negative numbers.  As such the adjustment would
be to add them to 360 degrees (producing an angle
greater than 180 degrees.  If there were ZIP codes
in the Southern hemisphere, these would probably
be represented as negative degrees latitude, which
is fine for the purposes of our calculations above.

User

16 Apr 2009 11:19 UTCThu 16 Apr 2009 - 11:19 am UTC

From the question:

"I have a table of zipcodes. Each zipcode has a latitude and longitude
associated with it. Given a specific lat,long coordinate, I want to find
all the zipcodes that are within a given radius (in US miles)."

If this means there is just one lat/long coordinate for a zipcode, a point, will the mathematic solution capture zipcodes with areas within the radius whose own lat/long coordinates lie outside the radius?

Researcher

16 Apr 2009 19:53 UTCThu 16 Apr 2009 - 7:53 pm UTC

Hi, myo:

No, it will miss a ZIP code unless the corresponding latitude/longitude is within the radius (specified distance) of a given origin.

To compensate for this, users of such a database are typically generous with the radius value.  If a set of ZIP codes is to be used more than once, i.e. for repeated mailings, then it might pay to refine the resulting address sets by another means.  But one can try to "err" on the side of inclusion at the level of ZIP codes considered.

regards, mathtalk

User

16 Apr 2009 22:03 UTCThu 16 Apr 2009 - 10:03 pm UTC

Hi Mathtalk,

Many thanks for your explanation. In my simplistic mind, ... No, I won't bother you both with how I was envisaging catching borderline zipcodes, which would have required a lot of handwork.

Regards, Myo

Researcher

19 Apr 2009 23:39 UTCSun 19 Apr 2009 - 11:39 pm UTC

Hi, happyengineer:

the SQL query by latitude and longitude), I'd taken a look
at your distance formula (implemented on the Java side) and
decided it looked right (it is), esp. given your mention of
finding the haversine formula for yourself.  But to round

Here's a derivation (with ASCII graphics!!) of the formula:

[Deriving the Haversine Formula -- Ask Dr. Math]
http://mathforum.org/library/drmath/view/51879.html

This is a formula for the "great circle" distance between
two points on a sphere, consistent with our earlier talk
about whether the oblate spheroid refinement is worth the
effort.  For future reference, if you wanted to address it:

[Geodesic distance... using Vincenty ellipsoid formula]
http://www.movable-type.co.uk/scripts/latlong-vincenty.html

One minor code improvement I'd suggest is using PI_180 in a
consistent way.  A couple of places you used 0.017453293,
and this ought to be the same thing as pi/180, converting

The haversine formula is preferred for accuracy at short
distance scales.  In many implementations of the haversine
formula you'll see an upper bound of 1 placed on:

a = sin²((lat-lat2)/2) + cos(lat)*cos(lat2)*sin²((lon-lon2)/2)

so that distance = 2R * asin(sqrt(min(1,a))) where R is the

The purpose of it is to guard against potential breakdown at
the other extreme of the distance scale, where two points
lie at nearly opposite locations on the sphere.  With rounding
it might compute that a > 1, which would cause an exception in
calculating arcsine of sqrt(a).  Assuming these extremes of
distance are irrelevant to your application, I see no need to
include that guard.  [Indeed I went further and suggested you
are likely only interested in distances between points in the
continental USA.]

The quantity a above is the square of half the chordal distance
between two points on a unit sphere, so 2R*sqrt(a) approximates
(Euclidean/chordal distance vs. great circle distance) when a
is quite small.  But you can have essentially the same savings
in computation _without_ resorting to an approximation, and in
the bargain have code that reads more simply!

The idea is to convert your distance limit into a corresponding
bound on a.  That is, distance is a monotone increasing function
of a, so let's figure out which aMax gives the limit on distance
and then compare ZIP code locations using aMax:

a < aMax = sin²(distance/(2R))

This saves all but computing "a" for each ZIP code location (and
doing the comparison).  Of course you may still wish to compute
for display purposes the corresponding distances of any "surviving"