ANSWERED on Thu 16 Apr 2009 - 7:33 am UTC by mathtalk

**Actions: **
Add Comment

This problem seemed simple at first, but then I found that it involves some tricky math.

I have a table of zipcodes. Each zipcode has a latitude and longitude associated with it. Given a specific lat,long coordinate, I want to find all the zipcodes that are within a given radius (in US miles).

I found information about the Haversine formula, but that doesn't seem to quite give me what I want.

The most efficient way to do this is to first query for a set of zipcodes withing a "square". Then I would calculate the distance for each individual zipcode in that square and keep only the ones that occupy the circle which represents the radius distance from the original point.

For instance, here is my zipcode query.

select * from share.zipcode where

(latitude between (MINLAT) and (MAXLAT))

and (longitude between (MINLON) and (MAXLON))

;

First, given a lat,long coordinate I need to calculate MINLAT,MAXLAT,MINLON, and MAXLON.

Then, given the resulting zipcodes, I need to calculate the distance between the original lat,long and the lat,long of each zipcode and then keep the ones that are inside the max radius.

Hi happy,

Have you looked at Kevin Roth's code?

Code Library > Zip Code Search

"Description: Everything you need to perform a zip code radius search."

http://www.kevinroth.com/projects/project.php?id=85

Patricia

An alternative approach is to do this with online look-ups. There are sites where you can enter lat/long, and get back the relevant Zip+4, and other sites where you can enter the Zip and get all other zips within a given radius.

Not the most elegant solution, but I thought I'd mention it, just in case...

Hi,

Perhaps you'd like a discussion of the math and how to implement it in SQL?

It would be good to know what database manager you plan to use (e.g. MySQL, SQLServer, etc.).

regards, mathtalk

Hi happyengineer,

It would also be useful to know what degree of accuracy is required. If approximate results are acceptable, then the application can be greatly simplified.

Regards,

eiffel

Rough accuracy is fine. It should be close enough that users won't say "hey, I know that that's within 5 miles of me!". So being accurate to a tenth of a mile is probably fine.

I'm using postgres, but I don't need the math to be done inside the query unless necessary. I'm doing most of the calculations on the java side which makes the query to the db.

Here is what I came up with. It seems to work, but I'm not 100% certain I did it correctly.

First, I run this query:

FROM Zipcode WHERE (latitude between :minLat and :maxLat) and (longitude between :minLon and :maxLon)

The parameters are assigned as follows:

double milesPerDegree = 0.868976242 / 60.0 * 1.2;

double degrees = milesPerDegree * milesDistance;

double minLon = lon - degrees;

double maxLon = lon + degrees;

double minLat = lat - degrees;

double maxLat = lat + degrees;

where milesDistance is the actual radius requires. The milesPerDegree is a conversion of degrees to nautical miles. I then multiplied it by 1.2 which seemed to result in an area large enough to get all the zipcodes I need.

The query returns a list of zipcodes which are inside that box. Then, on the java side I run the following calculation for each zipcode to find out its exact distance. If it's <= milesDistance then I keep it and use it. If it's outside then I throw it away.

for(Zipcode z : zipcodes) {

double lat2 = z.getLatitude();

double lon2 = z.getLongitude();

// This calculation seems to be close to correct. Google maps seems to confirm that the given results are

// actually roughly within the distance they are supposed to be.

double PI_180 = Math.PI / 180;

double distance =

3956 * (2 * Math.asin(Math.sqrt(

Math.pow(Math.sin(((lat-lat2)*PI_180)/2),2)

+ Math.cos(lat*0.017453293) * Math.cos(lat2*0.017453293) *

Math.pow(Math.sin(((lon-lon2)*PI_180)/2),2)

)));

if(distance <= milesDistance) {

nearbyZips.add(z.getZip());

}

}

In short, I query for a box which covers all the zipcodes that are in a square which covers the circle with radius milesDistance. Then I do the full calculation for all zipcodes inside that square to find the ones inside the circle.

I want the DB query to put minimal load on the DB and return as few zipcodes as possible without missing any valid ones.

@angy, it's not mathematical. I have a database table of US zipcodes and their corresponding lat,lon coordinates.

Hi, happyengineer:

I answered a similar kind of Question here:

[Google Answers: Identifying zip codes between two points]

http://answers.google.com/answers/threadview?id=247783

although the geometry is different and the coding was

pushed into the database layer. A more quantitative

difference is that where evidently you used a value of

3956 miles for Earth's radius, I used 3958.

For the sake of this discussion I'll adopt your value.

You present a formula (or the Java implementation of it)

which finds the distance between two points identified

by longitude & latitude coordinates.

To make the SQL query quick & easy, you propose to look

up places whose longitude & latitude lie between a pair

of (respective) bounds, then check actual distance on

the Java side of things.

This is especially easy to work out for the latitude

coordinates because a degree of latitude would have

constant length if the Earth were a perfect sphere.

The shape is more precisely an oblate spheroid, but

that is merely a refinement in approximation. Radius

at the equator is around 3963 miles, i.e. greater

than at the poles (and elsewhere).

For radius R of 3956 miles we get a circumference of

2*pi*R = 24,856 miles, which would make 1 degree of

latitude be 69 miles long.

While I'm happy sticking to the perfect sphere model,

the oblate spheriod details are presented here:

[Geodesy Page by James Q. Jacobs]

http://www.jqjacobs.net/astro/geodesy.html

Note that for an oblate spheroid there are actually

two notions of latitude, one geodetic (based on the

angle that a normal to the surface makes with the

equatorial plane) and one geocentric (based on the

angles lines drawn to the Earth's center make with

the equatorial plane). In the geodetic case the

length of a degree of latitude increases slightly

from the equator (68.7 miles) to the poles (69.4

miles).

US ZIP codes lie mostly in the western hemisphere

(longitude) and entirely in the norther hemisphere

(latitude), For the continental US (excluding

Alaska and Hawaii) the latitudes are between 25

and 50 degrees north, and the variation in length

of a degree of latitude is at most 0.4%. So the

simple sphere model and 69 miles per degree of

latitude should work well enough here:

double degrees = milesDistance/69.0;

double minLat = lat - degrees;

double maxLat = lat + degrees;

One improvement I suggest in your formulas is to

consider that the length of a degree of longitude

shrinks with increasing latitude (toward a pole).

If you don't mind reusing the value degrees after

we've found minLat,maxLat above:

double PI_180 = Math.PI / 180;

degrees /= Math.cos(lat * PI_180);

double minLon = lon - degrees;

double maxLon = lon + degrees;

Note that the result of the above is to increase

degrees, because it's being divided by the cosine

of the latitude, i.e. by a positive number less

than 1. The effect of a degree of longitude

being shorter than a degree of latitude is that

it takes more of them to make up a given distance.

It can be a substantial adjustment, already 30.5%

when 40 degrees North latitude is reached.

The intervals minLat & maxLat, minLon & maxLon can

then be used in your SQL query just as you meant.

regards, mathtalk

P.S. This approach also works reasonably well for

ZIP codes in Hawaii and Alaska, although a very

small portion of the Aleutian Islands are beyond

180 degrees West & technically in the Eastern

hemisphere. I don't expect you need to deal with

them, but a simple approach would be to adjust

the longitude from degrees East to a corresponding

"degrees West" value greater than 180, and similarly

for any US possessions lying further west. Usually

in a database such East longitudes are represented

as negative numbers. As such the adjustment would

be to add them to 360 degrees (producing an angle

greater than 180 degrees. If there were ZIP codes

in the Southern hemisphere, these would probably

be represented as negative degrees latitude, which

is fine for the purposes of our calculations above.

From the question:

"I have a table of zipcodes. Each zipcode has a latitude and longitude

associated with it. Given a specific lat,long coordinate, I want to find

all the zipcodes that are within a given radius (in US miles)."

If this means there is just one lat/long coordinate for a zipcode, a point, will the mathematic solution capture zipcodes with areas within the radius whose own lat/long coordinates lie outside the radius?

Hi, myo:

No, it will miss a ZIP code unless the corresponding latitude/longitude is within the radius (specified distance) of a given origin.

To compensate for this, users of such a database are typically generous with the radius value. If a set of ZIP codes is to be used more than once, i.e. for repeated mailings, then it might pay to refine the resulting address sets by another means. But one can try to "err" on the side of inclusion at the level of ZIP codes considered.

regards, mathtalk

Hi Mathtalk,

Many thanks for your explanation. In my simplistic mind, ... No, I won't bother you both with how I was envisaging catching borderline zipcodes, which would have required a lot of handwork.

Regards, Myo

Hi, happyengineer:

When I posted my answer (about finding a "box" for limiting

the SQL query by latitude and longitude), I'd taken a look

at your distance formula (implemented on the Java side) and

decided it looked right (it is), esp. given your mention of

finding the haversine formula for yourself. But to round

out the discussion, I want to add a few comments about that.

Here's a derivation (with ASCII graphics!!) of the formula:

[Deriving the Haversine Formula -- Ask Dr. Math]

http://mathforum.org/library/drmath/view/51879.html

This is a formula for the "great circle" distance between

two points on a sphere, consistent with our earlier talk

about whether the oblate spheroid refinement is worth the

effort. For future reference, if you wanted to address it:

[Geodesic distance... using Vincenty ellipsoid formula]

http://www.movable-type.co.uk/scripts/latlong-vincenty.html

One minor code improvement I'd suggest is using PI_180 in a

consistent way. A couple of places you used 0.017453293,

and this ought to be the same thing as pi/180, converting

from degrees to radians.

The haversine formula is preferred for accuracy at short

distance scales. In many implementations of the haversine

formula you'll see an upper bound of 1 placed on:

a = sin²((lat-lat2)/2) + cos(lat)*cos(lat2)*sin²((lon-lon2)/2)

so that distance = 2R * asin(sqrt(min(1,a))) where R is the

Earth's radius (and I'm assuming angles in radians).

The purpose of it is to guard against potential breakdown at

the other extreme of the distance scale, where two points

lie at nearly opposite locations on the sphere. With rounding

it might compute that a > 1, which would cause an exception in

calculating arcsine of sqrt(a). Assuming these extremes of

distance are irrelevant to your application, I see no need to

include that guard. [Indeed I went further and suggested you

are likely only interested in distances between points in the

continental USA.]

The quantity a above is the square of half the chordal distance

between two points on a unit sphere, so 2R*sqrt(a) approximates

(Euclidean/chordal distance vs. great circle distance) when a

is quite small. But you can have essentially the same savings

in computation _without_ resorting to an approximation, and in

the bargain have code that reads more simply!

The idea is to convert your distance limit into a corresponding

bound on a. That is, distance is a monotone increasing function

of a, so let's figure out which aMax gives the limit on distance

and then compare ZIP code locations using aMax:

a < aMax = sin²(distance/(2R))

This saves all but computing "a" for each ZIP code location (and

doing the comparison). Of course you may still wish to compute

for display purposes the corresponding distances of any "surviving"

points inside the radius "distance".

regards, mathtalk

**Actions: **
Add Comment

Frequently Asked Questions | Terms & Conditions | Disclaimer | Privacy Policy | Contact Us | Spread the word!

Tue 17 Jan 2017 - 11:25 pm UTC - © 2017 Uclue Ltd

angy

Researcher

12 Apr 2009 05:12 UTCSun 12 Apr 2009 - 5:12 am UTC

Doesn't this assume zipcodes are assigned mathematically? Australian postcodes are assigned to geographical areas by someone drawing lines along the streets on a map, and correspond roughly to population density.