2025-06-02_koordinatval#

When extending training data with observation points in multiple directions and at various distances, it is essential to select distances that reflect the spatial scales of the weather phenomena being modeled. Below is a detailed analysis and concrete recommendations, presented in a step-by-step format:

  • The eight principal compass directions (N, NE, E, SE, S, SW, W, NW) to achieve even coverage in all directions.

  • A “near” distance for short-term horizons (e.g., 5 min, 1 h).

  • A “far” distance for capturing larger, synoptic patterns relevant to 12 h and 24 h forecasts.

  1. Principles for Distance Selection

1.1 Spatial Scales of Weather Phenomena

Convective and mesoscale systems (e.g., local heat islands, cumulus clouds, rain showers):

  • Typically limited to a few kilometers in diameter.

  • Have a significant impact on very short time horizons (5–60 min).

  • These systems generally move only a couple of kilometers per hour (depending on wind), so distances under 5 km are most influential.

Synoptic-scale systems (e.g., frontal systems, low-pressure areas with precipitation or storms):

  • Can span 100 to 1,000 km.

  • Primarily important for 12–24 h forecasts, as these systems usually move at around 10–50 km/h.

  • To capture their influence, data may need to be collected 50–200 km away.

Regional systems (e.g., sea breezes or larger mountain weather in Scandinavia):

  • Often extend 10–50 km, making them relevant for mid-range forecasts (1–12 h), especially in complex terrain or near large water bodies.

1.2 Directions and Symmetry

Selecting the eight principal compass directions (N, NE, E, SE, S, SW, W, NW) ensures symmetric coverage around a central point. This allows for capturing anisotropy in wind direction or topography. For example, if weather systems typically move west to east in your region, data from W and E will be especially informative.

  1. Proposed Distance Sets

We need two sets of distances:

  • Near-range (shorter distances) to capture rapid local changes for 5 min and 1 h.

  • Far-range (longer distances) to capture slower, synoptic changes for 12 h and 24 h.

2.1 Near Distances – Short Horizons

For 5 min and 1 h forecasts, convective and mesoscale effects dominate. We recommend:

Distance A₁ = 1 km

  • Captures immediate neighboring air and cloud conditions.

  • Important when microclimates (e.g., urban heat islands or forest shading) influence the forecast.

Distance A₂ = 5 km

  • Covers areas where convective cells can move a few kilometers per hour.

  • Good for 1 h forecasts, where a rain shower cloud can drift a few kilometers within an hour.

Summary of short distances:

  • 1 km in N, NE, E, SE, S, SW, W, NW

  • 5 km in N, NE, E, SE, S, SW, W, NW

In practice, you may not have sensors exactly 1 km away in the field, but by using data from public weather stations (e.g., SMHI, METAR, or private IoT sensors), you can interpolate or select the nearest stations within these approximate radii.

2.2 Far Distances – Long Horizons

For 12 h and 24 h forecasts, larger synoptic patterns dominate. We recommend:

Distance B₁ = 50 km

  • Many frontal systems move at around 20–30 km/h, so 50 km away can provide an indication of a front arriving in approximately 2 h.

  • Regional valley and coastal systems can also have indirect impacts.

Distance B₂ = 150 km

  • Captures larger low- or high-pressure areas moving through the region.

  • Ideal for detecting pressure changes that may affect local weather in the next 12–24 h.

Summary of long distances:

  • 50 km in eight directions (N, NE, E, …)

  • 150 km in eight directions (N, NE, E, …)

  1. Justification and Concrete Examples

3.1 Why 1 km and 5 km for Short Horizons?

Local wind gusts, small rain showers, or shading effects can change temperature and cloudiness very rapidly.

  • Within 1 km, you can observe the effect of solar heating over a city street or a lake breeze.

  • Within 5 km, a convective cell formed 20 minutes earlier can move into your area.

3.2 Why 50 km and 150 km for Long Horizons?

A Scandinavian low-pressure system often moves at 20–40 km/h over archipelagos and land.

  • To understand how the weather will change in 12 h, you need to know if a front 50 km away can reach you in a few hours.

  • For 24 h forecasts, large high- and low-pressure systems (300–800 km in diameter) generally dominate; a radius of up to 150 km is therefore relevant.

3.3 Example Data Collection

Assume you have a central point (centroid) at coordinates (lat₀, lon₀). You can:

Calculate offsets for coordinate shifts:

  • 1 km ≈ 0.0090° latitude (in Scandinavia)

  • 5 km ≈ 0.045° latitude

  • 50 km ≈ 0.45°

  • 150 km ≈ 1.35°

Generate eight points per distance. For example, for 5 km:

  • N: (lat₀ + Δlat, lon₀) where Δlat ≈ +0.045

  • NE: (lat₀ + 0.045 × cos 45°, lon₀ + 0.045 × sin 45°)

  • E: (lat₀, lon₀ + 0.045)

  • … and so on for SE, S, SW, W, NW.

Fetch weather data (OpenWeatherMap or SMHI) for each point and store it in the database as separate “features”:

  • Example: temp_near_N, temp_near_NE, …, temp_far_S, wind_far_SW, etc.

  1. How These Extra Points Are Used in the Model

By collecting additional variables per timestamp (instead of just “local temperature, cloudiness, wind”), you can build a richer feature vector:

Local (0 km):

  • temp_local, humidity_local, pressure_local, cloudiness_local, wind_speed_local, …

Near (A₁=1 km) in 8 directions:

  • temp_1km_N, temp_1km_NE, …, cloudiness_1km_SE, …

Near (A₂=5 km) in 8 directions:

  • temp_5km_N, temp_5km_NE, …, wind_speed_5km_W, …

Far (B₁=50 km) in 8 directions:

  • temp_50km_N, humidity_50km_SE, …

Far (B₂=150 km) in 8 directions:

  • temp_150km_NE, pressure_150km_S, …

This yields 1 + (8×2) + (8×2) = 1 + 16 + 16 = 33 variables for temperature alone (plus corresponding features for humidity, cloudiness, wind, etc.). In total, you could have 33×X features, where X is the number of different measurements (temperature, humidity, wind speed, …).

4.1 Consider Correlation and Redundant Variables

Including many nearby points in the feature set can lead to highly correlated variables (e.g., temp_local and temp_1km_N during calm conditions).

Use a simple correlation analysis (e.g., pandas df.corr()) or PCA/feature selection on the dataset to identify overly redundant variables.

In the initial phase, you can still include all features and let a model (Ridge regression or neural network) learn the weights.

  1. Distance Recommendations – Summary

Horizon

Spatial Scale of Phenomena

Recommended Distances

0–1 hour (5 min, 1 h)

Mesoscale and convective (1–10 km)

1 km & 5 km

1–12 hours (12 h)

Regional/transient fronts (10–100 km)

50 km

12–24 hours (24 h)

Synoptic systems (100–500 km)

150 km

And in all cases:

  • Include the eight compass directions (N, NE, E, SE, S, SW, W, NW).

  • This provides 16 variables for short distances (two sets × 8 directions) plus 16 variables for long distances.