Our article on Statistics NZ's May migration statistics, and whether or not there's a brain drain, prompted a lively thread of comments from our readers.
In particular there was considerable discussion about the methodology Statistics NZ uses to compile its migration data every month/quarter/year and how well this represents current migration trends.
To assist readers who have an interest in this matter, Statistics NZ has provided interest.co.nz with an overview of how the migration statistics are compiled.
We are publishing it here unabridged, and hopefully it will provide readers with a better understanding of the migration data.
Readers are of course welcome to continue their discussion on this topic in the comment section beneath this article.
Here it is:
Is it better to measure migration using passenger card intentions or actual duration of stay/absence?
The immediacy of having all travellers indicate their intended duration of stay/absence on arrival/departure cards is an advantage of relying on passenger card intentions. Indeed, this was the basis of official migration statistics up until the departure card was dropped in November 2018.
However, these intentions have been shown to significantly understate the true number of migrant arrivals and departures (see International migration uses new official measure). Aside from the hassle of travellers having to fill in departure cards – 7 million a year before they were dropped – there was also the cost of distribution and processing.
With the ability to precisely link arrivals and departures, we can now measure the exact time that each traveller spends in New Zealand or overseas after crossing the border. This is the most accurate way of measuring migration, which is a critical input into other population statistics and for policy monitoring.
Where does the predictive model come in?
We use a statistical machine-learning model to classify travellers whose migrant status is uncertain. Through simple logic, usually a minority of travellers in each month have an uncertain migrant status. This is because most travellers are coming and going within a few weeks.
For example, a traveller who arrives in New Zealand for the first time, then departs 3 weeks later, cannot possibly be classified as a migrant on departure. They can, however, potentially still be a migrant on that arrival, if they return to New Zealand and satisfy the 12/16-month rule. If, 4 months after departure they have not returned to New Zealand, they cannot possibly have been a migrant on arrival, as they cannot satisfy the 12/16-month rule.
The provisional migration estimates therefore become more certain over time because fewer travellers need to be classified using the model. Eventually, all travellers can be classified as a migrant or non-migrant without the model.
Given the revisions to provisional estimates, when should those estimates be used?
Ultimately this is a choice for people to balance their need for timeliness versus accuracy. However, people do not need to wait 17 months for robust migration estimates.
Initial provisional estimates are clearly susceptible to the largest revisions. The nature of the 12/16-month rule means that once the 4-month threshold is passed, a majority of travellers can be definitively classified as non-migrants, so the revisions are typically less after this point.
A similar threshold is reached once the 12-month threshold is passed, as a majority of travellers can then be definitively classified as migrants (except for those who have made multiple trips across the border).
We note that the model assigns probabilities of being a migrant/non-migrant, not a binary yes/no assignment.
What training data is used in the model?
A 3-year rolling window of historic arrivals and departures is used to train the model. For the migration estimates up to May 2022 published in July 2022, this window was up to January 2021 where the migrant status of all travellers was finalised in June 2022. This window is therefore a mix of pre-COVID and post-COVID travellers, although the volume of travellers was much higher pre-COVID of course.
We note that the latest border crossings are used in the model each month. For example, the May 2022 estimates published in July 2022 take account of border crossings through June 2022.
Can’t you just calculate net migration from all arrivals and departures across the border?
Calculating net migration from all arrivals and departures is misleading, especially over short periods of time, as most border-crossing are short-term travellers. Pre-COVID, about 1 in every 50 person crossing the border was a migrant. The ratio dropped in 2020-22 as the volume of inbound and outbound tourism shrank, but migrants are still a minority of travellers crossing the New Zealand border.
The difference between all arrivals and all departures is interesting, as this indicates how the number of people physically in New Zealand is changing .
But most population statistics relate to the resident population – the population usually living in an area – as this is most pertinent to education and health needs, the labour and housing markets, the tax base, superannuation, and so on. For these purposes it is important to measure migrant arrivals and migrant departures.
What is a migrant again?
A migrant is essentially someone who changes their country of residence. UN guidelines around measurement suggest a 12-month threshold to differentiate between migrants and non-migrants.
And what is the 12/16-month rule?
The 12/16-month rule is a way of classifying border crossings as short-term or long-term on the basis of whether travellers spend 12 months (or more) of the following 16 months in New Zealand:
- A migrant arrival is an overseas resident who arrives in New Zealand and cumulatively spends 12 out of the next 16 months in New Zealand.
- A migrant departure is a New Zealand resident who departs New Zealand and cumulatively spends 12 out of the next 16 months out of New Zealand.
Overseas residents and New Zealand residents are both comprised of New Zealand citizens and non-New Zealand citizens.
The 12/16-month rule gives a transparent and objective method of classifying travellers, rather than relying on the stated intentions of travellers on passenger cards. Both Australia and New Zealand use this method to measure international migration.
More information: Stats NZ’s Migration Data Transformation page has a lot of information about the new migration estimates. However, if you can’t find what you are looking for or have other questions, please drop us an email to demography@stats.govt.nz
The comment stream on this story is now closed.
- You can have articles like this delivered directly to your inbox via our free Property Newsletter. We send it out 3-5 times a week with all of our property-related news, including auction results, interest rate movements and market commentary and analysis. To start receiving them, register here (it's free) and when approved you can select any of our free email newsletters.
28 Comments
Very helpful article, thanks.
People are waiting for a plane that'll outrun global inflation.
Not everything is about inflation, or even NZ.
Awesome stuff Greg. It does sound as though the methodology will make it difficult to identify a shift in trends using the latest months data, and we may need to wait 2-3 months for it to become apparent.
The reality is the world is really only just opening up to international travel, entry criteria and air transport availability are super patchy, so if there's a brain drain story to be told it'll be more like years in the making.
They haven't even opened up the strata lounge at Auckland, makes for an uncomfortable draining.
If I read that correctly, it seems they do not equate NZ'ers leaving as emigrants -> because they are covered by rules derived by the last 3 years of data, two years of which has had very low net migration.
Their uncertainty seems to be modeled around only confirming whether immigrants are here to stay or not - and the training data has been tainted by two years of people sheltering in the hermit kingdom from Covid. And why train on only 3 years of data, when they [theoretically] have decades of data they can use. My guess is their predictive model is over-fit, and has no ability to handle a change in large numbers of existing kiwis emigrating until there has been a significant delay - and as such any numbers they are currently publishing in the face of massive negative customs numbers are, quite frankly, hard to believe.
Just because a method is politically convenient ("both Aus and NZ use this method"), doesn't mean it gives the true value (much like our massaged CPI and unemployment stats).
It's also no surprise the number of migrants declaring immigration intention via card was understated, when for 15+ years we ran a Student Visa->Work Visa->to residency scheme.
They do measure the number of NZ citizens leaving the country permanently or long term. That's what yesterday's story was about: https://www.interest.co.nz/property/116717/talk-mass-exodus-new-zealand…
Thank you for taking the time to respond Greg, I appreciate the work you do.
They have stated here how they define that-> someone who has been out of the country 12 months out of 16 -> so anyone out for 12 consecutive we know about at 12 months is easy to classify but with a delay of 12/13 months to verify, and up to 16 months with a delay of 16/17 months to verify.
My point re: the model is this: they trained the current prediction with the 3 years of data up until January 2021. From the graphs in the article you linked:
There were only 2 months with negative migration in their dataset Dec 2021 (-2121) and Jan 2021 (-841). All the rest of the data is massively positive inwards migration. Given they assign a probability to the traveller's intention, I am hazarding to suggest their model is currently heavily skewed towards immigration. Very few of the months since, with fairly balanced net migration (ever-so-slightly negative) have been validated as yet, rendering the model untrustworthy in the face of potentially significant emigration, especially since January this year.
It would perhaps be more interesting to see the stats broken down a bit more (emigrated, returned, immigrated, left) per month. (It may exist, I haven't looked).
An interesting exercise (and I admit I haven't done it) would be to compare customs excess data with stats published migration numbers. As it stands at the moment, we know there is about a 17-month lag in the way stats validate their model. So the last year's excess departures haven't really had a chance to improve the model, which is still heavily informed by the previous years.
And I could be totally wrong, and my argument nonsense, and I'll own that if history proves me so :)
Greg,
They don't measure them at all.
What they do is estimate them as the output of a model.
And in the current situation, that model, based on historical relationships, is seriously flawed.
KeithW
Too many juicy articles on Interest today, this one is probably the most insightful. Interesting they use 3 years of data as Chaosinflesh pointed out, longer term rolling dataset may produce clearer results at this point in time given we're rolling through the other side of a pandemic.
Kind of solidifies that the statistics are estimated and we won't know the true picture until after the dust has settled
So basically, the data doesn't rally tell us enough for a fair few months? Makes sense when trends are shifting dramatically it will take some time for the machine learning to catch up
Thanks Greg
But all of this confirms that neither the Stats Dept nor any of us have data of the number of Kiwis who have been departing migrants in the last three months, and arguably back to the start of the year. The machine learning algorithm has not had recent data to do its learning from and the world together with associated behaviours has sure changed over the last 2.5 years.
My own impression is that outward migration is considerable, and growing, for young people who are not currently on the property ladder and who have sound employment prospects overseas. There are alternative hypotheses as to what is occurring and will occur, but there is really no information to either confirm or refute what is happening
KeithW
I thought it was actually pretty well calculated. The model only really needs to know what percentage of people who have left for a while and not come back are migrants. It’s hard to imagine that percentage has changed significantly. If we had a big brain drain I would expect the total leaving and not promptly returning to increase a lot, this is obviously not occurring.
it’s quite possible that you are seeing what the media is telling you to see. It’s actually quite common (pre covid) for young NZers to go overseas. Of course now there will be pent up demand for that, but it’s probably not a significant proportion of the workforce.
There is a high likelihood the percentage has changed, because outward tourism (the denominator) has not yet ramped up. But none of us know the answer because there are no data.
KeithW
https://www.customs.govt.nz/covid-19/more-information/passenger-arrival… Here are the raw figures for arrivals vs. departures since January 2020. Customs has suddenly stopped reporting the daily movements, but you can add up the totals and see the trends; this months daily departures exceeded arrivals by a number of thousands, and over the time of these stats, there is a significant (130,000 plus) excess of departures
Nailed it.
I agree with this conclusion.
So what's the confidence levels? Have they done sensitivity analysis?
You cannot do sensitivity estimates on these types of model. it would be meaningless.
KeithW
One thing you could do is see how sensitive the estimates are to different training windows. E.g. if they train it only on pre-Covid data versus including the COVID period. If it makes a big difference then you'd be even more cautious about reading into the results. (Since current propensities for a departure to be an emmigrant versus a holidaymaker are unlikely to be the same as there were during COVID (or even the same as pre COVID)
Having monitored the revisions to Stats NZs estimates of migration for a given month from release to release (vintage analysis), I wouldn't be reporting on even the last 4 available months of data. It gets revised so much.
I agree.
It is a misuse of the model to report the last four months.
KeithW
Sounds to me like they have no idea if people are immigrating or emigrating, but will have a guess in 12 to 16 months. Just impossible to see trends when you have no data... Very loose accounting.
Wow, you guys, thank you!
3 years of data to train the model is way too low and doesn't account for long term trends, but would account for seasonality. Not sure the data set, without other inputs, is helpful really.
They are essentially saying "Based on what happened with people going in/out of the country for the past 3 years, we are projecting the same things to happen going forward so can make assumptions about arrivals/departures". That's not how migration works. People make migration decisions based on their perceived change in lifestyle of source compared to destination and their ability to leave. Changes of a wide range of policies can have sudden effects as can things like pandemics, which will distort the models conclusions about every single movement. For instance now, we will be using data from a very small data set (hardly anyone travelled the last 2 years) and travellers used in the model that has been trained, don't represent those that are coming/going now. To say that they do is frankly stupid, it's recency bias trained into a model.
I have worked in the area of machine learning/predictive modelling and while I like the idea from a tech perspective, I think they have asked "Can we do this" when maybe they should have asked "Should we do this".
Customs figures for arrivals and departures going back 18 months show a consistent pattern of departures exceeding arrivals to a total today of, I believe, around 130,000, so if I understand this correctly, the differences at that time are likely to be people migrating as they apparently have not returned. Even in 2020, the monthly differences were in the thousands; now this is the figure daily, or at least was until Customs suddenly stopped reporting figures just a few days ago. I believe the exodus is far greater than Stats want to let on, and it poses some interesting questions for the future.
In the 3 months of Apr-Jun since borders were opened, there has been a net loss of 30,000 people - people who have left and not returned. In the first week of July alone, there has been a net loss of 9000 people! Now it should be apparent to everyone why Customs has stopped reporting the daily numbers. Any fictional model predicated on the last 3 years of emigration statistics is nowhere near accurately reporting what is going on currently. People are fleeing this country en masse. Shame we have to wait 16 months for the "official" recognition of it. But hey, that 16 months will probably be long enough to get us to the 2023 election without the Labour Govt having to take responsibility for it.
We welcome your comments below. If you are not already registered, please register to comment.
Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.