Issue
I don't speak Russian, so I'm having trouble validating whether the months are correctly spelled, etc. To be honest, I'm not fully sure that my input is in Russian (Russian is the language detected by Google translate)
I have some code in Kotlin which does a best-effort to parse dates specified in various formats and languages. I'm struggling with parsing Russian dates, however. Here's the relevant part of my code:
sequenceOf(
"ru-RU", // Russian
"sr", // Serbian
).forEach {
val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
.withLocale(Locale.forLanguageTag(it))
try {
return listOf(LocalDate.parse(dateString, format))
} catch (e: Exception) {
//Ignore and move on
}
}
This code correctly parses "27 апр. 2018"
and "24 мая. 2013"
, but fails on "28 фев. 2019"
.
What's special about "28 фев. 2019"
and/or how can I parse this value correctly?
If you provide answers in Java, I can translate it to Kotlin fairly easily.
EDIT: Here's an SSCCE in Kotlin:
import java.time.LocalDate
import java.time.format.DateTimeFormatter
import java.util.*
println("System.getProperty - " + System.getProperty("java.version"));
println("Runtime.version - " + Runtime.version());
val dateString = "28 фев. 2019"
sequenceOf(
"ru-RU", // Russian
"sr", // Serbian
).forEach {
val format = DateTimeFormatter.ofPattern("d MMM. yyyy")
.withLocale(Locale.forLanguageTag(it))
try {
println("Parse successful - " + LocalDate.parse(dateString, format))
} catch (e: Exception) {
println("Parse failed - " + e)
}
}
Output on my system:
System.getProperty - 17.0.4.1
Runtime.version - 17.0.4.1+7-b469.62
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
Parse failed - java.time.format.DateTimeParseException: Text '28 фев. 2019' could not be parsed at index 3
Solution
Since you are parsing user input, I believe, the only option is to normalize that input prior parsing it - appealing to standards is not an option there.
In Russian language we use genitive form of month names in dates (M(M)+
vs L(L)+
in java DateTimeFormat) and, normally, short forms are produced using rules below (please do not confuse that with programming standards, conventions, habits, tricks, UI/UX guides, etc):
- . (dot) denotes the short form of the word (
мая.
vsмая
- the first form looks ridiculous becauseмая
is a full genitive form ofMay
, another case:июн.
vsиюня
- both have the same length butиюня
is a full genitive form ofJune
) - typically successive consonant should be kept if they followed by vowel in the full form (there are some exceptions for double consonants) - seems to be your case:
фев.
vsфевр.
- short form should not end in vowel,
й
,ь
orъ
Based on that and taking into account possible user mistakes, typos, common sense and programming habits you may potentially face with the following "short genitive forms" of month names in the wild:
- January:
янв
,янв.
- February:
фев
,февр
,фев.
,февр.
- March:
мар
,марта
,мар.
,март.
- April:
апр
,апр.
- May:
мая
,мая.
- June:
июн
,июня
,июн.
- July:
июл
,июля
,июл.
- August:
авг
,авг.
- September:
сен
,сент
,сен.
,сент.
- October:
окт
,окт.
- November:
ноя
,нояб
,ноя.
,нояб.
- December:
дек
,дек.
Answered By - Andrey B. Panfilov
Answer Checked By - Clifford M. (JavaFixing Volunteer)