Sunday 19 July 2015

A Calendar for Your Date — Part II



In A Calendar for Your Date — Part I, I gave a brief tour of the variations in current and historical calendar systems. I now want to approach the question of how we should represent dates that are not expressed according to our Gregorian calendar.

The different calendar systems may be categorised as follows:[1]

Empirical. The start of the months or years is determined by direct observation and intercalary days or months are inserted on an ad hoc basis.

Calculated. These are rule-based and so are predictable. Lunisolar and Solar calendars may be astronomical rather than arithmetic in that the start of months and years may be determined through astronomical calculation rather than purely by using a fixed rule. Calendars with “wandering years”, such as the Egyptian civil calendar and the Mayan calendar, have a simple fixed number of days per year.

Conversion between calculated calendars can be approached algorithmically, if enough information is known, but empirical calendars require the use of tables, and it is rarely the case that enough historical information survives to make this an accurate process.



In general, when we see an historical date, we cannot always convert it directly and unambiguously to our modern Gregorian calendar; it requires some interpretation, and that in turn requires information about the actual calendar variant that was being used, the social group involved, their political and religious leanings, the weather, and maybe even the geographical coordinates.

As a modern-day analogy of this problem, consider if I’d written the date 9/6/2015. Now did I mean June 9th or September 6th? If the date had been written in the US then you might believe that the latter alternative is more likely. However, some knowledge of the author, and the fact he has worked in the US but that he is English by birth, might suggest the former alternative is more likely. Hopefully, you see the problem: the converted value cannot always be faithful to the original source information, and it might require an update based on the analysis of new information, or the availability of a revised algorithm or tables.

A number of resources exist for calendar conversions — both algorithms and documentation — although some seem to have disappeared since I first saw them:

URL
Description
Status
Notes
Converter for historical calendars


Calendars and Their History
Currently inaccessible
See "inexact" in sec 1.3 observational calendars
Indian Calendrical Calculations


Convert a Date


Pancanga (version 3.14)



These resources involve specific calendar variants that would have to be known in advance, and at least one acknowledges the inexact nature of general calculations. In effect, a calculated date can never be more accurate than the original, and there will be many cases where it will be less accurate.

What I’m suggesting is that wholesale conversion of historical dates to the Gregorian calendar is a very bad approach for genealogy, and for historians in general. Obviously we need the ability to put dates from different calendars on the same timeline, but that does not mean discarding the original information in favour of a calculated alternative; a process which may involve an increased degree of uncertainty or imprecision (as differentiated in Warm Fuzzy Dates), as well as some loss of evidence. Furthermore, if software is going to collate these dates then it needs a representation that it can understand, and we cannot suffice with just the written evidential form and the calculated variant.

Let me explain what this last sentence means by using a small example. STEMMA applies a bilateral approach to all data items assimilated into a computer-readable form, as explained in Returning to Normalised Names and Dates and in Is That a Fact?. What this means is that it holds a transcript of the original evidential form and a separate computer-readable (normalised) form. A computer-readable date would be used for sorting and collation, but also for generating a display form — say for a report or a chart — according to the regional settings and personal preferences of the current end-user. For instance:

Evidential form:     25 Dec 56
Normalised form:     1856-12-25
Display form:        25th December 1856

There are a couple of points to note in this simple Gregorian example. Firstly, those software developers who believe that it is possible to automatically convert written (Gregorian-)dates to a normalised form (ISO 8601 in this case) would probably have interpreted this date as 1956 rather than 1856, thus emphasising the importance of the context of the information. I could have used an evidential form such as “Christmas 56” to hammer that home but I wanted to give a sense of its subtlety. Similarly, an evidential form of “my birthday” is also referencing a date, but whose birthday, and in which year — all of which is contextual information that a researcher would use to apply a conversion. My second point is that the display form is generally (in modern software) produced according to a “short”, “medium”, “long”, or “full” request, and that request would examine the end-user’s settings in order to generate a consistent representation for readability. This is an approach that could be applied to all calendars, in principle.

In the case of a calendar conversion, a missing item would be a normalised value in the alternative calendar; one that must be flagged as being “calculated” to avoid ambiguity. The next example includes a date from the French Republican calendar converted to the Gregorian calendar..

Evidential form:     18 Brum an VIII
Normalised form:     #FR#08-02-18
Display form:        18 Brumaire An 8
Normalised form:     1799-11-09         (Calc)
Display form:        9th November 1799  (Calc)

The normalised form is the STEMMA one since there is no standard that I am aware of. The associated display form uses Arabic year numbers rather than Roman numerals. Coinage of the time often used these rather than the Roman numerals used elsewhere, but it would obviously be a display setting. The two extra fields show the equivalent normalised date after conversion to the Gregorian calendar, and its associated display form. The two normalised forms are therefore distinct in that one is a direct implementation of the evidential form whereas the other is a derivation. The second should therefore be flagged as a calculated datum using something akin to the GEDCOM CAL flag (see DATE_APPROXIMATED in the specification). STEMMA would allow the two forms to be bound using the DATE_ENTITY that’s also used for synchronised dates (i.e. its generalised form of dual dates).

Unfortunately, there are no data standards to accommodate the normalised representation of dates in other calendars. All we have is the ISO 8601 date standard[2], which is specific to the Gregorian calendar and largely the result of an amalgamation of previous standards. Much of its content gets ignored in favour of the pure representation of a Gregorian date and/or time, and that includes ranges, ordinal dates, etc. A critique of that standard may be found at: Is the ISO Date Standard Bad?.

GEDCOM 5.5 includes a small set of “date escapes”[3] that can prefix a date value in order to address different calendars:

@#DGREGORIAN@ — Gregorian calendar
@#DJULIAN@ — Julian calendar
@#DHEBREW@ — Jewish calendar
@#DFRENCH R@ — French Republican calendar
@#DROMAN@ — for future definition
@#DUNKNOWN@ — for unknown calendars

This sounds like a step in the right direction although the specification offers little help on the encoding of year numbers or month names for the non-Gregorian calendars. It does acknowledge the ambiguity of using words rather than numbers via the statement: “No future calendar types will use words (e.g. month names) from this list: FROM, TO, BEF, AFT, BET, AND, ABT, EST, CAL, or INT”.

In February of 2015, Bob Coret analysed the usage on this calendar feature in a sample of 82.9 million DATE lines from about 7000 GEDCOM files.[4] He reported the following very low permille (i.e. tenths of a percent) usage — all others being zero:

@#DJULIAN@        0.123 ‰
@#DHEBREW@     0.013 ‰
@#DFRENCH R@  0.006 ‰

Clearly this feature is very underutilised, but what is the reason? Is it that few people have dates in alternative calendars, or that they only store the Gregorian equivalents, or that their software does not support this feature?

Family Historian uses a “[J]” prefix for entering dates in the Julian calendar, and this has also become a display option in some other products (e.g. TNG). For instance: “[J] 1 Mar 1740”. A consequence is that this alternative syntax occasionally creeps into exported GEDCOM dates to dirty the water.

The Unicode Common Locale Data Repository (CDLR) has also proposed a set of calendar names for computer use at: http://unicode.org/repos/cldr/trunk/common/bcp47/calendar.xml, although I cannot see any details of corresponding date encodings. It appears to be part of an extension to BCP47 ("Tags for Identifying Languages") called RFC6067 for "subtags that specify language and/or locale-based behaviour or refinements to language tags, according to work done by the Unicode Consortium”.

The MARC Extended Date/Time Format (EDTF) makes no mention of calendars as it is applicable only to the Gregorian calendar.

The Society of American Archivists (SAA) adopted DACS (Describing Archives, A Content Standard) in 2004. This mentions alternative calendar systems but only from a written point of view as opposed to a digital one. Their Standards for Archival Description, Chapter 7 (Codes), does mention the Julian calendar but only in the context of ordinal dates.

The MSS Working Group discusses a number of issues related to date/time representation, including dates from non-Gregorian calendars.

The ISO 8601 standard that addresses the Gregorian calendar has a few attractive core features:

  • It uses fixed-length all-numeric fields and so avoids language issues and textual ambiguties (see GEDCOM list of avoided names, above).
  • The resultant text is implicitly sortable without the host software having to understand dates at all.

Ideally, any standard for the computer-readable date formats in the other calendars should adopt a similar approach. This was STEMMA’s goal from its inception. However, it found that it had to adopt a variation of the ISO 8601 format in order to support missing levels of granularity (such as yearly quarters) and to correctly sort differing granularities with respect to each other — two criticisms in the aforementioned article. Apart from the easy cases of the Julian and French Republican calendars, it has made no further headway. What it has done, though, is to create a generic Date entity that can be back-filled with the encodings for any number of calendars — once they’ve been defined — and without changing its overall data model. This is an approach that I strongly recommend to FHISO in order to avoid prematurely dismissing this issue, and then later finding that some method of date escapes is required, similar to GEDCOM.

A number of papers were received by FHISO on the subject of calendars, and their approaches and coverage appear to be very constructive. At the time of writing, there was no associated Exploratory Group established for research into this field.

CFPS
Title
Description
Proposal to support dates BC as negative years
This paper presents a case for allowing dates BC to be recorded using the standard Julian and Gregorian calendars, proposes a representation for such dates that is naturally sortable.
Proposal to extend the calendar style mechanism of CFPS 43 into an abstract formatting model
CFPS 43′s style mechanism is extended into abstract formatting model that would allow applications to format correctly dates written in many unknown calendar systems.
Proposal to support the Julian calendar similarly to CFPS 17
Proposal for a Julian calendar with years starting on 1 Jan
Proposal to add style to the wholly-numeric representation of dates in CFPS 13
Proposal to separate presentation from representation in calendars in order to avoid a proliferation of calendars.
Proposal for compound calendars to resolve a difficulty with default calendars
Proposal to allow the default calendar to be dependent on the date.
Proposal for a Generalised Dual-Date Representation
Proposal for a generalised dual-date representation that applies to multiple calendars
Proposal to Accommodate Alternative World Calendar Systems
Proposed adoption of a date syntax applicable to multiple world calendars, both historical and modern-day.


A question I have heard before is why those uncertainties and inaccuracies should be relevant to genealogists. What difference does it make if you’re a few days out, or a month, or possibly even a year? I entirely disagree with this thinking. Even if you’re only building a family tree then the relationships and vital events might not be supported by direct and non-conflicting evidence; there may have to be some interpretation, and some correlation with information from elsewhere in order to justify them.

A bigger question people may pose is why the historical calendars are of interest to genealogists. After all, there is at least some agreed synchronisation between the six principal calendars that are in use today. Not many people can trace their lineage back to, say, Caesar. Well, even historical characters had lineage, and family history, so whether you’re studying modern genealogy or ancient genealogy should be irrelevant. More than this, though, I do not consider genealogy (including family trees and family history) to be a special case that needs its own standards and methodologies. It is a form of micro-history, which in turn is a form of history. The information that we uncover and analyse in our research does not come from a world of its own, and it cannot be considered in isolation. All those events — both large-scale and small-scale — relate to the real world, and will affect each other. Historical research needs a consistent scheme that respects the integrity of our sources and the information found therein. To suggest that software standards, or the Internet, or populist genealogy products, must stick to Gregorian dates would be a case of the tail wagging the dog.



[1] E. G. Richards, Mapping Time: The Calendar and its History (1998; reprint, Oxford University Press, 2005), p.99.
[2] Data elements and interchange formats — Information interchange — Representation of dates and times, International Standard, ISO 8601:2004(E), 3rd ed. 1 Dec 2004; online copy obtained from http://dotat.at/tmp/ISO_8601-2004_E.pdf (accessed 11 Jul 2015).
[3] Actually, version 5.3 also contained this feature but version 5.4 omitted it with the following explanatory statement: “The Lineage-Linked GEDCOM Form is restricted to Gregorian calendar forms. This version of GEDCOM chose not to support multiple calendars. The reason is that support of multiple calendars would require each receiving system to handle multiple calendar conversions”. Source: Tamura Jones, "FamilySearch GEDCOM Specifications", Modern Software Experience (http://www.tamurajones.net/FamilySearchGEDCOMSpecifications.xhtml : accessed 19 Jul 2015).
[4] Bob Coret, "Usage of calendars in GEDCOM", Bob Coret in English, posted 5 Feb 2015 (http://blog-en.coret.org/2015/02/usage-of-calendars-in-gedcom.html : accessed 19 Jul 2015).

No comments:

Post a Comment