Patch draft.py

`owner:esanchez@yaco.es` `resolution_fixed` `type_task`   |    by esanchez@yaco.es

___


Henrik provided a patch to draft.py that solves the following issues:

  * Extraction of Title which don't have the draft name on a separate
    page fails.  See for instance this example:
    http://www.ietf.org/staging/draft-ma-cdni-publisher-use-cases-00.txt
    The regex should maybe be updated to permit but not require a newline
    before the draft filename:
    '(?:\n\s*\n\s*)((.+\n){1,2}(.+\n?))(\s+<?draft-\S+\s*\n)\s*\n'
    Fixed in patch.

  * If there are blank lines before the start of the author list on the
    first page, the author extraction will fail.  This sometimes happens
    when there's junk at the start of a draft, see for instance
    http://www.ietf.org/id/draft-ietf-mpls-tp-process-00.txt .  Fixed in
    patch.

  * Sometimes the Authors' Addresses section lists authors with the same
    workplace address on the same line: "Sam Spade and Joe Smith".  This
    needs a fix in the author extraction code.  Provided in the patch.

  * Sometimes the order of first name, surname is different on the first
    page and in the author list, and sometimes the surname is uppercase
    in one place, but not in the other.  This also needs a fix in the
    author extraction code.  Provided in the patch.

  * The header stripping code had a bug, where multiple blank lines could
    be replaced by a single blank line in the stripped text, which could
    mess up title extraction.  Fixed in the patch.

  * Title space normalization should be done also for titles from the
    'unusual title format' code branch of the title extraction code.
    Fix provided in the patch.

  * Company names on the first page are sometimes rendered with different
    case than in the Authors' Addresses section.  Fixed in the patch.

  * Some drafts list the draft filename _before_ the title, rather than
    after the title.  Permit this too. Covered in the patch.

  * Spanish names can be shown as either
    <given_name> <fathers_first_surname> <mothers_first_surname>
    or less formally as
    <given_name> <fathers_first_surname>
    If the first form is used in the Authors' Addresses section, but the
    second form (with the given name possibly abbreviated to its first
    letter) the author extraction will fail.  Fix provided in patch.

  * Drafts containing tabs will be caught by idnits during I-D submission,
    but in case the drafts.py module is used independently from idnits,
    convert tabs to spaces in order for the author extraction and other
    methods to work as expected.  Example: recently submitted draft
    draft-bergeron-payload-rtpfec-rs-00.txt.  Fix provided in patch.

  * Found a draft with a previously unhandled header/footer format:
    draft-fang-mpls-tp-oam-toolset-01.txt.  Tweak needed for header/footer stripping.  Fix provided in patch.


___
_Issue migrated from trac:624 at 2022-03-04 01:46:42 +0000_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch draft.py #624

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Patch draft.py #624

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions