Looking at the raw XML
from a .fods
file:
<table:table-column table:style-name="co1" table:default-cell-style-name="ce17"/>
<table:table-row table:style-name="ro1">
<table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
<text:p>John Smith</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
<text:p>(123) 456-7890</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell office:value-type="string" calcext:value-type="string">
<text:p>123?Main Street</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell office:value-type="string" calcext:value-type="string">
<text:p>Anywhere,?ZZ?12345-6789</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro1">
<table:table-cell table:style-name="ce15" office:value-type="string" calcext:value-type="string">
<text:p>Jane Doe</text:p>
</table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro2">
<table:table-cell table:style-name="ce16" office:value-type="string" calcext:value-type="string">
<text:p>(234) 567-8901</text:p>
When opened in Libre Office
the names are in bold. Where would that be reflected in the above XML
? I'm only seeing a value-type="string"
with no markup for bold, underline, etc.
Everything is in a single column, so not quite sure what the default-cell-style-name="ce17"
attribute indicates.
While the data originated as a .doc
file, I'm using Libre Office
on the file.
I'm looking to extract the names from the XML
, which are only, really, distinguished from phone or address in that they're in bold. I suppose there's no numeric numbers, either, but I'd like to select the bold data from the spreadsheet.
The formatting information seems somewhat vague:
Formatting
The style and formatting controls are numerous, providing a number of
controls over the display of information.
Page layout is controlled by a variety of attributes. These include
page size, number format, paper tray, print orientation, margins,
border (and its line width), padding, shadow, background, columns,
print page order, first page number, scale, table centering, maximum
footnote height and separator, and many layout grid properties.
Headers and footer can have defined fixed and minimum heights,
margins, border line width, padding, background, shadow, and dynamic
spacing.
There are many attributes for specific text, paragraphs, ruby text,
sections, tables, columns, lists, and fills. Specific characters can
have their fonts, sizes, generic font family names (roman – serif,
swiss – sans-serif, modern – monospace, decorative, script or system),
and other properties set. Paragraphs can have their vertical space
controlled through attributes on keep together, widow, and orphan, and
have other attributes such as "drop caps" to provide special
formatting. The list is extremely extensive; see the references (in
particular the actual standard) for details.
See Question&Answers more detail:
os