The information expression principle and character

  • Detail

Analysis of the information expression principle and characteristics of PDF (Part 1)

I. PDF overview

PDF (portable document format) is a structured document format taking many experimental machine data collection systems, such as AD7705, AD7703, ad7701, as examples. It was first released by the famous American typesetting and image processing software Adobe in 1993 (version 1.0), and its corresponding supporting software product series adobeacrobat version 1.0 was launched in the same year; Then Adobe revised and upgraded it, released version 1.1 in 1994, and launched Adobe Acrobat version 2.0 and 2.1, a series of supporting software products. The subsequent PDF1.2 version was released on November 27th, 1996, and the corresponding supporting software product series Adobe Acrobat was also upgraded to version 3.0. By the end of 1997, the international organization for standardization had begun to consider PDF as an international standard

comparison between F and ps

PS language (postscript language, i.e. page description language) is also a de facto printing industry standard owned by Adobe company. It can describe the layout where the generation of exquisite electrostatic phenomenon will bring a lot of harm to our experimental machine, and occupies a dominant position in the current printing field. Pdf is developed from PS. in terms of page description, they have almost the same capabilities and similar description methods. Pdf uses the same imaging model as PS to represent text and graphics. Like PS language, PDF's page description instruction draws pages by coloring the selected area. The shaded areas can be outlines of letters, areas defined by lines and curves, and bitmaps. The shaded colors can be arbitrary, and any graphics on the page can be cropped into other shapes. The page is empty at the beginning. Various instructions draw different graphics on the page, and the new graphics are opaque, which can overwrite the old graphics

nevertheless, PDF is quite different from PS. This is mainly reflected in the following aspects: ① PDF files can contain interactive objects, such as hyperlinks, interactive forms, etc., while PS does not. ② Pdf is a file structure, while PS is a programming language. Therefore, PDF has higher processing efficiency than PS. ③ The strict structure definition of PDF allows random access to one object by application degree, while PS can only access the whole in sequence. For example, if you want to access page 100 in a PS file, you must explain the first 99 pages in order before you can find page 100. In PDF, the access to each page is the same. ④ The PDF also contains font description information such as the specification and size of the font, so that when the font does not exist, the font simulation can be carried out (not a simple font replacement) to ensure the consistency of document display

The characteristics of


pdf can be summarized as follows: ① transitivity. Pdf files support 7-bit ASC Ⅱ code and binary code, which can be transmitted correctly in various network environments. ② Support interactive operation. Pdf contains interactive objects such as interactive forms and hyperlinks. ③ Support sound and animation. ④ It supports random access to page content and improves various operation speeds of the page. ⑤ It supports the modification method of continuous addition, so as to facilitate a small number of modifications and improve efficiency. ⑥ It supports a variety of compression coding methods. It is expected that lithium battery materials and equipment will be in a state of strong demand and full orders in the next 2 (3) years, and the file structure will be more compact. ⑦ Font independence. The PDF file can be provided with font description information, so that the correct display of the document can still be ensured in the case that the user system lacks the required font. ⑧ Platform independence. Pdf files have platform independence of software and hardware. This feature is very suitable for information exchange in network transmission to avoid the trouble of garbled code. ⑨ Safety control. Pdf files support various levels of security control, which is very important for protecting the copyright of electronic publications. We can set different levels of security according to the security requirements of various electronic publications

II. PDF principle structure

f file structure

pdf file structure (i.e. physical structure) includes four parts: file header, file body, cross reference table and file tail. See Figure 1


file header indicates the version number of the PDF specification that the file complies with, which appears in the first line of the PDF file

the file body consists of a series of PDF indirectobjects

cross reference table is an address index table of indirect objects set up to enable random access to indirect objects

at the end of the file, the address of the cross reference table is clear, that is, the root object (catalog) of the file body is indicated, and security information such as encryption is saved

f document structure

pdf document structure is the logical organization structure of PDF file content, which reflects the hierarchical relationship between indirect objects in the file body. The document structure of PDF is a tree structure, as shown in Figure 2. The root node of the tree is also the root object of the PDF file. Under the root node, there are four sub trees: pages tree, bookmark tree, articlethreads and nameddestination

in the page tree, all page objects are leaf nodes of the tree, and they will inherit the attribute values of the parent node as the default values of their corresponding attributes. The bookmark tree organizes bookmarks according to the hierarchical relationship of the tree. Bookmarks establish the location association between a Book signature and a specific page, which enables users to access the content of the document according to the bookmark name. The clue tree combines the article clues and PolyOne TPE materials & nbsp; The article beans under the clue of innovative wearable smart devices are organized and managed according to the tree structure. As for the name tree, it establishes a correspondence between a string (i.e. name) and a page area. Each leaf node in the tree stores the string and its corresponding page area, while the non leaf node is just an index, so that applications can quickly access the leaf node. The function of the name tree is to enable other objects in the PDF file to use string names to represent a certain page area

(to be continued)

Copyright © 2011 JIN SHI