about openOffice about word The conversion and problems encountered - Blog

[{"createTime":1735734952000,"id":1,"img":"bandupan_350_218.jpg","link":"https://pan.baidu.com/s/1T03izdWtRSeMqOXoT9HCug?pwd=draw","name":"百度网盘下载","status":9,"txt":"百度网盘下载","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"qk_443_300.png","link":"https://pan.quark.cn/s/6229b93c70d0","name":"夸克网盘","status":9,"txt":"夸克网盘","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

One ： Demand details ：

　　 Companies need to store contract documents , User upload word Contract of documents , adopt openOffice Go and get it word Convert to pdf, And then pdf Convert to picture format , And store them separately . because openOffice The conversion needs a lot of memory , So the design is task task , Automatic conversion in the morning .

　　 Record the problems encountered in the completion of this requirement .

Two ： process

　　1： Local environment coding （windows）

　　 The first step ： Because it's the local environment code and it's Windows Environmental Science , So from the installation openOffice start , There was no problem starting the service .

　　 Step two ： Toolkit required for conversion ;
1 <dependency> 2 <groupId>commons-cli</groupId> 3
<artifactId>commons-cli</artifactId> 4 <version>1.2</version> 5 </dependency>
6 7 <dependency> 8 <groupId>commons-io</groupId> 9
<artifactId>commons-io</artifactId>10 <version>1.4</version> 11 </dependency> 12
13 <dependency>14 <groupId>org.openoffice</groupId> 15
<artifactId>juh</artifactId>16 <version>3.0.1</version> 17 </dependency> 18 19
<dependency>20 <groupId>org.openoffice</groupId> 21
<artifactId>jurt</artifactId>22 <version>3.0.1</version> 23 </dependency> 24 25
<dependency>26 <groupId>org.openoffice</groupId> 27
<artifactId>ridl</artifactId>28 <version>3.0.1</version> 29 </dependency> 30 31
<dependency>32 <groupId>org.slf4j</groupId> 33
<artifactId>slf4j-api</artifactId>34 </dependency> 35 36 <dependency> 37
<groupId>org.slf4j</groupId>38 <artifactId>slf4j-jdk14</artifactId> 39
<scope>test</scope>40 </dependency> 41 42 <dependency> 43
<groupId>org.openoffice</groupId>44 <artifactId>unoil</artifactId> 45 <version>
3.0.1</version> 46 </dependency> 47 48 <dependency> 49
<groupId>com.thoughtworks.xstream</groupId>50 <artifactId>xstream</artifactId>
51 <version>1.3.1</version> 52 </dependency> 53 54 <dependency> 55
<groupId>org.apache.pdfbox</groupId>56 <artifactId>fontbox</artifactId> 57
<version>2.0.8</version> 58 </dependency> 59 60 <dependency> 61
<groupId>org.apache.pdfbox</groupId>62 <artifactId>pdfbox</artifactId> 63
<version>2.0.8</version> 64 </dependency>
　　 problem 1： Here comes the first problem , It's in the maven No key dependencies were found in the central repository for jar Package issues .

　　jodconverter-cli this jar Package central warehouse not found jar Package dependency ,jodconverter
The version just arrived 2.2.1（ Previous versions are not supported docx format conversion ,2.2.2 And later on .）

　　 Then discuss with Daniel , Join your company's Intranet maven Warehouse .

　　

　　 Step three ： Tools
1 /** 2 * @author GH 3 * input file 4 * output file 5 */ 6 public class WordToPdf {
//word turn pdf 7 public static void docToPdf(File inputFile, File outputFile){ 8
OpenOfficeConnection connection =new SocketOpenOfficeConnection(8100); 9 try{
10 connection.connect(); 11 DocumentConverter converter = new
OpenOfficeDocumentConverter(connection);12 converter.convert(inputFile,
outputFile);13 }catch(ConnectException cex){ 14 cex.printStackTrace(); 15 }
finally{ 16 if(connection!=null){ 17 connection.disconnect(); 18 connection =
null; 19 } 20 } 21 } 22 } 1 /** 2 * @author GH 3 * parameter 1： To be replaced pdf position 4 *
parameter 2： Converted image storage location 5 * parameter 3： Name to be spliced in the middle 6 * return： Converted img Name set 7 */ 8 public class
PdfToImage {//pdf turn img 9 public static List<String> pdfToImagePath(String
srcFile,String contractFromSrc,String name){10 List<String> list = new
ArrayList<>(); 11 String imagePath; 12 File file = new File(srcFile); 13 try {
14 File f =new File(contractFromSrc); 15 if(!f.exists()){ 16 f.mkdir(); 17 }
18 PDDocument doc = PDDocument.load(file); 19 PDFRenderer renderer = new
PDFRenderer(doc);20 int pageCount = doc.getNumberOfPages(); 21 for(int i=0;
i<pageCount; i++){ 22 // mode 1, The second parameter is to set the zoom ratio ( That is, pixels ) 23 // BufferedImage image =
renderer.renderImageWithDPI(i, 296);24 // mode 2, The second parameter is to set the zoom ratio ( That is, pixels ) 25 BufferedImage
image = renderer.renderImage(i, 2f);// The larger the second parameter, the higher the resolution of the generated image , The longer the conversion time 26 imagePath =
contractFromSrc+name+"-"+i +".jpg"; 27 ImageIO.write(image, "PNG", new
File(imagePath));28 list.add(name+"-"+i +".jpg"); 29 } 30 doc.close(); 31 }
catch (IOException e) { 32 e.printStackTrace(); 33 } 34 return list; 35 } 36
}

　　 Step 4 ： code

　　 First read the untransformed collection from the database , Loop Download oss Object to store files in the specified temporary folder .

　　 Download through tool class conversion word by pdf, Input data pdf record , upload oss object pdf picture .

　　 Through tool class conversion pdf picture , Input data path picture record , Upload and convert it img picture .

　　try catch Catching anomalies , If there is an exception, roll back the database , delete oss Object uploaded file .

　　 modify word The transition state of is converted .

　　
problem 2： Because in the end, both the test and production environments are Linux Systematic , Because it involves the operation of files , however Linux and Windows The file path is different , for example ：Windows The file path is （C:\tmp\test.txt）Linux Then it is （/tmp/test.txt）

　　 therefore In this way
1 　　public final static String
Convert_Tmp_Url="C:"+File.separator+"temp"+File.separator+"contractToImg"+File.separator;
// conduct word——img Temporary storage path during conversion window 2 public final static String
Convert_Tmp_Url2=File.separator+"tmp"+File.separator+"contractToImg"+File.separator;
// conduct word——img Temporary storage path during conversion linux

　　File.separator System related default name separator , for convenience , It is represented as a string stay Linux The value of this field is '/' Windows by '\'

　　 Step five ： Local testing , no problem .

　　2： Test environment test （windows）

　　 problem 3： stay Linux Under the environment word transformation word There is a garbled code in Chinese blank , The reason is that Linux Missing Chinese font encoding .

　　 resolvent ：

　　 step 1： Create path .

　　 stay centos Of /usr/java/jdk1.8.0_91/jre/lib/fonts New path under ：fallback.

　　 step 2： Upload font .

　　 Set font ：simhei.ttf Blackbody ,simsun.ttc
Song style （windows Pass through everything Look for it ） Upload to /usr/java/jdk1.8.0_91/jre/lib/fonts/fallback Under the path .

　　 step 3： View system font file path .

　　 View scheme :

* [root@80ec6 fallback]# cat /etc/fonts/fonts.conf
* 
* <dir>/usr/share/fonts</dir>
* <dir>/usr/share/X11/fonts/Type1</dir> <dir>/usr/share/X11/fonts/TTF</dir> <
dir>/usr/local/share/fonts</dir>
* <dir>~/.fonts</dir>
　　 step 4： Font copy .

　　 take /usr/java/jdk1.8.0_91/jre/lib/fonts All content of , Copy to step 3 View under the path ,
My font path is ：/usr/share/fonts.

　　 step 5： Update cache

　　 Execute the order ：fc-cache

　　 step 6：kill fall openoffice process .

　　[root@80ec6 fonts]# ps -ef | grep openoffice

　　root 3045 3031 0 06:19 pts/1 00:00:03 /opt/openoffice4/program/soffice.bin
-headless -accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard

　　 implement kill：kill -9 3045

　　 step 7： Restart background operation openoffice.
　[root@a3cf78780ec6 openoffice4]# soffice -headless
-accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
　　　

　　3： The test environment is different from the production kernel , The installation packages installed are different .

　　 The test environment is installed with deb file , use dpkg Command to install all deb file , Start the service and use it .

　　 What is the production environment dpkg Command not found . Replacement and installation prm file , After performing the installation , It can't start , After finding out the reason, the installation is not finished ,RPMS There are desktop-integration folder , Enter desktop-integration catalog , There are four in it rpm　　 file , Select the appropriate installation , What I choose here is redhat edition .
　　 implement rpm -ivh　openoffice4.1.5-redhat-menus-4.1.5-9789.noarch.rpm

Welcome to share your ideas .

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...