关于openOffice对于word的转换及遇到的问题 - 博客

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

一：需求详情：

　　公司需要存储合同文件，用户上传word文档的合同，通过openOffice去把word转换为pdf、再把pdf转换为图片格式，并分别存储。因为openOffice的转换需要耗费挺大的内存，所以设计为task任务，凌晨自动转换。

　　记录本次需求完成的时候遇到的问题。

二：过程

　　1：本地环境编码（windows）

　　第一步：因为是本地环境的编码而且是Windows环境，所以从安装openOffice开始，到启动服务并没有遇到难题。

　　第二步：转换所需要的工具包；
1 <dependency> 2 <groupId>commons-cli</groupId> 3
<artifactId>commons-cli</artifactId> 4 <version>1.2</version> 5 </dependency>
6 7 <dependency> 8 <groupId>commons-io</groupId> 9
<artifactId>commons-io</artifactId>10 <version>1.4</version> 11 </dependency> 12
13 <dependency>14 <groupId>org.openoffice</groupId> 15
<artifactId>juh</artifactId>16 <version>3.0.1</version> 17 </dependency> 18 19
<dependency>20 <groupId>org.openoffice</groupId> 21
<artifactId>jurt</artifactId>22 <version>3.0.1</version> 23 </dependency> 24 25
<dependency>26 <groupId>org.openoffice</groupId> 27
<artifactId>ridl</artifactId>28 <version>3.0.1</version> 29 </dependency> 30 31
<dependency>32 <groupId>org.slf4j</groupId> 33
<artifactId>slf4j-api</artifactId>34 </dependency> 35 36 <dependency> 37
<groupId>org.slf4j</groupId>38 <artifactId>slf4j-jdk14</artifactId> 39
<scope>test</scope>40 </dependency> 41 42 <dependency> 43
<groupId>org.openoffice</groupId>44 <artifactId>unoil</artifactId> 45 <version>
3.0.1</version> 46 </dependency> 47 48 <dependency> 49
<groupId>com.thoughtworks.xstream</groupId>50 <artifactId>xstream</artifactId>
51 <version>1.3.1</version> 52 </dependency> 53 54 <dependency> 55
<groupId>org.apache.pdfbox</groupId>56 <artifactId>fontbox</artifactId> 57
<version>2.0.8</version> 58 </dependency> 59 60 <dependency> 61
<groupId>org.apache.pdfbox</groupId>62 <artifactId>pdfbox</artifactId> 63
<version>2.0.8</version> 64 </dependency>
　　问题1：在这里遇到了第一个问题，就是在maven的中央仓库找不到关键的依赖jar包的问题。

　　jodconverter-cli 这个jar包中央仓库找不到jar包依赖，jodconverter
版本才到2.2.1（这个版本之前的不能支持docx格式转换，2.2.2及以后才开始支持。）

　　然后和大牛商量，加入到公司内网自己的maven仓库。

　　

　　第三步：工具类
1 /** 2 * @author GH 3 * 输入文件 4 * 输出文件 5 */ 6 public class WordToPdf {
//word转pdf 7 public static void docToPdf(File inputFile, File outputFile){ 8
OpenOfficeConnection connection =new SocketOpenOfficeConnection(8100); 9 try{
10 connection.connect(); 11 DocumentConverter converter = new
OpenOfficeDocumentConverter(connection);12 converter.convert(inputFile,
outputFile);13 }catch(ConnectException cex){ 14 cex.printStackTrace(); 15 }
finally{ 16 if(connection!=null){ 17 connection.disconnect(); 18 connection =
null; 19 } 20 } 21 } 22 } 1 /** 2 * @author GH 3 * 参数1：要装换的pdf位置 4 *
参数2：转换后的图片存放位置 5 * 参数3：中间要拼接的名字 6 * return：转换后的img名字集合 7 */ 8 public class
PdfToImage {//pdf转img 9 public static List<String> pdfToImagePath(String
srcFile,String contractFromSrc,String name){10 List<String> list = new
ArrayList<>(); 11 String imagePath; 12 File file = new File(srcFile); 13 try {
14 File f =new File(contractFromSrc); 15 if(!f.exists()){ 16 f.mkdir(); 17 }
18 PDDocument doc = PDDocument.load(file); 19 PDFRenderer renderer = new
PDFRenderer(doc);20 int pageCount = doc.getNumberOfPages(); 21 for(int i=0;
i<pageCount; i++){ 22 // 方式1,第二个参数是设置缩放比(即像素) 23 // BufferedImage image =
renderer.renderImageWithDPI(i, 296);24 // 方式2,第二个参数是设置缩放比(即像素) 25 BufferedImage
image = renderer.renderImage(i, 2f);//第二个参数越大生成图片分辨率越高，转换时间也就越长 26 imagePath =
contractFromSrc+name+"-"+i +".jpg"; 27 ImageIO.write(image, "PNG", new
File(imagePath));28 list.add(name+"-"+i +".jpg"); 29 } 30 doc.close(); 31 }
catch (IOException e) { 32 e.printStackTrace(); 33 } 34 return list; 35 } 36
}

　　第四步：编码

　　首先从数据库读取没有转换过的集合，循环下载oss对象存储文件到指定临时文件夹。

　　通过工具类转换下载的word为pdf，录入数据pdf记录，上传oss对象pdf图片。

　　通过工具类转换得到的pdf图片，录入数据路图片记录，上传转换得到的img图片。

　　try catch捕捉异常，有异常就回滚数据库，删除oss对象上传的文件。

　　修改word的转换状态为已转换。

　　
问题2：因为到最后测试环境和生产环境都是Linux系统的，因为涉及到文件的操作，但是Linux和Windows的文件路径是不一样的，例如：Windows文件路径为（C:\tmp\test.txt）Linux则为（/tmp/test.txt）

　　因此采用这种方式
1 　　public final static String
Convert_Tmp_Url="C:"+File.separator+"temp"+File.separator+"contractToImg"+File.separator;
//进行word——img转换的时候的暂时存放路径 window 2 public final static String
Convert_Tmp_Url2=File.separator+"tmp"+File.separator+"contractToImg"+File.separator;
//进行word——img转换的时候的暂时存放路径 linux

　　File.separator 与系统有关的默认名称分隔符，为了方便，它被表示为一个字符串在Linux此字段的值为 '/' Windows为'\'

　　第五步：本地测试，没有问题。

　　2：测试环境测试（windows）

　　问题3：在Linux环境下word转换word中文出现乱码空白，导致的原因是Linux缺少中文字体编码。

　　解决方法：

　　步骤1：创建路径。

　　在centos的/usr/java/jdk1.8.0_91/jre/lib/fonts下新建路径：fallback。

　　步骤2：上传字体。

　　将字体：simhei.ttf 黑体、simsun.ttc
宋体（windows下通过everything找下）上传至/usr/java/jdk1.8.0_91/jre/lib/fonts/fallback路径下。

　　步骤3：查看系统字体文件路径。

　　查看方案:

* [root@80ec6 fallback]# cat /etc/fonts/fonts.conf
* 
* <dir>/usr/share/fonts</dir>
* <dir>/usr/share/X11/fonts/Type1</dir> <dir>/usr/share/X11/fonts/TTF</dir> <
dir>/usr/local/share/fonts</dir>
* <dir>~/.fonts</dir>
　　步骤4：字体拷贝。

　　将 /usr/java/jdk1.8.0_91/jre/lib/fonts的全部内容，拷贝到步骤3查看的路径下，
我的字体路径为：/usr/share/fonts。

　　步骤5：更新缓存

　　执行命令：fc-cache

　　步骤6：kill掉openoffice进程。

　　[root@80ec6 fonts]# ps -ef | grep openoffice

　　root 3045 3031 0 06:19 pts/1 00:00:03 /opt/openoffice4/program/soffice.bin
-headless -accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard

　　执行kill：kill -9 3045

　　步骤7：重启后台运行openoffice。
　[root@a3cf78780ec6 openoffice4]# soffice -headless
-accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
　　　

　　3：测试环境和生产环境内核不一样，安装的安装包不一样。

　　测试环境的安装的是deb文件，使用 dpkg命令安装所有的deb文件，启动服务就能使用。

　　生产环境的是dpkg命令找不到。改换安装prm文件，执行安装之后，竟然启动不了，查找原因之后尽然是没有安装完，RPMS目录下有desktop-integration文件夹，进入到desktop-integration目录，里面有四个rpm　　文件，选择相应的安装即可，这里我选择的是redhat版本。
　　执行 rpm -ivh　openoffice4.1.5-redhat-menus-4.1.5-9789.noarch.rpm

欢迎大家一起说出自己的想法。

技术

Java1212 篇
Python927 篇
开发语言608 篇
c语言463 篇
算法461 篇
MySQL438 篇
数据库394 篇
前端387 篇
更多...