One : Demand details :
Companies need to store contract documents , User upload word Contract of documents , adopt openOffice Go and get it word Convert to pdf, And then pdf Convert to picture format , And store them separately . because openOffice The conversion needs a lot of memory , So the design is task task , Automatic conversion in the morning .
Record the problems encountered in the completion of this requirement .
Two : process
1: Local environment coding (windows)
The first step : Because it's the local environment code and it's Windows Environmental Science , So from the installation openOffice start , There was no problem starting the service .
Step two : Toolkit required for conversion ;
1 <dependency> 2 <groupId>commons-cli</groupId> 3
<artifactId>commons-cli</artifactId> 4 <version>1.2</version> 5 </dependency>
6 7 <dependency> 8 <groupId>commons-io</groupId> 9
<artifactId>commons-io</artifactId>10 <version>1.4</version> 11 </dependency> 12
13 <dependency>14 <groupId>org.openoffice</groupId> 15
<artifactId>juh</artifactId>16 <version>3.0.1</version> 17 </dependency> 18 19
<dependency>20 <groupId>org.openoffice</groupId> 21
<artifactId>jurt</artifactId>22 <version>3.0.1</version> 23 </dependency> 24 25
<dependency>26 <groupId>org.openoffice</groupId> 27
<artifactId>ridl</artifactId>28 <version>3.0.1</version> 29 </dependency> 30 31
<dependency>32 <groupId>org.slf4j</groupId> 33
<artifactId>slf4j-api</artifactId>34 </dependency> 35 36 <dependency> 37
<groupId>org.slf4j</groupId>38 <artifactId>slf4j-jdk14</artifactId> 39
<scope>test</scope>40 </dependency> 41 42 <dependency> 43
<groupId>org.openoffice</groupId>44 <artifactId>unoil</artifactId> 45 <version>
3.0.1</version> 46 </dependency> 47 48 <dependency> 49
<groupId>com.thoughtworks.xstream</groupId>50 <artifactId>xstream</artifactId>
51 <version>1.3.1</version> 52 </dependency> 53 54 <dependency> 55
<groupId>org.apache.pdfbox</groupId>56 <artifactId>fontbox</artifactId> 57
<version>2.0.8</version> 58 </dependency> 59 60 <dependency> 61
<groupId>org.apache.pdfbox</groupId>62 <artifactId>pdfbox</artifactId> 63
<version>2.0.8</version> 64 </dependency>
problem 1: Here comes the first problem , It's in the maven No key dependencies were found in the central repository for jar Package issues .
jodconverter-cli this jar Package central warehouse not found jar Package dependency ,jodconverter
The version just arrived 2.2.1( Previous versions are not supported docx format conversion ,2.2.2 And later on .)
Then discuss with Daniel , Join your company's Intranet maven Warehouse .
Step three : Tools
1 /** 2 * @author GH 3 * input file 4 * output file 5 */ 6 public class WordToPdf {
//word turn pdf 7 public static void docToPdf(File inputFile, File outputFile){ 8
OpenOfficeConnection connection =new SocketOpenOfficeConnection(8100); 9 try{
10 connection.connect(); 11 DocumentConverter converter = new
OpenOfficeDocumentConverter(connection);12 converter.convert(inputFile,
outputFile);13 }catch(ConnectException cex){ 14 cex.printStackTrace(); 15 }
finally{ 16 if(connection!=null){ 17 connection.disconnect(); 18 connection =
null; 19 } 20 } 21 } 22 } 1 /** 2 * @author GH 3 * parameter 1: To be replaced pdf position 4 *
parameter 2: Converted image storage location 5 * parameter 3: Name to be spliced in the middle 6 * return: Converted img Name set 7 */ 8 public class
PdfToImage {//pdf turn img 9 public static List<String> pdfToImagePath(String
srcFile,String contractFromSrc,String name){10 List<String> list = new
ArrayList<>(); 11 String imagePath; 12 File file = new File(srcFile); 13 try {
14 File f =new File(contractFromSrc); 15 if(!f.exists()){ 16 f.mkdir(); 17 }
18 PDDocument doc = PDDocument.load(file); 19 PDFRenderer renderer = new
PDFRenderer(doc);20 int pageCount = doc.getNumberOfPages(); 21 for(int i=0;
i<pageCount; i++){ 22 // mode 1, The second parameter is to set the zoom ratio ( That is, pixels ) 23 // BufferedImage image =
renderer.renderImageWithDPI(i, 296);24 // mode 2, The second parameter is to set the zoom ratio ( That is, pixels ) 25 BufferedImage
image = renderer.renderImage(i, 2f);// The larger the second parameter, the higher the resolution of the generated image , The longer the conversion time 26 imagePath =
contractFromSrc+name+"-"+i +".jpg"; 27 ImageIO.write(image, "PNG", new
File(imagePath));28 list.add(name+"-"+i +".jpg"); 29 } 30 doc.close(); 31 }
catch (IOException e) { 32 e.printStackTrace(); 33 } 34 return list; 35 } 36
}
Step 4 : code
First read the untransformed collection from the database , Loop Download oss Object to store files in the specified temporary folder .
Download through tool class conversion word by pdf, Input data pdf record , upload oss object pdf picture .
Through tool class conversion pdf picture , Input data path picture record , Upload and convert it img picture .
try catch Catching anomalies , If there is an exception, roll back the database , delete oss Object uploaded file .
modify word The transition state of is converted .
problem 2: Because in the end, both the test and production environments are Linux Systematic , Because it involves the operation of files , however Linux and Windows The file path is different , for example :Windows The file path is (C:\tmp\test.txt)Linux Then it is (/tmp/test.txt)
therefore In this way
1 public final static String
Convert_Tmp_Url="C:"+File.separator+"temp"+File.separator+"contractToImg"+File.separator;
// conduct word——img Temporary storage path during conversion window 2 public final static String
Convert_Tmp_Url2=File.separator+"tmp"+File.separator+"contractToImg"+File.separator;
// conduct word——img Temporary storage path during conversion linux
File.separator System related default name separator , for convenience , It is represented as a string stay Linux The value of this field is '/' Windows by '\'
Step five : Local testing , no problem .
2: Test environment test (windows)
problem 3: stay Linux Under the environment word transformation word There is a garbled code in Chinese blank , The reason is that Linux Missing Chinese font encoding .
resolvent :
step 1: Create path .
stay centos Of /usr/java/jdk1.8.0_91/jre/lib/fonts New path under :fallback.
step 2: Upload font .
Set font :simhei.ttf Blackbody ,simsun.ttc
Song style (windows Pass through everything Look for it ) Upload to /usr/java/jdk1.8.0_91/jre/lib/fonts/fallback Under the path .
step 3: View system font file path .
View scheme :
* [root@80ec6 fallback]# cat /etc/fonts/fonts.conf
* <!-- Font directory list -->
* <dir>/usr/share/fonts</dir>
* <dir>/usr/share/X11/fonts/Type1</dir> <dir>/usr/share/X11/fonts/TTF</dir> <
dir>/usr/local/share/fonts</dir>
* <dir>~/.fonts</dir>
step 4: Font copy .
take /usr/java/jdk1.8.0_91/jre/lib/fonts All content of , Copy to step 3 View under the path ,
My font path is :/usr/share/fonts.
step 5: Update cache
Execute the order :fc-cache
step 6:kill fall openoffice process .
[root@80ec6 fonts]# ps -ef | grep openoffice
root 3045 3031 0 06:19 pts/1 00:00:03 /opt/openoffice4/program/soffice.bin
-headless -accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard
implement kill:kill -9 3045
step 7: Restart background operation openoffice.
[root@a3cf78780ec6 openoffice4]# soffice -headless
-accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
3: The test environment is different from the production kernel , The installation packages installed are different .
The test environment is installed with deb file , use dpkg Command to install all deb file , Start the service and use it .
What is the production environment dpkg Command not found . Replacement and installation prm file , After performing the installation , It can't start , After finding out the reason, the installation is not finished ,RPMS There are desktop-integration folder , Enter desktop-integration catalog , There are four in it rpm file , Select the appropriate installation , What I choose here is redhat edition .
implement rpm -ivh openoffice4.1.5-redhat-menus-4.1.5-9789.noarch.rpm
Welcome to share your ideas .
Technology