当前位置:网站首页>Fileinputformat.setinputpaths multipath read rule

Fileinputformat.setinputpaths multipath read rule

2022-07-23 07:11:00 Full stack programmer webmaster

Hello everyone , I meet you again , I'm your friend, Quan Jun .

FileInputFormat.setInputPaths(job, input1, input2);

When reading files , By default, read the path of a single large file first ( Read all the documents under this document at one time ), After reading the path of the small file .

When writing collaborative filtering , Want to make setInputPaths Method first read the first input path input1, Read the second output path again input2

Even if the file location is exchanged , The order of reading is still wrong

public static class myMapper extends Mapper<LongWritable, Text, Text, Text> {
		@SuppressWarnings("rawtypes")
		private final static Map<Integer, List> cooccurrenceMatrix = new HashMap<Integer, List>();//  Co-occurrence matrix 

		Text k = new Text();
		Text v = new Text();

		@SuppressWarnings("unchecked")
		@Override
		protected void map(LongWritable key, Text values,
				Mapper<LongWritable, Text, Text, Text>.Context context)
				throws IOException, InterruptedException {

			String[] lists = values.toString().split("[\t,]");

			//  When co-occurrence matrix ,v1 Save books id  Book id v2 Save the weight .
			//  User rating matrix ,v1 Save the book number  v2 It's the user id  User rating 
			String[] v1 = lists[0].split(":");
			String[] v2 = lists[1].split(":");

			if (v1.length > 1) {//  You need to read it first cooccurrence Co-occurrence matrix 
				int itemID1 = Integer.parseInt(v1[0]);//  The horizontal axis 
				int itemID2 = Integer.parseInt(v1[1]);//  The vertical axis 

				int num = Integer.parseInt(v2[0]);//  The weight 

				List<Cooccurrence> list = null;
				if (!cooccurrenceMatrix.containsKey(itemID1))
					list = new ArrayList<Cooccurrence>();
				else
					list = cooccurrenceMatrix.get(itemID1);

				list.add(new Cooccurrence(itemID1, itemID2, num));
				cooccurrenceMatrix.put(itemID1, list);//  The abscissa book number is key  Store co-occurrence matrix 
			}

			if (v2.length > 1) {// userVector User evaluation matrix 

				int itemID = Integer.parseInt(v1[0]);//  goods id
				String userID = v2[0];//  user id
				double pref = Double.parseDouble(v2[1]);//  User rating 

				k.set(userID);

				if (cooccurrenceMatrix.containsKey(itemID)) {
					Iterator<Cooccurrence> iterator = cooccurrenceMatrix.get(
							itemID).iterator();

					while (iterator.hasNext()) {
						Cooccurrence co = iterator.next();

						v.set(co.getItemID2() + "," + pref * co.getNum());
						context.write(k, v);
					}

				}
			}
		}
	}

Baidu for a long time without results , Then think for yourself .

After half a day of drumming myself , Cut the big file into pieces . Then read correctly .

ok, It is concluded that :

FileInputFormat.setInputPaths(job, input1, input2); When reading files , By default, read the path of a single large file first ( Read clearly at one time ), Post reading

The path of the small file .

The code of collaborative filtering refers to https://blog.csdn.net/pang_hailong/article/details/53046330?locationNum=12&fps=1

At first, I didn't understand why step3 Rewrite the co-occurrence matrix inside , Then analysis 70m It's time to read the Douban data of step2 and step3_2 The difference between , Is to cut the data , Guaranteed to read first step3_1 The data inside , Post reading step3_2 The data inside .

this is it , Xiaomengxin wrote a blog for the first time , Please give us more advice .

Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/125996.html Link to the original text :https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/204/202207221940224440.html