当前位置:网站首页>Fileinputformat.setinputpaths multipath read rule
Fileinputformat.setinputpaths multipath read rule
2022-07-23 07:11:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
FileInputFormat.setInputPaths(job, input1, input2);
When reading files , By default, read the path of a single large file first ( Read all the documents under this document at one time ), After reading the path of the small file .
When writing collaborative filtering , Want to make setInputPaths Method first read the first input path input1, Read the second output path again input2
Even if the file location is exchanged , The order of reading is still wrong
public static class myMapper extends Mapper<LongWritable, Text, Text, Text> {
@SuppressWarnings("rawtypes")
private final static Map<Integer, List> cooccurrenceMatrix = new HashMap<Integer, List>();// Co-occurrence matrix
Text k = new Text();
Text v = new Text();
@SuppressWarnings("unchecked")
@Override
protected void map(LongWritable key, Text values,
Mapper<LongWritable, Text, Text, Text>.Context context)
throws IOException, InterruptedException {
String[] lists = values.toString().split("[\t,]");
// When co-occurrence matrix ,v1 Save books id Book id v2 Save the weight .
// User rating matrix ,v1 Save the book number v2 It's the user id User rating
String[] v1 = lists[0].split(":");
String[] v2 = lists[1].split(":");
if (v1.length > 1) {// You need to read it first cooccurrence Co-occurrence matrix
int itemID1 = Integer.parseInt(v1[0]);// The horizontal axis
int itemID2 = Integer.parseInt(v1[1]);// The vertical axis
int num = Integer.parseInt(v2[0]);// The weight
List<Cooccurrence> list = null;
if (!cooccurrenceMatrix.containsKey(itemID1))
list = new ArrayList<Cooccurrence>();
else
list = cooccurrenceMatrix.get(itemID1);
list.add(new Cooccurrence(itemID1, itemID2, num));
cooccurrenceMatrix.put(itemID1, list);// The abscissa book number is key Store co-occurrence matrix
}
if (v2.length > 1) {// userVector User evaluation matrix
int itemID = Integer.parseInt(v1[0]);// goods id
String userID = v2[0];// user id
double pref = Double.parseDouble(v2[1]);// User rating
k.set(userID);
if (cooccurrenceMatrix.containsKey(itemID)) {
Iterator<Cooccurrence> iterator = cooccurrenceMatrix.get(
itemID).iterator();
while (iterator.hasNext()) {
Cooccurrence co = iterator.next();
v.set(co.getItemID2() + "," + pref * co.getNum());
context.write(k, v);
}
}
}
}
}Baidu for a long time without results , Then think for yourself .
After half a day of drumming myself , Cut the big file into pieces . Then read correctly .
ok, It is concluded that :
FileInputFormat.setInputPaths(job, input1, input2); When reading files , By default, read the path of a single large file first ( Read clearly at one time ), Post reading
The path of the small file .
The code of collaborative filtering refers to https://blog.csdn.net/pang_hailong/article/details/53046330?locationNum=12&fps=1
At first, I didn't understand why step3 Rewrite the co-occurrence matrix inside , Then analysis 70m It's time to read the Douban data of step2 and step3_2 The difference between , Is to cut the data , Guaranteed to read first step3_1 The data inside , Post reading step3_2 The data inside .
this is it , Xiaomengxin wrote a blog for the first time , Please give us more advice .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/125996.html Link to the original text :https://javaforall.cn
边栏推荐
- 关注公众号免费领取小米移动电源是真的吗?微信朋友圈送小米移动电源
- 【MATLAB项目实战】基于SPI指数的某地区地区干旱时空特征分析
- PHP prevents or detects repeated post submissions when the page is refreshed
- Jupyternotebook runs to the specified line
- 删除文件时需要system权限怎么办 你需要来自system的权限才能删除的解决办法
- What if you need system permission to delete files? You need permission from system to delete the solution
- XSS essential knowledge
- Huawei shengteng competition materials
- Unable to open the proxy server. What should I do if the proxy server is not set to full access?
- Pikachu shooting range SQL injection search injection clearance steps
猜你喜欢

OWA动态密码短信认证方案,解决outlook邮件双因子认证问题

Common operators

【MATLAB项目实战】基于SPI指数的某地区地区干旱时空特征分析

Design of boiler drum temperature control system (process control course design matlab/simulink)

vim文本编辑器

CV目标检测模型小抄(1)

Mycms we media mall v3.5 release, new free plug-ins

Uric acid detection and precautions

Gb28181 summary of common problems in the use and secondary development of livegbs streaming media service

Q6ui layout operation
随机推荐
Flink数据源拆解分析(WikipediaEditsSource)
Is it safe to apply for a stock trading account online?
电脑一拖二显示器分辨率怎么调? 两个显示器设置不同分辨率的技巧
How to open the tutorial of administrator permission setting for computer administrator permission
Flink data source disassembly and analysis (Wikipedia editssource)
What if the software downloaded from the computer is not displayed on the desktop? Solve the problem that the installed software is not on the desktop
局域网SDN技术硬核内幕 - 前传 CPU里面有什么?
电脑cmd重置网络设置 重置网络的cmd命令
What is the difference between 32-bit and 64 bit computers
EXCEL单元格公式-实现阿克曼函数计算
STL container -string Simulation Implementation
记事本文件太大打不开怎么办?TXT文件太大无法打开现象的解决办法介绍
删除文件时需要system权限怎么办 你需要来自system的权限才能删除的解决办法
CloudWeGo 在飞书管理后台平台化设计实践
安防摄像头互联网直播方案LiveGBS设计文档
VsCode如何使用国内镜像秒下载
Jupyternotebook runs to the specified line
正向代理,反向代理及XFF
FileInputFormat.setInputPaths多路径读取规则
小马激活工具出现Cannot open file k:\OEMSF 的解决方法