当前位置:网站首页>一道shell脚本的统计题
一道shell脚本的统计题
2022-06-27 12:55:00 【用户3147702】
1. 问题描述
1.1. 输入格式:
a. 若干行数据,每行数据有3列内容,列之间\t分割。 b. 第一列表示属性1,第二列表示属性2,第三列表示属性3。 c. 每一个属性的可能取值在一次计算过程中是固定的,比如属性1只可是0,1,2,4,属性2只可能是29,35,55,70。 d. 每一个属性的可能取值在不同的计算过程中可能是变化的,比如第一次计算时属性1只可能是0,1,2,4,第二次计算时,属性1的可能取值多了一个5,即第二次计算时,属性1只可能取0,1,2,4,5。
1.2. 输入的例子:
0 29 50。 1 35 60。 0 29 60。
1.3. 输出的结果:
Flag 29 35。 0 2 0。 1 0 1。
1.4. 输出结果的解释:
a. Flag是固定的,就输出成这个。 b. 第一行除flag之外,是属性2的所有可能取值。 c. 第一列除flag之外,是属性1的所有可能取值。 d. 其余位置数字的含义:例如:第二行第二列的2,表示输入数据中,属性1的值是0并且属性2的值是29,这样的数据有2行。
2. 解答代码
2.1. main.sh
#!/bin/bash
bash count.sh > output.txt &&
bash count_result.sh > result.txt &&
cat result.txt
echo2.2. cout.sh
#!/bin/bash
sort -k 1,2 input.txt > output_a.txt &&
b="<![INITED]>";
i=1;
while read line; do
y=`echo $line | awk '{ print $1"\t"$2 }'`;
if [ "$b" != "<![INITED]>" ]; then
if [ "$y" != "$b" ]
then
echo -e ${b}"\t"${i};
i=1;
else
i=$((i+1))
fi
fi
b=$y;
done < output_a.txt
echo -e ${b}"\t"${i};2.3. count_result.sh
#!/bin/bash
echo -e "Flag\t\c"
i=0
x="<![INITED]>"
while read line; do
b=`echo -e $line | awk '{print $1}'`
if [ "$b" != "$x" ]; then
array_1[$((i))]=$b
x=${array_1[$((i))]}
i=$((i+1))
fi
done < output.txt
sort -k 2 output.txt > out2.txt
i=0
x="<![INITED]>"
while read line; do
b=`echo -e $line | awk '{print $2}'`
if [ "$b" != "$x" ]; then
array_2[$((i))]=$b
x=${array_2[$((i))]}
i=$((i+1))
fi
done < out2.txt
for var in ${array_2[@]}; do
echo -e $var"\t\c"
done
echo
echo -e ${array_1[0]}"\t\c"
i=0
j=0
while read line; do
e1=`echo -e $line | awk '{ print $1 }'`
e2=`echo -e $line | awk '{ print $2 }'`
e3=`echo -e $line | awk '{ print $3 }'`
if [ $e1 != ${array_1[$((j))]} ]; then
while [ $((i)) -lt ${#array_2[@]} ]; do
echo -e 0"\t\c"
i=$((i+1))
done
i=0
j=$((j+1))
echo
echo -e ${array_1[$((j))]}"\t\c"
fi
while [ 1 ]; do
if [ $e2 == ${array_2[$((i))]} ]; then
echo -e $e3"\t\c"
break;
elif [ $e2 -gt ${array_2[$((i))]} ]; then
echo -e 0"\t\c"
i=$((i+1))
fi
done
i=$((i+1))
done < output.txt
while [ $((i)) -lt ${#array_2[@]} ]; do
echo -e 0"\t\c"
i=$((i+1))
done
rm -rf out*.txt3. 缺陷
while read line; do ... done这个方式读取文件效率过低,如果输入为十万行级别或更高,运行时间是不可接受的。 因此,这个方案无奈被弃置。
4. awk 改进
4.1. awk_main.sh
cat zeyu_test_input | awk -F"\t" -f count.awk -v a=2 b=4 > out.txt &&
sort -k 1,2 out.txt > output.txt &&
sh count_result.sh > result.txt &&
echo >> result.txt
echo >> result.txt
cat zeyu_test_input | awk -F"\t" -f count.awk -v a=2 b=5 > out.txt &&
sort -k 1,2 out.txt > output.txt &&
sh count_result.sh >> result.txt &&
cat result.txt &&
echo4.2. count.awk
{
y=$a"\t"$b;
if (y in A)
A[y]++;
else
A[y] = 1;
};
END \
{
for (k in A)
{
print k"\t"A[k];
}
}4.3. count_result.sh
#!/bin/bash
echo -e "Flag\t\c"
i=0
x="<![INITED]>"
while read line; do
b=`echo $line | awk '{print $1}'`
if [ "$b" != "$x" ]; then
array_1[$((i))]=$b
x=${array_1[$((i))]}
i=$((i+1))
fi
done < output.txt
sort -k 2 output.txt > out2.txt
i=0
x="<![INITED]>"
while read line; do
b=`echo $line | awk '{print $2}'`
if [ "$b" != "$x" ]; then
array_2[$((i))]=$b
x=${array_2[$((i))]}
i=$((i+1))
fi
done < out2.txt
for var in ${array_2[@]}; do
echo -e $var"\t\c"
done
echo -e "total\t"
echo -e ${array_1[0]}"\t\c"
i=0
j=0
x_t=0
while read line; do
e1=`echo $line | awk '{ print $1 }'`
e2=`echo $line | awk '{ print $2 }'`
e3=`echo $line | awk '{ print $3 }'`
if [ $e1 != ${array_1[$((j))]} ]; then
while [ $((i)) -lt ${#array_2[@]} ]; do
echo -e 0"\t\c"
i=$((i+1))
done
i=0
j=$((j+1))
echo $x_t
x_t=0
echo -e ${array_1[$((j))]}"\t\c"
fi
while [ 1 ]; do
if [ $e2 == ${array_2[$((i))]} ]; then
echo -e $e3"\t\c"
y_t[$((i))]=$(($((${y_t[$((i))]}))+$e3));
x_t=$((x_t+e3))
break;
elif [ $e2 -gt ${array_2[$((i))]} ]; then
echo -e 0"\t\c"
i=$((i+1))
fi
done
i=$((i+1))
done < output.txt
while [ $((i)) -lt ${#array_2[@]} ]; do
echo -e 0"\t\c"
i=$((i+1))
done
echo $x_t
i=0
echo -e "total\t\c"
for var in ${y_t[@]}; do
echo -e $var"\t\c"
i=$((i+var))
done
echo $i
rm -rf out*.txt5. 执行结果
运行一个 1620590 行的数据用时 16 秒。 执行结果如图所示:
边栏推荐
- Introduce you to ldbc SNB, a powerful tool for database performance and scenario testing
- 与生活握手言和
- TCP 流控问题两则
- How to download pictures with hyperlinks
- 数字化新星何为低代码?何为无代码
- What is low code for digital Nova? What is no code
- JSON.stringify用法
- 7 killer JS lines of code
- Today's sleep quality record 78 points
- 手把手教你搭一个永久运行的个人服务器!
猜你喜欢

今日睡眠质量记录78分

Summary of redis master-slave replication principle

手把手教你搭一个永久运行的个人服务器!

Pre training weekly issue 51: reconstruction pre training, zero sample automatic fine tuning, one click call opt

Cloud native (30) | kubernetes' app store Helm

Quick news: Huawei launched the Hongmeng developer competition; Tencent conference released the "Wanshi Ruyi" plan

让学指针变得更简单(一)

OpenFeign服务接口调用

The world's fastest download tool XDM

printf不定长参数原理
随机推荐
Introduce you to ldbc SNB, a powerful tool for database performance and scenario testing
ThreadLocal 源码全详解(ThreadLocalMap)
每日刷题记录 (六)
Read a poem
socket阻塞和非阻塞模式
script defer async模式
C语言 函数指针与回调函数
Bluetooth health management device based on stm32
Does Xinhua San still have to rely on ICT to realize its 100 billion enterprise dream?
Neo4j: basic introduction (I) installation and use
[weekly replay] the 81st biweekly match of leetcode
Database Series: MySQL index optimization and performance improvement summary (comprehensive version)
爱可可AI前沿推介(6.27)
To understand again is the person in the song
ViewPager2使用记录
防火墙基础之华为华三防火墙web页面登录
IJCAI 2022 | greatly improve the effect of zero sample learning method with one line of code. Nanjing Institute of Technology & Oxford proposed the plug and play classifier module
JSON.stringify用法
Teach you how to build a permanent personal server!
What kind of air conditioner is this?